Fork of Lightning_Report adding: - n8n_report_branch.json: workflow branch for storm-triggered report delivery - report_service/: FastAPI microservice wrapping create_docx_report() so n8n can produce byte-identical reports without fighting the Python Code sandbox Made-with: Cursor
Lightning Report Generator
A comprehensive Python application for analyzing lightning strike data in relation to wind turbine locations and generating detailed DOCX reports with risk assessments, visualizations, and statistical analysis.
Overview
This application processes lightning strike data and wind turbine coordinates to:
- Calculate lightning risk scores for each turbine using advanced mathematical models
- Generate interactive maps showing lightning strikes and turbine locations
- Create statistical analysis and histograms with temporal distribution
- Group turbines based on proximity and risk levels
- Generate comprehensive DOCX reports with visualizations and risk assessment charts
- Support storm cell analysis and mapping
- Provide detailed risk score interpretation and calculation methodology
Features
Core Analysis
- Risk Assessment: Fast per-turbine scoring using BallTree radius queries (Haversine metric) with automatic fallback to vectorized matrix math
- Advanced Risk Formula:
Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance)with configurable parameters - Geospatial Analysis: Vectorized Haversine utilities and configurable distance rings
- Statistical Analysis: Lightning density, frequency, and temporal distribution analysis
- Daily Lightning Density: Calculates daily average using actual number of days in date range (not fixed month)
- Turbine Grouping: Proximity-based clustering using DBSCAN (Haversine) with graceful fallback to O(N^2) grouping for small datasets
API Integration
- Automated Data Fetching: Fetch lightning and storm data directly from API
- Flexible Location Bounds: Auto-calculate center + radius from turbines or specify manually
- Date Range Management: Auto-detect actual period from data or use manual date ranges
- Batch Processing: Process multiple wind farms in a single run
- Error Handling: Graceful handling of empty data, API timeouts, and failures
Visualization
- Interactive Maps: Plotly-based coordinate-plane maps for CG/IC lightning with ring-aware coloring
- Risk Score Heatmap: 2D visualization with current magnitude on X-axis (up to 300k amps) and distance on Y-axis, with contour curves
- Fixed Interval Coloring: Consistent color gradient mapping (blue to red) based on predefined risk score ranges (0.1-1.5)
- Lightning Histograms: Temporal distribution of lightning events with peak detection
- Storm Cell Maps: Visualization of storm cell data (when available)
- Coordinate Plane Views: Standard geographic orientation (latitude on Y-axis, longitude on X-axis)
Reporting
- DOCX Generation: Word reports (DOCX)
- Risk Score Chart: Integrated heatmap showing distance vs. current magnitude relationship
- Multiple Map Types: Coordinate plane maps for different lightning types
- Statistical Tables: Detailed lightning strike information with proximity data (precomputed distances)
- Risk Summaries: Grouped risk analysis and recommendations with fixed interval color coding
- Enhanced Appendix: Detailed methodology explanations including risk calculation method, interpretation guide, and algorithm descriptions
Data Processing
- JSON Data Loading: Support for various JSON data structures
- Date Range Filtering: Configurable analysis periods
- Date/Time Formatting: Centralized, consistent DD-MM-YYYY and DD-MM-YYYY HH:MM:SS formatting
- Data Validation: Comprehensive input validation and error handling
- Precomputation: Shared per-group distance and ring-index precompute reused by maps and tables
- Coordinate Conversion: UTM ED50 to WGS84 coordinate system conversion
Installation
Prerequisites
- Python 3.8 or higher
- pip package manager
Dependencies
Install the required packages:
pip install -r requirements.txt
Required Packages
pandas>=1.5.0- Data manipulation and analysisnumpy>=1.21.0- Numerical computationsplotly>=5.15.0- Interactive visualizationskaleido>=0.2.1- Static image export for Plotlyscikit-learn>=1.3.0- BallTree radius queries and DBSCAN clustering (used when available)requests>=2.31.0- API HTTP requestspython-dotenv>=1.0.0- Environment variable managementpython-docx>=1.1.2- DOCX (Word) report generation
Optional Dependencies
For coordinate conversion functionality:
pip install -r utm_converter_requirements.txt
Configuration
The application supports two modes of operation:
1. Single Report Generation (Legacy Mode)
Uses src/config.py for configuration. See the legacy section below for details.
2. Batch Report Generation (Recommended)
Uses wind_farms_config.json for multi-farm batch processing with API integration.
Setup
- Create
.envfile with your API key:
API_KEY=your_api_key_here
- Create
wind_farms_config.json:
{
"api_config": {
"base_url": "https://risk.tarla.io/api",
"timeout_seconds": 30,
"retry_attempts": 3,
"default_query_range": {
"method": "current_month"
}
},
"output_base_directory": "reports/",
"default_padding_km": 5,
"wind_farms": [
{
"farm_id": "dagpazari_RES",
"name": "Dağpazarı RES",
"enabled": true,
"coordinates_file": "/path/to/coordinates.json",
"distance_rings": [1000, 2000, 3000, 4000, 10000],
"ring_colors": ["purple", "red", "orange", "coral", "green"],
"api_params": {
"location_bounds": {
"method": "auto",
"padding_km": 5
},
"date_range": {
"method": "auto",
"query_range": {
"method": "current_month"
}
}
},
"report_config": {
"output_directory": "reports/dagpazari_RES/",
"wind_farm_name": "Dağpazarı RES"
}
}
]
}
Configuration Parameters
Farm-Level Settings:
enabled:true/false- Enable/disable report generation for this farmdistance_rings: Array of distance rings in meters (e.g.,[1000, 2000, 3000, 4000, 10000])ring_colors: Array of colors for each ringcoordinates_file: Path to turbine coordinates JSON file
Location Bounds:
method:"auto"(calculate from turbines) or"manual"(specify)padding_km: Extra buffer beyond max distance ring (default: 5km)- For manual: provide
center_lat,center_lng,radius_km
Date Range:
method:"auto"(detect from data) or"manual"(specify)- For manual: provide
start_dateandend_dateinDD-MM-YYYYformat - For auto: specify
query_rangeto control API query period
Query Range Options (for auto mode):
"current_month": First day of current month to today"last_month": Entire previous month"days_back": Last N days (requiresdaysparameter)"custom": Specific dates (requiresstart_dateandend_date)
Global Configuration (src/config.py)
The src/config.py file now only contains global defaults:
- Risk calculation parameters (
risk_params) - Histogram parameters (
histogram_params) - PDF layout parameters (
pdf_params) - Grouping parameters (
grouping_params)
Note: Farm-specific settings (distance_rings, ring_colors, wind_farm_name, file paths, date ranges) are managed in wind_farms_config.json and should NOT be configured in config.py.
Location Bounds Auto-Calculation
When location_bounds.method = "auto", the system calculates:
-
Centroid (Center Point):
center_lat= average of all turbine latitudescenter_lng= average of all turbine longitudes
-
Maximum Distance from Centroid:
- Calculates distance from centroid to each turbine
- Finds the maximum distance
-
Total Radius:
radius_km = (max_turbine_distance / 1000) + (max_distance_ring / 1000) + padding_kmExample: If turbines span 2.5km from centroid, max ring is 10km, padding is 5km:
- Total radius = 2.5 + 10 + 5 = 17.5km
Date Range Handling
- If
date_range.method = "auto": Usesquery_rangeto determine what dates to fetch; the report uses those query dates for the analyzed period. - If
date_range.method = "manual": Uses specifiedstart_dateandend_datefor both API fetch and report (supportsDD-MM-YYYYor ISO with time, e.g.2026-01-22T07:00:00Z).
Daily Lightning Density Calculation
The daily lightning density is calculated using the actual number of days in the analysis period:
daily_lightning_per_km2 = total_lightning_per_km2 / actual_days_in_range
Where actual_days_in_range is calculated from the start and end dates (inclusive).
Example:
- Date range: September 1-15 (15 days)
- Total lightning density: 150 events/km²
- Daily lightning density: 150 / 15 = 10 events/km²/day
This ensures accurate daily averages for partial months or custom date ranges.
Risk Score Categories
The system uses fixed interval coloring based on specific risk score ranges:
- Very Low Risk (<0.1): Blue - Distant lightning with low current
- Low Risk (0.1-0.2): Teal - Moderate distance lightning
- Med-Low Risk (0.2-0.4): Green - Closer lightning
- Medium Risk (0.4-0.6): Yellow - Moderate risk lightning
- Med-High Risk (0.6-0.8): Orange - High risk lightning
- High Risk (0.8-1.0): Dark Orange - Very high risk lightning
- Very High Risk (1.0-1.2): Red - Extreme risk lightning
- Critical Risk (>1.2): Dark Red - Critical risk lightning
Grouping vs Analysis Radius
- grouping_params.max_distance_m (meters): Controls ONLY turbine clustering (grouping). If set (>0), it overrides ring-based grouping. Used to decide which turbines are in the same group.
- grouping_params.distance_ring_index (0-based): Selects a ring from
distance_rings.- For grouping: used only if
max_distance_mis not set; determines grouping radius. - For analysis (histogram, stats, report labels): ALWAYS used to choose the analysis radius/cutoff. Does not change grouping when
max_distance_mis provided.
- For grouping: used only if
Examples
- If
max_distance_m=2500anddistance_ring_index=4(10 km ring):- Grouping radius = 2.5 km (from max_distance_m)
- Analysis radius = 10 km (from distance_ring_index)
- If
max_distance_munset anddistance_ring_index=1(2 km ring):- Grouping radius = 2 km
- Analysis radius = 2 km
Clustering Algorithm
- Preferred: DBSCAN with Haversine metric
- Convert lat/lng to radians;
eps = (radius_km / 6371),min_samples=1 - Clusters are formed transitively (density reachability). Example with R=2 km: A–B=1.5 km, B–C=1.5 km, A–C=3.0 km → one cluster {A,B,C} due to B bridging A and C
- Convert lat/lng to radians;
- Fallback: Greedy O(N^2) proximity grouping if scikit-learn is unavailable
- Starts a group at turbine i; adds any j within R of i; moves on. No transitive chaining
Wind Farm Configuration
wind_farm_name = "Your Wind Farm Name"
Usage
Batch Report Generation (Recommended)
Generate reports for multiple wind farms automatically:
# Process all enabled farms
python batch_generate.py --config wind_farms_config.json
# Process specific farm
python batch_generate.py --config wind_farms_config.json --farm-id dagpazari_RES
# List farms and their enabled status
python batch_generate.py --config wind_farms_config.json --list-farms
# Process all farms (ignore enabled flag)
python batch_generate.py --config wind_farms_config.json --force-all
The batch system will:
- Load configuration from
wind_farms_config.json - For each enabled farm:
- Load turbine coordinates
- Auto-calculate location bounds (center + radius) from turbines
- Determine date range for API query
- Fetch lightning data from API
- Fetch storm data from API
- Calculate risk scores
- Generate DOCX report
- Save to farm's output directory
- Generate batch summary report
Single Report Generation (Legacy)
Run the main application for a single report:
python main.py
The application will:
- Load lightning and turbine data from configured JSON files (in
src/config.py) - Calculate risk scores for each turbine using the advanced risk formula
- Create turbine groups based on proximity
- Generate visualizations including the new risk score heatmap
- Create a comprehensive DOCX report with enhanced appendix
Data Format Requirements
Lightning Data JSON
{
"data": [
{
"lat": 39.85420,
"lng": 26.71218,
"local_time": "2025-07-15T14:30:25",
"current": -15000,
"p_type": "0",
"height": 5000
}
]
}
Required Fields:
lat,lng: Lightning strike coordinateslocal_time: Timestamp (various formats supported)current: Lightning current in amperesp_type: Lightning type ("0" for cloud-to-ground, others for intercloud)
Turbine Data JSON
[
{
"lat": 39.85420,
"lng": 26.71218,
"turbine_id": "T001"
}
]
Required Fields:
lat,lng: Turbine coordinatesturbine_id: Unique turbine identifier
Advanced Usage
Coordinate Conversion
Convert UTM ED50 coordinates to WGS84:
python utm_ed50_to_wgs84_converter.py input.csv output.csv
Data Separation by Month
Separate large JSON files by month:
python separate_by_month.py input_data.json [output_directory]
Output
DOCX Report Structure
- Cover Page: Wind farm information and analysis period
- Report Summary: Automated narrative summary (Gemini-backed when available)
- Risk Analysis: Detailed risk scores and rankings with fixed interval coloring
- Lightning Maps: Coordinate plane visualizations with proper geographic orientation
- Statistical Analysis: Lightning density and frequency data
- Detailed Tables: Complete lightning strike information with color-coded distance rings
- Storm Analysis: Storm cell data and maps (if available)
- Enhanced Appendix: Comprehensive methodology including:
- Risk calculation method and formula explanation
- Risk score interpretation guide
- Centroid and distance ring calculation methodology
- Turbine grouping algorithm description
- Frequent lightning activity period detection algorithm
Generated Files
Single Report Mode:
lightning_report.log: Application execution log{wind_farm_name}_lightning_report.docx: Main DOCX report- Interactive HTML maps (temporary files)
Batch Generation Mode:
batch_generation_YYYY-MM-DD.log: Batch execution logbatch_summary_YYYY-MM-DD.json: Batch processing summary{farm_id}_report.docx: DOCX report for each farm (in respective output directories)
Project Structure
lightning_report/
├── main.py # Single report generation (legacy)
├── batch_generate.py # Batch report generation with API
├── wind_farms_config.json # Batch configuration file
├── .env # API credentials (gitignored)
├── requirements.txt # Python dependencies
├── src/
│ ├── config.py # Global configuration defaults
│ ├── api/
│ │ └── data_fetcher.py # API integration for data fetching
│ ├── data/
│ │ └── loader.py # Data loading and validation
│ ├── analysis/
│ │ ├── geospatial.py # Distance calculations (vectorized Haversine)
│ │ ├── grouping.py # Turbine grouping (DBSCAN + fallback)
│ │ ├── histogram.py # Temporal analysis
│ │ ├── risk.py # Risk calculation (BallTree + fallback)
│ │ └── statistics.py # Statistical analysis (includes daily density)
│ ├── reporting/
│ │ ├── docx.py # DOCX report generation
│ │ ├── docx_sections.py # Shared DOCX helpers (charts/tables)
│ │ └── precompute.py # Shared precomputations (distances, ring indices)
│ ├── visualization/
│ │ ├── maps.py # Map generation with risk score heatmap
│ │ └── storm_cells.py # Storm cell visualization
│ └── utils.py # Utility functions including fixed interval coloring
├── separate_by_month.py # Data separation utility
└── utm_ed50_to_wgs84_converter.py # Coordinate conversion
Configuration Examples
Batch Generation Setup
Example: Multiple Farms with Different Settings
{
"api_config": {
"base_url": "https://risk.tarla.io/api",
"timeout_seconds": 30,
"retry_attempts": 3
},
"wind_farms": [
{
"farm_id": "farm1",
"name": "Farm 1",
"enabled": true,
"coordinates_file": "/path/to/farm1_coordinates.json",
"distance_rings": [1000, 2000, 3000, 4000, 10000],
"api_params": {
"location_bounds": {
"method": "auto",
"padding_km": 5
},
"date_range": {
"method": "manual",
"start_date": "01-09-2025",
"end_date": "30-09-2025"
}
},
"report_config": {
"output_directory": "reports/farm1/",
"wind_farm_name": "Farm 1"
}
},
{
"farm_id": "farm2",
"name": "Farm 2",
"enabled": false,
"coordinates_file": "/path/to/farm2_coordinates.json",
"distance_rings": [1000, 2000, 3000, 4000, 10000],
"api_params": {
"location_bounds": {
"method": "manual",
"center_lat": 36.90,
"center_lng": 33.575,
"radius_km": 35
},
"date_range": {
"method": "auto",
"query_range": {
"method": "days_back",
"days": 30
}
}
},
"report_config": {
"output_directory": "reports/farm2/",
"wind_farm_name": "Farm 2"
}
}
]
}
Custom Risk Parameters
# Adjust risk calculation sensitivity in src/config.py
risk_params = {
'P_0': 1.5, # Higher base probability
'alpha': 0.3, # Slower distance decay
'current_weight': 0.2 # Higher current importance
}
Note: Farm-specific settings (distance_rings, ring_colors, etc.) should be configured in wind_farms_config.json, not in config.py.
Risk Score Methodology
Risk Calculation Formula
The system uses an advanced risk calculation formula:
Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance)
Where:
- P₀: Base probability (configurable)
- α: Distance decay factor (configurable)
- Current: Lightning current magnitude in amperes
- Distance: Distance from turbine in kilometers
Risk Score Interpretation
The risk score heatmap provides a visual reference for interpreting risk levels:
- X-axis: Lightning current magnitude (1,000 to 300,000 amperes)
- Y-axis: Distance from turbine (0.1 km to max distance ring, dynamically scaled)
- Color intensity: Risk score level (blue to red gradient using palette: F94144, F3722C, F8961E, F9C74F, 90BE6D, 43AA8B, 577590)
- Contour curves: Specific risk level boundaries (0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.5)
API Integration
The system integrates with the Tarla.io API for automated data fetching:
Endpoints:
- Lightning data:
https://risk.tarla.io/api/lightning-data/historical/ - Storm data:
https://risk.tarla.io/api/storm-data/historical/
Authentication:
- API key stored in
.envfile asAPI_KEY - Sent as
x-api-keyheader in requests
Request Format:
- Query type:
circle(center + radius) - Parameters:
centerLatitude,centerLongitude,radius(in meters),startDate,endDate - Date format:
YYYY-MM-DD
Response Handling:
- Automatically converts API responses to expected DataFrame format
- Handles empty datasets gracefully
- Validates data structure before processing
Troubleshooting
Common Issues
-
API Authentication Errors (401 Unauthorized)
- Verify
.envfile exists withAPI_KEY=your_key - Check that API key is correct and active
- Ensure API key contains special characters correctly (e.g.,
==at the end)
- Verify
-
API Timeout Errors
- Increase
timeout_secondsinapi_config - Check network connectivity
- Verify API endpoint is accessible
- Increase
-
File Not Found Errors
- For batch mode: Verify file paths in
wind_farms_config.json - For single mode: Verify file paths in
src/config.py - Ensure JSON files exist and are readable
- For batch mode: Verify file paths in
-
Data Validation Errors
- Check JSON format matches required structure
- Verify coordinate values are valid numbers
- Ensure timestamp format is supported
- For API data: Check API response format matches expected structure
-
Empty Data / NaT Errors
- System handles empty datasets gracefully
- Check API date range - data might not exist for specified period
- Verify location bounds cover the area of interest
- Check logs for API response details
-
Memory Issues with Large Datasets
- Use
separate_by_month.pyto split large files - Adjust analysis period to smaller time ranges
- Process farms individually using
--farm-idflag
- Use
-
DOCX Generation Errors
- Ensure sufficient disk space
- Check write permissions for output directory
-
Risk Score Heatmap Issues
- Verify distance_rings configuration is valid
- Check that lightning data contains valid current values
- Ensure turbine coordinates are properly formatted
-
Batch Generation Issues
- Check
batch_summary_YYYY-MM-DD.jsonfor detailed error information - Verify all farms have valid configuration
- Check
batch_generation_YYYY-MM-DD.logfor detailed logs - Use
--list-farmsto verify farm configuration
- Check
Logging
Single Report Mode:
lightning_report.log: Application execution log
Batch Generation Mode:
batch_generation_YYYY-MM-DD.log: Batch execution log with per-farm detailsbatch_summary_YYYY-MM-DD.json: Structured summary of batch processing
Logs include:
- Data loading progress
- API request/response details
- Risk calculation details
- Error messages and stack traces
- Performance metrics
- Farm processing status
Performance Considerations
-
Large Datasets: For datasets with >100,000 lightning strikes, consider:
- Using date range filtering
- Splitting data by month
- Increasing system memory allocation
-
Optimizations used:
- BallTree neighbor queries for CG risk scoring (O(n log n) build; sublinear queries)
- DBSCAN clustering with Haversine metric for grouping; O(N^2) fallback maintained
- Vectorized Haversine distance utilities (array-based)
- Shared per-group precomputation of distances and ring indices reused by maps and tables
- Centralized date/time parsing and formatting
- Efficient risk score heatmap generation with contour overlay
Contributing
- Follow the existing code structure and naming conventions
- Add appropriate error handling and logging
- Update configuration options as needed
- Test with various data formats and sizes
- Update documentation for new features
- Maintain consistency with the fixed interval coloring system
License
This project is proprietary software. All rights reserved.
Support
For technical support or feature requests, please contact the development team with:
- Detailed error messages
- Sample data (if possible)
- System configuration details
- Expected vs actual behavior description