BE-LightningReport/README.md
erdemerikci 45d80dfaa6 Initial import: Lightning_Report with n8n integration
Fork of Lightning_Report adding:
- n8n_report_branch.json: workflow branch for storm-triggered report delivery
- report_service/: FastAPI microservice wrapping create_docx_report() so n8n
  can produce byte-identical reports without fighting the Python Code sandbox

Made-with: Cursor
2026-04-22 15:13:08 +03:00

23 KiB
Raw Blame History

Lightning Report Generator

A comprehensive Python application for analyzing lightning strike data in relation to wind turbine locations and generating detailed DOCX reports with risk assessments, visualizations, and statistical analysis.

Overview

This application processes lightning strike data and wind turbine coordinates to:

  • Calculate lightning risk scores for each turbine using advanced mathematical models
  • Generate interactive maps showing lightning strikes and turbine locations
  • Create statistical analysis and histograms with temporal distribution
  • Group turbines based on proximity and risk levels
  • Generate comprehensive DOCX reports with visualizations and risk assessment charts
  • Support storm cell analysis and mapping
  • Provide detailed risk score interpretation and calculation methodology

Features

Core Analysis

  • Risk Assessment: Fast per-turbine scoring using BallTree radius queries (Haversine metric) with automatic fallback to vectorized matrix math
  • Advanced Risk Formula: Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance) with configurable parameters
  • Geospatial Analysis: Vectorized Haversine utilities and configurable distance rings
  • Statistical Analysis: Lightning density, frequency, and temporal distribution analysis
  • Daily Lightning Density: Calculates daily average using actual number of days in date range (not fixed month)
  • Turbine Grouping: Proximity-based clustering using DBSCAN (Haversine) with graceful fallback to O(N^2) grouping for small datasets

API Integration

  • Automated Data Fetching: Fetch lightning and storm data directly from API
  • Flexible Location Bounds: Auto-calculate center + radius from turbines or specify manually
  • Date Range Management: Auto-detect actual period from data or use manual date ranges
  • Batch Processing: Process multiple wind farms in a single run
  • Error Handling: Graceful handling of empty data, API timeouts, and failures

Visualization

  • Interactive Maps: Plotly-based coordinate-plane maps for CG/IC lightning with ring-aware coloring
  • Risk Score Heatmap: 2D visualization with current magnitude on X-axis (up to 300k amps) and distance on Y-axis, with contour curves
  • Fixed Interval Coloring: Consistent color gradient mapping (blue to red) based on predefined risk score ranges (0.1-1.5)
  • Lightning Histograms: Temporal distribution of lightning events with peak detection
  • Storm Cell Maps: Visualization of storm cell data (when available)
  • Coordinate Plane Views: Standard geographic orientation (latitude on Y-axis, longitude on X-axis)

Reporting

  • DOCX Generation: Word reports (DOCX)
  • Risk Score Chart: Integrated heatmap showing distance vs. current magnitude relationship
  • Multiple Map Types: Coordinate plane maps for different lightning types
  • Statistical Tables: Detailed lightning strike information with proximity data (precomputed distances)
  • Risk Summaries: Grouped risk analysis and recommendations with fixed interval color coding
  • Enhanced Appendix: Detailed methodology explanations including risk calculation method, interpretation guide, and algorithm descriptions

Data Processing

  • JSON Data Loading: Support for various JSON data structures
  • Date Range Filtering: Configurable analysis periods
  • Date/Time Formatting: Centralized, consistent DD-MM-YYYY and DD-MM-YYYY HH:MM:SS formatting
  • Data Validation: Comprehensive input validation and error handling
  • Precomputation: Shared per-group distance and ring-index precompute reused by maps and tables
  • Coordinate Conversion: UTM ED50 to WGS84 coordinate system conversion

Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Dependencies

Install the required packages:

pip install -r requirements.txt

Required Packages

  • pandas>=1.5.0 - Data manipulation and analysis
  • numpy>=1.21.0 - Numerical computations
  • plotly>=5.15.0 - Interactive visualizations
  • kaleido>=0.2.1 - Static image export for Plotly
  • scikit-learn>=1.3.0 - BallTree radius queries and DBSCAN clustering (used when available)
  • requests>=2.31.0 - API HTTP requests
  • python-dotenv>=1.0.0 - Environment variable management
  • python-docx>=1.1.2 - DOCX (Word) report generation

Optional Dependencies

For coordinate conversion functionality:

pip install -r utm_converter_requirements.txt

Configuration

The application supports two modes of operation:

1. Single Report Generation (Legacy Mode)

Uses src/config.py for configuration. See the legacy section below for details.

Uses wind_farms_config.json for multi-farm batch processing with API integration.

Setup

  1. Create .env file with your API key:
API_KEY=your_api_key_here
  1. Create wind_farms_config.json:
{
  "api_config": {
    "base_url": "https://risk.tarla.io/api",
    "timeout_seconds": 30,
    "retry_attempts": 3,
    "default_query_range": {
      "method": "current_month"
    }
  },
  "output_base_directory": "reports/",
  "default_padding_km": 5,
  "wind_farms": [
    {
      "farm_id": "dagpazari_RES",
      "name": "Dağpazarı RES",
      "enabled": true,
      "coordinates_file": "/path/to/coordinates.json",
      "distance_rings": [1000, 2000, 3000, 4000, 10000],
      "ring_colors": ["purple", "red", "orange", "coral", "green"],
      "api_params": {
        "location_bounds": {
          "method": "auto",
          "padding_km": 5
        },
        "date_range": {
          "method": "auto",
          "query_range": {
            "method": "current_month"
          }
        }
      },
      "report_config": {
        "output_directory": "reports/dagpazari_RES/",
        "wind_farm_name": "Dağpazarı RES"
      }
    }
  ]
}

Configuration Parameters

Farm-Level Settings:

  • enabled: true/false - Enable/disable report generation for this farm
  • distance_rings: Array of distance rings in meters (e.g., [1000, 2000, 3000, 4000, 10000])
  • ring_colors: Array of colors for each ring
  • coordinates_file: Path to turbine coordinates JSON file

Location Bounds:

  • method: "auto" (calculate from turbines) or "manual" (specify)
  • padding_km: Extra buffer beyond max distance ring (default: 5km)
  • For manual: provide center_lat, center_lng, radius_km

Date Range:

  • method: "auto" (detect from data) or "manual" (specify)
  • For manual: provide start_date and end_date in DD-MM-YYYY format
  • For auto: specify query_range to control API query period

Query Range Options (for auto mode):

  • "current_month": First day of current month to today
  • "last_month": Entire previous month
  • "days_back": Last N days (requires days parameter)
  • "custom": Specific dates (requires start_date and end_date)

Global Configuration (src/config.py)

The src/config.py file now only contains global defaults:

  • Risk calculation parameters (risk_params)
  • Histogram parameters (histogram_params)
  • PDF layout parameters (pdf_params)
  • Grouping parameters (grouping_params)

Note: Farm-specific settings (distance_rings, ring_colors, wind_farm_name, file paths, date ranges) are managed in wind_farms_config.json and should NOT be configured in config.py.

Location Bounds Auto-Calculation

When location_bounds.method = "auto", the system calculates:

  1. Centroid (Center Point):

    • center_lat = average of all turbine latitudes
    • center_lng = average of all turbine longitudes
  2. Maximum Distance from Centroid:

    • Calculates distance from centroid to each turbine
    • Finds the maximum distance
  3. Total Radius:

    radius_km = (max_turbine_distance / 1000) + 
                 (max_distance_ring / 1000) + 
                 padding_km
    

    Example: If turbines span 2.5km from centroid, max ring is 10km, padding is 5km:

    • Total radius = 2.5 + 10 + 5 = 17.5km

Date Range Handling

  • If date_range.method = "auto": Uses query_range to determine what dates to fetch; the report uses those query dates for the analyzed period.
  • If date_range.method = "manual": Uses specified start_date and end_date for both API fetch and report (supports DD-MM-YYYY or ISO with time, e.g. 2026-01-22T07:00:00Z).

Daily Lightning Density Calculation

The daily lightning density is calculated using the actual number of days in the analysis period:

daily_lightning_per_km2 = total_lightning_per_km2 / actual_days_in_range

Where actual_days_in_range is calculated from the start and end dates (inclusive).

Example:

  • Date range: September 1-15 (15 days)
  • Total lightning density: 150 events/km²
  • Daily lightning density: 150 / 15 = 10 events/km²/day

This ensures accurate daily averages for partial months or custom date ranges.

Risk Score Categories

The system uses fixed interval coloring based on specific risk score ranges:

  • Very Low Risk (<0.1): Blue - Distant lightning with low current
  • Low Risk (0.1-0.2): Teal - Moderate distance lightning
  • Med-Low Risk (0.2-0.4): Green - Closer lightning
  • Medium Risk (0.4-0.6): Yellow - Moderate risk lightning
  • Med-High Risk (0.6-0.8): Orange - High risk lightning
  • High Risk (0.8-1.0): Dark Orange - Very high risk lightning
  • Very High Risk (1.0-1.2): Red - Extreme risk lightning
  • Critical Risk (>1.2): Dark Red - Critical risk lightning

Grouping vs Analysis Radius

  • grouping_params.max_distance_m (meters): Controls ONLY turbine clustering (grouping). If set (>0), it overrides ring-based grouping. Used to decide which turbines are in the same group.
  • grouping_params.distance_ring_index (0-based): Selects a ring from distance_rings.
    • For grouping: used only if max_distance_m is not set; determines grouping radius.
    • For analysis (histogram, stats, report labels): ALWAYS used to choose the analysis radius/cutoff. Does not change grouping when max_distance_m is provided.

Examples

  • If max_distance_m=2500 and distance_ring_index=4 (10 km ring):
    • Grouping radius = 2.5 km (from max_distance_m)
    • Analysis radius = 10 km (from distance_ring_index)
  • If max_distance_m unset and distance_ring_index=1 (2 km ring):
    • Grouping radius = 2 km
    • Analysis radius = 2 km

Clustering Algorithm

  • Preferred: DBSCAN with Haversine metric
    • Convert lat/lng to radians; eps = (radius_km / 6371), min_samples=1
    • Clusters are formed transitively (density reachability). Example with R=2 km: AB=1.5 km, BC=1.5 km, AC=3.0 km → one cluster {A,B,C} due to B bridging A and C
  • Fallback: Greedy O(N^2) proximity grouping if scikit-learn is unavailable
    • Starts a group at turbine i; adds any j within R of i; moves on. No transitive chaining

Wind Farm Configuration

wind_farm_name = "Your Wind Farm Name"

Usage

Generate reports for multiple wind farms automatically:

# Process all enabled farms
python batch_generate.py --config wind_farms_config.json

# Process specific farm
python batch_generate.py --config wind_farms_config.json --farm-id dagpazari_RES

# List farms and their enabled status
python batch_generate.py --config wind_farms_config.json --list-farms

# Process all farms (ignore enabled flag)
python batch_generate.py --config wind_farms_config.json --force-all

The batch system will:

  1. Load configuration from wind_farms_config.json
  2. For each enabled farm:
    • Load turbine coordinates
    • Auto-calculate location bounds (center + radius) from turbines
    • Determine date range for API query
    • Fetch lightning data from API
    • Fetch storm data from API
    • Calculate risk scores
    • Generate DOCX report
    • Save to farm's output directory
  3. Generate batch summary report

Single Report Generation (Legacy)

Run the main application for a single report:

python main.py

The application will:

  1. Load lightning and turbine data from configured JSON files (in src/config.py)
  2. Calculate risk scores for each turbine using the advanced risk formula
  3. Create turbine groups based on proximity
  4. Generate visualizations including the new risk score heatmap
  5. Create a comprehensive DOCX report with enhanced appendix

Data Format Requirements

Lightning Data JSON

{
  "data": [
    {
      "lat": 39.85420,
      "lng": 26.71218,
      "local_time": "2025-07-15T14:30:25",
      "current": -15000,
      "p_type": "0",
      "height": 5000
    }
  ]
}

Required Fields:

  • lat, lng: Lightning strike coordinates
  • local_time: Timestamp (various formats supported)
  • current: Lightning current in amperes
  • p_type: Lightning type ("0" for cloud-to-ground, others for intercloud)

Turbine Data JSON

[
  {
    "lat": 39.85420,
    "lng": 26.71218,
    "turbine_id": "T001"
  }
]

Required Fields:

  • lat, lng: Turbine coordinates
  • turbine_id: Unique turbine identifier

Advanced Usage

Coordinate Conversion

Convert UTM ED50 coordinates to WGS84:

python utm_ed50_to_wgs84_converter.py input.csv output.csv

Data Separation by Month

Separate large JSON files by month:

python separate_by_month.py input_data.json [output_directory]

Output

DOCX Report Structure

  1. Cover Page: Wind farm information and analysis period
  2. Report Summary: Automated narrative summary (Gemini-backed when available)
  3. Risk Analysis: Detailed risk scores and rankings with fixed interval coloring
  4. Lightning Maps: Coordinate plane visualizations with proper geographic orientation
  5. Statistical Analysis: Lightning density and frequency data
  6. Detailed Tables: Complete lightning strike information with color-coded distance rings
  7. Storm Analysis: Storm cell data and maps (if available)
  8. Enhanced Appendix: Comprehensive methodology including:
    • Risk calculation method and formula explanation
    • Risk score interpretation guide
    • Centroid and distance ring calculation methodology
    • Turbine grouping algorithm description
    • Frequent lightning activity period detection algorithm

Generated Files

Single Report Mode:

  • lightning_report.log: Application execution log
  • {wind_farm_name}_lightning_report.docx: Main DOCX report
  • Interactive HTML maps (temporary files)

Batch Generation Mode:

  • batch_generation_YYYY-MM-DD.log: Batch execution log
  • batch_summary_YYYY-MM-DD.json: Batch processing summary
  • {farm_id}_report.docx: DOCX report for each farm (in respective output directories)

Project Structure

lightning_report/
├── main.py                          # Single report generation (legacy)
├── batch_generate.py                # Batch report generation with API
├── wind_farms_config.json          # Batch configuration file
├── .env                             # API credentials (gitignored)
├── requirements.txt                 # Python dependencies
├── src/
│   ├── config.py                   # Global configuration defaults
│   ├── api/
│   │   └── data_fetcher.py        # API integration for data fetching
│   ├── data/
│   │   └── loader.py              # Data loading and validation
│   ├── analysis/
│   │   ├── geospatial.py          # Distance calculations (vectorized Haversine)
│   │   ├── grouping.py            # Turbine grouping (DBSCAN + fallback)
│   │   ├── histogram.py           # Temporal analysis
│   │   ├── risk.py                # Risk calculation (BallTree + fallback)
│   │   └── statistics.py          # Statistical analysis (includes daily density)
│   ├── reporting/
│   │   ├── docx.py                # DOCX report generation
│   │   ├── docx_sections.py       # Shared DOCX helpers (charts/tables)
│   │   └── precompute.py          # Shared precomputations (distances, ring indices)
│   ├── visualization/
│   │   ├── maps.py                # Map generation with risk score heatmap
│   │   └── storm_cells.py         # Storm cell visualization
│   └── utils.py                   # Utility functions including fixed interval coloring
├── separate_by_month.py           # Data separation utility
└── utm_ed50_to_wgs84_converter.py # Coordinate conversion

Configuration Examples

Batch Generation Setup

Example: Multiple Farms with Different Settings

{
  "api_config": {
    "base_url": "https://risk.tarla.io/api",
    "timeout_seconds": 30,
    "retry_attempts": 3
  },
  "wind_farms": [
    {
      "farm_id": "farm1",
      "name": "Farm 1",
      "enabled": true,
      "coordinates_file": "/path/to/farm1_coordinates.json",
      "distance_rings": [1000, 2000, 3000, 4000, 10000],
      "api_params": {
        "location_bounds": {
          "method": "auto",
          "padding_km": 5
        },
        "date_range": {
          "method": "manual",
          "start_date": "01-09-2025",
          "end_date": "30-09-2025"
        }
      },
      "report_config": {
        "output_directory": "reports/farm1/",
        "wind_farm_name": "Farm 1"
      }
    },
    {
      "farm_id": "farm2",
      "name": "Farm 2",
      "enabled": false,
      "coordinates_file": "/path/to/farm2_coordinates.json",
      "distance_rings": [1000, 2000, 3000, 4000, 10000],
      "api_params": {
        "location_bounds": {
          "method": "manual",
          "center_lat": 36.90,
          "center_lng": 33.575,
          "radius_km": 35
        },
        "date_range": {
          "method": "auto",
          "query_range": {
            "method": "days_back",
            "days": 30
          }
        }
      },
      "report_config": {
        "output_directory": "reports/farm2/",
        "wind_farm_name": "Farm 2"
      }
    }
  ]
}

Custom Risk Parameters

# Adjust risk calculation sensitivity in src/config.py
risk_params = {
    'P_0': 1.5,           # Higher base probability
    'alpha': 0.3,         # Slower distance decay
    'current_weight': 0.2  # Higher current importance
}

Note: Farm-specific settings (distance_rings, ring_colors, etc.) should be configured in wind_farms_config.json, not in config.py.

Risk Score Methodology

Risk Calculation Formula

The system uses an advanced risk calculation formula:

Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance)

Where:

  • P₀: Base probability (configurable)
  • α: Distance decay factor (configurable)
  • Current: Lightning current magnitude in amperes
  • Distance: Distance from turbine in kilometers

Risk Score Interpretation

The risk score heatmap provides a visual reference for interpreting risk levels:

  • X-axis: Lightning current magnitude (1,000 to 300,000 amperes)
  • Y-axis: Distance from turbine (0.1 km to max distance ring, dynamically scaled)
  • Color intensity: Risk score level (blue to red gradient using palette: F94144, F3722C, F8961E, F9C74F, 90BE6D, 43AA8B, 577590)
  • Contour curves: Specific risk level boundaries (0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.5)

API Integration

The system integrates with the Tarla.io API for automated data fetching:

Endpoints:

  • Lightning data: https://risk.tarla.io/api/lightning-data/historical/
  • Storm data: https://risk.tarla.io/api/storm-data/historical/

Authentication:

  • API key stored in .env file as API_KEY
  • Sent as x-api-key header in requests

Request Format:

  • Query type: circle (center + radius)
  • Parameters: centerLatitude, centerLongitude, radius (in meters), startDate, endDate
  • Date format: YYYY-MM-DD

Response Handling:

  • Automatically converts API responses to expected DataFrame format
  • Handles empty datasets gracefully
  • Validates data structure before processing

Troubleshooting

Common Issues

  1. API Authentication Errors (401 Unauthorized)

    • Verify .env file exists with API_KEY=your_key
    • Check that API key is correct and active
    • Ensure API key contains special characters correctly (e.g., == at the end)
  2. API Timeout Errors

    • Increase timeout_seconds in api_config
    • Check network connectivity
    • Verify API endpoint is accessible
  3. File Not Found Errors

    • For batch mode: Verify file paths in wind_farms_config.json
    • For single mode: Verify file paths in src/config.py
    • Ensure JSON files exist and are readable
  4. Data Validation Errors

    • Check JSON format matches required structure
    • Verify coordinate values are valid numbers
    • Ensure timestamp format is supported
    • For API data: Check API response format matches expected structure
  5. Empty Data / NaT Errors

    • System handles empty datasets gracefully
    • Check API date range - data might not exist for specified period
    • Verify location bounds cover the area of interest
    • Check logs for API response details
  6. Memory Issues with Large Datasets

    • Use separate_by_month.py to split large files
    • Adjust analysis period to smaller time ranges
    • Process farms individually using --farm-id flag
  7. DOCX Generation Errors

    • Ensure sufficient disk space
    • Check write permissions for output directory
  8. Risk Score Heatmap Issues

    • Verify distance_rings configuration is valid
    • Check that lightning data contains valid current values
    • Ensure turbine coordinates are properly formatted
  9. Batch Generation Issues

    • Check batch_summary_YYYY-MM-DD.json for detailed error information
    • Verify all farms have valid configuration
    • Check batch_generation_YYYY-MM-DD.log for detailed logs
    • Use --list-farms to verify farm configuration

Logging

Single Report Mode:

  • lightning_report.log: Application execution log

Batch Generation Mode:

  • batch_generation_YYYY-MM-DD.log: Batch execution log with per-farm details
  • batch_summary_YYYY-MM-DD.json: Structured summary of batch processing

Logs include:

  • Data loading progress
  • API request/response details
  • Risk calculation details
  • Error messages and stack traces
  • Performance metrics
  • Farm processing status

Performance Considerations

  • Large Datasets: For datasets with >100,000 lightning strikes, consider:

    • Using date range filtering
    • Splitting data by month
    • Increasing system memory allocation
  • Optimizations used:

    • BallTree neighbor queries for CG risk scoring (O(n log n) build; sublinear queries)
    • DBSCAN clustering with Haversine metric for grouping; O(N^2) fallback maintained
    • Vectorized Haversine distance utilities (array-based)
    • Shared per-group precomputation of distances and ring indices reused by maps and tables
    • Centralized date/time parsing and formatting
    • Efficient risk score heatmap generation with contour overlay

Contributing

  1. Follow the existing code structure and naming conventions
  2. Add appropriate error handling and logging
  3. Update configuration options as needed
  4. Test with various data formats and sizes
  5. Update documentation for new features
  6. Maintain consistency with the fixed interval coloring system

License

This project is proprietary software. All rights reserved.

Support

For technical support or feature requests, please contact the development team with:

  • Detailed error messages
  • Sample data (if possible)
  • System configuration details
  • Expected vs actual behavior description