BE-LightningReport/README.md
erdemerikci 45d80dfaa6 Initial import: Lightning_Report with n8n integration
Fork of Lightning_Report adding:
- n8n_report_branch.json: workflow branch for storm-triggered report delivery
- report_service/: FastAPI microservice wrapping create_docx_report() so n8n
  can produce byte-identical reports without fighting the Python Code sandbox

Made-with: Cursor
2026-04-22 15:13:08 +03:00

647 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Lightning Report Generator
A comprehensive Python application for analyzing lightning strike data in relation to wind turbine locations and generating detailed DOCX reports with risk assessments, visualizations, and statistical analysis.
## Overview
This application processes lightning strike data and wind turbine coordinates to:
- Calculate lightning risk scores for each turbine using advanced mathematical models
- Generate interactive maps showing lightning strikes and turbine locations
- Create statistical analysis and histograms with temporal distribution
- Group turbines based on proximity and risk levels
- Generate comprehensive DOCX reports with visualizations and risk assessment charts
- Support storm cell analysis and mapping
- Provide detailed risk score interpretation and calculation methodology
## Features
### Core Analysis
- **Risk Assessment**: Fast per-turbine scoring using BallTree radius queries (Haversine metric) with automatic fallback to vectorized matrix math
- **Advanced Risk Formula**: `Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance)` with configurable parameters
- **Geospatial Analysis**: Vectorized Haversine utilities and configurable distance rings
- **Statistical Analysis**: Lightning density, frequency, and temporal distribution analysis
- **Daily Lightning Density**: Calculates daily average using actual number of days in date range (not fixed month)
- **Turbine Grouping**: Proximity-based clustering using DBSCAN (Haversine) with graceful fallback to O(N^2) grouping for small datasets
### API Integration
- **Automated Data Fetching**: Fetch lightning and storm data directly from API
- **Flexible Location Bounds**: Auto-calculate center + radius from turbines or specify manually
- **Date Range Management**: Auto-detect actual period from data or use manual date ranges
- **Batch Processing**: Process multiple wind farms in a single run
- **Error Handling**: Graceful handling of empty data, API timeouts, and failures
### Visualization
- **Interactive Maps**: Plotly-based coordinate-plane maps for CG/IC lightning with ring-aware coloring
- **Risk Score Heatmap**: 2D visualization with current magnitude on X-axis (up to 300k amps) and distance on Y-axis, with contour curves
- **Fixed Interval Coloring**: Consistent color gradient mapping (blue to red) based on predefined risk score ranges (0.1-1.5)
- **Lightning Histograms**: Temporal distribution of lightning events with peak detection
- **Storm Cell Maps**: Visualization of storm cell data (when available)
- **Coordinate Plane Views**: Standard geographic orientation (latitude on Y-axis, longitude on X-axis)
### Reporting
- **DOCX Generation**: Word reports (DOCX)
- **Risk Score Chart**: Integrated heatmap showing distance vs. current magnitude relationship
- **Multiple Map Types**: Coordinate plane maps for different lightning types
- **Statistical Tables**: Detailed lightning strike information with proximity data (precomputed distances)
- **Risk Summaries**: Grouped risk analysis and recommendations with fixed interval color coding
- **Enhanced Appendix**: Detailed methodology explanations including risk calculation method, interpretation guide, and algorithm descriptions
### Data Processing
- **JSON Data Loading**: Support for various JSON data structures
- **Date Range Filtering**: Configurable analysis periods
- **Date/Time Formatting**: Centralized, consistent DD-MM-YYYY and DD-MM-YYYY HH:MM:SS formatting
- **Data Validation**: Comprehensive input validation and error handling
- **Precomputation**: Shared per-group distance and ring-index precompute reused by maps and tables
- **Coordinate Conversion**: UTM ED50 to WGS84 coordinate system conversion
## Installation
### Prerequisites
- Python 3.8 or higher
- pip package manager
### Dependencies
Install the required packages:
```bash
pip install -r requirements.txt
```
### Required Packages
- `pandas>=1.5.0` - Data manipulation and analysis
- `numpy>=1.21.0` - Numerical computations
- `plotly>=5.15.0` - Interactive visualizations
- `kaleido>=0.2.1` - Static image export for Plotly
- `scikit-learn>=1.3.0` - BallTree radius queries and DBSCAN clustering (used when available)
- `requests>=2.31.0` - API HTTP requests
- `python-dotenv>=1.0.0` - Environment variable management
- `python-docx>=1.1.2` - DOCX (Word) report generation
### Optional Dependencies
For coordinate conversion functionality:
```bash
pip install -r utm_converter_requirements.txt
```
## Configuration
The application supports two modes of operation:
### 1. Single Report Generation (Legacy Mode)
Uses `src/config.py` for configuration. See the legacy section below for details.
### 2. Batch Report Generation (Recommended)
Uses `wind_farms_config.json` for multi-farm batch processing with API integration.
#### Setup
1. **Create `.env` file** with your API key:
```env
API_KEY=your_api_key_here
```
2. **Create `wind_farms_config.json`**:
```json
{
"api_config": {
"base_url": "https://risk.tarla.io/api",
"timeout_seconds": 30,
"retry_attempts": 3,
"default_query_range": {
"method": "current_month"
}
},
"output_base_directory": "reports/",
"default_padding_km": 5,
"wind_farms": [
{
"farm_id": "dagpazari_RES",
"name": "Dağpazarı RES",
"enabled": true,
"coordinates_file": "/path/to/coordinates.json",
"distance_rings": [1000, 2000, 3000, 4000, 10000],
"ring_colors": ["purple", "red", "orange", "coral", "green"],
"api_params": {
"location_bounds": {
"method": "auto",
"padding_km": 5
},
"date_range": {
"method": "auto",
"query_range": {
"method": "current_month"
}
}
},
"report_config": {
"output_directory": "reports/dagpazari_RES/",
"wind_farm_name": "Dağpazarı RES"
}
}
]
}
```
#### Configuration Parameters
**Farm-Level Settings:**
- `enabled`: `true`/`false` - Enable/disable report generation for this farm
- `distance_rings`: Array of distance rings in meters (e.g., `[1000, 2000, 3000, 4000, 10000]`)
- `ring_colors`: Array of colors for each ring
- `coordinates_file`: Path to turbine coordinates JSON file
**Location Bounds:**
- `method`: `"auto"` (calculate from turbines) or `"manual"` (specify)
- `padding_km`: Extra buffer beyond max distance ring (default: 5km)
- For manual: provide `center_lat`, `center_lng`, `radius_km`
**Date Range:**
- `method`: `"auto"` (detect from data) or `"manual"` (specify)
- For manual: provide `start_date` and `end_date` in `DD-MM-YYYY` format
- For auto: specify `query_range` to control API query period
**Query Range Options (for auto mode):**
- `"current_month"`: First day of current month to today
- `"last_month"`: Entire previous month
- `"days_back"`: Last N days (requires `days` parameter)
- `"custom"`: Specific dates (requires `start_date` and `end_date`)
#### Global Configuration (src/config.py)
The `src/config.py` file now only contains global defaults:
- Risk calculation parameters (`risk_params`)
- Histogram parameters (`histogram_params`)
- PDF layout parameters (`pdf_params`)
- Grouping parameters (`grouping_params`)
**Note:** Farm-specific settings (distance_rings, ring_colors, wind_farm_name, file paths, date ranges) are managed in `wind_farms_config.json` and should NOT be configured in `config.py`.
### Location Bounds Auto-Calculation
When `location_bounds.method = "auto"`, the system calculates:
1. **Centroid (Center Point)**:
- `center_lat` = average of all turbine latitudes
- `center_lng` = average of all turbine longitudes
2. **Maximum Distance from Centroid**:
- Calculates distance from centroid to each turbine
- Finds the maximum distance
3. **Total Radius**:
```
radius_km = (max_turbine_distance / 1000) +
(max_distance_ring / 1000) +
padding_km
```
Example: If turbines span 2.5km from centroid, max ring is 10km, padding is 5km:
- Total radius = 2.5 + 10 + 5 = 17.5km
### Date Range Handling
- If `date_range.method = "auto"`: Uses `query_range` to determine what dates to fetch; the report uses those query dates for the analyzed period.
- If `date_range.method = "manual"`: Uses specified `start_date` and `end_date` for both API fetch and report (supports `DD-MM-YYYY` or ISO with time, e.g. `2026-01-22T07:00:00Z`).
### Daily Lightning Density Calculation
The daily lightning density is calculated using the **actual number of days** in the analysis period:
```
daily_lightning_per_km2 = total_lightning_per_km2 / actual_days_in_range
```
Where `actual_days_in_range` is calculated from the start and end dates (inclusive).
**Example:**
- Date range: September 1-15 (15 days)
- Total lightning density: 150 events/km²
- Daily lightning density: 150 / 15 = 10 events/km²/day
This ensures accurate daily averages for partial months or custom date ranges.
### Risk Score Categories
The system uses fixed interval coloring based on specific risk score ranges:
- **Very Low Risk (<0.1)**: Blue - Distant lightning with low current
- **Low Risk (0.1-0.2)**: Teal - Moderate distance lightning
- **Med-Low Risk (0.2-0.4)**: Green - Closer lightning
- **Medium Risk (0.4-0.6)**: Yellow - Moderate risk lightning
- **Med-High Risk (0.6-0.8)**: Orange - High risk lightning
- **High Risk (0.8-1.0)**: Dark Orange - Very high risk lightning
- **Very High Risk (1.0-1.2)**: Red - Extreme risk lightning
- **Critical Risk (>1.2)**: Dark Red - Critical risk lightning
### Grouping vs Analysis Radius
- **grouping_params.max_distance_m (meters)**: Controls ONLY turbine clustering (grouping). If set (>0), it overrides ring-based grouping. Used to decide which turbines are in the same group.
- **grouping_params.distance_ring_index (0-based)**: Selects a ring from `distance_rings`.
- For grouping: used only if `max_distance_m` is not set; determines grouping radius.
- For analysis (histogram, stats, report labels): ALWAYS used to choose the analysis radius/cutoff. Does not change grouping when `max_distance_m` is provided.
Examples
- If `max_distance_m=2500` and `distance_ring_index=4` (10 km ring):
- Grouping radius = 2.5 km (from max_distance_m)
- Analysis radius = 10 km (from distance_ring_index)
- If `max_distance_m` unset and `distance_ring_index=1` (2 km ring):
- Grouping radius = 2 km
- Analysis radius = 2 km
Clustering Algorithm
- Preferred: DBSCAN with Haversine metric
- Convert lat/lng to radians; `eps = (radius_km / 6371)`, `min_samples=1`
- Clusters are formed transitively (density reachability). Example with R=2 km: AB=1.5 km, BC=1.5 km, AC=3.0 km → one cluster {A,B,C} due to B bridging A and C
- Fallback: Greedy O(N^2) proximity grouping if scikit-learn is unavailable
- Starts a group at turbine i; adds any j within R of i; moves on. No transitive chaining
### Wind Farm Configuration
```python
wind_farm_name = "Your Wind Farm Name"
```
## Usage
### Batch Report Generation (Recommended)
Generate reports for multiple wind farms automatically:
```bash
# Process all enabled farms
python batch_generate.py --config wind_farms_config.json
# Process specific farm
python batch_generate.py --config wind_farms_config.json --farm-id dagpazari_RES
# List farms and their enabled status
python batch_generate.py --config wind_farms_config.json --list-farms
# Process all farms (ignore enabled flag)
python batch_generate.py --config wind_farms_config.json --force-all
```
The batch system will:
1. Load configuration from `wind_farms_config.json`
2. For each enabled farm:
- Load turbine coordinates
- Auto-calculate location bounds (center + radius) from turbines
- Determine date range for API query
- Fetch lightning data from API
- Fetch storm data from API
- Calculate risk scores
- Generate DOCX report
- Save to farm's output directory
3. Generate batch summary report
### Single Report Generation (Legacy)
Run the main application for a single report:
```bash
python main.py
```
The application will:
1. Load lightning and turbine data from configured JSON files (in `src/config.py`)
2. Calculate risk scores for each turbine using the advanced risk formula
3. Create turbine groups based on proximity
4. Generate visualizations including the new risk score heatmap
5. Create a comprehensive DOCX report with enhanced appendix
### Data Format Requirements
#### Lightning Data JSON
```json
{
"data": [
{
"lat": 39.85420,
"lng": 26.71218,
"local_time": "2025-07-15T14:30:25",
"current": -15000,
"p_type": "0",
"height": 5000
}
]
}
```
**Required Fields:**
- `lat`, `lng`: Lightning strike coordinates
- `local_time`: Timestamp (various formats supported)
- `current`: Lightning current in amperes
- `p_type`: Lightning type ("0" for cloud-to-ground, others for intercloud)
#### Turbine Data JSON
```json
[
{
"lat": 39.85420,
"lng": 26.71218,
"turbine_id": "T001"
}
]
```
**Required Fields:**
- `lat`, `lng`: Turbine coordinates
- `turbine_id`: Unique turbine identifier
### Advanced Usage
#### Coordinate Conversion
Convert UTM ED50 coordinates to WGS84:
```bash
python utm_ed50_to_wgs84_converter.py input.csv output.csv
```
#### Data Separation by Month
Separate large JSON files by month:
```bash
python separate_by_month.py input_data.json [output_directory]
```
## Output
### DOCX Report Structure
1. **Cover Page**: Wind farm information and analysis period
2. **Report Summary**: Automated narrative summary (Gemini-backed when available)
3. **Risk Analysis**: Detailed risk scores and rankings with fixed interval coloring
4. **Lightning Maps**: Coordinate plane visualizations with proper geographic orientation
5. **Statistical Analysis**: Lightning density and frequency data
6. **Detailed Tables**: Complete lightning strike information with color-coded distance rings
7. **Storm Analysis**: Storm cell data and maps (if available)
8. **Enhanced Appendix**: Comprehensive methodology including:
- Risk calculation method and formula explanation
- Risk score interpretation guide
- Centroid and distance ring calculation methodology
- Turbine grouping algorithm description
- Frequent lightning activity period detection algorithm
### Generated Files
**Single Report Mode:**
- `lightning_report.log`: Application execution log
- `{wind_farm_name}_lightning_report.docx`: Main DOCX report
- Interactive HTML maps (temporary files)
**Batch Generation Mode:**
- `batch_generation_YYYY-MM-DD.log`: Batch execution log
- `batch_summary_YYYY-MM-DD.json`: Batch processing summary
- `{farm_id}_report.docx`: DOCX report for each farm (in respective output directories)
## Project Structure
```
lightning_report/
├── main.py # Single report generation (legacy)
├── batch_generate.py # Batch report generation with API
├── wind_farms_config.json # Batch configuration file
├── .env # API credentials (gitignored)
├── requirements.txt # Python dependencies
├── src/
│ ├── config.py # Global configuration defaults
│ ├── api/
│ │ └── data_fetcher.py # API integration for data fetching
│ ├── data/
│ │ └── loader.py # Data loading and validation
│ ├── analysis/
│ │ ├── geospatial.py # Distance calculations (vectorized Haversine)
│ │ ├── grouping.py # Turbine grouping (DBSCAN + fallback)
│ │ ├── histogram.py # Temporal analysis
│ │ ├── risk.py # Risk calculation (BallTree + fallback)
│ │ └── statistics.py # Statistical analysis (includes daily density)
│ ├── reporting/
│ │ ├── docx.py # DOCX report generation
│ │ ├── docx_sections.py # Shared DOCX helpers (charts/tables)
│ │ └── precompute.py # Shared precomputations (distances, ring indices)
│ ├── visualization/
│ │ ├── maps.py # Map generation with risk score heatmap
│ │ └── storm_cells.py # Storm cell visualization
│ └── utils.py # Utility functions including fixed interval coloring
├── separate_by_month.py # Data separation utility
└── utm_ed50_to_wgs84_converter.py # Coordinate conversion
```
## Configuration Examples
### Batch Generation Setup
**Example: Multiple Farms with Different Settings**
```json
{
"api_config": {
"base_url": "https://risk.tarla.io/api",
"timeout_seconds": 30,
"retry_attempts": 3
},
"wind_farms": [
{
"farm_id": "farm1",
"name": "Farm 1",
"enabled": true,
"coordinates_file": "/path/to/farm1_coordinates.json",
"distance_rings": [1000, 2000, 3000, 4000, 10000],
"api_params": {
"location_bounds": {
"method": "auto",
"padding_km": 5
},
"date_range": {
"method": "manual",
"start_date": "01-09-2025",
"end_date": "30-09-2025"
}
},
"report_config": {
"output_directory": "reports/farm1/",
"wind_farm_name": "Farm 1"
}
},
{
"farm_id": "farm2",
"name": "Farm 2",
"enabled": false,
"coordinates_file": "/path/to/farm2_coordinates.json",
"distance_rings": [1000, 2000, 3000, 4000, 10000],
"api_params": {
"location_bounds": {
"method": "manual",
"center_lat": 36.90,
"center_lng": 33.575,
"radius_km": 35
},
"date_range": {
"method": "auto",
"query_range": {
"method": "days_back",
"days": 30
}
}
},
"report_config": {
"output_directory": "reports/farm2/",
"wind_farm_name": "Farm 2"
}
}
]
}
```
### Custom Risk Parameters
```python
# Adjust risk calculation sensitivity in src/config.py
risk_params = {
'P_0': 1.5, # Higher base probability
'alpha': 0.3, # Slower distance decay
'current_weight': 0.2 # Higher current importance
}
```
**Note:** Farm-specific settings (distance_rings, ring_colors, etc.) should be configured in `wind_farms_config.json`, not in `config.py`.
## Risk Score Methodology
### Risk Calculation Formula
The system uses an advanced risk calculation formula:
```
Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance)
```
Where:
- **P₀**: Base probability (configurable)
- **α**: Distance decay factor (configurable)
- **Current**: Lightning current magnitude in amperes
- **Distance**: Distance from turbine in kilometers
### Risk Score Interpretation
The risk score heatmap provides a visual reference for interpreting risk levels:
- **X-axis**: Lightning current magnitude (1,000 to 300,000 amperes)
- **Y-axis**: Distance from turbine (0.1 km to max distance ring, dynamically scaled)
- **Color intensity**: Risk score level (blue to red gradient using palette: F94144, F3722C, F8961E, F9C74F, 90BE6D, 43AA8B, 577590)
- **Contour curves**: Specific risk level boundaries (0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.5)
### API Integration
The system integrates with the Tarla.io API for automated data fetching:
**Endpoints:**
- Lightning data: `https://risk.tarla.io/api/lightning-data/historical/`
- Storm data: `https://risk.tarla.io/api/storm-data/historical/`
**Authentication:**
- API key stored in `.env` file as `API_KEY`
- Sent as `x-api-key` header in requests
**Request Format:**
- Query type: `circle` (center + radius)
- Parameters: `centerLatitude`, `centerLongitude`, `radius` (in meters), `startDate`, `endDate`
- Date format: `YYYY-MM-DD`
**Response Handling:**
- Automatically converts API responses to expected DataFrame format
- Handles empty datasets gracefully
- Validates data structure before processing
## Troubleshooting
### Common Issues
1. **API Authentication Errors (401 Unauthorized)**
- Verify `.env` file exists with `API_KEY=your_key`
- Check that API key is correct and active
- Ensure API key contains special characters correctly (e.g., `==` at the end)
2. **API Timeout Errors**
- Increase `timeout_seconds` in `api_config`
- Check network connectivity
- Verify API endpoint is accessible
3. **File Not Found Errors**
- For batch mode: Verify file paths in `wind_farms_config.json`
- For single mode: Verify file paths in `src/config.py`
- Ensure JSON files exist and are readable
4. **Data Validation Errors**
- Check JSON format matches required structure
- Verify coordinate values are valid numbers
- Ensure timestamp format is supported
- For API data: Check API response format matches expected structure
5. **Empty Data / NaT Errors**
- System handles empty datasets gracefully
- Check API date range - data might not exist for specified period
- Verify location bounds cover the area of interest
- Check logs for API response details
6. **Memory Issues with Large Datasets**
- Use `separate_by_month.py` to split large files
- Adjust analysis period to smaller time ranges
- Process farms individually using `--farm-id` flag
7. **DOCX Generation Errors**
- Ensure sufficient disk space
- Check write permissions for output directory
8. **Risk Score Heatmap Issues**
- Verify distance_rings configuration is valid
- Check that lightning data contains valid current values
- Ensure turbine coordinates are properly formatted
9. **Batch Generation Issues**
- Check `batch_summary_YYYY-MM-DD.json` for detailed error information
- Verify all farms have valid configuration
- Check `batch_generation_YYYY-MM-DD.log` for detailed logs
- Use `--list-farms` to verify farm configuration
### Logging
**Single Report Mode:**
- `lightning_report.log`: Application execution log
**Batch Generation Mode:**
- `batch_generation_YYYY-MM-DD.log`: Batch execution log with per-farm details
- `batch_summary_YYYY-MM-DD.json`: Structured summary of batch processing
Logs include:
- Data loading progress
- API request/response details
- Risk calculation details
- Error messages and stack traces
- Performance metrics
- Farm processing status
## Performance Considerations
- **Large Datasets**: For datasets with >100,000 lightning strikes, consider:
- Using date range filtering
- Splitting data by month
- Increasing system memory allocation
- **Optimizations used**:
- BallTree neighbor queries for CG risk scoring (O(n log n) build; sublinear queries)
- DBSCAN clustering with Haversine metric for grouping; O(N^2) fallback maintained
- Vectorized Haversine distance utilities (array-based)
- Shared per-group precomputation of distances and ring indices reused by maps and tables
- Centralized date/time parsing and formatting
- Efficient risk score heatmap generation with contour overlay
## Contributing
1. Follow the existing code structure and naming conventions
2. Add appropriate error handling and logging
3. Update configuration options as needed
4. Test with various data formats and sizes
5. Update documentation for new features
6. Maintain consistency with the fixed interval coloring system
## License
This project is proprietary software. All rights reserved.
## Support
For technical support or feature requests, please contact the development team with:
- Detailed error messages
- Sample data (if possible)
- System configuration details
- Expected vs actual behavior description