Fork of Lightning_Report adding: - n8n_report_branch.json: workflow branch for storm-triggered report delivery - report_service/: FastAPI microservice wrapping create_docx_report() so n8n can produce byte-identical reports without fighting the Python Code sandbox Made-with: Cursor
647 lines
23 KiB
Markdown
647 lines
23 KiB
Markdown
# Lightning Report Generator
|
||
|
||
A comprehensive Python application for analyzing lightning strike data in relation to wind turbine locations and generating detailed DOCX reports with risk assessments, visualizations, and statistical analysis.
|
||
|
||
## Overview
|
||
|
||
This application processes lightning strike data and wind turbine coordinates to:
|
||
- Calculate lightning risk scores for each turbine using advanced mathematical models
|
||
- Generate interactive maps showing lightning strikes and turbine locations
|
||
- Create statistical analysis and histograms with temporal distribution
|
||
- Group turbines based on proximity and risk levels
|
||
- Generate comprehensive DOCX reports with visualizations and risk assessment charts
|
||
- Support storm cell analysis and mapping
|
||
- Provide detailed risk score interpretation and calculation methodology
|
||
|
||
## Features
|
||
|
||
### Core Analysis
|
||
- **Risk Assessment**: Fast per-turbine scoring using BallTree radius queries (Haversine metric) with automatic fallback to vectorized matrix math
|
||
- **Advanced Risk Formula**: `Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance)` with configurable parameters
|
||
- **Geospatial Analysis**: Vectorized Haversine utilities and configurable distance rings
|
||
- **Statistical Analysis**: Lightning density, frequency, and temporal distribution analysis
|
||
- **Daily Lightning Density**: Calculates daily average using actual number of days in date range (not fixed month)
|
||
- **Turbine Grouping**: Proximity-based clustering using DBSCAN (Haversine) with graceful fallback to O(N^2) grouping for small datasets
|
||
|
||
### API Integration
|
||
- **Automated Data Fetching**: Fetch lightning and storm data directly from API
|
||
- **Flexible Location Bounds**: Auto-calculate center + radius from turbines or specify manually
|
||
- **Date Range Management**: Auto-detect actual period from data or use manual date ranges
|
||
- **Batch Processing**: Process multiple wind farms in a single run
|
||
- **Error Handling**: Graceful handling of empty data, API timeouts, and failures
|
||
|
||
### Visualization
|
||
- **Interactive Maps**: Plotly-based coordinate-plane maps for CG/IC lightning with ring-aware coloring
|
||
- **Risk Score Heatmap**: 2D visualization with current magnitude on X-axis (up to 300k amps) and distance on Y-axis, with contour curves
|
||
- **Fixed Interval Coloring**: Consistent color gradient mapping (blue to red) based on predefined risk score ranges (0.1-1.5)
|
||
- **Lightning Histograms**: Temporal distribution of lightning events with peak detection
|
||
- **Storm Cell Maps**: Visualization of storm cell data (when available)
|
||
- **Coordinate Plane Views**: Standard geographic orientation (latitude on Y-axis, longitude on X-axis)
|
||
|
||
### Reporting
|
||
- **DOCX Generation**: Word reports (DOCX)
|
||
- **Risk Score Chart**: Integrated heatmap showing distance vs. current magnitude relationship
|
||
- **Multiple Map Types**: Coordinate plane maps for different lightning types
|
||
- **Statistical Tables**: Detailed lightning strike information with proximity data (precomputed distances)
|
||
- **Risk Summaries**: Grouped risk analysis and recommendations with fixed interval color coding
|
||
- **Enhanced Appendix**: Detailed methodology explanations including risk calculation method, interpretation guide, and algorithm descriptions
|
||
|
||
### Data Processing
|
||
- **JSON Data Loading**: Support for various JSON data structures
|
||
- **Date Range Filtering**: Configurable analysis periods
|
||
- **Date/Time Formatting**: Centralized, consistent DD-MM-YYYY and DD-MM-YYYY HH:MM:SS formatting
|
||
- **Data Validation**: Comprehensive input validation and error handling
|
||
- **Precomputation**: Shared per-group distance and ring-index precompute reused by maps and tables
|
||
- **Coordinate Conversion**: UTM ED50 to WGS84 coordinate system conversion
|
||
|
||
## Installation
|
||
|
||
### Prerequisites
|
||
- Python 3.8 or higher
|
||
- pip package manager
|
||
|
||
### Dependencies
|
||
Install the required packages:
|
||
|
||
```bash
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### Required Packages
|
||
- `pandas>=1.5.0` - Data manipulation and analysis
|
||
- `numpy>=1.21.0` - Numerical computations
|
||
- `plotly>=5.15.0` - Interactive visualizations
|
||
- `kaleido>=0.2.1` - Static image export for Plotly
|
||
- `scikit-learn>=1.3.0` - BallTree radius queries and DBSCAN clustering (used when available)
|
||
- `requests>=2.31.0` - API HTTP requests
|
||
- `python-dotenv>=1.0.0` - Environment variable management
|
||
- `python-docx>=1.1.2` - DOCX (Word) report generation
|
||
|
||
### Optional Dependencies
|
||
For coordinate conversion functionality:
|
||
```bash
|
||
pip install -r utm_converter_requirements.txt
|
||
```
|
||
|
||
## Configuration
|
||
|
||
The application supports two modes of operation:
|
||
|
||
### 1. Single Report Generation (Legacy Mode)
|
||
|
||
Uses `src/config.py` for configuration. See the legacy section below for details.
|
||
|
||
### 2. Batch Report Generation (Recommended)
|
||
|
||
Uses `wind_farms_config.json` for multi-farm batch processing with API integration.
|
||
|
||
#### Setup
|
||
|
||
1. **Create `.env` file** with your API key:
|
||
```env
|
||
API_KEY=your_api_key_here
|
||
```
|
||
|
||
2. **Create `wind_farms_config.json`**:
|
||
```json
|
||
{
|
||
"api_config": {
|
||
"base_url": "https://risk.tarla.io/api",
|
||
"timeout_seconds": 30,
|
||
"retry_attempts": 3,
|
||
"default_query_range": {
|
||
"method": "current_month"
|
||
}
|
||
},
|
||
"output_base_directory": "reports/",
|
||
"default_padding_km": 5,
|
||
"wind_farms": [
|
||
{
|
||
"farm_id": "dagpazari_RES",
|
||
"name": "Dağpazarı RES",
|
||
"enabled": true,
|
||
"coordinates_file": "/path/to/coordinates.json",
|
||
"distance_rings": [1000, 2000, 3000, 4000, 10000],
|
||
"ring_colors": ["purple", "red", "orange", "coral", "green"],
|
||
"api_params": {
|
||
"location_bounds": {
|
||
"method": "auto",
|
||
"padding_km": 5
|
||
},
|
||
"date_range": {
|
||
"method": "auto",
|
||
"query_range": {
|
||
"method": "current_month"
|
||
}
|
||
}
|
||
},
|
||
"report_config": {
|
||
"output_directory": "reports/dagpazari_RES/",
|
||
"wind_farm_name": "Dağpazarı RES"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
#### Configuration Parameters
|
||
|
||
**Farm-Level Settings:**
|
||
- `enabled`: `true`/`false` - Enable/disable report generation for this farm
|
||
- `distance_rings`: Array of distance rings in meters (e.g., `[1000, 2000, 3000, 4000, 10000]`)
|
||
- `ring_colors`: Array of colors for each ring
|
||
- `coordinates_file`: Path to turbine coordinates JSON file
|
||
|
||
**Location Bounds:**
|
||
- `method`: `"auto"` (calculate from turbines) or `"manual"` (specify)
|
||
- `padding_km`: Extra buffer beyond max distance ring (default: 5km)
|
||
- For manual: provide `center_lat`, `center_lng`, `radius_km`
|
||
|
||
**Date Range:**
|
||
- `method`: `"auto"` (detect from data) or `"manual"` (specify)
|
||
- For manual: provide `start_date` and `end_date` in `DD-MM-YYYY` format
|
||
- For auto: specify `query_range` to control API query period
|
||
|
||
**Query Range Options (for auto mode):**
|
||
- `"current_month"`: First day of current month to today
|
||
- `"last_month"`: Entire previous month
|
||
- `"days_back"`: Last N days (requires `days` parameter)
|
||
- `"custom"`: Specific dates (requires `start_date` and `end_date`)
|
||
|
||
#### Global Configuration (src/config.py)
|
||
|
||
The `src/config.py` file now only contains global defaults:
|
||
- Risk calculation parameters (`risk_params`)
|
||
- Histogram parameters (`histogram_params`)
|
||
- PDF layout parameters (`pdf_params`)
|
||
- Grouping parameters (`grouping_params`)
|
||
|
||
**Note:** Farm-specific settings (distance_rings, ring_colors, wind_farm_name, file paths, date ranges) are managed in `wind_farms_config.json` and should NOT be configured in `config.py`.
|
||
|
||
### Location Bounds Auto-Calculation
|
||
|
||
When `location_bounds.method = "auto"`, the system calculates:
|
||
|
||
1. **Centroid (Center Point)**:
|
||
- `center_lat` = average of all turbine latitudes
|
||
- `center_lng` = average of all turbine longitudes
|
||
|
||
2. **Maximum Distance from Centroid**:
|
||
- Calculates distance from centroid to each turbine
|
||
- Finds the maximum distance
|
||
|
||
3. **Total Radius**:
|
||
```
|
||
radius_km = (max_turbine_distance / 1000) +
|
||
(max_distance_ring / 1000) +
|
||
padding_km
|
||
```
|
||
|
||
Example: If turbines span 2.5km from centroid, max ring is 10km, padding is 5km:
|
||
- Total radius = 2.5 + 10 + 5 = 17.5km
|
||
|
||
### Date Range Handling
|
||
|
||
- If `date_range.method = "auto"`: Uses `query_range` to determine what dates to fetch; the report uses those query dates for the analyzed period.
|
||
- If `date_range.method = "manual"`: Uses specified `start_date` and `end_date` for both API fetch and report (supports `DD-MM-YYYY` or ISO with time, e.g. `2026-01-22T07:00:00Z`).
|
||
|
||
### Daily Lightning Density Calculation
|
||
|
||
The daily lightning density is calculated using the **actual number of days** in the analysis period:
|
||
|
||
```
|
||
daily_lightning_per_km2 = total_lightning_per_km2 / actual_days_in_range
|
||
```
|
||
|
||
Where `actual_days_in_range` is calculated from the start and end dates (inclusive).
|
||
|
||
**Example:**
|
||
- Date range: September 1-15 (15 days)
|
||
- Total lightning density: 150 events/km²
|
||
- Daily lightning density: 150 / 15 = 10 events/km²/day
|
||
|
||
This ensures accurate daily averages for partial months or custom date ranges.
|
||
|
||
### Risk Score Categories
|
||
The system uses fixed interval coloring based on specific risk score ranges:
|
||
- **Very Low Risk (<0.1)**: Blue - Distant lightning with low current
|
||
- **Low Risk (0.1-0.2)**: Teal - Moderate distance lightning
|
||
- **Med-Low Risk (0.2-0.4)**: Green - Closer lightning
|
||
- **Medium Risk (0.4-0.6)**: Yellow - Moderate risk lightning
|
||
- **Med-High Risk (0.6-0.8)**: Orange - High risk lightning
|
||
- **High Risk (0.8-1.0)**: Dark Orange - Very high risk lightning
|
||
- **Very High Risk (1.0-1.2)**: Red - Extreme risk lightning
|
||
- **Critical Risk (>1.2)**: Dark Red - Critical risk lightning
|
||
|
||
### Grouping vs Analysis Radius
|
||
- **grouping_params.max_distance_m (meters)**: Controls ONLY turbine clustering (grouping). If set (>0), it overrides ring-based grouping. Used to decide which turbines are in the same group.
|
||
- **grouping_params.distance_ring_index (0-based)**: Selects a ring from `distance_rings`.
|
||
- For grouping: used only if `max_distance_m` is not set; determines grouping radius.
|
||
- For analysis (histogram, stats, report labels): ALWAYS used to choose the analysis radius/cutoff. Does not change grouping when `max_distance_m` is provided.
|
||
|
||
Examples
|
||
- If `max_distance_m=2500` and `distance_ring_index=4` (10 km ring):
|
||
- Grouping radius = 2.5 km (from max_distance_m)
|
||
- Analysis radius = 10 km (from distance_ring_index)
|
||
- If `max_distance_m` unset and `distance_ring_index=1` (2 km ring):
|
||
- Grouping radius = 2 km
|
||
- Analysis radius = 2 km
|
||
|
||
Clustering Algorithm
|
||
- Preferred: DBSCAN with Haversine metric
|
||
- Convert lat/lng to radians; `eps = (radius_km / 6371)`, `min_samples=1`
|
||
- Clusters are formed transitively (density reachability). Example with R=2 km: A–B=1.5 km, B–C=1.5 km, A–C=3.0 km → one cluster {A,B,C} due to B bridging A and C
|
||
- Fallback: Greedy O(N^2) proximity grouping if scikit-learn is unavailable
|
||
- Starts a group at turbine i; adds any j within R of i; moves on. No transitive chaining
|
||
|
||
### Wind Farm Configuration
|
||
```python
|
||
wind_farm_name = "Your Wind Farm Name"
|
||
```
|
||
|
||
## Usage
|
||
|
||
### Batch Report Generation (Recommended)
|
||
|
||
Generate reports for multiple wind farms automatically:
|
||
|
||
```bash
|
||
# Process all enabled farms
|
||
python batch_generate.py --config wind_farms_config.json
|
||
|
||
# Process specific farm
|
||
python batch_generate.py --config wind_farms_config.json --farm-id dagpazari_RES
|
||
|
||
# List farms and their enabled status
|
||
python batch_generate.py --config wind_farms_config.json --list-farms
|
||
|
||
# Process all farms (ignore enabled flag)
|
||
python batch_generate.py --config wind_farms_config.json --force-all
|
||
```
|
||
|
||
The batch system will:
|
||
1. Load configuration from `wind_farms_config.json`
|
||
2. For each enabled farm:
|
||
- Load turbine coordinates
|
||
- Auto-calculate location bounds (center + radius) from turbines
|
||
- Determine date range for API query
|
||
- Fetch lightning data from API
|
||
- Fetch storm data from API
|
||
- Calculate risk scores
|
||
- Generate DOCX report
|
||
- Save to farm's output directory
|
||
3. Generate batch summary report
|
||
|
||
### Single Report Generation (Legacy)
|
||
|
||
Run the main application for a single report:
|
||
```bash
|
||
python main.py
|
||
```
|
||
|
||
The application will:
|
||
1. Load lightning and turbine data from configured JSON files (in `src/config.py`)
|
||
2. Calculate risk scores for each turbine using the advanced risk formula
|
||
3. Create turbine groups based on proximity
|
||
4. Generate visualizations including the new risk score heatmap
|
||
5. Create a comprehensive DOCX report with enhanced appendix
|
||
|
||
### Data Format Requirements
|
||
|
||
#### Lightning Data JSON
|
||
```json
|
||
{
|
||
"data": [
|
||
{
|
||
"lat": 39.85420,
|
||
"lng": 26.71218,
|
||
"local_time": "2025-07-15T14:30:25",
|
||
"current": -15000,
|
||
"p_type": "0",
|
||
"height": 5000
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Required Fields:**
|
||
- `lat`, `lng`: Lightning strike coordinates
|
||
- `local_time`: Timestamp (various formats supported)
|
||
- `current`: Lightning current in amperes
|
||
- `p_type`: Lightning type ("0" for cloud-to-ground, others for intercloud)
|
||
|
||
#### Turbine Data JSON
|
||
```json
|
||
[
|
||
{
|
||
"lat": 39.85420,
|
||
"lng": 26.71218,
|
||
"turbine_id": "T001"
|
||
}
|
||
]
|
||
```
|
||
|
||
**Required Fields:**
|
||
- `lat`, `lng`: Turbine coordinates
|
||
- `turbine_id`: Unique turbine identifier
|
||
|
||
### Advanced Usage
|
||
|
||
#### Coordinate Conversion
|
||
Convert UTM ED50 coordinates to WGS84:
|
||
```bash
|
||
python utm_ed50_to_wgs84_converter.py input.csv output.csv
|
||
```
|
||
|
||
#### Data Separation by Month
|
||
Separate large JSON files by month:
|
||
```bash
|
||
python separate_by_month.py input_data.json [output_directory]
|
||
```
|
||
|
||
## Output
|
||
|
||
### DOCX Report Structure
|
||
1. **Cover Page**: Wind farm information and analysis period
|
||
2. **Report Summary**: Automated narrative summary (Gemini-backed when available)
|
||
3. **Risk Analysis**: Detailed risk scores and rankings with fixed interval coloring
|
||
4. **Lightning Maps**: Coordinate plane visualizations with proper geographic orientation
|
||
5. **Statistical Analysis**: Lightning density and frequency data
|
||
6. **Detailed Tables**: Complete lightning strike information with color-coded distance rings
|
||
7. **Storm Analysis**: Storm cell data and maps (if available)
|
||
8. **Enhanced Appendix**: Comprehensive methodology including:
|
||
- Risk calculation method and formula explanation
|
||
- Risk score interpretation guide
|
||
- Centroid and distance ring calculation methodology
|
||
- Turbine grouping algorithm description
|
||
- Frequent lightning activity period detection algorithm
|
||
|
||
### Generated Files
|
||
|
||
**Single Report Mode:**
|
||
- `lightning_report.log`: Application execution log
|
||
- `{wind_farm_name}_lightning_report.docx`: Main DOCX report
|
||
- Interactive HTML maps (temporary files)
|
||
|
||
**Batch Generation Mode:**
|
||
- `batch_generation_YYYY-MM-DD.log`: Batch execution log
|
||
- `batch_summary_YYYY-MM-DD.json`: Batch processing summary
|
||
- `{farm_id}_report.docx`: DOCX report for each farm (in respective output directories)
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
lightning_report/
|
||
├── main.py # Single report generation (legacy)
|
||
├── batch_generate.py # Batch report generation with API
|
||
├── wind_farms_config.json # Batch configuration file
|
||
├── .env # API credentials (gitignored)
|
||
├── requirements.txt # Python dependencies
|
||
├── src/
|
||
│ ├── config.py # Global configuration defaults
|
||
│ ├── api/
|
||
│ │ └── data_fetcher.py # API integration for data fetching
|
||
│ ├── data/
|
||
│ │ └── loader.py # Data loading and validation
|
||
│ ├── analysis/
|
||
│ │ ├── geospatial.py # Distance calculations (vectorized Haversine)
|
||
│ │ ├── grouping.py # Turbine grouping (DBSCAN + fallback)
|
||
│ │ ├── histogram.py # Temporal analysis
|
||
│ │ ├── risk.py # Risk calculation (BallTree + fallback)
|
||
│ │ └── statistics.py # Statistical analysis (includes daily density)
|
||
│ ├── reporting/
|
||
│ │ ├── docx.py # DOCX report generation
|
||
│ │ ├── docx_sections.py # Shared DOCX helpers (charts/tables)
|
||
│ │ └── precompute.py # Shared precomputations (distances, ring indices)
|
||
│ ├── visualization/
|
||
│ │ ├── maps.py # Map generation with risk score heatmap
|
||
│ │ └── storm_cells.py # Storm cell visualization
|
||
│ └── utils.py # Utility functions including fixed interval coloring
|
||
├── separate_by_month.py # Data separation utility
|
||
└── utm_ed50_to_wgs84_converter.py # Coordinate conversion
|
||
```
|
||
|
||
## Configuration Examples
|
||
|
||
### Batch Generation Setup
|
||
|
||
**Example: Multiple Farms with Different Settings**
|
||
```json
|
||
{
|
||
"api_config": {
|
||
"base_url": "https://risk.tarla.io/api",
|
||
"timeout_seconds": 30,
|
||
"retry_attempts": 3
|
||
},
|
||
"wind_farms": [
|
||
{
|
||
"farm_id": "farm1",
|
||
"name": "Farm 1",
|
||
"enabled": true,
|
||
"coordinates_file": "/path/to/farm1_coordinates.json",
|
||
"distance_rings": [1000, 2000, 3000, 4000, 10000],
|
||
"api_params": {
|
||
"location_bounds": {
|
||
"method": "auto",
|
||
"padding_km": 5
|
||
},
|
||
"date_range": {
|
||
"method": "manual",
|
||
"start_date": "01-09-2025",
|
||
"end_date": "30-09-2025"
|
||
}
|
||
},
|
||
"report_config": {
|
||
"output_directory": "reports/farm1/",
|
||
"wind_farm_name": "Farm 1"
|
||
}
|
||
},
|
||
{
|
||
"farm_id": "farm2",
|
||
"name": "Farm 2",
|
||
"enabled": false,
|
||
"coordinates_file": "/path/to/farm2_coordinates.json",
|
||
"distance_rings": [1000, 2000, 3000, 4000, 10000],
|
||
"api_params": {
|
||
"location_bounds": {
|
||
"method": "manual",
|
||
"center_lat": 36.90,
|
||
"center_lng": 33.575,
|
||
"radius_km": 35
|
||
},
|
||
"date_range": {
|
||
"method": "auto",
|
||
"query_range": {
|
||
"method": "days_back",
|
||
"days": 30
|
||
}
|
||
}
|
||
},
|
||
"report_config": {
|
||
"output_directory": "reports/farm2/",
|
||
"wind_farm_name": "Farm 2"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### Custom Risk Parameters
|
||
```python
|
||
# Adjust risk calculation sensitivity in src/config.py
|
||
risk_params = {
|
||
'P_0': 1.5, # Higher base probability
|
||
'alpha': 0.3, # Slower distance decay
|
||
'current_weight': 0.2 # Higher current importance
|
||
}
|
||
```
|
||
|
||
**Note:** Farm-specific settings (distance_rings, ring_colors, etc.) should be configured in `wind_farms_config.json`, not in `config.py`.
|
||
|
||
## Risk Score Methodology
|
||
|
||
### Risk Calculation Formula
|
||
The system uses an advanced risk calculation formula:
|
||
```
|
||
Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance)
|
||
```
|
||
|
||
Where:
|
||
- **P₀**: Base probability (configurable)
|
||
- **α**: Distance decay factor (configurable)
|
||
- **Current**: Lightning current magnitude in amperes
|
||
- **Distance**: Distance from turbine in kilometers
|
||
|
||
### Risk Score Interpretation
|
||
The risk score heatmap provides a visual reference for interpreting risk levels:
|
||
- **X-axis**: Lightning current magnitude (1,000 to 300,000 amperes)
|
||
- **Y-axis**: Distance from turbine (0.1 km to max distance ring, dynamically scaled)
|
||
- **Color intensity**: Risk score level (blue to red gradient using palette: F94144, F3722C, F8961E, F9C74F, 90BE6D, 43AA8B, 577590)
|
||
- **Contour curves**: Specific risk level boundaries (0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.5)
|
||
|
||
### API Integration
|
||
|
||
The system integrates with the Tarla.io API for automated data fetching:
|
||
|
||
**Endpoints:**
|
||
- Lightning data: `https://risk.tarla.io/api/lightning-data/historical/`
|
||
- Storm data: `https://risk.tarla.io/api/storm-data/historical/`
|
||
|
||
**Authentication:**
|
||
- API key stored in `.env` file as `API_KEY`
|
||
- Sent as `x-api-key` header in requests
|
||
|
||
**Request Format:**
|
||
- Query type: `circle` (center + radius)
|
||
- Parameters: `centerLatitude`, `centerLongitude`, `radius` (in meters), `startDate`, `endDate`
|
||
- Date format: `YYYY-MM-DD`
|
||
|
||
**Response Handling:**
|
||
- Automatically converts API responses to expected DataFrame format
|
||
- Handles empty datasets gracefully
|
||
- Validates data structure before processing
|
||
|
||
## Troubleshooting
|
||
|
||
### Common Issues
|
||
|
||
1. **API Authentication Errors (401 Unauthorized)**
|
||
- Verify `.env` file exists with `API_KEY=your_key`
|
||
- Check that API key is correct and active
|
||
- Ensure API key contains special characters correctly (e.g., `==` at the end)
|
||
|
||
2. **API Timeout Errors**
|
||
- Increase `timeout_seconds` in `api_config`
|
||
- Check network connectivity
|
||
- Verify API endpoint is accessible
|
||
|
||
3. **File Not Found Errors**
|
||
- For batch mode: Verify file paths in `wind_farms_config.json`
|
||
- For single mode: Verify file paths in `src/config.py`
|
||
- Ensure JSON files exist and are readable
|
||
|
||
4. **Data Validation Errors**
|
||
- Check JSON format matches required structure
|
||
- Verify coordinate values are valid numbers
|
||
- Ensure timestamp format is supported
|
||
- For API data: Check API response format matches expected structure
|
||
|
||
5. **Empty Data / NaT Errors**
|
||
- System handles empty datasets gracefully
|
||
- Check API date range - data might not exist for specified period
|
||
- Verify location bounds cover the area of interest
|
||
- Check logs for API response details
|
||
|
||
6. **Memory Issues with Large Datasets**
|
||
- Use `separate_by_month.py` to split large files
|
||
- Adjust analysis period to smaller time ranges
|
||
- Process farms individually using `--farm-id` flag
|
||
|
||
7. **DOCX Generation Errors**
|
||
- Ensure sufficient disk space
|
||
- Check write permissions for output directory
|
||
|
||
8. **Risk Score Heatmap Issues**
|
||
- Verify distance_rings configuration is valid
|
||
- Check that lightning data contains valid current values
|
||
- Ensure turbine coordinates are properly formatted
|
||
|
||
9. **Batch Generation Issues**
|
||
- Check `batch_summary_YYYY-MM-DD.json` for detailed error information
|
||
- Verify all farms have valid configuration
|
||
- Check `batch_generation_YYYY-MM-DD.log` for detailed logs
|
||
- Use `--list-farms` to verify farm configuration
|
||
|
||
### Logging
|
||
|
||
**Single Report Mode:**
|
||
- `lightning_report.log`: Application execution log
|
||
|
||
**Batch Generation Mode:**
|
||
- `batch_generation_YYYY-MM-DD.log`: Batch execution log with per-farm details
|
||
- `batch_summary_YYYY-MM-DD.json`: Structured summary of batch processing
|
||
|
||
Logs include:
|
||
- Data loading progress
|
||
- API request/response details
|
||
- Risk calculation details
|
||
- Error messages and stack traces
|
||
- Performance metrics
|
||
- Farm processing status
|
||
|
||
## Performance Considerations
|
||
|
||
- **Large Datasets**: For datasets with >100,000 lightning strikes, consider:
|
||
- Using date range filtering
|
||
- Splitting data by month
|
||
- Increasing system memory allocation
|
||
|
||
- **Optimizations used**:
|
||
- BallTree neighbor queries for CG risk scoring (O(n log n) build; sublinear queries)
|
||
- DBSCAN clustering with Haversine metric for grouping; O(N^2) fallback maintained
|
||
- Vectorized Haversine distance utilities (array-based)
|
||
- Shared per-group precomputation of distances and ring indices reused by maps and tables
|
||
- Centralized date/time parsing and formatting
|
||
- Efficient risk score heatmap generation with contour overlay
|
||
|
||
## Contributing
|
||
|
||
1. Follow the existing code structure and naming conventions
|
||
2. Add appropriate error handling and logging
|
||
3. Update configuration options as needed
|
||
4. Test with various data formats and sizes
|
||
5. Update documentation for new features
|
||
6. Maintain consistency with the fixed interval coloring system
|
||
|
||
## License
|
||
|
||
This project is proprietary software. All rights reserved.
|
||
|
||
## Support
|
||
|
||
For technical support or feature requests, please contact the development team with:
|
||
- Detailed error messages
|
||
- Sample data (if possible)
|
||
- System configuration details
|
||
- Expected vs actual behavior description
|