Initial import: Lightning_Report with n8n integration
Fork of Lightning_Report adding: - n8n_report_branch.json: workflow branch for storm-triggered report delivery - report_service/: FastAPI microservice wrapping create_docx_report() so n8n can produce byte-identical reports without fighting the Python Code sandbox Made-with: Cursor
This commit is contained in:
commit
45d80dfaa6
9
.gitignore
vendored
Normal file
9
.gitignore
vendored
Normal file
@ -0,0 +1,9 @@
|
||||
/distance_analysis.html
|
||||
/firtina_sorgulama_2025-07-22_2025-04-01.xlsx
|
||||
/turbine_report.pdf
|
||||
/yildirim_simsek_sorgulama-2025-07-22_2025-04-01.xlsx
|
||||
/~$yildirim_simsek_sorgulama-2025-07-22_2025-04-01.xlsx
|
||||
.env
|
||||
*.log
|
||||
__pycache__/
|
||||
*.pyc
|
||||
145
BATCH_GENERATION_README.md
Normal file
145
BATCH_GENERATION_README.md
Normal file
@ -0,0 +1,145 @@
|
||||
# Batch Report Generation Guide
|
||||
|
||||
## Overview
|
||||
|
||||
The batch generation system automates lightning report generation for multiple wind farms by fetching data from the API and processing them in batch.
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Install Dependencies
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 2. Create .env File
|
||||
|
||||
Create a `.env` file in the project root with your API key:
|
||||
|
||||
```env
|
||||
API_KEY=your_api_key_here
|
||||
```
|
||||
|
||||
### 3. Create Configuration File
|
||||
|
||||
Copy `wind_farms_config.example.json` to `wind_farms_config.json` and configure your wind farms:
|
||||
|
||||
```bash
|
||||
cp wind_farms_config.example.json wind_farms_config.json
|
||||
```
|
||||
|
||||
Edit `wind_farms_config.json` with your farms' details.
|
||||
|
||||
## Configuration
|
||||
|
||||
### API Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"api_config": {
|
||||
"base_url": "https://risk.tarla.io/api",
|
||||
"timeout_seconds": 30,
|
||||
"retry_attempts": 3,
|
||||
"default_query_range": {
|
||||
"method": "current_month"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Wind Farm Configuration
|
||||
|
||||
Each farm can have:
|
||||
|
||||
- **enabled**: `true` or `false` - Controls whether to generate report
|
||||
- **location_bounds.method**: `"auto"` (calculate from turbines) or `"manual"` (specify)
|
||||
- **location_bounds.padding_km**: Extra buffer beyond max distance ring (default: 5km)
|
||||
- **date_range.method**: `"auto"` (use query_range to fetch, then detect from data) or `"manual"` (specify dates)
|
||||
|
||||
### Query Range Options (for auto date_range)
|
||||
|
||||
- `"current_month"`: First day of current month to today
|
||||
- `"last_month"`: Entire previous month
|
||||
- `"days_back"`: Last N days (requires `days` parameter)
|
||||
- `"custom"`: Specific dates (requires `start_date` and `end_date`)
|
||||
|
||||
## Usage
|
||||
|
||||
### Process All Enabled Farms
|
||||
|
||||
```bash
|
||||
python batch_generate.py --config wind_farms_config.json
|
||||
```
|
||||
|
||||
### List Farms
|
||||
|
||||
```bash
|
||||
python batch_generate.py --config wind_farms_config.json --list-farms
|
||||
```
|
||||
|
||||
### Process Specific Farm
|
||||
|
||||
```bash
|
||||
python batch_generate.py --config wind_farms_config.json --farm-id dagpazari_RES
|
||||
```
|
||||
|
||||
### Process All Farms (Ignore Enabled Flag)
|
||||
|
||||
```bash
|
||||
python batch_generate.py --config wind_farms_config.json --force-all
|
||||
```
|
||||
|
||||
### Process Disabled Farm
|
||||
|
||||
```bash
|
||||
python batch_generate.py --config wind_farms_config.json --farm-id disabled_farm --force
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### Location Bounds Auto-Calculation
|
||||
|
||||
1. **Calculate Centroid**: Average of all turbine coordinates
|
||||
2. **Find Max Distance**: Maximum distance from centroid to any turbine
|
||||
3. **Add Distance Ring**: Add max distance ring (e.g., 30km)
|
||||
4. **Add Padding**: Add padding (e.g., 5km)
|
||||
5. **Result**: Center (centroid) + Radius (total distance)
|
||||
|
||||
## Output
|
||||
|
||||
- Reports are saved to each farm's `output_directory` in the config
|
||||
- Batch summary saved to `reports/batch_summary_YYYY-MM-DD.json`
|
||||
- Log file: `batch_generation_YYYY-MM-DD.log`
|
||||
|
||||
## API Endpoints
|
||||
|
||||
The system uses:
|
||||
- Lightning data: `https://risk.tarla.io/api/lightning-data/historical/`
|
||||
- Storm data: `https://risk.tarla.io/api/storm-data/historical/`
|
||||
|
||||
Parameters:
|
||||
- `queryType=circle`
|
||||
- `centerLongitude` (longitude first)
|
||||
- `centerLatitude`
|
||||
- `radius` (in meters)
|
||||
- `startDate=YYYY-MM-DD`
|
||||
- `endDate=YYYY-MM-DD`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### API Key Not Found
|
||||
|
||||
Ensure `.env` file exists with `API_KEY=your_key`
|
||||
|
||||
### No Data Fetched
|
||||
|
||||
- Check API credentials
|
||||
- Verify date range is correct
|
||||
- Check if location bounds cover the area
|
||||
- Verify API endpoint URLs
|
||||
|
||||
### Farm Skipped
|
||||
|
||||
- Check if `enabled: false` in config
|
||||
- Use `--force` flag to process disabled farms
|
||||
|
||||
646
README.md
Normal file
646
README.md
Normal file
@ -0,0 +1,646 @@
|
||||
# Lightning Report Generator
|
||||
|
||||
A comprehensive Python application for analyzing lightning strike data in relation to wind turbine locations and generating detailed DOCX reports with risk assessments, visualizations, and statistical analysis.
|
||||
|
||||
## Overview
|
||||
|
||||
This application processes lightning strike data and wind turbine coordinates to:
|
||||
- Calculate lightning risk scores for each turbine using advanced mathematical models
|
||||
- Generate interactive maps showing lightning strikes and turbine locations
|
||||
- Create statistical analysis and histograms with temporal distribution
|
||||
- Group turbines based on proximity and risk levels
|
||||
- Generate comprehensive DOCX reports with visualizations and risk assessment charts
|
||||
- Support storm cell analysis and mapping
|
||||
- Provide detailed risk score interpretation and calculation methodology
|
||||
|
||||
## Features
|
||||
|
||||
### Core Analysis
|
||||
- **Risk Assessment**: Fast per-turbine scoring using BallTree radius queries (Haversine metric) with automatic fallback to vectorized matrix math
|
||||
- **Advanced Risk Formula**: `Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance)` with configurable parameters
|
||||
- **Geospatial Analysis**: Vectorized Haversine utilities and configurable distance rings
|
||||
- **Statistical Analysis**: Lightning density, frequency, and temporal distribution analysis
|
||||
- **Daily Lightning Density**: Calculates daily average using actual number of days in date range (not fixed month)
|
||||
- **Turbine Grouping**: Proximity-based clustering using DBSCAN (Haversine) with graceful fallback to O(N^2) grouping for small datasets
|
||||
|
||||
### API Integration
|
||||
- **Automated Data Fetching**: Fetch lightning and storm data directly from API
|
||||
- **Flexible Location Bounds**: Auto-calculate center + radius from turbines or specify manually
|
||||
- **Date Range Management**: Auto-detect actual period from data or use manual date ranges
|
||||
- **Batch Processing**: Process multiple wind farms in a single run
|
||||
- **Error Handling**: Graceful handling of empty data, API timeouts, and failures
|
||||
|
||||
### Visualization
|
||||
- **Interactive Maps**: Plotly-based coordinate-plane maps for CG/IC lightning with ring-aware coloring
|
||||
- **Risk Score Heatmap**: 2D visualization with current magnitude on X-axis (up to 300k amps) and distance on Y-axis, with contour curves
|
||||
- **Fixed Interval Coloring**: Consistent color gradient mapping (blue to red) based on predefined risk score ranges (0.1-1.5)
|
||||
- **Lightning Histograms**: Temporal distribution of lightning events with peak detection
|
||||
- **Storm Cell Maps**: Visualization of storm cell data (when available)
|
||||
- **Coordinate Plane Views**: Standard geographic orientation (latitude on Y-axis, longitude on X-axis)
|
||||
|
||||
### Reporting
|
||||
- **DOCX Generation**: Word reports (DOCX)
|
||||
- **Risk Score Chart**: Integrated heatmap showing distance vs. current magnitude relationship
|
||||
- **Multiple Map Types**: Coordinate plane maps for different lightning types
|
||||
- **Statistical Tables**: Detailed lightning strike information with proximity data (precomputed distances)
|
||||
- **Risk Summaries**: Grouped risk analysis and recommendations with fixed interval color coding
|
||||
- **Enhanced Appendix**: Detailed methodology explanations including risk calculation method, interpretation guide, and algorithm descriptions
|
||||
|
||||
### Data Processing
|
||||
- **JSON Data Loading**: Support for various JSON data structures
|
||||
- **Date Range Filtering**: Configurable analysis periods
|
||||
- **Date/Time Formatting**: Centralized, consistent DD-MM-YYYY and DD-MM-YYYY HH:MM:SS formatting
|
||||
- **Data Validation**: Comprehensive input validation and error handling
|
||||
- **Precomputation**: Shared per-group distance and ring-index precompute reused by maps and tables
|
||||
- **Coordinate Conversion**: UTM ED50 to WGS84 coordinate system conversion
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
- Python 3.8 or higher
|
||||
- pip package manager
|
||||
|
||||
### Dependencies
|
||||
Install the required packages:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Required Packages
|
||||
- `pandas>=1.5.0` - Data manipulation and analysis
|
||||
- `numpy>=1.21.0` - Numerical computations
|
||||
- `plotly>=5.15.0` - Interactive visualizations
|
||||
- `kaleido>=0.2.1` - Static image export for Plotly
|
||||
- `scikit-learn>=1.3.0` - BallTree radius queries and DBSCAN clustering (used when available)
|
||||
- `requests>=2.31.0` - API HTTP requests
|
||||
- `python-dotenv>=1.0.0` - Environment variable management
|
||||
- `python-docx>=1.1.2` - DOCX (Word) report generation
|
||||
|
||||
### Optional Dependencies
|
||||
For coordinate conversion functionality:
|
||||
```bash
|
||||
pip install -r utm_converter_requirements.txt
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The application supports two modes of operation:
|
||||
|
||||
### 1. Single Report Generation (Legacy Mode)
|
||||
|
||||
Uses `src/config.py` for configuration. See the legacy section below for details.
|
||||
|
||||
### 2. Batch Report Generation (Recommended)
|
||||
|
||||
Uses `wind_farms_config.json` for multi-farm batch processing with API integration.
|
||||
|
||||
#### Setup
|
||||
|
||||
1. **Create `.env` file** with your API key:
|
||||
```env
|
||||
API_KEY=your_api_key_here
|
||||
```
|
||||
|
||||
2. **Create `wind_farms_config.json`**:
|
||||
```json
|
||||
{
|
||||
"api_config": {
|
||||
"base_url": "https://risk.tarla.io/api",
|
||||
"timeout_seconds": 30,
|
||||
"retry_attempts": 3,
|
||||
"default_query_range": {
|
||||
"method": "current_month"
|
||||
}
|
||||
},
|
||||
"output_base_directory": "reports/",
|
||||
"default_padding_km": 5,
|
||||
"wind_farms": [
|
||||
{
|
||||
"farm_id": "dagpazari_RES",
|
||||
"name": "Dağpazarı RES",
|
||||
"enabled": true,
|
||||
"coordinates_file": "/path/to/coordinates.json",
|
||||
"distance_rings": [1000, 2000, 3000, 4000, 10000],
|
||||
"ring_colors": ["purple", "red", "orange", "coral", "green"],
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "auto",
|
||||
"padding_km": 5
|
||||
},
|
||||
"date_range": {
|
||||
"method": "auto",
|
||||
"query_range": {
|
||||
"method": "current_month"
|
||||
}
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "reports/dagpazari_RES/",
|
||||
"wind_farm_name": "Dağpazarı RES"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### Configuration Parameters
|
||||
|
||||
**Farm-Level Settings:**
|
||||
- `enabled`: `true`/`false` - Enable/disable report generation for this farm
|
||||
- `distance_rings`: Array of distance rings in meters (e.g., `[1000, 2000, 3000, 4000, 10000]`)
|
||||
- `ring_colors`: Array of colors for each ring
|
||||
- `coordinates_file`: Path to turbine coordinates JSON file
|
||||
|
||||
**Location Bounds:**
|
||||
- `method`: `"auto"` (calculate from turbines) or `"manual"` (specify)
|
||||
- `padding_km`: Extra buffer beyond max distance ring (default: 5km)
|
||||
- For manual: provide `center_lat`, `center_lng`, `radius_km`
|
||||
|
||||
**Date Range:**
|
||||
- `method`: `"auto"` (detect from data) or `"manual"` (specify)
|
||||
- For manual: provide `start_date` and `end_date` in `DD-MM-YYYY` format
|
||||
- For auto: specify `query_range` to control API query period
|
||||
|
||||
**Query Range Options (for auto mode):**
|
||||
- `"current_month"`: First day of current month to today
|
||||
- `"last_month"`: Entire previous month
|
||||
- `"days_back"`: Last N days (requires `days` parameter)
|
||||
- `"custom"`: Specific dates (requires `start_date` and `end_date`)
|
||||
|
||||
#### Global Configuration (src/config.py)
|
||||
|
||||
The `src/config.py` file now only contains global defaults:
|
||||
- Risk calculation parameters (`risk_params`)
|
||||
- Histogram parameters (`histogram_params`)
|
||||
- PDF layout parameters (`pdf_params`)
|
||||
- Grouping parameters (`grouping_params`)
|
||||
|
||||
**Note:** Farm-specific settings (distance_rings, ring_colors, wind_farm_name, file paths, date ranges) are managed in `wind_farms_config.json` and should NOT be configured in `config.py`.
|
||||
|
||||
### Location Bounds Auto-Calculation
|
||||
|
||||
When `location_bounds.method = "auto"`, the system calculates:
|
||||
|
||||
1. **Centroid (Center Point)**:
|
||||
- `center_lat` = average of all turbine latitudes
|
||||
- `center_lng` = average of all turbine longitudes
|
||||
|
||||
2. **Maximum Distance from Centroid**:
|
||||
- Calculates distance from centroid to each turbine
|
||||
- Finds the maximum distance
|
||||
|
||||
3. **Total Radius**:
|
||||
```
|
||||
radius_km = (max_turbine_distance / 1000) +
|
||||
(max_distance_ring / 1000) +
|
||||
padding_km
|
||||
```
|
||||
|
||||
Example: If turbines span 2.5km from centroid, max ring is 10km, padding is 5km:
|
||||
- Total radius = 2.5 + 10 + 5 = 17.5km
|
||||
|
||||
### Date Range Handling
|
||||
|
||||
- If `date_range.method = "auto"`: Uses `query_range` to determine what dates to fetch; the report uses those query dates for the analyzed period.
|
||||
- If `date_range.method = "manual"`: Uses specified `start_date` and `end_date` for both API fetch and report (supports `DD-MM-YYYY` or ISO with time, e.g. `2026-01-22T07:00:00Z`).
|
||||
|
||||
### Daily Lightning Density Calculation
|
||||
|
||||
The daily lightning density is calculated using the **actual number of days** in the analysis period:
|
||||
|
||||
```
|
||||
daily_lightning_per_km2 = total_lightning_per_km2 / actual_days_in_range
|
||||
```
|
||||
|
||||
Where `actual_days_in_range` is calculated from the start and end dates (inclusive).
|
||||
|
||||
**Example:**
|
||||
- Date range: September 1-15 (15 days)
|
||||
- Total lightning density: 150 events/km²
|
||||
- Daily lightning density: 150 / 15 = 10 events/km²/day
|
||||
|
||||
This ensures accurate daily averages for partial months or custom date ranges.
|
||||
|
||||
### Risk Score Categories
|
||||
The system uses fixed interval coloring based on specific risk score ranges:
|
||||
- **Very Low Risk (<0.1)**: Blue - Distant lightning with low current
|
||||
- **Low Risk (0.1-0.2)**: Teal - Moderate distance lightning
|
||||
- **Med-Low Risk (0.2-0.4)**: Green - Closer lightning
|
||||
- **Medium Risk (0.4-0.6)**: Yellow - Moderate risk lightning
|
||||
- **Med-High Risk (0.6-0.8)**: Orange - High risk lightning
|
||||
- **High Risk (0.8-1.0)**: Dark Orange - Very high risk lightning
|
||||
- **Very High Risk (1.0-1.2)**: Red - Extreme risk lightning
|
||||
- **Critical Risk (>1.2)**: Dark Red - Critical risk lightning
|
||||
|
||||
### Grouping vs Analysis Radius
|
||||
- **grouping_params.max_distance_m (meters)**: Controls ONLY turbine clustering (grouping). If set (>0), it overrides ring-based grouping. Used to decide which turbines are in the same group.
|
||||
- **grouping_params.distance_ring_index (0-based)**: Selects a ring from `distance_rings`.
|
||||
- For grouping: used only if `max_distance_m` is not set; determines grouping radius.
|
||||
- For analysis (histogram, stats, report labels): ALWAYS used to choose the analysis radius/cutoff. Does not change grouping when `max_distance_m` is provided.
|
||||
|
||||
Examples
|
||||
- If `max_distance_m=2500` and `distance_ring_index=4` (10 km ring):
|
||||
- Grouping radius = 2.5 km (from max_distance_m)
|
||||
- Analysis radius = 10 km (from distance_ring_index)
|
||||
- If `max_distance_m` unset and `distance_ring_index=1` (2 km ring):
|
||||
- Grouping radius = 2 km
|
||||
- Analysis radius = 2 km
|
||||
|
||||
Clustering Algorithm
|
||||
- Preferred: DBSCAN with Haversine metric
|
||||
- Convert lat/lng to radians; `eps = (radius_km / 6371)`, `min_samples=1`
|
||||
- Clusters are formed transitively (density reachability). Example with R=2 km: A–B=1.5 km, B–C=1.5 km, A–C=3.0 km → one cluster {A,B,C} due to B bridging A and C
|
||||
- Fallback: Greedy O(N^2) proximity grouping if scikit-learn is unavailable
|
||||
- Starts a group at turbine i; adds any j within R of i; moves on. No transitive chaining
|
||||
|
||||
### Wind Farm Configuration
|
||||
```python
|
||||
wind_farm_name = "Your Wind Farm Name"
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Batch Report Generation (Recommended)
|
||||
|
||||
Generate reports for multiple wind farms automatically:
|
||||
|
||||
```bash
|
||||
# Process all enabled farms
|
||||
python batch_generate.py --config wind_farms_config.json
|
||||
|
||||
# Process specific farm
|
||||
python batch_generate.py --config wind_farms_config.json --farm-id dagpazari_RES
|
||||
|
||||
# List farms and their enabled status
|
||||
python batch_generate.py --config wind_farms_config.json --list-farms
|
||||
|
||||
# Process all farms (ignore enabled flag)
|
||||
python batch_generate.py --config wind_farms_config.json --force-all
|
||||
```
|
||||
|
||||
The batch system will:
|
||||
1. Load configuration from `wind_farms_config.json`
|
||||
2. For each enabled farm:
|
||||
- Load turbine coordinates
|
||||
- Auto-calculate location bounds (center + radius) from turbines
|
||||
- Determine date range for API query
|
||||
- Fetch lightning data from API
|
||||
- Fetch storm data from API
|
||||
- Calculate risk scores
|
||||
- Generate DOCX report
|
||||
- Save to farm's output directory
|
||||
3. Generate batch summary report
|
||||
|
||||
### Single Report Generation (Legacy)
|
||||
|
||||
Run the main application for a single report:
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
The application will:
|
||||
1. Load lightning and turbine data from configured JSON files (in `src/config.py`)
|
||||
2. Calculate risk scores for each turbine using the advanced risk formula
|
||||
3. Create turbine groups based on proximity
|
||||
4. Generate visualizations including the new risk score heatmap
|
||||
5. Create a comprehensive DOCX report with enhanced appendix
|
||||
|
||||
### Data Format Requirements
|
||||
|
||||
#### Lightning Data JSON
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"lat": 39.85420,
|
||||
"lng": 26.71218,
|
||||
"local_time": "2025-07-15T14:30:25",
|
||||
"current": -15000,
|
||||
"p_type": "0",
|
||||
"height": 5000
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Required Fields:**
|
||||
- `lat`, `lng`: Lightning strike coordinates
|
||||
- `local_time`: Timestamp (various formats supported)
|
||||
- `current`: Lightning current in amperes
|
||||
- `p_type`: Lightning type ("0" for cloud-to-ground, others for intercloud)
|
||||
|
||||
#### Turbine Data JSON
|
||||
```json
|
||||
[
|
||||
{
|
||||
"lat": 39.85420,
|
||||
"lng": 26.71218,
|
||||
"turbine_id": "T001"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
**Required Fields:**
|
||||
- `lat`, `lng`: Turbine coordinates
|
||||
- `turbine_id`: Unique turbine identifier
|
||||
|
||||
### Advanced Usage
|
||||
|
||||
#### Coordinate Conversion
|
||||
Convert UTM ED50 coordinates to WGS84:
|
||||
```bash
|
||||
python utm_ed50_to_wgs84_converter.py input.csv output.csv
|
||||
```
|
||||
|
||||
#### Data Separation by Month
|
||||
Separate large JSON files by month:
|
||||
```bash
|
||||
python separate_by_month.py input_data.json [output_directory]
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
### DOCX Report Structure
|
||||
1. **Cover Page**: Wind farm information and analysis period
|
||||
2. **Report Summary**: Automated narrative summary (Gemini-backed when available)
|
||||
3. **Risk Analysis**: Detailed risk scores and rankings with fixed interval coloring
|
||||
4. **Lightning Maps**: Coordinate plane visualizations with proper geographic orientation
|
||||
5. **Statistical Analysis**: Lightning density and frequency data
|
||||
6. **Detailed Tables**: Complete lightning strike information with color-coded distance rings
|
||||
7. **Storm Analysis**: Storm cell data and maps (if available)
|
||||
8. **Enhanced Appendix**: Comprehensive methodology including:
|
||||
- Risk calculation method and formula explanation
|
||||
- Risk score interpretation guide
|
||||
- Centroid and distance ring calculation methodology
|
||||
- Turbine grouping algorithm description
|
||||
- Frequent lightning activity period detection algorithm
|
||||
|
||||
### Generated Files
|
||||
|
||||
**Single Report Mode:**
|
||||
- `lightning_report.log`: Application execution log
|
||||
- `{wind_farm_name}_lightning_report.docx`: Main DOCX report
|
||||
- Interactive HTML maps (temporary files)
|
||||
|
||||
**Batch Generation Mode:**
|
||||
- `batch_generation_YYYY-MM-DD.log`: Batch execution log
|
||||
- `batch_summary_YYYY-MM-DD.json`: Batch processing summary
|
||||
- `{farm_id}_report.docx`: DOCX report for each farm (in respective output directories)
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
lightning_report/
|
||||
├── main.py # Single report generation (legacy)
|
||||
├── batch_generate.py # Batch report generation with API
|
||||
├── wind_farms_config.json # Batch configuration file
|
||||
├── .env # API credentials (gitignored)
|
||||
├── requirements.txt # Python dependencies
|
||||
├── src/
|
||||
│ ├── config.py # Global configuration defaults
|
||||
│ ├── api/
|
||||
│ │ └── data_fetcher.py # API integration for data fetching
|
||||
│ ├── data/
|
||||
│ │ └── loader.py # Data loading and validation
|
||||
│ ├── analysis/
|
||||
│ │ ├── geospatial.py # Distance calculations (vectorized Haversine)
|
||||
│ │ ├── grouping.py # Turbine grouping (DBSCAN + fallback)
|
||||
│ │ ├── histogram.py # Temporal analysis
|
||||
│ │ ├── risk.py # Risk calculation (BallTree + fallback)
|
||||
│ │ └── statistics.py # Statistical analysis (includes daily density)
|
||||
│ ├── reporting/
|
||||
│ │ ├── docx.py # DOCX report generation
|
||||
│ │ ├── docx_sections.py # Shared DOCX helpers (charts/tables)
|
||||
│ │ └── precompute.py # Shared precomputations (distances, ring indices)
|
||||
│ ├── visualization/
|
||||
│ │ ├── maps.py # Map generation with risk score heatmap
|
||||
│ │ └── storm_cells.py # Storm cell visualization
|
||||
│ └── utils.py # Utility functions including fixed interval coloring
|
||||
├── separate_by_month.py # Data separation utility
|
||||
└── utm_ed50_to_wgs84_converter.py # Coordinate conversion
|
||||
```
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### Batch Generation Setup
|
||||
|
||||
**Example: Multiple Farms with Different Settings**
|
||||
```json
|
||||
{
|
||||
"api_config": {
|
||||
"base_url": "https://risk.tarla.io/api",
|
||||
"timeout_seconds": 30,
|
||||
"retry_attempts": 3
|
||||
},
|
||||
"wind_farms": [
|
||||
{
|
||||
"farm_id": "farm1",
|
||||
"name": "Farm 1",
|
||||
"enabled": true,
|
||||
"coordinates_file": "/path/to/farm1_coordinates.json",
|
||||
"distance_rings": [1000, 2000, 3000, 4000, 10000],
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "auto",
|
||||
"padding_km": 5
|
||||
},
|
||||
"date_range": {
|
||||
"method": "manual",
|
||||
"start_date": "01-09-2025",
|
||||
"end_date": "30-09-2025"
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "reports/farm1/",
|
||||
"wind_farm_name": "Farm 1"
|
||||
}
|
||||
},
|
||||
{
|
||||
"farm_id": "farm2",
|
||||
"name": "Farm 2",
|
||||
"enabled": false,
|
||||
"coordinates_file": "/path/to/farm2_coordinates.json",
|
||||
"distance_rings": [1000, 2000, 3000, 4000, 10000],
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "manual",
|
||||
"center_lat": 36.90,
|
||||
"center_lng": 33.575,
|
||||
"radius_km": 35
|
||||
},
|
||||
"date_range": {
|
||||
"method": "auto",
|
||||
"query_range": {
|
||||
"method": "days_back",
|
||||
"days": 30
|
||||
}
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "reports/farm2/",
|
||||
"wind_farm_name": "Farm 2"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Custom Risk Parameters
|
||||
```python
|
||||
# Adjust risk calculation sensitivity in src/config.py
|
||||
risk_params = {
|
||||
'P_0': 1.5, # Higher base probability
|
||||
'alpha': 0.3, # Slower distance decay
|
||||
'current_weight': 0.2 # Higher current importance
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** Farm-specific settings (distance_rings, ring_colors, etc.) should be configured in `wind_farms_config.json`, not in `config.py`.
|
||||
|
||||
## Risk Score Methodology
|
||||
|
||||
### Risk Calculation Formula
|
||||
The system uses an advanced risk calculation formula:
|
||||
```
|
||||
Risk = P₀ × (1 + α×Current/10000) × e^(-α×Distance)
|
||||
```
|
||||
|
||||
Where:
|
||||
- **P₀**: Base probability (configurable)
|
||||
- **α**: Distance decay factor (configurable)
|
||||
- **Current**: Lightning current magnitude in amperes
|
||||
- **Distance**: Distance from turbine in kilometers
|
||||
|
||||
### Risk Score Interpretation
|
||||
The risk score heatmap provides a visual reference for interpreting risk levels:
|
||||
- **X-axis**: Lightning current magnitude (1,000 to 300,000 amperes)
|
||||
- **Y-axis**: Distance from turbine (0.1 km to max distance ring, dynamically scaled)
|
||||
- **Color intensity**: Risk score level (blue to red gradient using palette: F94144, F3722C, F8961E, F9C74F, 90BE6D, 43AA8B, 577590)
|
||||
- **Contour curves**: Specific risk level boundaries (0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.5)
|
||||
|
||||
### API Integration
|
||||
|
||||
The system integrates with the Tarla.io API for automated data fetching:
|
||||
|
||||
**Endpoints:**
|
||||
- Lightning data: `https://risk.tarla.io/api/lightning-data/historical/`
|
||||
- Storm data: `https://risk.tarla.io/api/storm-data/historical/`
|
||||
|
||||
**Authentication:**
|
||||
- API key stored in `.env` file as `API_KEY`
|
||||
- Sent as `x-api-key` header in requests
|
||||
|
||||
**Request Format:**
|
||||
- Query type: `circle` (center + radius)
|
||||
- Parameters: `centerLatitude`, `centerLongitude`, `radius` (in meters), `startDate`, `endDate`
|
||||
- Date format: `YYYY-MM-DD`
|
||||
|
||||
**Response Handling:**
|
||||
- Automatically converts API responses to expected DataFrame format
|
||||
- Handles empty datasets gracefully
|
||||
- Validates data structure before processing
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **API Authentication Errors (401 Unauthorized)**
|
||||
- Verify `.env` file exists with `API_KEY=your_key`
|
||||
- Check that API key is correct and active
|
||||
- Ensure API key contains special characters correctly (e.g., `==` at the end)
|
||||
|
||||
2. **API Timeout Errors**
|
||||
- Increase `timeout_seconds` in `api_config`
|
||||
- Check network connectivity
|
||||
- Verify API endpoint is accessible
|
||||
|
||||
3. **File Not Found Errors**
|
||||
- For batch mode: Verify file paths in `wind_farms_config.json`
|
||||
- For single mode: Verify file paths in `src/config.py`
|
||||
- Ensure JSON files exist and are readable
|
||||
|
||||
4. **Data Validation Errors**
|
||||
- Check JSON format matches required structure
|
||||
- Verify coordinate values are valid numbers
|
||||
- Ensure timestamp format is supported
|
||||
- For API data: Check API response format matches expected structure
|
||||
|
||||
5. **Empty Data / NaT Errors**
|
||||
- System handles empty datasets gracefully
|
||||
- Check API date range - data might not exist for specified period
|
||||
- Verify location bounds cover the area of interest
|
||||
- Check logs for API response details
|
||||
|
||||
6. **Memory Issues with Large Datasets**
|
||||
- Use `separate_by_month.py` to split large files
|
||||
- Adjust analysis period to smaller time ranges
|
||||
- Process farms individually using `--farm-id` flag
|
||||
|
||||
7. **DOCX Generation Errors**
|
||||
- Ensure sufficient disk space
|
||||
- Check write permissions for output directory
|
||||
|
||||
8. **Risk Score Heatmap Issues**
|
||||
- Verify distance_rings configuration is valid
|
||||
- Check that lightning data contains valid current values
|
||||
- Ensure turbine coordinates are properly formatted
|
||||
|
||||
9. **Batch Generation Issues**
|
||||
- Check `batch_summary_YYYY-MM-DD.json` for detailed error information
|
||||
- Verify all farms have valid configuration
|
||||
- Check `batch_generation_YYYY-MM-DD.log` for detailed logs
|
||||
- Use `--list-farms` to verify farm configuration
|
||||
|
||||
### Logging
|
||||
|
||||
**Single Report Mode:**
|
||||
- `lightning_report.log`: Application execution log
|
||||
|
||||
**Batch Generation Mode:**
|
||||
- `batch_generation_YYYY-MM-DD.log`: Batch execution log with per-farm details
|
||||
- `batch_summary_YYYY-MM-DD.json`: Structured summary of batch processing
|
||||
|
||||
Logs include:
|
||||
- Data loading progress
|
||||
- API request/response details
|
||||
- Risk calculation details
|
||||
- Error messages and stack traces
|
||||
- Performance metrics
|
||||
- Farm processing status
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **Large Datasets**: For datasets with >100,000 lightning strikes, consider:
|
||||
- Using date range filtering
|
||||
- Splitting data by month
|
||||
- Increasing system memory allocation
|
||||
|
||||
- **Optimizations used**:
|
||||
- BallTree neighbor queries for CG risk scoring (O(n log n) build; sublinear queries)
|
||||
- DBSCAN clustering with Haversine metric for grouping; O(N^2) fallback maintained
|
||||
- Vectorized Haversine distance utilities (array-based)
|
||||
- Shared per-group precomputation of distances and ring indices reused by maps and tables
|
||||
- Centralized date/time parsing and formatting
|
||||
- Efficient risk score heatmap generation with contour overlay
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Follow the existing code structure and naming conventions
|
||||
2. Add appropriate error handling and logging
|
||||
3. Update configuration options as needed
|
||||
4. Test with various data formats and sizes
|
||||
5. Update documentation for new features
|
||||
6. Maintain consistency with the fixed interval coloring system
|
||||
|
||||
## License
|
||||
|
||||
This project is proprietary software. All rights reserved.
|
||||
|
||||
## Support
|
||||
|
||||
For technical support or feature requests, please contact the development team with:
|
||||
- Detailed error messages
|
||||
- Sample data (if possible)
|
||||
- System configuration details
|
||||
- Expected vs actual behavior description
|
||||
181
README_UTM_Converter.md
Normal file
181
README_UTM_Converter.md
Normal file
@ -0,0 +1,181 @@
|
||||
# UTM ED50 to WGS84 Coordinate Converter
|
||||
|
||||
This script converts UTM (Universal Transverse Mercator) coordinates from ED50 (European Datum 1950) reference system to WGS84 format. It supports 6-degree UTM zones and handles both northern and southern hemispheres.
|
||||
|
||||
## Features
|
||||
|
||||
- Convert single coordinates interactively
|
||||
- Batch convert coordinates from CSV files
|
||||
- Support for all UTM zones (1-60)
|
||||
- Automatic handling of northern/southern hemispheres
|
||||
- Error handling and validation
|
||||
- Detailed conversion statistics
|
||||
|
||||
## Installation
|
||||
|
||||
1. Install the required dependencies:
|
||||
```bash
|
||||
pip install -r utm_converter_requirements.txt
|
||||
```
|
||||
|
||||
Or install manually:
|
||||
```bash
|
||||
pip install pyproj pandas
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Interactive Mode
|
||||
|
||||
Run the script in interactive mode to convert single coordinates:
|
||||
|
||||
```bash
|
||||
python utm_ed50_to_wgs84_converter.py --interactive
|
||||
```
|
||||
|
||||
Example session:
|
||||
```
|
||||
UTM ED50 to WGS84 Coordinate Converter
|
||||
========================================
|
||||
Enter coordinates (type 'quit' to exit)
|
||||
|
||||
Enter easting (meters): 500000
|
||||
Enter northing (meters): 4500000
|
||||
Enter UTM zone (1-60): 35
|
||||
Enter hemisphere (N/S) [default: N]: N
|
||||
|
||||
WGS84 Coordinates:
|
||||
Latitude: 40.12345678°
|
||||
Longitude: 32.87654321°
|
||||
----------------------------------------
|
||||
```
|
||||
|
||||
### Batch Conversion from CSV
|
||||
|
||||
Convert multiple coordinates from a CSV file:
|
||||
|
||||
```bash
|
||||
python utm_ed50_to_wgs84_converter.py input.csv output.csv
|
||||
```
|
||||
|
||||
#### Input CSV Format
|
||||
|
||||
The input CSV should contain the following columns:
|
||||
|
||||
| Column | Description | Required | Default |
|
||||
|--------|-------------|----------|---------|
|
||||
| `easting` | UTM easting coordinate in meters | Yes | - |
|
||||
| `northing` | UTM northing coordinate in meters | Yes | - |
|
||||
| `zone` | UTM zone number (1-60) | Yes | - |
|
||||
| `northern` | Hemisphere flag (True/False) | No | True |
|
||||
|
||||
Example input CSV:
|
||||
```csv
|
||||
easting,northing,zone,northern,description
|
||||
500000,4500000,35,True,Sample point 1
|
||||
600000,4600000,36,True,Sample point 2
|
||||
400000,4400000,34,True,Sample point 3
|
||||
```
|
||||
|
||||
#### Output CSV Format
|
||||
|
||||
The output CSV will contain all original columns plus:
|
||||
- `wgs84_lat`: WGS84 latitude in decimal degrees
|
||||
- `wgs84_lon`: WGS84 longitude in decimal degrees
|
||||
|
||||
### Custom Column Names
|
||||
|
||||
If your CSV uses different column names, specify them with command line arguments:
|
||||
|
||||
```bash
|
||||
python utm_ed50_to_wgs84_converter.py input.csv output.csv \
|
||||
--easting-col X \
|
||||
--northing-col Y \
|
||||
--zone-col ZONE \
|
||||
--northern-col HEMISPHERE
|
||||
```
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Coordinate Systems
|
||||
|
||||
- **ED50 (European Datum 1950)**: Historical European geodetic datum
|
||||
- **WGS84**: World Geodetic System 1984, current global standard
|
||||
- **UTM**: Universal Transverse Mercator projection system
|
||||
|
||||
### Conversion Process
|
||||
|
||||
1. **ED50 UTM → WGS84 UTM**: Transform between datums using pyproj
|
||||
2. **WGS84 UTM → WGS84 Lat/Lon**: Convert from projected to geographic coordinates
|
||||
|
||||
### UTM Zones
|
||||
|
||||
The script supports all 60 UTM zones:
|
||||
- Zones 1-60 cover the globe in 6-degree longitude bands
|
||||
- Zone 1: 180°W to 174°W
|
||||
- Zone 60: 174°E to 180°E
|
||||
|
||||
### Accuracy
|
||||
|
||||
The conversion accuracy depends on:
|
||||
- Quality of the original ED50 coordinates
|
||||
- Geographic location (accuracy varies by region)
|
||||
- Typically within 1-10 meters for most European locations
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Interactive Conversion
|
||||
|
||||
```bash
|
||||
python utm_ed50_to_wgs84_converter.py --interactive
|
||||
```
|
||||
|
||||
### Example 2: Batch Conversion
|
||||
|
||||
```bash
|
||||
python utm_ed50_to_wgs84_converter.py sample_utm_ed50_data.csv converted_coordinates.csv
|
||||
```
|
||||
|
||||
### Example 3: Custom Column Names
|
||||
|
||||
```bash
|
||||
python utm_ed50_to_wgs84_converter.py data.csv output.csv \
|
||||
--easting-col X_COORD \
|
||||
--northing-col Y_COORD \
|
||||
--zone-col UTM_ZONE
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The script handles various error conditions:
|
||||
|
||||
- **Invalid UTM zones**: Must be between 1-60
|
||||
- **Missing columns**: Reports which required columns are missing
|
||||
- **Invalid coordinates**: Skips invalid rows and reports warnings
|
||||
- **File not found**: Clear error messages for missing input files
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **pyproj**: Coordinate transformation library
|
||||
- **pandas**: Data manipulation and CSV handling
|
||||
- **argparse**: Command line argument parsing
|
||||
|
||||
## License
|
||||
|
||||
This script is provided as-is for educational and practical use.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **"Invalid UTM zone" error**: Ensure zone numbers are between 1-60
|
||||
2. **"Missing required columns" error**: Check your CSV column names
|
||||
3. **Conversion failures**: Verify coordinate values are numeric
|
||||
4. **Import errors**: Install required dependencies with pip
|
||||
|
||||
### Getting Help
|
||||
|
||||
Run the script with `--help` for command line options:
|
||||
```bash
|
||||
python utm_ed50_to_wgs84_converter.py --help
|
||||
```
|
||||
463
batch_generate.py
Normal file
463
batch_generate.py
Normal file
@ -0,0 +1,463 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Batch generate lightning reports for multiple wind farms.
|
||||
"""
|
||||
|
||||
import sys
|
||||
import json
|
||||
import argparse
|
||||
import logging
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
import pandas as pd
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from src.api.data_fetcher import APIDataFetcher
|
||||
from src.data.loader import load_turbine_data, load_lightning_data_from_csv
|
||||
from src.analysis.risk import calculate_turbine_risks
|
||||
from src.analysis.grouping import create_turbine_groups
|
||||
from src.reporting.docx import create_docx_report
|
||||
from src.reporting.filename_utils import farm_local_date_range_from_config, slugify_ascii_underscore
|
||||
from src.utils import (
|
||||
filter_lightning_data_by_date_range,
|
||||
format_date_ddmmyyyy,
|
||||
format_period_display_for_report,
|
||||
normalize_local_time_to_timezone,
|
||||
)
|
||||
from src.config import config as global_config
|
||||
|
||||
load_dotenv()
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||
handlers=[
|
||||
logging.StreamHandler(sys.stdout),
|
||||
logging.FileHandler(f'batch_generation_{datetime.now().strftime("%Y-%m-%d")}.log')
|
||||
]
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def load_wind_farms_config(config_path: str) -> dict:
|
||||
"""Load wind farms configuration from JSON file."""
|
||||
try:
|
||||
with open(config_path, 'r', encoding='utf-8') as f:
|
||||
config = json.load(f)
|
||||
logger.info(f"Loaded configuration from {config_path}")
|
||||
return config
|
||||
except FileNotFoundError:
|
||||
logger.error(f"Configuration file not found: {config_path}")
|
||||
raise
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"Invalid JSON in configuration file: {e}")
|
||||
raise
|
||||
|
||||
|
||||
def filter_enabled_farms(wind_farms: list) -> tuple:
|
||||
"""Filter farms by enabled status."""
|
||||
enabled = []
|
||||
disabled = []
|
||||
|
||||
for farm in wind_farms:
|
||||
is_enabled = farm.get('enabled', True)
|
||||
if is_enabled:
|
||||
enabled.append(farm)
|
||||
else:
|
||||
disabled.append(farm)
|
||||
|
||||
return enabled, disabled
|
||||
|
||||
|
||||
def get_location_bounds(farm: dict, turbine_df: pd.DataFrame, api_fetcher: APIDataFetcher) -> dict:
|
||||
"""Get location bounds for API query."""
|
||||
location_config = farm['api_params']['location_bounds']
|
||||
|
||||
if location_config['method'] == 'auto':
|
||||
max_distance_ring = max(farm['distance_rings'])
|
||||
padding_km = location_config.get('padding_km', 5)
|
||||
|
||||
bounds = api_fetcher.calculate_location_bounds(
|
||||
turbine_df,
|
||||
max_distance_ring,
|
||||
padding_km
|
||||
)
|
||||
return bounds
|
||||
else:
|
||||
return {
|
||||
'center_lat': location_config['center_lat'],
|
||||
'center_lng': location_config['center_lng'],
|
||||
'radius_km': location_config['radius_km']
|
||||
}
|
||||
|
||||
|
||||
def update_global_config(farm: dict, start_date: str = None, end_date: str = None):
|
||||
"""Update global config with farm-specific settings."""
|
||||
global_config.distance_rings = farm.get('distance_rings', global_config.distance_rings)
|
||||
global_config.ring_colors = farm.get('ring_colors', global_config.ring_colors)
|
||||
# DOCX title is based on the top-level `name` field for the farm.
|
||||
global_config.wind_farm_name = farm.get('name', 'Unknown')
|
||||
global_config.timezone = farm['report_config'].get('timezone', None)
|
||||
|
||||
# Lightning data source configuration (auto-detected from farm config)
|
||||
lightning_source_type = farm.get('lightning_source_type')
|
||||
if lightning_source_type:
|
||||
global_config.lightning_source_type = lightning_source_type
|
||||
if lightning_source_type == 'csv':
|
||||
global_config.lightning_csv = farm.get('lightning_csv')
|
||||
elif lightning_source_type == 'api':
|
||||
global_config.lightning_json = farm.get('lightning_json')
|
||||
|
||||
# Set date range if provided (for reporting)
|
||||
if start_date and end_date:
|
||||
global_config.analysis_start_date = start_date
|
||||
global_config.analysis_end_date = end_date
|
||||
|
||||
# Update grouping params if specified in farm config
|
||||
if 'grouping_params' in farm:
|
||||
global_config.grouping_params = farm['grouping_params']
|
||||
|
||||
logger.debug(f"Updated global config: distance_rings={global_config.distance_rings}, wind_farm_name={global_config.wind_farm_name}")
|
||||
|
||||
|
||||
def convert_api_response_to_dataframe(records: list, data_type: str = 'lightning') -> pd.DataFrame:
|
||||
"""
|
||||
Convert API response to DataFrame format expected by existing code.
|
||||
|
||||
Args:
|
||||
records: List of records from API
|
||||
data_type: 'lightning' or 'storm'
|
||||
|
||||
Returns:
|
||||
DataFrame in expected format
|
||||
"""
|
||||
if not records:
|
||||
if data_type == 'lightning':
|
||||
return pd.DataFrame(columns=['lat', 'lng', 'current', 'p_type', 'local_time'])
|
||||
else:
|
||||
return pd.DataFrame()
|
||||
|
||||
df = pd.DataFrame(records)
|
||||
|
||||
if data_type == 'lightning':
|
||||
if 'local_time' not in df.columns and 'timestamp' in df.columns:
|
||||
df['local_time'] = pd.to_datetime(df['timestamp'])
|
||||
elif 'local_time' in df.columns:
|
||||
df['local_time'] = pd.to_datetime(df['local_time'])
|
||||
|
||||
if 'current_abs' not in df.columns and 'current' in df.columns:
|
||||
df['current_abs'] = df['current'].abs()
|
||||
|
||||
return df
|
||||
|
||||
|
||||
def process_farm(farm: dict, api_fetcher: APIDataFetcher, config: dict) -> dict:
|
||||
"""Process a single farm and generate report."""
|
||||
farm_id = farm['farm_id']
|
||||
farm_name = farm.get('name', farm_id)
|
||||
|
||||
logger.info(f"Processing farm: {farm_id} ({farm_name})")
|
||||
|
||||
try:
|
||||
start_time = datetime.now()
|
||||
|
||||
# Update global config with farm-specific settings BEFORE processing
|
||||
# (dates will be set later after they're determined)
|
||||
update_global_config(farm)
|
||||
|
||||
turbine_file = farm['coordinates_file']
|
||||
turbine_df = load_turbine_data(turbine_file)
|
||||
logger.info(f"Loaded {len(turbine_df)} turbines")
|
||||
|
||||
location_bounds = get_location_bounds(farm, turbine_df, api_fetcher)
|
||||
|
||||
query_start, query_end = APIDataFetcher.determine_query_date_range(farm, config['api_config'])
|
||||
start_date_str = query_start.strftime('%Y-%m-%d')
|
||||
end_date_str = query_end.strftime('%Y-%m-%d')
|
||||
|
||||
source_type = farm.get('lightning_source_type', 'api')
|
||||
|
||||
if source_type == 'csv':
|
||||
lightning_df = load_lightning_data_from_csv(farm.get('lightning_csv'))
|
||||
logger.info(f"Loaded {len(lightning_df)} lightning records from CSV for {farm_id}")
|
||||
else:
|
||||
logger.info(f"Fetching lightning data from API for period: {start_date_str} to {end_date_str}")
|
||||
lightning_records = api_fetcher.fetch_lightning_data(
|
||||
center_lat=location_bounds['center_lat'],
|
||||
center_lng=location_bounds['center_lng'],
|
||||
radius_km=location_bounds['radius_km'],
|
||||
start_date=start_date_str,
|
||||
end_date=end_date_str
|
||||
)
|
||||
lightning_df = convert_api_response_to_dataframe(lightning_records, 'lightning')
|
||||
logger.info(f"Converted {len(lightning_df)} lightning records to DataFrame")
|
||||
|
||||
if len(lightning_df) == 0:
|
||||
logger.warning(f"No lightning data found for {farm_id}")
|
||||
lightning_df = pd.DataFrame(columns=['lat', 'lng', 'current', 'p_type', 'local_time', 'current_abs'])
|
||||
|
||||
storm_records = api_fetcher.fetch_storm_data(
|
||||
center_lat=location_bounds['center_lat'],
|
||||
center_lng=location_bounds['center_lng'],
|
||||
radius_km=location_bounds['radius_km'],
|
||||
start_date=start_date_str,
|
||||
end_date=end_date_str
|
||||
)
|
||||
|
||||
date_range_cfg = farm.get('api_params', {}).get('date_range', {})
|
||||
start_filter = None
|
||||
end_filter = None
|
||||
method = date_range_cfg.get('method')
|
||||
|
||||
if source_type != 'csv':
|
||||
if method == 'manual':
|
||||
start_filter = date_range_cfg.get('start_date')
|
||||
end_filter = date_range_cfg.get('end_date')
|
||||
else:
|
||||
query_range_cfg = date_range_cfg.get('query_range', {})
|
||||
start_filter = query_range_cfg.get('start_date')
|
||||
end_filter = query_range_cfg.get('end_date')
|
||||
|
||||
if len(lightning_df) > 0 and (start_filter is not None or end_filter is not None):
|
||||
lightning_df = filter_lightning_data_by_date_range(lightning_df, start_filter, end_filter)
|
||||
|
||||
farm_tz = farm.get('report_config', {}).get('timezone')
|
||||
if len(lightning_df) > 0 and farm_tz:
|
||||
lightning_df = normalize_local_time_to_timezone(lightning_df, 'local_time', farm_tz)
|
||||
|
||||
turbine_df = calculate_turbine_risks(turbine_df, lightning_df)
|
||||
|
||||
group_data = create_turbine_groups(turbine_df)
|
||||
logger.info(f"Created {group_data['total_groups']} groups")
|
||||
|
||||
# Determine actual dates for report (display strings: DD-MM-YYYY or DD-MM-YYYY HH:MM in local time)
|
||||
if source_type == 'csv' and len(lightning_df) > 0:
|
||||
local_times = pd.to_datetime(lightning_df['local_time'])
|
||||
start_val = local_times.min()
|
||||
end_val = local_times.max()
|
||||
actual_start = start_val.strftime('%d-%m-%Y %H:%M')
|
||||
actual_end = end_val.strftime('%d-%m-%Y %H:%M')
|
||||
else:
|
||||
if method == 'manual' and date_range_cfg.get('start_date') is not None and date_range_cfg.get('end_date') is not None:
|
||||
actual_start, actual_end = format_period_display_for_report(start_filter, end_filter)
|
||||
if not actual_start or not actual_end:
|
||||
actual_start = format_date_ddmmyyyy(query_start)
|
||||
actual_end = format_date_ddmmyyyy(query_end)
|
||||
elif start_filter is not None and end_filter is not None:
|
||||
actual_start, actual_end = format_period_display_for_report(start_filter, end_filter)
|
||||
if not actual_start or not actual_end:
|
||||
actual_start = format_date_ddmmyyyy(query_start)
|
||||
actual_end = format_date_ddmmyyyy(query_end)
|
||||
else:
|
||||
actual_start = format_date_ddmmyyyy(query_start)
|
||||
actual_end = format_date_ddmmyyyy(query_end)
|
||||
|
||||
# Update global config with dates for PDF generation
|
||||
update_global_config(farm, actual_start, actual_end)
|
||||
|
||||
output_dir = Path(farm['report_config']['output_directory'])
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
local_range = farm_local_date_range_from_config(farm)
|
||||
safe_name = slugify_ascii_underscore(farm.get("name", farm_id))
|
||||
docx_filename = (
|
||||
f"{safe_name}_{local_range.start_date_yyyy_mm_dd}"
|
||||
f"_{local_range.end_date_yyyy_mm_dd}_report.docx"
|
||||
)
|
||||
docx_path = output_dir / docx_filename
|
||||
|
||||
create_docx_report(
|
||||
str(docx_path),
|
||||
turbine_df,
|
||||
lightning_df,
|
||||
storm_data_path=None,
|
||||
storm_data_records=storm_records if storm_records else None,
|
||||
)
|
||||
|
||||
processing_time = (datetime.now() - start_time).total_seconds()
|
||||
|
||||
logger.info(f"Successfully generated report for {farm_id} in {processing_time:.1f}s")
|
||||
|
||||
return {
|
||||
'farm_id': farm_id,
|
||||
'name': farm_name,
|
||||
'status': 'success',
|
||||
'report_path': str(docx_path),
|
||||
'docx_path': str(docx_path),
|
||||
'pdf_path': None,
|
||||
'location': location_bounds,
|
||||
'processing_time_seconds': processing_time,
|
||||
'lightning_records': len(lightning_df),
|
||||
'storm_records': len(storm_records)
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
processing_time = (datetime.now() - start_time).total_seconds() if 'start_time' in locals() else 0
|
||||
logger.error(f"Failed to process farm {farm_id}: {e}", exc_info=True)
|
||||
|
||||
return {
|
||||
'farm_id': farm_id,
|
||||
'name': farm_name,
|
||||
'status': 'failed',
|
||||
'error': str(e),
|
||||
'processing_time_seconds': processing_time
|
||||
}
|
||||
|
||||
|
||||
def generate_batch_summary(results: list, total_farms: int, enabled_count: int,
|
||||
disabled_count: int, start_time: datetime) -> dict:
|
||||
"""Generate batch summary report."""
|
||||
successful = [r for r in results if r['status'] == 'success']
|
||||
failed = [r for r in results if r['status'] == 'failed']
|
||||
skipped = disabled_count
|
||||
|
||||
total_time = (datetime.now() - start_time).total_seconds()
|
||||
|
||||
summary = {
|
||||
'batch_date': datetime.now().strftime('%Y-%m-%d'),
|
||||
'batch_time': datetime.now().strftime('%H:%M:%S'),
|
||||
'total_farms': total_farms,
|
||||
'enabled_farms': enabled_count,
|
||||
'disabled_farms': disabled_count,
|
||||
'processed': len(results),
|
||||
'successful': len(successful),
|
||||
'failed': len(failed),
|
||||
'skipped': skipped,
|
||||
'processing_time_seconds': total_time,
|
||||
'results': results
|
||||
}
|
||||
|
||||
return summary
|
||||
|
||||
|
||||
def save_batch_summary(summary: dict, output_dir: str):
|
||||
"""Save batch summary to JSON file."""
|
||||
output_path = Path(output_dir) / f"batch_summary_{summary['batch_date']}.json"
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with open(output_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(summary, f, indent=2)
|
||||
|
||||
logger.info(f"Batch summary saved to {output_path}")
|
||||
return output_path
|
||||
|
||||
|
||||
def list_farms(config: dict):
|
||||
"""List all farms and their enabled status."""
|
||||
print("\nWind Farms Configuration:")
|
||||
print("=" * 60)
|
||||
|
||||
for i, farm in enumerate(config['wind_farms'], 1):
|
||||
enabled = farm.get('enabled', True)
|
||||
status = "✓ Enabled" if enabled else "✗ Disabled"
|
||||
farm_id = farm['farm_id']
|
||||
name = farm.get('name', 'N/A')
|
||||
|
||||
print(f"{i}. {status}: {farm_id} - {name}")
|
||||
|
||||
enabled, disabled = filter_enabled_farms(config['wind_farms'])
|
||||
print(f"\nTotal: {len(config['wind_farms'])} farms ({len(enabled)} enabled, {len(disabled)} disabled)")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Batch generate lightning reports')
|
||||
parser.add_argument('--config', required=True, help='Path to wind_farms_config.json')
|
||||
parser.add_argument('--farm-id', help='Process specific farm only')
|
||||
parser.add_argument('--force-all', action='store_true',
|
||||
help='Process all farms, ignoring enabled flag')
|
||||
parser.add_argument('--force', action='store_true',
|
||||
help='Process even if disabled')
|
||||
parser.add_argument('--list-farms', action='store_true',
|
||||
help='List all farms and their enabled status')
|
||||
parser.add_argument('--output-dir', help='Override output directory')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
try:
|
||||
config = load_wind_farms_config(args.config)
|
||||
|
||||
if args.list_farms:
|
||||
list_farms(config)
|
||||
return
|
||||
|
||||
api_config = config['api_config']
|
||||
api_fetcher = APIDataFetcher(
|
||||
base_url=api_config['base_url'],
|
||||
timeout=api_config.get('timeout_seconds', 30),
|
||||
retry_attempts=api_config.get('retry_attempts', 3)
|
||||
)
|
||||
|
||||
wind_farms = config['wind_farms']
|
||||
|
||||
if args.force_all:
|
||||
farms_to_process = wind_farms
|
||||
logger.info(f"Processing all {len(farms_to_process)} farms (--force-all)")
|
||||
else:
|
||||
enabled_farms, disabled_farms = filter_enabled_farms(wind_farms)
|
||||
farms_to_process = enabled_farms
|
||||
|
||||
if disabled_farms:
|
||||
logger.info(f"Skipping {len(disabled_farms)} disabled farms:")
|
||||
for farm in disabled_farms:
|
||||
logger.info(f" - {farm['farm_id']}: {farm.get('name', 'N/A')}")
|
||||
|
||||
if args.farm_id:
|
||||
farms_to_process = [f for f in farms_to_process if f['farm_id'] == args.farm_id]
|
||||
if not farms_to_process:
|
||||
logger.error(f"Farm '{args.farm_id}' not found or not enabled")
|
||||
return
|
||||
|
||||
if not farms_to_process:
|
||||
logger.warning("No farms to process")
|
||||
return
|
||||
|
||||
logger.info(f"Processing {len(farms_to_process)} farm(s)")
|
||||
start_time = datetime.now()
|
||||
|
||||
results = []
|
||||
for i, farm in enumerate(farms_to_process, 1):
|
||||
logger.info(f"\n[{i}/{len(farms_to_process)}] Processing {farm['farm_id']}...")
|
||||
result = process_farm(farm, api_fetcher, config)
|
||||
results.append(result)
|
||||
|
||||
enabled_count, disabled_count = filter_enabled_farms(wind_farms)
|
||||
summary = generate_batch_summary(
|
||||
results,
|
||||
len(wind_farms),
|
||||
len(enabled_count),
|
||||
len(disabled_count),
|
||||
start_time
|
||||
)
|
||||
|
||||
output_base = config.get('output_base_directory', 'reports/')
|
||||
save_batch_summary(summary, output_base)
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("Batch Processing Summary")
|
||||
print("=" * 60)
|
||||
print(f"Total farms: {summary['total_farms']}")
|
||||
print(f"Enabled: {summary['enabled_farms']}")
|
||||
print(f"Disabled: {summary['disabled_farms']}")
|
||||
print(f"Processed: {summary['processed']}")
|
||||
print(f"Successful: {summary['successful']}")
|
||||
print(f"Failed: {summary['failed']}")
|
||||
print(f"Total time: {summary['processing_time_seconds']:.1f}s")
|
||||
print("=" * 60)
|
||||
|
||||
if summary['failed'] > 0:
|
||||
print("\nFailed farms:")
|
||||
for result in [r for r in results if r['status'] == 'failed']:
|
||||
print(f" - {result['farm_id']}: {result.get('error', 'Unknown error')}")
|
||||
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Batch processing interrupted by user")
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
logger.error(f"Batch processing failed: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
||||
BIN
logo/earth_networks.jpg
Normal file
BIN
logo/earth_networks.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 75 KiB |
BIN
logo/iklim.png
Normal file
BIN
logo/iklim.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 19 KiB |
84
main.py
Normal file
84
main.py
Normal file
@ -0,0 +1,84 @@
|
||||
import sys
|
||||
import argparse
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from src.api.data_fetcher import APIDataFetcher
|
||||
|
||||
import batch_generate
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||
handlers=[
|
||||
logging.StreamHandler(sys.stdout),
|
||||
logging.FileHandler('lightning_report.log')
|
||||
]
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def main():
|
||||
"""Generate DOCX reports (from wind_farms_config.json)."""
|
||||
try:
|
||||
parser = argparse.ArgumentParser(description="DOCX lightning report generation (from wind_farms_config.json)")
|
||||
parser.add_argument("--config", default="wind_farms_config.json", help="Path to wind_farms_config.json")
|
||||
parser.add_argument("--farm-id", default=None, help="farm_id to process (if omitted, process enabled farms)")
|
||||
parser.add_argument("--force", action="store_true", help="Process even if farm is disabled")
|
||||
args = parser.parse_args()
|
||||
|
||||
logger.info("Starting DOCX report generation...")
|
||||
config = batch_generate.load_wind_farms_config(args.config)
|
||||
farms = config.get("wind_farms", [])
|
||||
|
||||
api_cfg = config["api_config"]
|
||||
api_fetcher = APIDataFetcher(
|
||||
base_url=api_cfg["base_url"],
|
||||
timeout=api_cfg.get("timeout_seconds", 30),
|
||||
retry_attempts=api_cfg.get("retry_attempts", 3),
|
||||
)
|
||||
|
||||
if args.farm_id:
|
||||
farms_to_process = [f for f in farms if f.get("farm_id") == args.farm_id]
|
||||
if not farms_to_process:
|
||||
logger.error(f"Farm '{args.farm_id}' not found in {args.config}")
|
||||
sys.exit(1)
|
||||
else:
|
||||
farms_to_process = farms
|
||||
|
||||
results: list[dict[str, Any]] = []
|
||||
for idx, farm in enumerate(farms_to_process, 1):
|
||||
farm_id = farm.get("farm_id")
|
||||
name = farm.get("name", farm_id)
|
||||
if not args.force and not farm.get("enabled", True):
|
||||
logger.info(f"[{idx}/{len(farms_to_process)}] Skipping disabled farm: {farm_id} ({name})")
|
||||
continue
|
||||
|
||||
logger.info(f"[{idx}/{len(farms_to_process)}] Processing farm: {farm_id} ({name})")
|
||||
result = batch_generate.process_farm(farm, api_fetcher, config)
|
||||
results.append(result)
|
||||
|
||||
if result.get("status") != "success":
|
||||
logger.error(f"Report generation failed for {farm_id}: {result.get('error')}")
|
||||
# Keep going if doing batch; stop early for single farm.
|
||||
if args.farm_id:
|
||||
sys.exit(1)
|
||||
|
||||
for r in results:
|
||||
if r.get("status") == "success":
|
||||
logger.info(f"✅ DOCX report saved as {r.get('docx_path')}")
|
||||
|
||||
logger.info("Lightning report generation completed successfully!")
|
||||
|
||||
except FileNotFoundError as e:
|
||||
logger.error(f"File not found: {e}")
|
||||
sys.exit(1)
|
||||
except ValueError as e:
|
||||
logger.error(f"Data validation error: {e}")
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
logger.error(f"Unexpected error: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
197
n8n_report_branch.json
Normal file
197
n8n_report_branch.json
Normal file
@ -0,0 +1,197 @@
|
||||
{
|
||||
"name": "Lightning Report Branch (paste into Lightning_Report_Automatic)",
|
||||
"nodes": [
|
||||
{
|
||||
"parameters": {
|
||||
"assignments": {
|
||||
"assignments": [
|
||||
{ "id": "rpt-cid", "name": "customer_id", "value": "={{ $('Loop Over Items').item.json.id }}", "type": "string" },
|
||||
{ "id": "rpt-cname", "name": "customer_name", "value": "={{ $('Loop Over Items').item.json.customer_name }}", "type": "string" },
|
||||
{ "id": "rpt-tz", "name": "timezone", "value": "={{ $('Loop Over Items').item.json.timezone || 'Europe/Istanbul' }}", "type": "string" },
|
||||
{ "id": "rpt-clat", "name": "centroid_lat", "value": "={{ $('Centroid & Distance Ring calculation').item.json.centroid_latitude }}", "type": "number" },
|
||||
{ "id": "rpt-clon", "name": "centroid_lon", "value": "={{ $('Centroid & Distance Ring calculation').item.json.centroid_longitude }}", "type": "number" },
|
||||
{ "id": "rpt-bnd", "name": "boundary_m", "value": "={{ $('Centroid & Distance Ring calculation').item.json.monitoring_boundary_m }}","type": "number" },
|
||||
{ "id": "rpt-rings", "name": "rings", "value": "={{ $('Centroid & Distance Ring calculation').item.json.rings }}", "type": "object" },
|
||||
{ "id": "rpt-rcolors", "name": "ring_colors", "value": "={{ [\"#B71C1C\", \"#F94144\", \"#F8961E\", \"#90BE6D\"] }}", "type": "array" },
|
||||
{ "id": "rpt-ts", "name": "t_start", "value": "={{ $('Logic Gate').item.json.tStart }}", "type": "number" },
|
||||
{ "id": "rpt-te", "name": "t_end", "value": "={{ $('Logic Gate').item.json.tLast }}", "type": "number" },
|
||||
{ "id": "rpt-ns", "name": "n_strikes", "value": "={{ $('Logic Gate').item.json.allStrikes.length }}", "type": "number" },
|
||||
{ "id": "rpt-strikes", "name": "strikes", "value": "={{ $('Logic Gate').item.json.allStrikes }}", "type": "array" },
|
||||
{ "id": "rpt-turbines", "name": "turbines", "value": "={{ $('Get Customer Wind Turbines').all().map(t => t.json) }}", "type": "array" }
|
||||
]
|
||||
},
|
||||
"options": {}
|
||||
},
|
||||
"type": "n8n-nodes-base.set",
|
||||
"typeVersion": 3.4,
|
||||
"position": [3440, 480],
|
||||
"id": "a1a1a1a1-0001-4a01-8a01-000000000001",
|
||||
"name": "Report: Gather Inputs"
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"jsCode": "const crypto = require('crypto');\nconst HMAC_SECRET = 'c88f845bd6d520ded507ef6b02efc223019ccf68f41d9070705712d480ba5166';\nconst URI = '/v1/thunderstorms/within';\n\nconst ctx = $('Report: Gather Inputs').item.json;\nconst auth = $('Restore Credentials').all().pop().json;\n\nif (!ctx.t_start || !ctx.t_end) {\n throw new Error('Missing storm timestamps from Logic Gate.');\n}\n\nconst durationSeconds = Math.max(600, Math.floor((ctx.t_end - ctx.t_start) / 1000));\nconst timestamp = Date.now().toString();\n\nconst bodyPayload = {\n latitude: Number(Number(ctx.centroid_lat).toFixed(6)),\n longitude: Number(Number(ctx.centroid_lon).toFixed(6)),\n radius: parseInt(ctx.boundary_m, 10),\n backwardInterval: durationSeconds,\n endTimeEpoch: Number(ctx.t_end),\n intersectsWith: 'THREAT_POLYGON',\n pageNumber: 0,\n pageSize: 100\n};\n\nconst bodyString = JSON.stringify(bodyPayload);\nconst dataToSign = `POST|${URI}|${timestamp}|${bodyString}`;\nconst signature = crypto.createHmac('sha256', HMAC_SECRET).update(dataToSign).digest('hex').toLowerCase();\n\nreturn [{\n json: {\n requestBody: bodyPayload,\n headers: {\n 'X-Signature': signature,\n 'X-Timestamp': timestamp,\n 'X-Nonce': crypto.randomUUID(),\n 'X-Idempotency-Key': crypto.randomUUID(),\n 'Authorization': 'Bearer ' + auth.accessToken,\n 'Content-Type': 'application/json'\n }\n }\n}];"
|
||||
},
|
||||
"type": "n8n-nodes-base.code",
|
||||
"typeVersion": 2,
|
||||
"position": [3680, 480],
|
||||
"id": "a2a2a2a2-0002-4a02-8a02-000000000002",
|
||||
"name": "Report: Calc Thunderstorm Headers"
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"method": "POST",
|
||||
"url": "https://api-test.iklim.co/v1/thunderstorms/within",
|
||||
"sendHeaders": true,
|
||||
"headerParameters": {
|
||||
"parameters": [
|
||||
{ "name": "Authorization", "value": "={{ $json.headers.Authorization }}" },
|
||||
{ "name": "X-Signature", "value": "={{ $json.headers['X-Signature'] }}" },
|
||||
{ "name": "X-Timestamp", "value": "={{ $json.headers['X-Timestamp'] }}" },
|
||||
{ "name": "X-Nonce", "value": "={{ $json.headers['X-Nonce'] }}" },
|
||||
{ "name": "X-Idempotency-Key", "value": "={{ $json.headers['X-Idempotency-Key'] }}" },
|
||||
{ "name": "Content-Type", "value": "={{ $json.headers['Content-Type'] }}" },
|
||||
{ "name": "Accept", "value": "={{ $json.headers['Content-Type'] }}" }
|
||||
]
|
||||
},
|
||||
"sendBody": true,
|
||||
"specifyBody": "json",
|
||||
"jsonBody": "={{ $json.requestBody }}",
|
||||
"options": {}
|
||||
},
|
||||
"type": "n8n-nodes-base.httpRequest",
|
||||
"typeVersion": 4.3,
|
||||
"position": [3920, 480],
|
||||
"id": "a3a3a3a3-0003-4a03-8a03-000000000003",
|
||||
"name": "Report: Fetch Thunderstorms",
|
||||
"onError": "continueRegularOutput"
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"method": "POST",
|
||||
"url": "=https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key={{ $env.GEMINI_API_KEY }}",
|
||||
"sendBody": true,
|
||||
"specifyBody": "json",
|
||||
"jsonBody": "={{ { contents: [ { parts: [ { text: 'Sen bir yildirim risk analisti raportorusun. Asagidaki verileri kullanarak Turkce, 2-3 paragraf, profesyonel bir ozet yaz. Sayilari dogal dilde ver; asiri teknik olma; santral adini kullan; sure, toplam darbe ve yogunluk hakkinda yorum yap.\\n\\nSantral: ' + $('Report: Gather Inputs').item.json.customer_name + '\\nZaman dilimi: ' + $('Report: Gather Inputs').item.json.timezone + '\\nFirtina baslangici (epoch ms): ' + $('Report: Gather Inputs').item.json.t_start + '\\nFirtina bitisi (epoch ms): ' + $('Report: Gather Inputs').item.json.t_end + '\\nToplam yildirim darbesi: ' + $('Report: Gather Inputs').item.json.n_strikes + '\\nIzleme yaricapi (m): ' + $('Report: Gather Inputs').item.json.boundary_m } ] } ] } }}",
|
||||
"options": {}
|
||||
},
|
||||
"type": "n8n-nodes-base.httpRequest",
|
||||
"typeVersion": 4.3,
|
||||
"position": [4160, 480],
|
||||
"id": "a4a4a4a4-0004-4a04-8a04-000000000004",
|
||||
"name": "Report: Gemini Commentary",
|
||||
"onError": "continueRegularOutput"
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"assignments": {
|
||||
"assignments": [
|
||||
{ "id": "bp-cid", "name": "customer_id", "value": "={{ $('Report: Gather Inputs').item.json.customer_id }}", "type": "string" },
|
||||
{ "id": "bp-cname", "name": "customer_name", "value": "={{ $('Report: Gather Inputs').item.json.customer_name }}", "type": "string" },
|
||||
{ "id": "bp-tz", "name": "timezone", "value": "={{ $('Report: Gather Inputs').item.json.timezone }}", "type": "string" },
|
||||
{ "id": "bp-clat", "name": "centroid_lat", "value": "={{ $('Report: Gather Inputs').item.json.centroid_lat }}", "type": "number" },
|
||||
{ "id": "bp-clon", "name": "centroid_lon", "value": "={{ $('Report: Gather Inputs').item.json.centroid_lon }}", "type": "number" },
|
||||
{ "id": "bp-bnd", "name": "boundary_m", "value": "={{ $('Report: Gather Inputs').item.json.boundary_m }}", "type": "number" },
|
||||
{ "id": "bp-rings", "name": "rings", "value": "={{ $('Report: Gather Inputs').item.json.rings }}", "type": "object" },
|
||||
{ "id": "bp-rcolors", "name": "ring_colors", "value": "={{ $('Report: Gather Inputs').item.json.ring_colors }}", "type": "array" },
|
||||
{ "id": "bp-ts", "name": "t_start", "value": "={{ $('Report: Gather Inputs').item.json.t_start }}", "type": "number" },
|
||||
{ "id": "bp-te", "name": "t_end", "value": "={{ $('Report: Gather Inputs').item.json.t_end }}", "type": "number" },
|
||||
{ "id": "bp-ns", "name": "n_strikes", "value": "={{ $('Report: Gather Inputs').item.json.n_strikes }}", "type": "number" },
|
||||
{ "id": "bp-strikes", "name": "strikes", "value": "={{ $('Report: Gather Inputs').item.json.strikes }}", "type": "array" },
|
||||
{ "id": "bp-turbines", "name": "turbines", "value": "={{ $('Report: Gather Inputs').item.json.turbines }}", "type": "array" },
|
||||
{ "id": "bp-gem", "name": "gemini_text", "value": "={{ $('Report: Gemini Commentary').item.json?.candidates?.[0]?.content?.parts?.[0]?.text || '' }}", "type": "string" },
|
||||
{ "id": "bp-storms", "name": "storm_records", "value": "={{ $('Report: Fetch Thunderstorms').item.json?.thunderstorms || $('Report: Fetch Thunderstorms').item.json?.data || [] }}", "type": "array" }
|
||||
]
|
||||
},
|
||||
"options": {}
|
||||
},
|
||||
"type": "n8n-nodes-base.set",
|
||||
"typeVersion": 3.4,
|
||||
"position": [4320, 480],
|
||||
"id": "a8a8a8a8-0008-4a08-8a08-000000000008",
|
||||
"name": "Report: Build Payload"
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"method": "POST",
|
||||
"url": "={{ $env.REPORT_SERVICE_URL || 'http://report-service:8000' }}/generate",
|
||||
"sendHeaders": true,
|
||||
"headerParameters": {
|
||||
"parameters": [
|
||||
{ "name": "Content-Type", "value": "application/json" },
|
||||
{ "name": "Accept", "value": "application/vnd.openxmlformats-officedocument.wordprocessingml.document" }
|
||||
]
|
||||
},
|
||||
"sendBody": true,
|
||||
"specifyBody": "json",
|
||||
"jsonBody": "={{ $json }}",
|
||||
"options": {
|
||||
"timeout": 300000,
|
||||
"response": {
|
||||
"response": {
|
||||
"responseFormat": "file",
|
||||
"outputPropertyName": "report"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"type": "n8n-nodes-base.httpRequest",
|
||||
"typeVersion": 4.3,
|
||||
"position": [4560, 480],
|
||||
"id": "a5a5a5a5-0005-4a05-8a05-000000000005",
|
||||
"name": "Report: Generate DOCX"
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"resource": "file",
|
||||
"operation": "upload",
|
||||
"binaryData": true,
|
||||
"binaryPropertyName": "report",
|
||||
"channelId": {
|
||||
"__rl": true,
|
||||
"value": "REPLACE_WITH_USER_ID",
|
||||
"mode": "id",
|
||||
"cachedResultName": "DM target user"
|
||||
},
|
||||
"options": {
|
||||
"fileName": "={{ $binary.report.fileName || ($('Report: Build Payload').item.json.customer_name + '_report.docx') }}",
|
||||
"initialComment": "={{ '⚡ ' + $('Report: Build Payload').item.json.customer_name + ' — yeni firtina raporu (' + $('Report: Build Payload').item.json.n_strikes + ' darbe) — rapor ekte' }}"
|
||||
}
|
||||
},
|
||||
"type": "n8n-nodes-base.slack",
|
||||
"typeVersion": 2.4,
|
||||
"position": [4800, 480],
|
||||
"id": "a6a6a6a6-0006-4a06-8a06-000000000006",
|
||||
"name": "Report: Send to User",
|
||||
"credentials": {
|
||||
"slackApi": {
|
||||
"id": "OKgM8VkM05pJl9kU",
|
||||
"name": "Tarla Slack Account"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"pinData": {},
|
||||
"connections": {
|
||||
"Report: Gather Inputs": {
|
||||
"main": [[{ "node": "Report: Calc Thunderstorm Headers", "type": "main", "index": 0 }]]
|
||||
},
|
||||
"Report: Calc Thunderstorm Headers": {
|
||||
"main": [[{ "node": "Report: Fetch Thunderstorms", "type": "main", "index": 0 }]]
|
||||
},
|
||||
"Report: Fetch Thunderstorms": {
|
||||
"main": [[{ "node": "Report: Gemini Commentary", "type": "main", "index": 0 }]]
|
||||
},
|
||||
"Report: Gemini Commentary": {
|
||||
"main": [[{ "node": "Report: Build Payload", "type": "main", "index": 0 }]]
|
||||
},
|
||||
"Report: Build Payload": {
|
||||
"main": [[{ "node": "Report: Generate DOCX", "type": "main", "index": 0 }]]
|
||||
},
|
||||
"Report: Generate DOCX": {
|
||||
"main": [[{ "node": "Report: Send to User", "type": "main", "index": 0 }]]
|
||||
}
|
||||
},
|
||||
"active": false,
|
||||
"settings": { "executionOrder": "v1" }
|
||||
}
|
||||
BIN
qgis/2025_07_dagpazari_RES.qgz
Normal file
BIN
qgis/2025_07_dagpazari_RES.qgz
Normal file
Binary file not shown.
36
report_service/Dockerfile
Normal file
36
report_service/Dockerfile
Normal file
@ -0,0 +1,36 @@
|
||||
FROM python:3.12-slim
|
||||
|
||||
ENV PYTHONDONTWRITEBYTECODE=1 \
|
||||
PYTHONUNBUFFERED=1 \
|
||||
PIP_NO_CACHE_DIR=1 \
|
||||
PIP_DISABLE_PIP_VERSION_CHECK=1
|
||||
|
||||
# Matplotlib, Kaleido (Plotly -> PNG) and python-docx transitively need a handful
|
||||
# of system libs. Installing them here is cheaper than figuring out per-import errors later.
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
build-essential \
|
||||
libfreetype6-dev \
|
||||
libjpeg62-turbo-dev \
|
||||
libpng-dev \
|
||||
libxml2-dev \
|
||||
libxslt1-dev \
|
||||
zlib1g-dev \
|
||||
fonts-dejavu-core \
|
||||
ca-certificates \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY requirements.txt ./project-requirements.txt
|
||||
COPY report_service/requirements.txt ./service-requirements.txt
|
||||
RUN pip install --no-cache-dir -r project-requirements.txt -r service-requirements.txt
|
||||
|
||||
COPY . .
|
||||
|
||||
ENV PYTHONPATH=/app
|
||||
EXPOSE 8000
|
||||
|
||||
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
|
||||
CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8000/health', timeout=3).status == 200 else 1)"
|
||||
|
||||
CMD ["uvicorn", "report_service.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
97
report_service/README.md
Normal file
97
report_service/README.md
Normal file
@ -0,0 +1,97 @@
|
||||
# Lightning Report Service
|
||||
|
||||
Tiny FastAPI wrapper around `create_docx_report()`. Lets the n8n workflow produce reports that are **byte-identical** to the CLI (`batch_generate.py`) output, since it uses the same code path.
|
||||
|
||||
## Why this exists
|
||||
|
||||
n8n 2.x's self-hosted Python Code node runs in a sandbox that strips common builtins (`hasattr`, `getattr`, etc.) and restricts imports, making it impossible to host the 3,500+ line reporting pipeline in a node. This service runs outside the sandbox and is called over HTTP.
|
||||
|
||||
## Endpoints
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|---|---|---|
|
||||
| `GET` | `/health` | Liveness probe |
|
||||
| `POST` | `/generate` | Accepts the `Report: Build Payload` JSON, returns a DOCX binary |
|
||||
|
||||
### Request body for `/generate`
|
||||
|
||||
Mirrors the `Report: Build Payload` Set node. Required keys:
|
||||
|
||||
```json
|
||||
{
|
||||
"customer_name": "Example Wind Farm",
|
||||
"timezone": "Europe/Istanbul",
|
||||
"centroid_lat": 40.5,
|
||||
"centroid_lon": 29.7,
|
||||
"boundary_m": 20000,
|
||||
"rings": { "r1": 2000, "r2": 4000, "r3": 6000, "r4": 8000 },
|
||||
"ring_colors": ["#B71C1C", "#F94144", "#F8961E", "#90BE6D"],
|
||||
"t_start": 1729000000000,
|
||||
"t_end": 1729010000000,
|
||||
"n_strikes": 1234,
|
||||
"strikes": [ { "latitude": ..., "longitude": ..., "peakCurrent": ..., "type": "0", "captured": "2026-04-22T13:00:00Z" } ],
|
||||
"turbines": [ { "name": "T1", "latitude": ..., "longitude": ..., "unit_power_mwm": 3.6, ... } ],
|
||||
|
||||
"gemini_text": "optional — if provided, used verbatim; else the service calls Gemini or falls back",
|
||||
"storm_records": [ { "cell_polygon_wkt": "POLYGON(...)", "lightning_severity": "medium", "effective_time": "...", "expire_time": "..." } ]
|
||||
}
|
||||
```
|
||||
|
||||
The adapter is forgiving about column names: it accepts `lat`/`latitude`, `lng`/`longitude`/`lon`, `current`/`peakCurrent`/`peak_current`, `p_type`/`type`, `local_time`/`captured`/`timestamp`.
|
||||
|
||||
### Response
|
||||
|
||||
- `Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document`
|
||||
- `Content-Disposition: attachment; filename="<slug>_<start>_<end>_report.docx"`
|
||||
- Headers: `X-Report-Filename`, `X-Report-Customer`, `X-Report-Strikes`
|
||||
|
||||
## Running locally
|
||||
|
||||
```bash
|
||||
cd lightning_report
|
||||
python -m pip install -r requirements.txt -r report_service/requirements.txt
|
||||
uvicorn report_service.main:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
Sanity check:
|
||||
|
||||
```bash
|
||||
curl -sS http://127.0.0.1:8000/health
|
||||
```
|
||||
|
||||
## Running with Docker (alongside n8n)
|
||||
|
||||
```bash
|
||||
cd lightning_report
|
||||
docker compose -f report_service/docker-compose.yml up --build -d
|
||||
docker logs -f lightning-report-service
|
||||
```
|
||||
|
||||
The compose file attaches the service to an external network named `n8n`. Check your actual network name with `docker network ls` and adjust the `name:` field if needed. Once attached, the n8n container can reach the service at `http://report-service:8000`.
|
||||
|
||||
## Environment variables
|
||||
|
||||
| Variable | Default | Purpose |
|
||||
|---|---|---|
|
||||
| `LOG_LEVEL` | `INFO` | Uvicorn/service log level |
|
||||
| `GEMINI_API_KEY` | _(unset)_ | Only used if n8n doesn't send `gemini_text` in the payload. If unset, the service falls back to the deterministic commentary in `src/reporting/gemini_commentary.py` |
|
||||
| `GEMINI_MODEL` | `gemini-1.5-flash` | Only used when the service calls Gemini itself |
|
||||
|
||||
## n8n configuration
|
||||
|
||||
In the n8n workflow, the `Report: Generate DOCX` HTTP Request node points at:
|
||||
|
||||
```
|
||||
={{ $env.REPORT_SERVICE_URL || 'http://report-service:8000' }}/generate
|
||||
```
|
||||
|
||||
- If n8n and the service share a Docker network, the default hostname works.
|
||||
- Otherwise, set `REPORT_SERVICE_URL` as an n8n environment variable (e.g. `http://192.168.1.10:8000`).
|
||||
|
||||
Response format is set to `file` with output property `report`, which the downstream Slack node uploads directly.
|
||||
|
||||
## Gemini commentary handoff
|
||||
|
||||
The existing n8n branch already has a `Report: Gemini Commentary` HTTP Request node that calls Gemini. Its text is forwarded to this service as `gemini_text`. When present, the service skips its own Gemini call and plugs the text straight into `create_docx_report()`'s commentary slot via a scoped monkey-patch on `generate_gemini_paragraph`.
|
||||
|
||||
If you'd rather let the service handle Gemini end-to-end, you can delete the `Report: Gemini Commentary` node from the n8n workflow and point `Report: Build Payload` directly at `Report: Fetch Thunderstorms`.
|
||||
0
report_service/__init__.py
Normal file
0
report_service/__init__.py
Normal file
178
report_service/adapter.py
Normal file
178
report_service/adapter.py
Normal file
@ -0,0 +1,178 @@
|
||||
"""
|
||||
Bridge between the n8n `Report: Build Payload` JSON and the existing
|
||||
`create_docx_report()` entry point.
|
||||
|
||||
Keeping this isolated so the rest of the codebase never learns about n8n.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
import pandas as pd
|
||||
|
||||
from src.config import config
|
||||
|
||||
|
||||
_LIGHTNING_COLUMN_ALIASES: dict[str, list[str]] = {
|
||||
"lat": ["lat", "latitude"],
|
||||
"lng": ["lng", "longitude", "lon", "long"],
|
||||
"current": ["current", "peak_current", "peakCurrent", "amplitude", "amp"],
|
||||
"p_type": ["p_type", "ptype", "type", "flash_type"],
|
||||
"local_time": [
|
||||
"local_time",
|
||||
"localtime",
|
||||
"time",
|
||||
"timestamp",
|
||||
"captured",
|
||||
"datetime",
|
||||
"date_time",
|
||||
],
|
||||
}
|
||||
|
||||
_TURBINE_COLUMN_ALIASES: dict[str, list[str]] = {
|
||||
"lat": ["lat", "latitude"],
|
||||
"lng": ["lng", "longitude", "lon", "long"],
|
||||
"name": ["name", "turbine_name", "turbine_id"],
|
||||
"unit_power_mwm": ["unit_power_mwm", "power_mwm"],
|
||||
"unit_power_mwe": ["unit_power_mwe", "power_mwe"],
|
||||
"tower_height_m": ["tower_height_m", "tower_height"],
|
||||
"turbine_rotor_blade_diameter": [
|
||||
"turbine_rotor_blade_diameter",
|
||||
"rotor_diameter",
|
||||
"rotor_blade_diameter",
|
||||
],
|
||||
"altitude": ["altitude", "elevation"],
|
||||
}
|
||||
|
||||
|
||||
def _apply_aliases(df: pd.DataFrame, aliases: dict[str, list[str]]) -> pd.DataFrame:
|
||||
if df.empty:
|
||||
return df
|
||||
lower_to_actual = {str(c).lower(): c for c in df.columns}
|
||||
rename_map: dict[str, str] = {}
|
||||
for target, candidates in aliases.items():
|
||||
if target in df.columns:
|
||||
continue
|
||||
for candidate in candidates:
|
||||
src = lower_to_actual.get(candidate.lower())
|
||||
if src is not None and src not in rename_map:
|
||||
rename_map[src] = target
|
||||
break
|
||||
return df.rename(columns=rename_map) if rename_map else df
|
||||
|
||||
|
||||
def _build_turbine_df(turbines: list[dict[str, Any]]) -> pd.DataFrame:
|
||||
if not turbines:
|
||||
return pd.DataFrame(columns=["name", "lat", "lng"])
|
||||
df = pd.DataFrame(turbines)
|
||||
df = _apply_aliases(df, _TURBINE_COLUMN_ALIASES)
|
||||
|
||||
missing = [c for c in ("lat", "lng") if c not in df.columns]
|
||||
if missing:
|
||||
raise ValueError(
|
||||
f"Turbine payload missing required column(s) after normalization: {missing}"
|
||||
)
|
||||
|
||||
df["lat"] = pd.to_numeric(df["lat"], errors="coerce")
|
||||
df["lng"] = pd.to_numeric(df["lng"], errors="coerce")
|
||||
df = df.dropna(subset=["lat", "lng"]).reset_index(drop=True)
|
||||
|
||||
if "name" not in df.columns:
|
||||
df["name"] = [f"T{i + 1}" for i in range(len(df))]
|
||||
|
||||
for optional_col in (
|
||||
"unit_power_mwm",
|
||||
"unit_power_mwe",
|
||||
"tower_height_m",
|
||||
"turbine_rotor_blade_diameter",
|
||||
"altitude",
|
||||
):
|
||||
if optional_col not in df.columns:
|
||||
df[optional_col] = "N/A"
|
||||
|
||||
return df
|
||||
|
||||
|
||||
def _build_lightning_df(
|
||||
strikes: list[dict[str, Any]],
|
||||
timezone_name: str | None,
|
||||
) -> pd.DataFrame:
|
||||
base_columns = ["lat", "lng", "current", "p_type", "local_time", "current_abs"]
|
||||
if not strikes:
|
||||
return pd.DataFrame(columns=base_columns)
|
||||
|
||||
df = pd.DataFrame(strikes)
|
||||
df = _apply_aliases(df, _LIGHTNING_COLUMN_ALIASES)
|
||||
|
||||
missing = [c for c in ("lat", "lng", "current", "p_type", "local_time") if c not in df.columns]
|
||||
if missing:
|
||||
raise ValueError(
|
||||
f"Lightning payload missing required column(s) after normalization: {missing}"
|
||||
)
|
||||
|
||||
df["lat"] = pd.to_numeric(df["lat"], errors="coerce")
|
||||
df["lng"] = pd.to_numeric(df["lng"], errors="coerce")
|
||||
df["current"] = pd.to_numeric(df["current"], errors="coerce")
|
||||
df["p_type"] = df["p_type"].astype(str)
|
||||
|
||||
local_time = pd.to_datetime(df["local_time"], errors="coerce", utc=True)
|
||||
if timezone_name:
|
||||
try:
|
||||
local_time = local_time.dt.tz_convert(timezone_name)
|
||||
except Exception:
|
||||
pass
|
||||
df["local_time"] = local_time
|
||||
|
||||
df = df.dropna(subset=["lat", "lng", "local_time"]).reset_index(drop=True)
|
||||
|
||||
if "current_abs" not in df.columns:
|
||||
df["current_abs"] = df["current"].abs()
|
||||
|
||||
return df
|
||||
|
||||
|
||||
def _epoch_ms_to_local_str(epoch_ms: Any, timezone_name: str | None) -> str | None:
|
||||
if epoch_ms in (None, "", 0):
|
||||
return None
|
||||
try:
|
||||
ts = pd.to_datetime(int(epoch_ms), unit="ms", utc=True)
|
||||
if timezone_name:
|
||||
ts = ts.tz_convert(timezone_name)
|
||||
return ts.strftime("%d-%m-%Y %H:%M")
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def apply_farm_config(payload: dict[str, Any]) -> None:
|
||||
"""
|
||||
Mutate the global `src.config.config` singleton per request.
|
||||
|
||||
Mirrors what `batch_generate.update_global_config()` does, so the rest of
|
||||
the reporting code can read farm-specific values exactly like it does in CLI mode.
|
||||
"""
|
||||
rings_obj = payload.get("rings") or {}
|
||||
ordered_rings = [int(rings_obj[k]) for k in ("r1", "r2", "r3", "r4", "r5") if k in rings_obj]
|
||||
if ordered_rings:
|
||||
config.distance_rings = ordered_rings
|
||||
|
||||
ring_colors = payload.get("ring_colors")
|
||||
if ring_colors:
|
||||
config.ring_colors = list(ring_colors)
|
||||
|
||||
config.wind_farm_name = payload.get("customer_name") or config.wind_farm_name or "Wind Farm"
|
||||
config.timezone = payload.get("timezone") or config.timezone
|
||||
|
||||
start_label = _epoch_ms_to_local_str(payload.get("t_start"), config.timezone)
|
||||
end_label = _epoch_ms_to_local_str(payload.get("t_end"), config.timezone)
|
||||
if start_label:
|
||||
config.analysis_start_date = start_label
|
||||
if end_label:
|
||||
config.analysis_end_date = end_label
|
||||
|
||||
|
||||
def build_dataframes(payload: dict[str, Any]) -> tuple[pd.DataFrame, pd.DataFrame]:
|
||||
"""Return (turbine_df, lightning_df) ready for `create_docx_report()`."""
|
||||
timezone_name = payload.get("timezone") or config.timezone
|
||||
turbine_df = _build_turbine_df(payload.get("turbines") or [])
|
||||
lightning_df = _build_lightning_df(payload.get("strikes") or [], timezone_name)
|
||||
return turbine_df, lightning_df
|
||||
39
report_service/docker-compose.yml
Normal file
39
report_service/docker-compose.yml
Normal file
@ -0,0 +1,39 @@
|
||||
# Example compose snippet. If n8n is already defined in another compose file,
|
||||
# drop the `report-service` block below into that file (under `services`) and
|
||||
# attach it to the same network n8n uses.
|
||||
#
|
||||
# Build context is the repository root, not the report_service folder.
|
||||
#
|
||||
# cd lightning_report/
|
||||
# docker compose -f report_service/docker-compose.yml up --build -d
|
||||
#
|
||||
# Then in the n8n HTTP Request node, point at:
|
||||
# http://report-service:8000/generate (same Docker network)
|
||||
# or http://<host-ip>:8000/generate (host networking)
|
||||
|
||||
services:
|
||||
report-service:
|
||||
build:
|
||||
context: ..
|
||||
dockerfile: report_service/Dockerfile
|
||||
image: lightning-report-service:latest
|
||||
container_name: lightning-report-service
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- LOG_LEVEL=INFO
|
||||
# Only needed if n8n does NOT pre-compute gemini_text and you want the
|
||||
# service to call Gemini itself.
|
||||
- GEMINI_API_KEY=${GEMINI_API_KEY:-}
|
||||
- GEMINI_MODEL=${GEMINI_MODEL:-gemini-1.5-flash}
|
||||
ports:
|
||||
- "8000:8000"
|
||||
networks:
|
||||
- n8n
|
||||
|
||||
networks:
|
||||
n8n:
|
||||
external: true
|
||||
# If your n8n docker network has a different name, change it here
|
||||
# (check with: `docker network ls`). You can also remove `external: true`
|
||||
# to have compose create a fresh bridge network.
|
||||
name: n8n
|
||||
141
report_service/main.py
Normal file
141
report_service/main.py
Normal file
@ -0,0 +1,141 @@
|
||||
"""
|
||||
FastAPI microservice that wraps `create_docx_report()` for use by n8n.
|
||||
|
||||
Endpoints:
|
||||
- GET /health liveness probe
|
||||
- POST /generate accept the `Report: Build Payload` JSON and return a DOCX
|
||||
|
||||
Run locally:
|
||||
uvicorn report_service.main:app --host 0.0.0.0 --port 8000
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import tempfile
|
||||
from contextlib import contextmanager
|
||||
from typing import Any
|
||||
|
||||
from fastapi import FastAPI, HTTPException, Request
|
||||
from fastapi.responses import JSONResponse, Response
|
||||
|
||||
from src.reporting import docx as docx_module
|
||||
from src.reporting.docx import create_docx_report
|
||||
from src.reporting.filename_utils import slugify_ascii_underscore
|
||||
|
||||
from report_service.adapter import apply_farm_config, build_dataframes
|
||||
|
||||
logging.basicConfig(
|
||||
level=os.getenv("LOG_LEVEL", "INFO"),
|
||||
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
|
||||
)
|
||||
logger = logging.getLogger("report_service")
|
||||
|
||||
app = FastAPI(title="Lightning Report Service", version="1.0.0")
|
||||
|
||||
DOCX_MIME = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
|
||||
|
||||
|
||||
@contextmanager
|
||||
def _override_gemini_commentary(override_text: str | None):
|
||||
"""
|
||||
If n8n already called Gemini and forwarded the text, short-circuit
|
||||
`generate_gemini_paragraph` so the downstream report uses it verbatim.
|
||||
|
||||
Restores the original function on exit even if the request fails.
|
||||
"""
|
||||
if not override_text:
|
||||
yield
|
||||
return
|
||||
|
||||
original = docx_module.generate_gemini_paragraph
|
||||
docx_module.generate_gemini_paragraph = lambda _ctx, api_key=None: override_text
|
||||
try:
|
||||
yield
|
||||
finally:
|
||||
docx_module.generate_gemini_paragraph = original
|
||||
|
||||
|
||||
def _build_filename(payload: dict[str, Any]) -> str:
|
||||
safe_name = slugify_ascii_underscore(payload.get("customer_name") or "report")
|
||||
from src.config import config
|
||||
|
||||
start = (config.analysis_start_date or "").replace(" ", "_").replace(":", "").replace("-", "")
|
||||
end = (config.analysis_end_date or "").replace(" ", "_").replace(":", "").replace("-", "")
|
||||
parts = [safe_name]
|
||||
if start:
|
||||
parts.append(start)
|
||||
if end:
|
||||
parts.append(end)
|
||||
parts.append("report.docx")
|
||||
return "_".join(parts)
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
def health() -> JSONResponse:
|
||||
return JSONResponse({"ok": True, "service": "lightning-report", "version": app.version})
|
||||
|
||||
|
||||
@app.post("/generate")
|
||||
async def generate(request: Request) -> Response:
|
||||
try:
|
||||
payload: dict[str, Any] = await request.json()
|
||||
except Exception as exc:
|
||||
raise HTTPException(status_code=400, detail=f"Invalid JSON body: {exc}") from exc
|
||||
|
||||
if not isinstance(payload, dict):
|
||||
raise HTTPException(status_code=400, detail="Request body must be a JSON object")
|
||||
|
||||
customer_name = payload.get("customer_name") or "<unknown>"
|
||||
n_strikes = int(payload.get("n_strikes") or 0)
|
||||
logger.info(
|
||||
"Generating report for customer=%s n_strikes=%s n_turbines=%s",
|
||||
customer_name,
|
||||
n_strikes,
|
||||
len(payload.get("turbines") or []),
|
||||
)
|
||||
|
||||
try:
|
||||
apply_farm_config(payload)
|
||||
turbine_df, lightning_df = build_dataframes(payload)
|
||||
except ValueError as exc:
|
||||
logger.warning("Payload validation failed: %s", exc)
|
||||
raise HTTPException(status_code=422, detail=str(exc)) from exc
|
||||
|
||||
storm_records = payload.get("storm_records") or None
|
||||
filename = _build_filename(payload)
|
||||
|
||||
tmp_fd, tmp_path = tempfile.mkstemp(suffix=".docx")
|
||||
os.close(tmp_fd)
|
||||
|
||||
try:
|
||||
with _override_gemini_commentary(payload.get("gemini_text")):
|
||||
create_docx_report(
|
||||
tmp_path,
|
||||
turbine_df,
|
||||
lightning_df,
|
||||
storm_data_path=None,
|
||||
storm_data_records=storm_records,
|
||||
)
|
||||
with open(tmp_path, "rb") as fh:
|
||||
data = fh.read()
|
||||
except Exception as exc:
|
||||
logger.exception("Report generation failed for %s", customer_name)
|
||||
raise HTTPException(status_code=500, detail=f"Report generation failed: {exc}") from exc
|
||||
finally:
|
||||
try:
|
||||
os.unlink(tmp_path)
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
logger.info("Generated %s (%d bytes) for %s", filename, len(data), customer_name)
|
||||
return Response(
|
||||
content=data,
|
||||
media_type=DOCX_MIME,
|
||||
headers={
|
||||
"Content-Disposition": f'attachment; filename="{filename}"',
|
||||
"X-Report-Filename": filename,
|
||||
"X-Report-Customer": str(customer_name),
|
||||
"X-Report-Strikes": str(n_strikes),
|
||||
},
|
||||
)
|
||||
2
report_service/requirements.txt
Normal file
2
report_service/requirements.txt
Normal file
@ -0,0 +1,2 @@
|
||||
fastapi>=0.115.0
|
||||
uvicorn[standard]>=0.30.0
|
||||
10
requirements.txt
Normal file
10
requirements.txt
Normal file
@ -0,0 +1,10 @@
|
||||
pandas>=1.5.0
|
||||
numpy>=1.21.0
|
||||
plotly>=5.15.0
|
||||
kaleido>=0.2.1
|
||||
scikit-learn>=1.3.0
|
||||
requests>=2.31.0
|
||||
python-dotenv>=1.0.0
|
||||
python-docx>=1.1.2
|
||||
matplotlib>=3.8.0
|
||||
google-generativeai
|
||||
100
separate_by_month.py
Normal file
100
separate_by_month.py
Normal file
@ -0,0 +1,100 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import Dict, List, Any
|
||||
import logging
|
||||
from collections import defaultdict
|
||||
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def separate_json_by_month(input_file_path: str, output_dir: str = None) -> Dict[str, str]:
|
||||
"""
|
||||
Separate JSON file into smaller files based on months in creation_time.
|
||||
|
||||
Args:
|
||||
input_file_path: Path to the input JSON file
|
||||
output_dir: Directory to save separated files (optional)
|
||||
|
||||
Returns:
|
||||
Dictionary mapping month to output file path
|
||||
"""
|
||||
try:
|
||||
logger.info(f"Reading JSON file: {input_file_path}")
|
||||
|
||||
with open(input_file_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
logger.info(f"Successfully read {len(data)} records")
|
||||
|
||||
if output_dir is None:
|
||||
input_path = Path(input_file_path)
|
||||
output_dir = input_path.parent / f"{input_path.stem}_separated"
|
||||
|
||||
output_path = Path(output_dir)
|
||||
output_path.mkdir(exist_ok=True)
|
||||
|
||||
month_data = defaultdict(list)
|
||||
|
||||
for record in data:
|
||||
try:
|
||||
creation_time = record['creation_time']
|
||||
date_obj = datetime.strptime(creation_time[:10], '%Y-%m-%d')
|
||||
month_key = date_obj.strftime('%Y-%m')
|
||||
month_data[month_key].append(record)
|
||||
except (KeyError, ValueError) as e:
|
||||
logger.warning(f"Skipping record with invalid creation_time: {e}")
|
||||
continue
|
||||
|
||||
output_files = {}
|
||||
|
||||
for month, records in month_data.items():
|
||||
output_file = output_path / f"firtina_sorgulama_{month}.json"
|
||||
|
||||
logger.info(f"Writing {len(records)} records for {month} to {output_file}")
|
||||
|
||||
with open(output_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(records, f, indent=2, ensure_ascii=False, default=str)
|
||||
|
||||
file_size = output_file.stat().st_size / 1024
|
||||
logger.info(f"Created {output_file} ({file_size:.2f} KB)")
|
||||
output_files[month] = str(output_file)
|
||||
|
||||
logger.info(f"Separation completed. Created {len(output_files)} files in {output_path}")
|
||||
|
||||
return output_files
|
||||
|
||||
except FileNotFoundError:
|
||||
logger.error(f"JSON file not found: {input_file_path}")
|
||||
raise
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"Invalid JSON format: {e}")
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Error separating JSON by month: {str(e)}")
|
||||
raise
|
||||
|
||||
def main():
|
||||
"""Main function to handle command line execution."""
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python separate_by_month.py <json_file_path> [output_directory]")
|
||||
sys.exit(1)
|
||||
|
||||
input_file_path = sys.argv[1]
|
||||
output_dir = sys.argv[2] if len(sys.argv) > 2 else None
|
||||
|
||||
try:
|
||||
result_files = separate_json_by_month(input_file_path, output_dir)
|
||||
print(f"Separation completed successfully!")
|
||||
print("Created files:")
|
||||
for month, file_path in result_files.items():
|
||||
print(f" {month}: {file_path}")
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
51
src/analysis/geospatial.py
Normal file
51
src/analysis/geospatial.py
Normal file
@ -0,0 +1,51 @@
|
||||
import numpy as np
|
||||
|
||||
def haversine_distance(lat1, lon1, lat2, lon2):
|
||||
R = 6371000
|
||||
lat1_rad = np.radians(lat1)
|
||||
lat2_rad = np.radians(lat2)
|
||||
delta_lat = np.radians(lat2 - lat1)
|
||||
delta_lon = np.radians(lon2 - lon1)
|
||||
a = np.sin(delta_lat/2)**2 + np.cos(lat1_rad) * np.cos(lat2_rad) * np.sin(delta_lon/2)**2
|
||||
c = 2 * np.arcsin(np.sqrt(a))
|
||||
return R * c
|
||||
|
||||
def haversine_km(lon1, lat1, lon2, lat2):
|
||||
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
|
||||
dlon = lon2 - lon1
|
||||
dlat = lat2 - lat1
|
||||
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
|
||||
c = 2 * np.arcsin(np.sqrt(a))
|
||||
km = 6371 * c
|
||||
return km
|
||||
|
||||
def create_circle_points(center_lat, center_lon, radius_m, num_points=50):
|
||||
R = 6371000
|
||||
lats, lons = [], []
|
||||
for i in range(num_points + 1):
|
||||
bearing = 2 * np.pi * i / num_points
|
||||
lat1_rad = np.radians(center_lat)
|
||||
lon1_rad = np.radians(center_lon)
|
||||
angular_distance = radius_m / R
|
||||
lat2_rad = np.arcsin(
|
||||
np.sin(lat1_rad) * np.cos(angular_distance) +
|
||||
np.cos(lat1_rad) * np.sin(angular_distance) * np.cos(bearing)
|
||||
)
|
||||
lon2_rad = lon1_rad + np.arctan2(
|
||||
np.sin(bearing) * np.sin(angular_distance) * np.cos(lat1_rad),
|
||||
np.cos(angular_distance) - np.sin(lat1_rad) * np.sin(lat2_rad)
|
||||
)
|
||||
lats.append(np.degrees(lat2_rad))
|
||||
lons.append(np.degrees(lon2_rad))
|
||||
return lats, lons
|
||||
|
||||
def haversine_distance_vectorized(lat0, lon0, lats, lons):
|
||||
R = 6371000.0
|
||||
lat0r = np.radians(lat0)
|
||||
lon0r = np.radians(lon0)
|
||||
latr = np.radians(lats)
|
||||
lonr = np.radians(lons)
|
||||
dlat = latr - lat0r
|
||||
dlon = lonr - lon0r
|
||||
a = np.sin(dlat/2.0)**2 + np.cos(lat0r) * np.cos(latr) * np.sin(dlon/2.0)**2
|
||||
return 2.0 * R * np.arcsin(np.sqrt(a))
|
||||
93
src/analysis/grouping.py
Normal file
93
src/analysis/grouping.py
Normal file
@ -0,0 +1,93 @@
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from typing import List, Tuple, Dict
|
||||
from src.analysis.geospatial import haversine_distance
|
||||
from src.utils import get_grouping_radius_m
|
||||
from src.config import config
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def group_turbines_by_proximity(turbine_df: pd.DataFrame, max_distance_m: int = None) -> List[List[int]]:
|
||||
"""
|
||||
Group turbines based on proximity within max_distance_m meters.
|
||||
Returns list of groups, where each group is a list of turbine indices.
|
||||
"""
|
||||
if max_distance_m is None:
|
||||
max_distance_m = get_grouping_radius_m()
|
||||
try:
|
||||
from sklearn.cluster import DBSCAN
|
||||
import numpy as np
|
||||
coords = np.radians(turbine_df[['lat', 'lng']].values)
|
||||
eps_rad = (max_distance_m / 1000.0) / 6371.0
|
||||
clustering = DBSCAN(eps=eps_rad, min_samples=1, metric='haversine').fit(coords)
|
||||
labels = clustering.labels_
|
||||
groups = []
|
||||
for label in sorted(set(labels)):
|
||||
idxs = np.where(labels == label)[0].tolist()
|
||||
groups.append(idxs)
|
||||
return groups
|
||||
except Exception:
|
||||
n_turbines = len(turbine_df)
|
||||
groups = []
|
||||
assigned = set()
|
||||
for i in range(n_turbines):
|
||||
if i in assigned:
|
||||
continue
|
||||
current_group = [i]
|
||||
assigned.add(i)
|
||||
for j in range(i + 1, n_turbines):
|
||||
if j in assigned:
|
||||
continue
|
||||
distance = haversine_distance(
|
||||
turbine_df.iloc[i]['lat'], turbine_df.iloc[i]['lng'],
|
||||
turbine_df.iloc[j]['lat'], turbine_df.iloc[j]['lng']
|
||||
)
|
||||
if distance <= max_distance_m:
|
||||
current_group.append(j)
|
||||
assigned.add(j)
|
||||
groups.append(current_group)
|
||||
return groups
|
||||
|
||||
def calculate_group_centroid(turbine_df: pd.DataFrame, group_indices: List[int]) -> Tuple[float, float]:
|
||||
"""
|
||||
Calculate the centroid (center point) of a group of turbines.
|
||||
Returns (lat, lng) of the centroid.
|
||||
"""
|
||||
if not group_indices:
|
||||
return 0.0, 0.0
|
||||
|
||||
lats = turbine_df.iloc[group_indices]['lat'].values
|
||||
lngs = turbine_df.iloc[group_indices]['lng'].values
|
||||
|
||||
centroid_lat = np.mean(lats)
|
||||
centroid_lng = np.mean(lngs)
|
||||
|
||||
return centroid_lat, centroid_lng
|
||||
|
||||
def create_turbine_groups(turbine_df: pd.DataFrame) -> Dict:
|
||||
"""
|
||||
Create turbine groups and calculate their centroids.
|
||||
Returns a dictionary with group information.
|
||||
"""
|
||||
groups = group_turbines_by_proximity(turbine_df)
|
||||
|
||||
group_data = []
|
||||
for i, group_indices in enumerate(groups):
|
||||
centroid_lat, centroid_lng = calculate_group_centroid(turbine_df, group_indices)
|
||||
|
||||
group_info = {
|
||||
'group_id': i,
|
||||
'turbine_indices': group_indices,
|
||||
'centroid_lat': centroid_lat,
|
||||
'centroid_lng': centroid_lng,
|
||||
'turbine_count': len(group_indices),
|
||||
'is_single': len(group_indices) == 1
|
||||
}
|
||||
group_data.append(group_info)
|
||||
|
||||
return {
|
||||
'groups': group_data,
|
||||
'total_groups': len(groups),
|
||||
'total_turbines': len(turbine_df)
|
||||
}
|
||||
363
src/analysis/histogram.py
Normal file
363
src/analysis/histogram.py
Normal file
@ -0,0 +1,363 @@
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from datetime import datetime, timedelta
|
||||
from collections import defaultdict
|
||||
import plotly.graph_objects as go
|
||||
from plotly.subplots import make_subplots
|
||||
from src.analysis.geospatial import haversine_distance, haversine_distance_vectorized
|
||||
from src.config import config
|
||||
from src.utils import get_analysis_radius_m
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def filter_lightning_by_distance(lightning_df, centroid_lat, centroid_lng):
|
||||
"""
|
||||
Filter lightning data to only include events within the farthest distance ring.
|
||||
"""
|
||||
max_distance = get_analysis_radius_m()
|
||||
if len(lightning_df) == 0:
|
||||
return lightning_df.copy()
|
||||
dists = haversine_distance_vectorized(
|
||||
centroid_lat,
|
||||
centroid_lng,
|
||||
lightning_df['lat'].values,
|
||||
lightning_df['lng'].values,
|
||||
)
|
||||
mask = dists <= max_distance
|
||||
return lightning_df.loc[mask].copy()
|
||||
|
||||
def find_activity_periods(df, min_gap_minutes=30, min_events_per_period=10):
|
||||
"""
|
||||
Find concentrated activity periods based on time gaps.
|
||||
"""
|
||||
df_sorted = df.copy()
|
||||
if df_sorted['local_time'].dtype == 'object':
|
||||
df_sorted['local_time'] = pd.to_datetime(df_sorted['local_time'])
|
||||
df_sorted = df_sorted.sort_values('local_time').reset_index(drop=True)
|
||||
|
||||
df_sorted['time_diff'] = df_sorted['local_time'].diff()
|
||||
gap = timedelta(minutes=min_gap_minutes)
|
||||
break_positions = df_sorted.index[df_sorted['time_diff'] > gap].tolist()
|
||||
|
||||
periods = []
|
||||
start_pos = 0
|
||||
for break_pos in break_positions:
|
||||
period_slice = df_sorted.iloc[start_pos:break_pos]
|
||||
if len(period_slice) >= min_events_per_period:
|
||||
periods.append({
|
||||
'start_time': period_slice['local_time'].iloc[0],
|
||||
'end_time': period_slice['local_time'].iloc[-1],
|
||||
'data': period_slice,
|
||||
'event_count': len(period_slice)
|
||||
})
|
||||
start_pos = break_pos
|
||||
|
||||
last_slice = df_sorted.iloc[start_pos:]
|
||||
if len(last_slice) >= min_events_per_period:
|
||||
periods.append({
|
||||
'start_time': last_slice['local_time'].iloc[0],
|
||||
'end_time': last_slice['local_time'].iloc[-1],
|
||||
'data': last_slice,
|
||||
'event_count': len(last_slice)
|
||||
})
|
||||
|
||||
return periods
|
||||
|
||||
def create_minute_counts(period_data):
|
||||
"""
|
||||
Create minute-by-minute counts of lightning events.
|
||||
"""
|
||||
# Ensure local_time is datetime
|
||||
period_data = period_data.copy()
|
||||
if period_data['local_time'].dtype == 'object':
|
||||
period_data['local_time'] = pd.to_datetime(period_data['local_time'])
|
||||
|
||||
# Round timestamps to nearest minute
|
||||
period_data['minute_rounded'] = period_data['local_time'].dt.floor('min')
|
||||
|
||||
# Count events per minute for each p_type
|
||||
minute_counts = period_data.groupby(['minute_rounded', 'p_type']).size().unstack(fill_value=0)
|
||||
|
||||
# Create complete minute range
|
||||
start_minute = period_data['minute_rounded'].min()
|
||||
end_minute = period_data['minute_rounded'].max()
|
||||
minute_range = pd.date_range(start=start_minute, end=end_minute, freq='min')
|
||||
|
||||
# Reindex to include all minutes (fill missing with 0)
|
||||
minute_counts = minute_counts.reindex(minute_range, fill_value=0)
|
||||
|
||||
return minute_counts
|
||||
|
||||
def find_peak_sub_periods(period_data, window_minutes=3):
|
||||
"""
|
||||
Find peak sub-periods within an activity period.
|
||||
"""
|
||||
minute_counts = create_minute_counts(period_data)
|
||||
|
||||
# Calculate total events per minute
|
||||
if '0' in minute_counts.columns and '1' in minute_counts.columns:
|
||||
total_per_minute = minute_counts['0'] + minute_counts['1']
|
||||
elif '0' in minute_counts.columns:
|
||||
total_per_minute = minute_counts['0']
|
||||
elif '1' in minute_counts.columns:
|
||||
total_per_minute = minute_counts['1']
|
||||
else:
|
||||
return []
|
||||
|
||||
# Use rolling window to find peak periods
|
||||
rolling_mean = total_per_minute.rolling(window=window_minutes, center=True).mean()
|
||||
mean_activity = total_per_minute.mean()
|
||||
std_activity = total_per_minute.std()
|
||||
|
||||
# Find periods above mean + 1 standard deviation
|
||||
threshold = mean_activity + std_activity
|
||||
peak_mask = rolling_mean > threshold
|
||||
|
||||
# Group consecutive peak minutes
|
||||
peak_periods = []
|
||||
in_peak = False
|
||||
start_time = None
|
||||
|
||||
for i, (time, is_peak) in enumerate(zip(total_per_minute.index, peak_mask)):
|
||||
if is_peak and not in_peak:
|
||||
start_time = time
|
||||
in_peak = True
|
||||
elif not is_peak and in_peak:
|
||||
end_time = total_per_minute.index[i-1]
|
||||
peak_periods.append({
|
||||
'start': start_time,
|
||||
'end': end_time,
|
||||
'peak_rate': rolling_mean[start_time:end_time].max()
|
||||
})
|
||||
in_peak = False
|
||||
|
||||
# Handle case where peak period extends to the end
|
||||
if in_peak:
|
||||
peak_periods.append({
|
||||
'start': start_time,
|
||||
'end': total_per_minute.index[-1],
|
||||
'peak_rate': rolling_mean[start_time:].max()
|
||||
})
|
||||
|
||||
return peak_periods
|
||||
|
||||
def _build_histogram_figure_for_periods(periods_chunk, max_distance_km):
|
||||
"""
|
||||
Build a Plotly figure for a subset of activity periods.
|
||||
|
||||
Args:
|
||||
periods_chunk: List of period dictionaries to plot
|
||||
max_distance_km: Maximum distance in km for the title
|
||||
|
||||
Returns:
|
||||
Plotly figure object
|
||||
"""
|
||||
n_periods = len(periods_chunk)
|
||||
|
||||
if n_periods == 0:
|
||||
return None
|
||||
|
||||
# Determine layout: single column if <= 3 periods, otherwise 2 columns
|
||||
if n_periods <= 3:
|
||||
n_cols = 1
|
||||
else:
|
||||
n_cols = 2
|
||||
|
||||
n_rows = (n_periods + n_cols - 1) // n_cols
|
||||
# Cap n_rows at 3 (max 2×3 grid = 6 periods per figure)
|
||||
n_rows = min(n_rows, 3)
|
||||
|
||||
# Adjust periods if we capped rows
|
||||
if n_periods > n_rows * n_cols:
|
||||
periods_chunk = periods_chunk[:n_rows * n_cols]
|
||||
n_periods = len(periods_chunk)
|
||||
|
||||
# Calculate spacing
|
||||
base_row_height = 350
|
||||
if n_rows > 1:
|
||||
vertical_spacing = 0.28 # larger gap between rows
|
||||
else:
|
||||
vertical_spacing = 0.15
|
||||
horizontal_spacing = 0.1 if n_cols == 2 else 0.08
|
||||
|
||||
def format_period_title(i, p):
|
||||
start = p['start_time']
|
||||
end = p['end_time']
|
||||
event_count = p['event_count']
|
||||
|
||||
# Format date
|
||||
date_str = start.strftime('%d-%m-%Y')
|
||||
|
||||
# Determine period type
|
||||
if start.date() == end.date():
|
||||
period_str = f"{start.strftime('%H:%M')}-{end.strftime('%H:%M')}"
|
||||
else:
|
||||
period_str = f"{start.strftime('%d-%m-%Y %H:%M')}-{end.strftime('%d-%m-%Y %H:%M')}"
|
||||
|
||||
# Use line breaks to fit within histogram width
|
||||
return f"Date: {date_str}<br>Period: {period_str}<br>Total lightnings: {event_count}"
|
||||
|
||||
fig = make_subplots(
|
||||
rows=n_rows,
|
||||
cols=n_cols,
|
||||
subplot_titles=[format_period_title(i, p) for i, p in enumerate(periods_chunk)],
|
||||
vertical_spacing=vertical_spacing,
|
||||
horizontal_spacing=horizontal_spacing,
|
||||
specs=[[{"secondary_y": False} for _ in range(n_cols)] for _ in range(n_rows)]
|
||||
)
|
||||
|
||||
colors = {'0': '#FF6B6B', '1': '#4ECDC4'} # Red for cloud-to-ground, Teal for intracloud
|
||||
|
||||
for period_idx, period in enumerate(periods_chunk):
|
||||
row = (period_idx // n_cols) + 1
|
||||
col = (period_idx % n_cols) + 1
|
||||
|
||||
minute_counts = create_minute_counts(period['data'])
|
||||
|
||||
# Create time labels (minutes from start)
|
||||
start_time = minute_counts.index[0]
|
||||
minutes_from_start = [(t - start_time).total_seconds() / 60 for t in minute_counts.index]
|
||||
|
||||
# Add bars for each p_type
|
||||
for p_type in ['0', '1']:
|
||||
if p_type in minute_counts.columns:
|
||||
counts = minute_counts[p_type].values
|
||||
|
||||
p_type_name = "Cloud-to-Ground" if p_type == '0' else "Intercloud"
|
||||
|
||||
fig.add_trace(
|
||||
go.Bar(
|
||||
x=minutes_from_start,
|
||||
y=counts,
|
||||
name=p_type_name if period_idx == 0 else None,
|
||||
marker_color=colors[p_type],
|
||||
opacity=0.7,
|
||||
showlegend=(period_idx == 0),
|
||||
legendgroup=f"p_type_{p_type}",
|
||||
hovertemplate=f"{p_type_name}<br>Minute: %{{x:.0f}}<br>Count: %{{y}}<extra></extra>"
|
||||
),
|
||||
row=row, col=col
|
||||
)
|
||||
|
||||
# Find and highlight peak sub-periods
|
||||
peak_periods = find_peak_sub_periods(period['data'])
|
||||
for peak in peak_periods:
|
||||
peak_start_minutes = (peak['start'] - start_time).total_seconds() / 60
|
||||
peak_end_minutes = (peak['end'] - start_time).total_seconds() / 60
|
||||
|
||||
fig.add_vrect(
|
||||
x0=peak_start_minutes, x1=peak_end_minutes,
|
||||
fillcolor="yellow", opacity=0.2,
|
||||
line_width=0,
|
||||
row=row, col=col
|
||||
)
|
||||
|
||||
# Update axes for this subplot
|
||||
fig.update_xaxes(
|
||||
title_text="Minutes from start",
|
||||
title_standoff=6,
|
||||
row=row,
|
||||
col=col,
|
||||
tickfont=dict(size=18),
|
||||
title_font=dict(size=22),
|
||||
)
|
||||
# Y-axis labels only on leftmost column
|
||||
if col == 1:
|
||||
fig.update_yaxes(
|
||||
title_text="Lightning Count",
|
||||
row=row,
|
||||
col=col,
|
||||
tickfont=dict(size=18),
|
||||
title_font=dict(size=22),
|
||||
)
|
||||
else:
|
||||
fig.update_yaxes(
|
||||
title_text="",
|
||||
row=row,
|
||||
col=col,
|
||||
tickfont=dict(size=16),
|
||||
title_font=dict(size=22),
|
||||
)
|
||||
|
||||
# Calculate figure height
|
||||
figure_height = base_row_height * n_rows
|
||||
|
||||
# Update subplot title font size (subplot titles are rendered as annotations)
|
||||
fig.update_annotations(font=dict(size=22))
|
||||
|
||||
fig.update_layout(
|
||||
height=figure_height,
|
||||
width=config.histogram_layout['plot_width'],
|
||||
barmode='stack',
|
||||
showlegend=True,
|
||||
font=dict(size=18),
|
||||
margin=dict(t=110, b=100, l=75, r=40),
|
||||
legend=dict(
|
||||
orientation="h",
|
||||
y=-0.1,
|
||||
x=0.5,
|
||||
xanchor="center",
|
||||
font=dict(size=18),
|
||||
title_font=dict(size=20),
|
||||
)
|
||||
)
|
||||
|
||||
# Store metadata for render function
|
||||
fig.layout.meta = {'n_rows': n_rows, 'n_cols': n_cols}
|
||||
|
||||
return fig
|
||||
|
||||
def create_lightning_histogram_pages(lightning_df, centroid_lat, centroid_lng):
|
||||
"""
|
||||
Create multiple histogram figures for pagination when there are many activity periods.
|
||||
|
||||
Args:
|
||||
lightning_df: DataFrame containing lightning data
|
||||
centroid_lat: Center latitude
|
||||
centroid_lng: Center longitude
|
||||
|
||||
Returns:
|
||||
List of Plotly figure objects (empty list if no periods found)
|
||||
"""
|
||||
# Filter lightning data by distance
|
||||
filtered_df = filter_lightning_by_distance(lightning_df, centroid_lat, centroid_lng)
|
||||
|
||||
if len(filtered_df) == 0:
|
||||
return []
|
||||
|
||||
# Find ALL activity periods (no truncation)
|
||||
periods = find_activity_periods(
|
||||
filtered_df,
|
||||
min_gap_minutes=config.histogram_params['min_gap_minutes'],
|
||||
min_events_per_period=config.histogram_params['min_events_per_period']
|
||||
)
|
||||
|
||||
if len(periods) == 0:
|
||||
return []
|
||||
|
||||
# Get max periods per figure from config
|
||||
max_periods_per_figure = config.histogram_params.get('max_periods_per_figure', 6)
|
||||
|
||||
# Get max distance for title
|
||||
max_distance_km = max(config.distance_rings) / 1000
|
||||
|
||||
# Split periods into chunks
|
||||
figures = []
|
||||
for i in range(0, len(periods), max_periods_per_figure):
|
||||
chunk = periods[i:i + max_periods_per_figure]
|
||||
fig = _build_histogram_figure_for_periods(chunk, max_distance_km)
|
||||
if fig:
|
||||
figures.append(fig)
|
||||
|
||||
return figures
|
||||
|
||||
def create_lightning_histogram(lightning_df, centroid_lat, centroid_lng):
|
||||
"""
|
||||
Create lightning activity histogram for the PDF report.
|
||||
|
||||
This is a backward-compatible wrapper that returns the first page.
|
||||
For multi-page support, use create_lightning_histogram_pages instead.
|
||||
"""
|
||||
pages = create_lightning_histogram_pages(lightning_df, centroid_lat, centroid_lng)
|
||||
return pages[0] if pages else None
|
||||
153
src/analysis/risk.py
Normal file
153
src/analysis/risk.py
Normal file
@ -0,0 +1,153 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from typing import Tuple
|
||||
from .geospatial import haversine_distance
|
||||
from ..config import config
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def calculate_distance_matrix(turbine_coords: np.ndarray, lightning_coords: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Calculate distance matrix between turbines and lightning strikes using vectorized operations.
|
||||
|
||||
Args:
|
||||
turbine_coords: Array of turbine coordinates (lat, lng)
|
||||
lightning_coords: Array of lightning coordinates (lat, lng)
|
||||
|
||||
Returns:
|
||||
Distance matrix with shape (n_turbines, n_lightning)
|
||||
"""
|
||||
# Extract coordinates
|
||||
turbine_lats = turbine_coords[:, 0]
|
||||
turbine_lons = turbine_coords[:, 1]
|
||||
lightning_lats = lightning_coords[:, 0]
|
||||
lightning_lons = lightning_coords[:, 1]
|
||||
|
||||
# Use broadcasting to calculate all distances at once
|
||||
# This is much more efficient than nested loops
|
||||
lat_diff = lightning_lats[:, np.newaxis] - turbine_lats[np.newaxis, :]
|
||||
lon_diff = lightning_lons[:, np.newaxis] - turbine_lons[np.newaxis, :]
|
||||
|
||||
# Haversine formula components
|
||||
lat_diff_rad = np.radians(lat_diff)
|
||||
lon_diff_rad = np.radians(lon_diff)
|
||||
turbine_lats_rad = np.radians(turbine_lats)[np.newaxis, :]
|
||||
lightning_lats_rad = np.radians(lightning_lats)[:, np.newaxis]
|
||||
|
||||
a = (np.sin(lat_diff_rad/2)**2 +
|
||||
np.cos(turbine_lats_rad) * np.cos(lightning_lats_rad) * np.sin(lon_diff_rad/2)**2)
|
||||
c = 2 * np.arcsin(np.sqrt(a))
|
||||
|
||||
# Convert to kilometers
|
||||
distances_km = 6371 * c
|
||||
|
||||
return distances_km.T # Transpose to get (n_turbines, n_lightning)
|
||||
|
||||
def calculate_turbine_risks_vectorized(turbine_df: pd.DataFrame, lightning_df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
Calculate turbine risks using vectorized operations for better performance.
|
||||
|
||||
Args:
|
||||
turbine_df: DataFrame containing turbine data
|
||||
lightning_df: DataFrame containing lightning data
|
||||
|
||||
Returns:
|
||||
DataFrame with added risk columns
|
||||
"""
|
||||
logger.info("Starting vectorized risk calculation...")
|
||||
|
||||
# Filter for cloud-to-ground lightning only
|
||||
cg_lightning_df = lightning_df[lightning_df['p_type'].astype(str) == '0'].copy()
|
||||
|
||||
if len(cg_lightning_df) == 0:
|
||||
logger.warning("No cloud-to-ground lightning found")
|
||||
turbine_df = turbine_df.copy()
|
||||
turbine_df['risk_score'] = 0.0
|
||||
turbine_df['risk_log'] = 0.0
|
||||
return turbine_df
|
||||
|
||||
# Extract coordinates as numpy arrays
|
||||
turbine_coords = turbine_df[['lat', 'lng']].values
|
||||
lightning_coords = cg_lightning_df[['lat', 'lng']].values
|
||||
|
||||
# Calculate distance matrix
|
||||
logger.info(f"Calculating distance matrix for {len(turbine_df)} turbines and {len(cg_lightning_df)} lightning strikes...")
|
||||
distance_matrix = calculate_distance_matrix(turbine_coords, lightning_coords)
|
||||
|
||||
# Get risk parameters
|
||||
P_0 = config.risk_params['P_0']
|
||||
alpha = config.risk_params['alpha']
|
||||
current_weight = config.risk_params['current_weight']
|
||||
|
||||
# Calculate current magnitudes
|
||||
current_magnitudes = np.abs(cg_lightning_df['current'].values)
|
||||
|
||||
# Calculate risk matrix using vectorized operations
|
||||
# Shape: (n_turbines, n_lightning)
|
||||
current_factor = 1 + current_weight * current_magnitudes / 10000
|
||||
distance_factor = np.exp(-alpha * distance_matrix)
|
||||
risk_matrix = P_0 * current_factor[np.newaxis, :] * distance_factor
|
||||
|
||||
# Sum risks for each turbine
|
||||
turbine_risks = np.sum(risk_matrix, axis=1)
|
||||
|
||||
# Add risk columns to DataFrame
|
||||
turbine_df = turbine_df.copy()
|
||||
turbine_df['risk_score'] = turbine_risks
|
||||
turbine_df['risk_log'] = np.log10(turbine_risks + 1)
|
||||
|
||||
logger.info(f"Risk calculation completed. Risk range: {turbine_risks.min():.2f} - {turbine_risks.max():.2f}")
|
||||
|
||||
return turbine_df
|
||||
|
||||
def calculate_turbine_risks(turbine_df: pd.DataFrame, lightning_df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""
|
||||
Calculate turbine risks (legacy function for backward compatibility).
|
||||
Now uses vectorized implementation for better performance.
|
||||
"""
|
||||
try:
|
||||
from sklearn.neighbors import BallTree
|
||||
logger.info("Starting BallTree-based risk calculation...")
|
||||
cg_lightning_df = lightning_df[lightning_df['p_type'].astype(str) == '0'].copy()
|
||||
if len(cg_lightning_df) == 0:
|
||||
result_df = turbine_df.copy()
|
||||
result_df['risk_score'] = 0.0
|
||||
result_df['risk_log'] = 0.0
|
||||
return result_df
|
||||
|
||||
import numpy as np
|
||||
earth_km = 6371.0
|
||||
max_radius_km = max(config.distance_rings) / 1000.0
|
||||
radius_rad = max_radius_km / earth_km
|
||||
|
||||
lightning_coords_rad = np.radians(cg_lightning_df[['lat', 'lng']].values)
|
||||
tree = BallTree(lightning_coords_rad, metric='haversine')
|
||||
|
||||
turbine_coords_rad = np.radians(turbine_df[['lat', 'lng']].values)
|
||||
ind_list, dist_list = tree.query_radius(turbine_coords_rad, r=radius_rad, return_distance=True, sort_results=True)
|
||||
|
||||
P_0 = config.risk_params['P_0']
|
||||
alpha = config.risk_params['alpha']
|
||||
current_weight = config.risk_params['current_weight']
|
||||
currents = np.abs(cg_lightning_df['current'].values)
|
||||
|
||||
risk_scores = np.zeros(len(turbine_df), dtype=float)
|
||||
for i in range(len(turbine_df)):
|
||||
idxs = ind_list[i]
|
||||
if idxs.size == 0:
|
||||
continue
|
||||
dists_km = dist_list[i] * earth_km
|
||||
current_mag = currents[idxs]
|
||||
current_factor = 1.0 + current_weight * current_mag / 10000.0
|
||||
distance_factor = np.exp(-alpha * dists_km)
|
||||
risk_scores[i] = np.sum(P_0 * current_factor * distance_factor)
|
||||
|
||||
result_df = turbine_df.copy()
|
||||
result_df['risk_score'] = risk_scores
|
||||
result_df['risk_log'] = np.log10(risk_scores + 1.0)
|
||||
logger.info("BallTree-based risk calculation completed")
|
||||
return result_df
|
||||
except Exception as e:
|
||||
logger.warning(f"Falling back to vectorized matrix risk calculation due to: {e}")
|
||||
return calculate_turbine_risks_vectorized(turbine_df, lightning_df)
|
||||
208
src/analysis/statistics.py
Normal file
208
src/analysis/statistics.py
Normal file
@ -0,0 +1,208 @@
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from datetime import datetime
|
||||
from src.analysis.geospatial import haversine_distance, haversine_distance_vectorized
|
||||
from src.config import config
|
||||
from src.utils import get_analysis_radius_m, parse_period_string_to_datetime
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def calculate_lightning_statistics(lightning_df, centroid_lat, centroid_lng, start_date=None, end_date=None):
|
||||
"""
|
||||
Calculate lightning statistics for the report with detailed breakdown by distance rings.
|
||||
|
||||
Args:
|
||||
lightning_df: DataFrame containing lightning data
|
||||
centroid_lat: Center latitude
|
||||
centroid_lng: Center longitude
|
||||
start_date: Start date in 'DD-MM-YYYY' format (optional)
|
||||
end_date: End date in 'DD-MM-YYYY' format (optional)
|
||||
"""
|
||||
max_distance = get_analysis_radius_m()
|
||||
|
||||
# Filter lightning data by distance (vectorized)
|
||||
if len(lightning_df) == 0:
|
||||
return {
|
||||
'intercloud_by_day': {},
|
||||
'cloud_to_ground_by_day': {},
|
||||
'total_lightning_per_km2': 0,
|
||||
'daily_lightning_per_km2': 0,
|
||||
'period_days': 0.0,
|
||||
'total_events': 0,
|
||||
'area_km2': 0,
|
||||
'max_distance_km': max_distance / 1000,
|
||||
'lightning_by_distance_rings': {},
|
||||
'daily_lightning_by_rings': {}
|
||||
}
|
||||
|
||||
dists_all = haversine_distance_vectorized(
|
||||
centroid_lat,
|
||||
centroid_lng,
|
||||
lightning_df['lat'].values,
|
||||
lightning_df['lng'].values,
|
||||
)
|
||||
mask = dists_all <= max_distance
|
||||
filtered_df = lightning_df.loc[mask].copy()
|
||||
|
||||
if len(filtered_df) == 0:
|
||||
return {
|
||||
'intercloud_by_day': {},
|
||||
'cloud_to_ground_by_day': {},
|
||||
'total_lightning_per_km2': 0,
|
||||
'daily_lightning_per_km2': 0,
|
||||
'period_days': 0.0,
|
||||
'total_events': 0,
|
||||
'area_km2': 0,
|
||||
'max_distance_km': max_distance / 1000,
|
||||
'lightning_by_distance_rings': {},
|
||||
'daily_lightning_by_rings': {}
|
||||
}
|
||||
|
||||
# Convert local_time to datetime if needed
|
||||
if filtered_df['local_time'].dtype == 'object':
|
||||
filtered_df['local_time'] = pd.to_datetime(filtered_df['local_time'])
|
||||
|
||||
# Add distance column
|
||||
filtered_df['distance_km'] = (dists_all[mask] / 1000)
|
||||
|
||||
# Add date column
|
||||
filtered_df['date'] = filtered_df['local_time'].dt.date
|
||||
|
||||
# 1. Inter-cloud lightnings by day (outermost ring only)
|
||||
intercloud_df = filtered_df[filtered_df['p_type'].astype(str) == '1'].copy()
|
||||
intercloud_by_day = {}
|
||||
if len(intercloud_df) > 0:
|
||||
intercloud_by_day = intercloud_df['date'].value_counts().to_dict()
|
||||
# Convert date objects to strings for JSON serialization
|
||||
intercloud_by_day = {date.strftime("%d-%m-%Y"): count for date, count in intercloud_by_day.items()}
|
||||
|
||||
# 2. Cloud-to-ground lightnings by day (outermost ring only)
|
||||
cloud_to_ground_df = filtered_df[filtered_df['p_type'].astype(str) == '0'].copy()
|
||||
cloud_to_ground_by_day = {}
|
||||
if len(cloud_to_ground_df) > 0:
|
||||
cloud_to_ground_by_day = cloud_to_ground_df['date'].value_counts().to_dict()
|
||||
# Convert date objects to strings for JSON serialization
|
||||
cloud_to_ground_by_day = {date.strftime("%d-%m-%Y"): count for date, count in cloud_to_ground_by_day.items()}
|
||||
|
||||
# 3. Lightning counts by distance rings (ranges instead of cumulative)
|
||||
lightning_by_distance_rings = {}
|
||||
for i, ring_distance in enumerate(config.distance_rings):
|
||||
ring_km = ring_distance / 1000
|
||||
|
||||
# Define distance range for this ring
|
||||
if i == 0:
|
||||
# First ring: 0 to ring_km
|
||||
min_distance = 0
|
||||
ring_max_distance = ring_km
|
||||
ring_name = f"0-{ring_km:.1f}km"
|
||||
else:
|
||||
# Other rings: previous ring to current ring
|
||||
prev_ring_km = config.distance_rings[i-1] / 1000
|
||||
min_distance = prev_ring_km
|
||||
ring_max_distance = ring_km
|
||||
ring_name = f"{prev_ring_km:.1f}-{ring_km:.1f}km"
|
||||
|
||||
# Count lightning within this distance range
|
||||
ring_lightning = filtered_df[
|
||||
(filtered_df['distance_km'] > min_distance) &
|
||||
(filtered_df['distance_km'] <= ring_max_distance)
|
||||
]
|
||||
|
||||
# Count by type
|
||||
intercloud_count = len(ring_lightning[ring_lightning['p_type'].astype(str) == '1'])
|
||||
cloud_to_ground_count = len(ring_lightning[ring_lightning['p_type'].astype(str) == '0'])
|
||||
total_count = len(ring_lightning)
|
||||
|
||||
lightning_by_distance_rings[ring_name] = {
|
||||
'intercloud': intercloud_count,
|
||||
'cloud_to_ground': cloud_to_ground_count,
|
||||
'total': total_count
|
||||
}
|
||||
|
||||
# 4. Daily lightning breakdown by distance rings
|
||||
daily_lightning_by_rings = {}
|
||||
|
||||
# Get unique dates
|
||||
unique_dates = sorted(filtered_df['date'].unique())
|
||||
|
||||
for date in unique_dates:
|
||||
date_str = date.strftime("%d-%m-%Y")
|
||||
daily_lightning_by_rings[date_str] = {}
|
||||
|
||||
# Get lightning for this date
|
||||
daily_lightning = filtered_df[filtered_df['date'] == date]
|
||||
|
||||
for i, ring_distance in enumerate(config.distance_rings):
|
||||
ring_km = ring_distance / 1000
|
||||
|
||||
# Define distance range for this ring
|
||||
if i == 0:
|
||||
# First ring: 0 to ring_km
|
||||
min_distance = 0
|
||||
ring_max_distance = ring_km
|
||||
ring_name = f"0-{ring_km:.1f}km"
|
||||
else:
|
||||
# Other rings: previous ring to current ring
|
||||
prev_ring_km = config.distance_rings[i-1] / 1000
|
||||
min_distance = prev_ring_km
|
||||
ring_max_distance = ring_km
|
||||
ring_name = f"{prev_ring_km:.1f}-{ring_km:.1f}km"
|
||||
|
||||
# Count lightning within this distance range for this date
|
||||
ring_lightning = daily_lightning[
|
||||
(daily_lightning['distance_km'] > min_distance) &
|
||||
(daily_lightning['distance_km'] <= ring_max_distance)
|
||||
]
|
||||
|
||||
# Count by type
|
||||
intercloud_count = len(ring_lightning[ring_lightning['p_type'].astype(str) == '1'])
|
||||
cloud_to_ground_count = len(ring_lightning[ring_lightning['p_type'].astype(str) == '0'])
|
||||
total_count = len(ring_lightning)
|
||||
|
||||
daily_lightning_by_rings[date_str][ring_name] = {
|
||||
'intercloud': intercloud_count,
|
||||
'cloud_to_ground': cloud_to_ground_count,
|
||||
'total': total_count
|
||||
}
|
||||
|
||||
# 5. Calculate area and lightning density (outermost ring)
|
||||
area_km2 = np.pi * (max_distance / 1000) ** 2
|
||||
total_events = len(filtered_df)
|
||||
total_lightning_per_km2 = total_events / area_km2 if area_km2 > 0 else 0
|
||||
|
||||
if start_date and end_date:
|
||||
start_dt = parse_period_string_to_datetime(start_date)
|
||||
end_dt = parse_period_string_to_datetime(end_date)
|
||||
if start_dt is not None and end_dt is not None:
|
||||
delta = end_dt - start_dt
|
||||
period_seconds = delta.total_seconds()
|
||||
period_days = period_seconds / 86400.0 if period_seconds > 0 else 1.0
|
||||
logger.info(f"Analysis period duration: {period_days:.4f} days ({start_date} to {end_date})")
|
||||
else:
|
||||
period_days = 1.0
|
||||
if len(filtered_df) > 0:
|
||||
first_date = filtered_df['local_time'].min()
|
||||
period_days = float(pd.Timestamp(first_date.year, first_date.month, 1).days_in_month)
|
||||
logger.warning(f"Failed to parse date range ({start_date}, {end_date}). Using period_days={period_days}")
|
||||
elif len(filtered_df) > 0:
|
||||
first_date = filtered_df['local_time'].min()
|
||||
period_days = float(pd.Timestamp(first_date.year, first_date.month, 1).days_in_month)
|
||||
logger.info(f"Using month as period: {period_days} days")
|
||||
else:
|
||||
period_days = 1.0
|
||||
|
||||
daily_lightning_per_km2 = total_lightning_per_km2 / period_days if period_days > 0 else 0
|
||||
|
||||
return {
|
||||
'intercloud_by_day': intercloud_by_day,
|
||||
'cloud_to_ground_by_day': cloud_to_ground_by_day,
|
||||
'total_lightning_per_km2': total_lightning_per_km2,
|
||||
'daily_lightning_per_km2': daily_lightning_per_km2,
|
||||
'period_days': period_days,
|
||||
'total_events': total_events,
|
||||
'area_km2': area_km2,
|
||||
'max_distance_km': max_distance / 1000,
|
||||
'lightning_by_distance_rings': lightning_by_distance_rings,
|
||||
'daily_lightning_by_rings': daily_lightning_by_rings
|
||||
}
|
||||
275
src/api/data_fetcher.py
Normal file
275
src/api/data_fetcher.py
Normal file
@ -0,0 +1,275 @@
|
||||
import os
|
||||
import requests
|
||||
import json
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Dict, List, Optional, Tuple
|
||||
import logging
|
||||
from dotenv import load_dotenv
|
||||
import pandas as pd
|
||||
import re
|
||||
|
||||
load_dotenv()
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class APIDataFetcher:
|
||||
def __init__(self, base_url: str, timeout: int = 30, retry_attempts: int = 3):
|
||||
self.base_url = base_url.rstrip('/')
|
||||
self.api_key = os.getenv('API_KEY')
|
||||
self.timeout = timeout
|
||||
self.retry_attempts = retry_attempts
|
||||
|
||||
if not self.api_key:
|
||||
raise ValueError("API_KEY not found in .env file")
|
||||
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
'Content-Type': 'application/json',
|
||||
'x-api-key': self.api_key
|
||||
})
|
||||
|
||||
def _make_request(self, endpoint: str, params: Dict) -> Dict:
|
||||
"""Make API request with retry logic."""
|
||||
url = f"{self.base_url}{endpoint}"
|
||||
|
||||
# API key is already in headers (x-api-key)
|
||||
for attempt in range(self.retry_attempts):
|
||||
try:
|
||||
response = self.session.get(url, params=params, timeout=self.timeout)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
except requests.exceptions.Timeout:
|
||||
logger.warning(f"API request timeout (attempt {attempt + 1}/{self.retry_attempts})")
|
||||
if attempt == self.retry_attempts - 1:
|
||||
raise
|
||||
except requests.exceptions.RequestException as e:
|
||||
logger.error(f"API request failed: {e}")
|
||||
if attempt == self.retry_attempts - 1:
|
||||
raise
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"Failed to parse API response: {e}")
|
||||
raise
|
||||
|
||||
raise Exception("API request failed after all retries")
|
||||
|
||||
def fetch_lightning_data(
|
||||
self,
|
||||
center_lat: float,
|
||||
center_lng: float,
|
||||
radius_km: float,
|
||||
start_date: str,
|
||||
end_date: str
|
||||
) -> List[Dict]:
|
||||
"""
|
||||
Fetch lightning data from API.
|
||||
|
||||
Args:
|
||||
center_lat: Center latitude
|
||||
center_lng: Center longitude
|
||||
radius_km: Radius in kilometers (converted to meters for API)
|
||||
start_date: Start date in YYYY-MM-DD format
|
||||
end_date: End date in YYYY-MM-DD format
|
||||
|
||||
Returns:
|
||||
List of lightning records
|
||||
"""
|
||||
endpoint = "/lightning-data/historical/"
|
||||
|
||||
radius_m = int(radius_km * 1000)
|
||||
|
||||
params = {
|
||||
'queryType': 'circle',
|
||||
'centerLongitude': center_lng,
|
||||
'centerLatitude': center_lat,
|
||||
'radius': radius_m,
|
||||
'startDate': start_date,
|
||||
'endDate': end_date
|
||||
}
|
||||
|
||||
logger.info(f"Fetching lightning data: center=({center_lat}, {center_lng}), radius={radius_km}km, dates={start_date} to {end_date}")
|
||||
|
||||
try:
|
||||
data = self._make_request(endpoint, params)
|
||||
|
||||
if isinstance(data, dict) and 'data' in data:
|
||||
records = data['data']
|
||||
elif isinstance(data, list):
|
||||
records = data
|
||||
else:
|
||||
records = []
|
||||
|
||||
logger.info(f"Fetched {len(records)} lightning records")
|
||||
return records
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to fetch lightning data: {e}")
|
||||
raise
|
||||
|
||||
def fetch_storm_data(
|
||||
self,
|
||||
center_lat: float,
|
||||
center_lng: float,
|
||||
radius_km: float,
|
||||
start_date: str,
|
||||
end_date: str
|
||||
) -> List[Dict]:
|
||||
"""
|
||||
Fetch storm data from API.
|
||||
|
||||
Args:
|
||||
center_lat: Center latitude
|
||||
center_lng: Center longitude
|
||||
radius_km: Radius in kilometers (converted to meters for API)
|
||||
start_date: Start date in YYYY-MM-DD format
|
||||
end_date: End date in YYYY-MM-DD format
|
||||
|
||||
Returns:
|
||||
List of storm records
|
||||
"""
|
||||
endpoint = "/storm-data/historical/"
|
||||
|
||||
radius_m = int(radius_km * 1000)
|
||||
|
||||
params = {
|
||||
'queryType': 'circle',
|
||||
'centerLongitude': center_lng,
|
||||
'centerLatitude': center_lat,
|
||||
'radius': radius_m,
|
||||
'startDate': start_date,
|
||||
'endDate': end_date
|
||||
}
|
||||
|
||||
logger.info(f"Fetching storm data: center=({center_lat}, {center_lng}), radius={radius_km}km, dates={start_date} to {end_date}")
|
||||
|
||||
try:
|
||||
data = self._make_request(endpoint, params)
|
||||
|
||||
if isinstance(data, dict) and 'data' in data:
|
||||
records = data['data']
|
||||
elif isinstance(data, list):
|
||||
records = data
|
||||
else:
|
||||
records = []
|
||||
|
||||
logger.info(f"Fetched {len(records)} storm records")
|
||||
return records
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to fetch storm data: {e}")
|
||||
raise
|
||||
|
||||
def calculate_location_bounds(
|
||||
self,
|
||||
turbine_df: pd.DataFrame,
|
||||
max_distance_ring_m: int,
|
||||
padding_km: float = 5
|
||||
) -> Dict[str, float]:
|
||||
"""
|
||||
Auto-calculate center + radius from turbine coordinates.
|
||||
|
||||
Args:
|
||||
turbine_df: DataFrame with 'lat' and 'lng' columns
|
||||
max_distance_ring_m: Maximum distance ring in meters
|
||||
padding_km: Additional padding in kilometers
|
||||
|
||||
Returns:
|
||||
Dict with center_lat, center_lng, radius_km
|
||||
"""
|
||||
from src.analysis.geospatial import haversine_distance
|
||||
|
||||
centroid_lat = turbine_df['lat'].mean()
|
||||
centroid_lng = turbine_df['lng'].mean()
|
||||
|
||||
max_distance_from_centroid = 0
|
||||
for _, turbine in turbine_df.iterrows():
|
||||
distance = haversine_distance(
|
||||
centroid_lat, centroid_lng,
|
||||
turbine['lat'], turbine['lng']
|
||||
)
|
||||
max_distance_from_centroid = max(max_distance_from_centroid, distance)
|
||||
|
||||
radius_km = (max_distance_from_centroid / 1000) + \
|
||||
(max_distance_ring_m / 1000) + \
|
||||
padding_km
|
||||
|
||||
return {
|
||||
"center_lat": centroid_lat,
|
||||
"center_lng": centroid_lng,
|
||||
"radius_km": radius_km
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def determine_query_date_range(
|
||||
farm_config: Dict,
|
||||
api_config: Dict
|
||||
) -> Tuple[datetime, datetime]:
|
||||
"""
|
||||
Determine date range for API query.
|
||||
|
||||
Args:
|
||||
farm_config: Farm configuration dict
|
||||
api_config: API configuration dict
|
||||
|
||||
Returns:
|
||||
Tuple of (start_date, end_date) as datetime objects
|
||||
"""
|
||||
date_config = farm_config['api_params']['date_range']
|
||||
|
||||
def _parse_config_date(value: str) -> datetime:
|
||||
value_str = str(value).strip()
|
||||
if not value_str:
|
||||
raise ValueError("Empty date value")
|
||||
|
||||
if re.fullmatch(r"\d{2}-\d{2}-\d{4}", value_str):
|
||||
return datetime.strptime(value_str, '%d-%m-%Y')
|
||||
|
||||
ts = pd.to_datetime(value_str, errors='raise')
|
||||
if isinstance(ts, pd.Timestamp):
|
||||
if ts.tzinfo is not None:
|
||||
ts = ts.tz_convert('UTC').tz_localize(None)
|
||||
return ts.to_pydatetime()
|
||||
|
||||
raise ValueError(f"Unsupported date value: {value_str}")
|
||||
|
||||
if date_config['method'] == 'manual':
|
||||
start_date = _parse_config_date(date_config['start_date'])
|
||||
end_date = _parse_config_date(date_config['end_date'])
|
||||
return start_date, end_date
|
||||
|
||||
today = datetime.now()
|
||||
query_range = date_config.get('query_range', {})
|
||||
method = query_range.get('method', api_config.get('default_query_range', {}).get('method', 'current_month'))
|
||||
|
||||
if method == 'current_month':
|
||||
start_date = today.replace(day=1)
|
||||
end_date = today
|
||||
|
||||
elif method == 'last_month':
|
||||
if today.month == 1:
|
||||
start_date = today.replace(year=today.year-1, month=12, day=1)
|
||||
else:
|
||||
start_date = today.replace(month=today.month-1, day=1)
|
||||
|
||||
if today.month == 1:
|
||||
last_day = (today.replace(year=today.year-1, month=12, day=28) +
|
||||
timedelta(days=4)).replace(day=1) - timedelta(days=1)
|
||||
else:
|
||||
last_day = (today.replace(month=today.month, day=28) +
|
||||
timedelta(days=4)).replace(day=1) - timedelta(days=1)
|
||||
end_date = last_day
|
||||
|
||||
elif method == 'days_back':
|
||||
days = query_range.get('days', 30)
|
||||
start_date = today - timedelta(days=days)
|
||||
end_date = today
|
||||
|
||||
elif method == 'custom':
|
||||
start_date = _parse_config_date(query_range['start_date'])
|
||||
end_date = _parse_config_date(query_range['end_date'])
|
||||
|
||||
else:
|
||||
start_date = today.replace(day=1)
|
||||
end_date = today
|
||||
|
||||
return start_date, end_date
|
||||
|
||||
92
src/config.py
Normal file
92
src/config.py
Normal file
@ -0,0 +1,92 @@
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Dict, Any, Optional
|
||||
import os
|
||||
|
||||
|
||||
@dataclass
|
||||
class Config:
|
||||
"""
|
||||
Centralized configuration for global/default settings.
|
||||
|
||||
IMPORTANT: Farm-specific settings (distance_rings, ring_colors, wind_farm_name,
|
||||
file paths, date ranges) are managed in wind_farms_config.json and set dynamically
|
||||
by batch_generate.py. The values below are only fallback defaults for backward
|
||||
compatibility and should NOT be configured here.
|
||||
"""
|
||||
|
||||
# Farm-specific settings (set by batch processing, DO NOT configure here)
|
||||
distance_rings: List[int] = None
|
||||
ring_colors: List[str] = None
|
||||
wind_farm_name: str = None
|
||||
analysis_start_date: Optional[str] = None
|
||||
analysis_end_date: Optional[str] = None
|
||||
timezone: Optional[str] = None
|
||||
|
||||
# Lightning data source configuration
|
||||
# By default, lightning data is expected from API (JSON output).
|
||||
# When lightning_source_type is set to "csv", lightning_csv should
|
||||
# point to a CSV file that can be loaded instead.
|
||||
lightning_source_type: str = "api"
|
||||
lightning_json: Optional[str] = None
|
||||
lightning_csv: Optional[str] = None
|
||||
|
||||
# Risk calculation parameters (global defaults)
|
||||
risk_params: Dict[str, float] = None
|
||||
|
||||
# Histogram parameters (global defaults)
|
||||
histogram_params: Dict[str, Any] = None
|
||||
|
||||
# Grouping parameters (global defaults, can be overridden per-farm)
|
||||
grouping_params: Dict[str, Any] = None
|
||||
|
||||
# Centralized histogram layout configuration (global defaults)
|
||||
histogram_layout: Dict[str, Any] = None
|
||||
|
||||
def __post_init__(self):
|
||||
"""Set default values for global settings."""
|
||||
# Fallback defaults for farm-specific settings (only used if not set by batch processing)
|
||||
# DO NOT configure these - they come from wind_farms_config.json
|
||||
if self.distance_rings is None:
|
||||
self.distance_rings = [1000, 2000, 3000, 4000, 10000]
|
||||
|
||||
if self.ring_colors is None:
|
||||
self.ring_colors = ['purple', 'red', 'orange', 'coral', 'green']
|
||||
|
||||
if self.risk_params is None:
|
||||
self.risk_params = {
|
||||
'P_0': 1.0,
|
||||
'alpha': 0.5,
|
||||
'current_weight': 0.1
|
||||
}
|
||||
|
||||
if self.histogram_params is None:
|
||||
self.histogram_params = {
|
||||
'min_gap_minutes': 30,
|
||||
'min_events_per_period': 10,
|
||||
'max_periods': 8,
|
||||
'max_periods_per_figure': 6,
|
||||
'height_per_row': 200,
|
||||
'width': 800
|
||||
}
|
||||
|
||||
if self.histogram_layout is None:
|
||||
self.histogram_layout = {
|
||||
'plot_width': 800,
|
||||
'plot_height_per_row': 250,
|
||||
'pdf_width_ratio': 0.95,
|
||||
'pdf_aspect_ratio': 0.75,
|
||||
'pdf_top_margin': 140,
|
||||
'pdf_left_margin': 40,
|
||||
'image_scale': 2,
|
||||
'image_engine': 'kaleido'
|
||||
}
|
||||
|
||||
if self.grouping_params is None:
|
||||
self.grouping_params = {
|
||||
'max_distance_m': None,
|
||||
'distance_ring_index': 4,
|
||||
'min_group_size': 1,
|
||||
'max_group_size': 50
|
||||
}
|
||||
# Global configuration instance
|
||||
config = Config()
|
||||
205
src/data/loader.py
Normal file
205
src/data/loader.py
Normal file
@ -0,0 +1,205 @@
|
||||
import logging
|
||||
from typing import Dict, Any
|
||||
|
||||
import pandas as pd
|
||||
|
||||
from ..config import config
|
||||
from ..utils import (
|
||||
ensure_datetime_column,
|
||||
filter_lightning_data_by_date_range,
|
||||
load_json_data,
|
||||
normalize_local_time_to_timezone,
|
||||
validate_lightning_data,
|
||||
validate_turbine_data,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _rename_columns_for_lightning(df: pd.DataFrame) -> pd.DataFrame:
|
||||
required = {
|
||||
"lat": ["lat", "latitude"],
|
||||
"lng": ["lng", "lon", "longitude", "long"],
|
||||
"current": ["current", "amplitude", "amp", "peak_current"],
|
||||
"p_type": ["p_type", "ptype", "type", "flash_type"],
|
||||
"local_time": ["local_time", "time", "time_utc", "timestamp", "datetime", "date_time", "localtime"],
|
||||
}
|
||||
|
||||
columns_lower = {col.lower(): col for col in df.columns}
|
||||
rename_map: Dict[str, str] = {}
|
||||
|
||||
for target, candidates in required.items():
|
||||
found_source = None
|
||||
for candidate in candidates:
|
||||
source = columns_lower.get(candidate.lower())
|
||||
if source is not None:
|
||||
found_source = source
|
||||
break
|
||||
if found_source is not None and found_source != target:
|
||||
rename_map[found_source] = target
|
||||
|
||||
if rename_map:
|
||||
df = df.rename(columns=rename_map)
|
||||
|
||||
missing = [col for col in ["lat", "lng", "current", "p_type", "local_time"] if col not in df.columns]
|
||||
if missing:
|
||||
raise ValueError(f"Missing required lightning columns in CSV: {missing}")
|
||||
|
||||
return df
|
||||
|
||||
|
||||
def load_lightning_data_from_csv(csv_path: str | None = None) -> pd.DataFrame:
|
||||
"""
|
||||
Load and validate lightning data from a CSV file.
|
||||
|
||||
Args:
|
||||
csv_path: Path to lightning CSV file. If None, uses config.lightning_csv.
|
||||
|
||||
Returns:
|
||||
DataFrame containing lightning data
|
||||
"""
|
||||
if csv_path is None:
|
||||
csv_path = getattr(config, "lightning_csv", None)
|
||||
if csv_path is None:
|
||||
raise ValueError("lightning_csv path must be provided (not available in config)")
|
||||
|
||||
logger.info(f"Loading lightning data from CSV {csv_path}")
|
||||
|
||||
try:
|
||||
df = pd.read_csv(csv_path)
|
||||
|
||||
if len(df) == 0:
|
||||
logger.warning(
|
||||
"No lightning records found in CSV - creating empty DataFrame with required columns",
|
||||
)
|
||||
df = pd.DataFrame(columns=["lat", "lng", "current", "p_type", "local_time"])
|
||||
else:
|
||||
df = _rename_columns_for_lightning(df)
|
||||
|
||||
if not validate_lightning_data(df):
|
||||
raise ValueError("Lightning data validation failed for CSV source")
|
||||
|
||||
df = ensure_datetime_column(df, "local_time")
|
||||
|
||||
if "current_abs" not in df.columns and len(df) > 0:
|
||||
df["current_abs"] = df["current"].abs()
|
||||
elif "current_abs" not in df.columns and len(df) == 0:
|
||||
df["current_abs"] = pd.Series(dtype="float64")
|
||||
|
||||
start_date = getattr(config, "analysis_start_date", None)
|
||||
end_date = getattr(config, "analysis_end_date", None)
|
||||
|
||||
if start_date and end_date:
|
||||
df = filter_lightning_data_by_date_range(df, start_date, end_date)
|
||||
|
||||
farm_tz = getattr(config, "timezone", None)
|
||||
if len(df) > 0 and farm_tz:
|
||||
df = normalize_local_time_to_timezone(df, "local_time", farm_tz)
|
||||
|
||||
logger.info(f"Successfully loaded {len(df)} lightning records from CSV")
|
||||
return df
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load lightning data from CSV: {e}")
|
||||
raise
|
||||
|
||||
|
||||
def load_lightning_data(json_path: str = None) -> pd.DataFrame:
|
||||
"""
|
||||
Load and validate lightning data from configured source.
|
||||
|
||||
If config.lightning_source_type == \"csv\" and no explicit json_path is provided,
|
||||
data will be loaded from CSV. Otherwise, API/JSON loading is used.
|
||||
"""
|
||||
if json_path is None:
|
||||
source_type = getattr(config, "lightning_source_type", "api")
|
||||
if source_type == "csv":
|
||||
return load_lightning_data_from_csv()
|
||||
|
||||
json_path = getattr(config, "lightning_json", None)
|
||||
if json_path is None:
|
||||
raise ValueError("lightning_json path must be provided (not available in config)")
|
||||
|
||||
logger.info(f"Loading lightning data from JSON {json_path}")
|
||||
|
||||
try:
|
||||
data = load_json_data(json_path)
|
||||
|
||||
if isinstance(data, dict) and len(data) > 0:
|
||||
if "data" in data and isinstance(data["data"], list):
|
||||
records = data["data"]
|
||||
else:
|
||||
records = list(data.values())[0]
|
||||
else:
|
||||
records = data
|
||||
|
||||
df = pd.DataFrame(records)
|
||||
|
||||
if len(df) == 0:
|
||||
logger.warning("No lightning records found in data - creating empty DataFrame with required columns")
|
||||
df = pd.DataFrame(columns=["lat", "lng", "current", "p_type", "local_time"])
|
||||
|
||||
if not validate_lightning_data(df):
|
||||
raise ValueError("Lightning data validation failed")
|
||||
|
||||
df = ensure_datetime_column(df, "local_time")
|
||||
|
||||
if "current_abs" not in df.columns and len(df) > 0:
|
||||
df["current_abs"] = df["current"].abs()
|
||||
elif "current_abs" not in df.columns and len(df) == 0:
|
||||
df["current_abs"] = pd.Series(dtype="float64")
|
||||
|
||||
start_date = getattr(config, "analysis_start_date", None)
|
||||
end_date = getattr(config, "analysis_end_date", None)
|
||||
|
||||
if start_date and end_date:
|
||||
df = filter_lightning_data_by_date_range(df, start_date, end_date)
|
||||
|
||||
farm_tz = getattr(config, "timezone", None)
|
||||
if len(df) > 0 and farm_tz:
|
||||
df = normalize_local_time_to_timezone(df, "local_time", farm_tz)
|
||||
|
||||
logger.info(f"Successfully loaded {len(df)} lightning records")
|
||||
return df
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load lightning data: {e}")
|
||||
raise
|
||||
|
||||
def load_turbine_data(json_path: str = None) -> pd.DataFrame:
|
||||
"""
|
||||
Load and validate turbine data from JSON file.
|
||||
|
||||
Args:
|
||||
json_path: Path to turbine JSON file. If None, uses config default.
|
||||
|
||||
Returns:
|
||||
DataFrame containing turbine data
|
||||
|
||||
Raises:
|
||||
ValueError: If data validation fails
|
||||
"""
|
||||
if json_path is None:
|
||||
json_path = getattr(config, 'turbine_json', None)
|
||||
if json_path is None:
|
||||
raise ValueError("turbine_json path must be provided (not available in config)")
|
||||
|
||||
logger.info(f"Loading turbine data from {json_path}")
|
||||
|
||||
try:
|
||||
# Load JSON data
|
||||
data = load_json_data(json_path)
|
||||
|
||||
# Create DataFrame
|
||||
df = pd.DataFrame(data)
|
||||
|
||||
# Validate data
|
||||
if not validate_turbine_data(df):
|
||||
raise ValueError("Turbine data validation failed")
|
||||
|
||||
logger.info(f"Successfully loaded {len(df)} turbine records")
|
||||
return df
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load turbine data: {e}")
|
||||
raise
|
||||
1114
src/reporting/docx.py
Normal file
1114
src/reporting/docx.py
Normal file
File diff suppressed because it is too large
Load Diff
236
src/reporting/docx_sections.py
Normal file
236
src/reporting/docx_sections.py
Normal file
@ -0,0 +1,236 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import plotly.graph_objects as go
|
||||
|
||||
from src.config import config
|
||||
from src.reporting.precompute import precompute_group_distances_and_rings
|
||||
|
||||
|
||||
def _build_current_vs_distance_chart(
|
||||
lightning_df: pd.DataFrame,
|
||||
dists_km: np.ndarray,
|
||||
mask_within: np.ndarray,
|
||||
lightning_type_filter: str,
|
||||
title: str,
|
||||
fig_width: int,
|
||||
fig_height: int,
|
||||
) -> go.Figure | None:
|
||||
if lightning_type_filter == "cg":
|
||||
type_mask = lightning_df["p_type"].astype(str) == "0"
|
||||
else:
|
||||
type_mask = lightning_df["p_type"].astype(str) != "0"
|
||||
|
||||
combined_mask = mask_within & type_mask
|
||||
if combined_mask.sum() == 0:
|
||||
return None
|
||||
|
||||
subset = lightning_df.loc[combined_mask].copy()
|
||||
distances = dists_km[combined_mask]
|
||||
currents = subset["current"].values.astype(float)
|
||||
|
||||
time_series = pd.to_datetime(subset["local_time"])
|
||||
time_dt = time_series.sort_values()
|
||||
|
||||
rings_km = np.array(config.distance_rings, dtype=float) / 1000.0
|
||||
ring_colors_cfg = getattr(config, "ring_colors", None) or []
|
||||
ring_indices = np.searchsorted(rings_km, distances, side="left").astype(int)
|
||||
ring_indices = np.clip(ring_indices, 0, len(rings_km) - 1)
|
||||
|
||||
ring_names: list[str] = []
|
||||
for i in range(len(rings_km)):
|
||||
if i == 0:
|
||||
ring_names.append(f"0-{rings_km[0]:.1f} km")
|
||||
else:
|
||||
ring_names.append(f"{rings_km[i - 1]:.1f}-{rings_km[i]:.1f} km")
|
||||
|
||||
t_min = time_dt.min()
|
||||
t_max = time_dt.max()
|
||||
tick_vals = pd.date_range(t_min, t_max, periods=4)
|
||||
tick_text = [t.strftime("%d-%m-%Y %H:%M") for t in tick_vals]
|
||||
|
||||
fig = go.Figure()
|
||||
for i in range(len(rings_km)):
|
||||
mask_ring = ring_indices == i
|
||||
if mask_ring.sum() == 0:
|
||||
continue
|
||||
color = ring_colors_cfg[i] if i < len(ring_colors_cfg) else "gray"
|
||||
r_times = time_series.values[mask_ring]
|
||||
r_currents = currents[mask_ring]
|
||||
r_dists = distances[mask_ring]
|
||||
r_time_labels = pd.to_datetime(r_times).strftime("%d-%m-%Y %H:%M").values
|
||||
fig.add_trace(
|
||||
go.Scatter(
|
||||
x=r_times,
|
||||
y=r_currents,
|
||||
mode="markers",
|
||||
name=ring_names[i],
|
||||
marker=dict(size=10, opacity=0.8, color=color),
|
||||
customdata=np.column_stack((r_dists, r_time_labels)),
|
||||
hovertemplate=(
|
||||
"Time: %{customdata[1]}<br>"
|
||||
"Current: %{y:.0f} A<br>"
|
||||
"Distance: %{customdata[0]} km<br>"
|
||||
"Ring: " + ring_names[i] + "<br>"
|
||||
"<extra></extra>"
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
fig.update_layout(
|
||||
font=dict(size=16),
|
||||
title=dict(text=title, x=0.5, font=dict(size=22)),
|
||||
xaxis_title="Time",
|
||||
yaxis_title="Current (A)",
|
||||
plot_bgcolor="white",
|
||||
paper_bgcolor="white",
|
||||
xaxis=dict(
|
||||
showgrid=True,
|
||||
gridcolor="lightgray",
|
||||
zeroline=False,
|
||||
tickvals=tick_vals,
|
||||
ticktext=tick_text,
|
||||
tickangle=-25,
|
||||
tickfont=dict(size=22),
|
||||
title_font=dict(size=28),
|
||||
),
|
||||
yaxis=dict(
|
||||
showgrid=True,
|
||||
gridcolor="lightgray",
|
||||
zeroline=False,
|
||||
tickfont=dict(size=22),
|
||||
title_font=dict(size=28),
|
||||
),
|
||||
legend=dict(
|
||||
title="Distance Ring",
|
||||
orientation="h",
|
||||
x=0.5,
|
||||
xanchor="center",
|
||||
y=-0.28,
|
||||
yanchor="top",
|
||||
bgcolor="rgba(255,255,255,0.8)",
|
||||
bordercolor="black",
|
||||
borderwidth=1,
|
||||
font=dict(size=20),
|
||||
title_font=dict(size=24),
|
||||
),
|
||||
width=fig_width,
|
||||
height=fig_height,
|
||||
margin=dict(l=70, r=40, t=50, b=130),
|
||||
)
|
||||
|
||||
return fig
|
||||
|
||||
|
||||
def build_group_lightning_table_data(
|
||||
centroid_lat: float, centroid_lng: float, lightning_df: pd.DataFrame
|
||||
) -> tuple[list[list[str]], list[str]]:
|
||||
pre = precompute_group_distances_and_rings(
|
||||
centroid_lat, centroid_lng, lightning_df, config.distance_rings
|
||||
)
|
||||
rows: list[list[str]] = []
|
||||
row_colors: list[str] = []
|
||||
outermost_km = max(config.distance_rings) / 1000.0
|
||||
rings_km = [r / 1000.0 for r in config.distance_rings]
|
||||
|
||||
for i, rec in enumerate(lightning_df.itertuples(index=False)):
|
||||
proximity = float(pre["dists_km"][i])
|
||||
if proximity > outermost_km:
|
||||
continue
|
||||
ri = int(pre["ring_idx"][i])
|
||||
if ri >= len(rings_km):
|
||||
continue
|
||||
color = config.ring_colors[ri]
|
||||
try:
|
||||
from src.utils import format_datetime_ddmmyyyy_hhmmss
|
||||
|
||||
dt_val = (
|
||||
rec.local_time
|
||||
if not isinstance(rec.local_time, str)
|
||||
else pd.to_datetime(str(rec.local_time)[:19])
|
||||
)
|
||||
local_time = format_datetime_ddmmyyyy_hhmmss(pd.to_datetime(dt_val))
|
||||
except Exception:
|
||||
local_time = str(getattr(rec, "local_time", ""))[:19]
|
||||
|
||||
lightning_type = "cloud-to-ground" if str(rec.p_type) == "0" else "intercloud"
|
||||
height_val = getattr(rec, "ic_height", "")
|
||||
if height_val == "":
|
||||
height_val = getattr(rec, "height", "")
|
||||
|
||||
rows.append(
|
||||
[
|
||||
"",
|
||||
local_time,
|
||||
f"{rec.lat:.5f}",
|
||||
f"{rec.lng:.5f}",
|
||||
str(rec.current),
|
||||
str(height_val),
|
||||
lightning_type,
|
||||
f"{proximity:.2f}",
|
||||
]
|
||||
)
|
||||
row_colors.append(color)
|
||||
|
||||
sorted_data = sorted(
|
||||
zip(rows, row_colors),
|
||||
key=lambda x: (0 if x[0][6] == "cloud-to-ground" else 1, float(x[0][7])),
|
||||
)
|
||||
if sorted_data:
|
||||
rows, row_colors = zip(*sorted_data)
|
||||
rows = list(rows)
|
||||
row_colors = list(row_colors)
|
||||
for idx, row in enumerate(rows):
|
||||
row[0] = str(idx + 1)
|
||||
else:
|
||||
rows, row_colors = [], []
|
||||
|
||||
header = [
|
||||
"#",
|
||||
"Time (Local)",
|
||||
"Lat",
|
||||
"Lng",
|
||||
"Current (amps)",
|
||||
"Height (m)",
|
||||
"Lightning Type",
|
||||
"Proximity (km)",
|
||||
]
|
||||
return [header] + rows, ["lightgrey"] + row_colors
|
||||
|
||||
|
||||
def build_risk_table_data(
|
||||
turbine_df: pd.DataFrame, group_info: dict[str, Any]
|
||||
) -> tuple[list[list[str]] | None, list[str] | None]:
|
||||
if "risk_log" not in turbine_df.columns:
|
||||
return None, None
|
||||
|
||||
group_turbines = turbine_df.iloc[group_info["turbine_indices"]]
|
||||
rows: list[list[str]] = []
|
||||
row_colors: list[str] = []
|
||||
|
||||
from src.utils import get_risk_definition_by_fixed_intervals, get_turbine_color_by_fixed_intervals
|
||||
|
||||
for _, turbine in group_turbines.iterrows():
|
||||
risk_log = float(turbine.get("risk_log", 0) or 0)
|
||||
color = get_turbine_color_by_fixed_intervals(risk_log)
|
||||
rows.append(
|
||||
[
|
||||
str(turbine.get("name", "N/A")),
|
||||
f"{risk_log:.2f}",
|
||||
str(get_risk_definition_by_fixed_intervals(risk_log)),
|
||||
]
|
||||
)
|
||||
row_colors.append(str(color))
|
||||
|
||||
sorted_data = sorted(zip(rows, row_colors), key=lambda x: float(x[0][1]), reverse=True)
|
||||
if sorted_data:
|
||||
rows, row_colors = zip(*sorted_data)
|
||||
else:
|
||||
rows, row_colors = [], []
|
||||
|
||||
header = ["Turbine Name", "Log Risk", "Risk Definition"]
|
||||
return [header] + list(rows), ["lightgrey"] + list(row_colors)
|
||||
|
||||
133
src/reporting/filename_utils.py
Normal file
133
src/reporting/filename_utils.py
Normal file
@ -0,0 +1,133 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
import unicodedata
|
||||
from dataclasses import dataclass
|
||||
from datetime import date, datetime, timedelta
|
||||
from typing import Optional
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
import pandas as pd
|
||||
|
||||
|
||||
_DD_MM_YYYY_RE = re.compile(r"^\d{2}-\d{2}-\d{4}$")
|
||||
|
||||
|
||||
def slugify_ascii_underscore(value: str) -> str:
|
||||
"""
|
||||
Convert `value` into an ASCII-only slug suitable for filenames.
|
||||
|
||||
Rules:
|
||||
- spaces -> underscore
|
||||
- keep [A-Za-z0-9._-], convert everything else to underscore
|
||||
- collapse consecutive underscores
|
||||
- trim leading/trailing underscores
|
||||
"""
|
||||
if value is None:
|
||||
return "report"
|
||||
|
||||
s = str(value).strip()
|
||||
if not s:
|
||||
return "report"
|
||||
|
||||
s = s.replace(" ", "_")
|
||||
# Remove diacritics while preserving base ASCII letters (e.g. 'ğ' -> 'g').
|
||||
s = unicodedata.normalize("NFKD", s).encode("ascii", "ignore").decode("ascii")
|
||||
s = re.sub(r"[^A-Za-z0-9._-]+", "_", s)
|
||||
s = re.sub(r"_+", "_", s).strip("_")
|
||||
return s or "report"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FarmLocalDateRange:
|
||||
start_date_yyyy_mm_dd: str
|
||||
end_date_yyyy_mm_dd: str
|
||||
|
||||
|
||||
def _parse_date_value_local(value: str, tz: Optional[ZoneInfo]) -> date:
|
||||
"""
|
||||
Parse a date-like string from config into a local `date` in the given timezone.
|
||||
|
||||
Config supported formats:
|
||||
- DD-MM-YYYY (treated as already-local calendar date)
|
||||
- ISO datetime (converted to tz, then local calendar date extracted)
|
||||
"""
|
||||
v = str(value).strip()
|
||||
|
||||
if _DD_MM_YYYY_RE.match(v):
|
||||
# Local calendar date, no timezone conversion needed.
|
||||
dt = datetime.strptime(v, "%d-%m-%Y")
|
||||
return dt.date()
|
||||
|
||||
# ISO datetime path: interpret as UTC and convert to target timezone.
|
||||
# `utc=True` yields tz-aware timestamps.
|
||||
ts = pd.to_datetime(v, utc=True, errors="raise")
|
||||
if tz:
|
||||
ts = ts.tz_convert(tz)
|
||||
return ts.date()
|
||||
|
||||
|
||||
def farm_local_date_range_from_config(farm: dict) -> FarmLocalDateRange:
|
||||
"""
|
||||
Compute (start_date, end_date) for filename naming from `wind_farms_config.json`,
|
||||
using the farm timezone and interpreting configured date values as local.
|
||||
"""
|
||||
tz_name = farm.get("report_config", {}).get("timezone")
|
||||
tz = ZoneInfo(tz_name) if tz_name else None
|
||||
|
||||
date_range_cfg = farm.get("api_params", {}).get("date_range", {})
|
||||
method = str(date_range_cfg.get("method", "auto")).lower()
|
||||
|
||||
if method == "manual":
|
||||
start_val = date_range_cfg.get("start_date")
|
||||
end_val = date_range_cfg.get("end_date")
|
||||
if not start_val or not end_val:
|
||||
raise ValueError("Manual date_range requires start_date and end_date")
|
||||
start_dt = _parse_date_value_local(str(start_val), tz)
|
||||
end_dt = _parse_date_value_local(str(end_val), tz)
|
||||
return FarmLocalDateRange(
|
||||
start_date_yyyy_mm_dd=start_dt.strftime("%Y-%m-%d"),
|
||||
end_date_yyyy_mm_dd=end_dt.strftime("%Y-%m-%d"),
|
||||
)
|
||||
|
||||
# Auto mode: compute a local date range based on `query_range.method`.
|
||||
query_range_cfg = date_range_cfg.get("query_range", {}) or {}
|
||||
query_method = str(query_range_cfg.get("method", "current_month")).lower()
|
||||
|
||||
now_local = datetime.now(tz).date() if tz else datetime.now().date()
|
||||
|
||||
if query_method == "current_month":
|
||||
start_dt = date(now_local.year, now_local.month, 1)
|
||||
end_dt = now_local
|
||||
elif query_method == "last_month":
|
||||
if now_local.month == 1:
|
||||
prev_year = now_local.year - 1
|
||||
prev_month = 12
|
||||
else:
|
||||
prev_year = now_local.year
|
||||
prev_month = now_local.month - 1
|
||||
start_dt = date(prev_year, prev_month, 1)
|
||||
# Last day of previous month:
|
||||
first_of_next_month = date(prev_year, prev_month, 1) + timedelta(days=32)
|
||||
end_dt = date(first_of_next_month.year, first_of_next_month.month, 1) - timedelta(days=1)
|
||||
elif query_method == "days_back":
|
||||
days = int(query_range_cfg.get("days", 30))
|
||||
start_dt = now_local - timedelta(days=days)
|
||||
end_dt = now_local
|
||||
elif query_method == "custom":
|
||||
start_val = query_range_cfg.get("start_date")
|
||||
end_val = query_range_cfg.get("end_date")
|
||||
if not start_val or not end_val:
|
||||
raise ValueError("Auto date_range.custom requires query_range.start_date and end_date")
|
||||
start_dt = _parse_date_value_local(str(start_val), tz)
|
||||
end_dt = _parse_date_value_local(str(end_val), tz)
|
||||
else:
|
||||
# Fallback: treat as current_month.
|
||||
start_dt = date(now_local.year, now_local.month, 1)
|
||||
end_dt = now_local
|
||||
|
||||
return FarmLocalDateRange(
|
||||
start_date_yyyy_mm_dd=start_dt.strftime("%Y-%m-%d"),
|
||||
end_date_yyyy_mm_dd=end_dt.strftime("%Y-%m-%d"),
|
||||
)
|
||||
|
||||
209
src/reporting/gemini_commentary.py
Normal file
209
src/reporting/gemini_commentary.py
Normal file
@ -0,0 +1,209 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from typing import Any
|
||||
|
||||
from src.utils import get_risk_definition_by_fixed_intervals
|
||||
|
||||
|
||||
def build_gemini_prompt(context: dict[str, Any]) -> str:
|
||||
analysis_period = context.get("analysis_period", "N/A")
|
||||
analysis_radius_km = context.get("analysis_radius_km", None)
|
||||
total_events = context.get("total_events", None)
|
||||
total_lightning_per_km2 = context.get("total_lightning_per_km2", None)
|
||||
turbine_count = context.get("turbine_count", None)
|
||||
is_single_turbine_report = context.get("is_single_turbine_report", None)
|
||||
|
||||
top_rings = context.get("top_rings", []) # list of (ring_name, total, cg_count, ic_count)
|
||||
max_risk_log = context.get("max_risk_log", None)
|
||||
max_risk_definition = context.get("max_risk_definition", None)
|
||||
top_turbine_name = context.get("top_turbine_name", "N/A")
|
||||
top_turbine_risk_log = context.get("top_turbine_risk_log", None)
|
||||
|
||||
storm_summary = context.get("storm_summary")
|
||||
storm_over_turbine = context.get("storm_over_turbine")
|
||||
storm_near_turbine_count = context.get("storm_near_turbine_count")
|
||||
storm_closest_distance_km = context.get("storm_closest_distance_km")
|
||||
storm_over_threshold_km = context.get("storm_over_threshold_km", 1.0)
|
||||
|
||||
ring_lines: list[str] = []
|
||||
for ring in top_rings[:3]:
|
||||
try:
|
||||
ring_name, total, cg_count, ic_count = ring
|
||||
except Exception:
|
||||
continue
|
||||
ring_lines.append(f"- {ring_name}: total={total}, cloud-to-ground={cg_count}, intercloud={ic_count}")
|
||||
|
||||
storm_lines: list[str] = []
|
||||
if isinstance(storm_summary, dict) and storm_summary:
|
||||
total_cells = storm_summary.get("total_cells", 0)
|
||||
severity_counts = storm_summary.get("severity_counts", {}) or {}
|
||||
storm_lines.append(f"- total_cells={total_cells}")
|
||||
for severity, count in severity_counts.items():
|
||||
storm_lines.append(f"- {severity}_cells={count}")
|
||||
|
||||
return (
|
||||
"You are generating a single neutral, factual commentary paragraph for a lightning activity report.\n"
|
||||
"Write exactly 3-4 sentences.\n"
|
||||
"Do not invent any numbers. Only use the values provided.\n"
|
||||
"\n"
|
||||
"Risk explanation requirements (must be reflected in the paragraph):\n"
|
||||
"- Turbine risk increases for cloud-to-ground strikes with larger current magnitude.\n"
|
||||
"- Turbine risk decays exponentially with increasing distance from the turbine.\n"
|
||||
"- The turbine risk score is the sum of per-strike contributions (then log-transformed for visualization/heatmaps/tables).\n"
|
||||
"\n"
|
||||
"Context:\n"
|
||||
f"- analysis_period: {analysis_period}\n"
|
||||
f"- analysis_radius_km: {analysis_radius_km}\n"
|
||||
f"- total_events: {total_events}\n"
|
||||
f"- total_lightning_per_km2: {total_lightning_per_km2}\n"
|
||||
f"- turbine_count: {turbine_count}\n"
|
||||
f"- is_single_turbine_report: {is_single_turbine_report}\n"
|
||||
f"- top_rings:\n{chr(10).join(ring_lines) if ring_lines else '- N/A'}\n"
|
||||
f"- max_risk_log: {max_risk_log}\n"
|
||||
f"- max_risk_definition: {max_risk_definition}\n"
|
||||
f"- top_turbine_name: {top_turbine_name}\n"
|
||||
f"- top_turbine_risk_log: {top_turbine_risk_log}\n"
|
||||
f"- storm_over_turbine: {storm_over_turbine}\n"
|
||||
f"- storm_near_turbine_count: {storm_near_turbine_count}\n"
|
||||
f"- storm_closest_distance_km: {storm_closest_distance_km}\n"
|
||||
f"- storm_over_threshold_km: {storm_over_threshold_km}\n"
|
||||
+ (f"\n- storm_summary:\n{chr(10).join(storm_lines)}" if storm_lines else "\n- storm_summary: not available")
|
||||
+ "\n\n"
|
||||
"Requirements for the paragraph:\n"
|
||||
"- Mention one key takeaway from the ring distribution (e.g., where totals are highest).\n"
|
||||
"- Mention the overall lightning density (events/km²).\n"
|
||||
"- If is_single_turbine_report is true: start the sentence mentioning the highest-risk turbine as \"For {top_turbine_name}, ...\"; avoid wording like \"Within the analyzed area\" and avoid verbs like \"was identified\".\n"
|
||||
"- If is_single_turbine_report is false: you may use wording like \"Within the analyzed area, {top_turbine_name} was identified ...\".\n"
|
||||
"- Mention the specific turbine name with the highest risk score (top_turbine_name) verbatim.\n"
|
||||
"- Mention the risk category for that turbine using max_risk_definition.\n"
|
||||
"- Do not refer only to the category; always associate the risk with top_turbine_name.\n"
|
||||
"- Mention storm-cell interaction with the turbine when storm information is available:\n"
|
||||
" - If storm_over_turbine is true: say that storm cells were very close to/over the turbine (based on centroid distance <= storm_over_threshold_km).\n"
|
||||
" - If storm_over_turbine is false and storm_closest_distance_km is provided: say the closest storm cell centroid came within storm_closest_distance_km km of the turbine.\n"
|
||||
"- If storm_summary is available, mention total storm cells and at least one severity count.\n"
|
||||
"- Round numeric values as follows (use the rounded values you are given, avoid long decimals):\n"
|
||||
" - lightning density to 3 decimals (events/km²)\n"
|
||||
" - log-transformed risk score(s) to 2 decimals\n"
|
||||
" - distances (analysis_radius_km, storm_closest_distance_km) to 1 decimal (km)\n"
|
||||
" - counts to integers\n"
|
||||
"- Keep tone analytic and non-alarmist.\n"
|
||||
"\n"
|
||||
"Output:\n"
|
||||
"One paragraph only (no bullet points, no headings)."
|
||||
)
|
||||
|
||||
|
||||
def fallback_commentary(context: dict[str, Any]) -> str:
|
||||
analysis_period = context.get("analysis_period", "N/A")
|
||||
analysis_radius_km = context.get("analysis_radius_km", None)
|
||||
total_events = context.get("total_events", None)
|
||||
total_lightning_per_km2 = context.get("total_lightning_per_km2", None)
|
||||
turbine_count = context.get("turbine_count", None)
|
||||
is_single_turbine_report = context.get("is_single_turbine_report", None)
|
||||
top_rings = context.get("top_rings", [])
|
||||
max_risk_definition = context.get("max_risk_definition", "N/A")
|
||||
top_turbine_name = context.get("top_turbine_name", "N/A")
|
||||
top_turbine_risk_log = context.get("top_turbine_risk_log", None)
|
||||
storm_over_turbine = context.get("storm_over_turbine")
|
||||
storm_closest_distance_km = context.get("storm_closest_distance_km")
|
||||
storm_over_threshold_km = context.get("storm_over_threshold_km", 1.0)
|
||||
storm_near_turbine_count = context.get("storm_near_turbine_count")
|
||||
|
||||
outermost_ring = top_rings[-1] if top_rings else None
|
||||
best_ring = top_rings[0] if top_rings else None
|
||||
|
||||
def _format_ring(ring: Any) -> str:
|
||||
try:
|
||||
ring_name, total, cg_count, ic_count = ring
|
||||
return f"{ring_name} (total={total}, cloud-to-ground={cg_count}, intercloud={ic_count})"
|
||||
except Exception:
|
||||
return "N/A"
|
||||
|
||||
best_ring_txt = _format_ring(best_ring)
|
||||
outer_ring_txt = _format_ring(outermost_ring)
|
||||
|
||||
storm_summary = context.get("storm_summary") or {}
|
||||
storm_line = ""
|
||||
if isinstance(storm_summary, dict) and storm_summary:
|
||||
total_cells = storm_summary.get("total_cells", 0)
|
||||
severity_counts = storm_summary.get("severity_counts", {}) or {}
|
||||
if severity_counts:
|
||||
# Pick max severity to mention
|
||||
severity = max(severity_counts.items(), key=lambda kv: kv[1])[0]
|
||||
count = severity_counts.get(severity, 0)
|
||||
storm_line = f"Storm data indicates {total_cells} storm cells, with the highest share in {severity} ({count} cells)."
|
||||
else:
|
||||
storm_line = f"Storm data indicates {total_cells} storm cells."
|
||||
|
||||
density_txt = (
|
||||
f"{total_lightning_per_km2:.3f} events/km²" if isinstance(total_lightning_per_km2, (int, float)) else str(total_lightning_per_km2)
|
||||
)
|
||||
|
||||
radius_txt = f"within {analysis_radius_km:.1f} km" if isinstance(analysis_radius_km, (int, float)) else ""
|
||||
events_txt = f"{total_events} total lightning events" if total_events is not None else "N/A"
|
||||
|
||||
paragraph_intro = (
|
||||
f"For {analysis_period}, the dataset contains {events_txt} {radius_txt}, corresponding to an overall lightning density of {density_txt}. "
|
||||
f"The largest contributions are concentrated in {best_ring_txt}, with additional activity also present in {outer_ring_txt}. "
|
||||
)
|
||||
|
||||
turbine_sentence = (
|
||||
f"For {top_turbine_name}, the log-transformed risk score is the highest in this report and falls in the {max_risk_definition} category. "
|
||||
if is_single_turbine_report
|
||||
else f"Within the analyzed area, the turbine with the highest log-transformed risk score is {top_turbine_name}, which falls in the {max_risk_definition} category. "
|
||||
)
|
||||
|
||||
method_sentence = (
|
||||
f"This indicates the turbine was exposed to a combination of closer cloud-to-ground strikes and stronger current magnitudes. "
|
||||
f"In the risk model, each cloud-to-ground strike contributes more when it is near the turbine and when |I| is larger, and contributions decrease exponentially with distance; the turbine risk score is the sum over all included strikes (with a log transform used for visualization). "
|
||||
)
|
||||
|
||||
if storm_over_turbine:
|
||||
storm_interaction_sentence = (
|
||||
f"Storm interaction: storm-cell centroids came within {storm_over_threshold_km:.1f} km of the turbine (count={storm_near_turbine_count}). "
|
||||
)
|
||||
elif isinstance(storm_closest_distance_km, (int, float)):
|
||||
storm_interaction_sentence = (
|
||||
f"Storm interaction: the closest storm-cell centroid came within {storm_closest_distance_km:.1f} km of the turbine. "
|
||||
)
|
||||
else:
|
||||
storm_interaction_sentence = ""
|
||||
|
||||
storm_severity_sentence = storm_line if storm_line else "Storm severity distribution is not available for this report."
|
||||
paragraph = (paragraph_intro + turbine_sentence + method_sentence + storm_interaction_sentence + storm_severity_sentence).strip()
|
||||
return paragraph
|
||||
|
||||
|
||||
def generate_gemini_paragraph(context: dict[str, Any], api_key: str | None = None) -> str:
|
||||
api_key_final = api_key or os.getenv("GEMINI_API_KEY")
|
||||
if not api_key_final:
|
||||
return fallback_commentary(context)
|
||||
|
||||
model_name = os.getenv("GEMINI_MODEL", "gemini-1.5-flash")
|
||||
|
||||
prompt = build_gemini_prompt(context)
|
||||
|
||||
try:
|
||||
import google.generativeai as genai
|
||||
|
||||
genai.configure(api_key=api_key_final)
|
||||
model = genai.GenerativeModel(model_name)
|
||||
|
||||
# Keep output short and deterministic
|
||||
resp = model.generate_content(
|
||||
prompt,
|
||||
generation_config={
|
||||
"temperature": 0.2,
|
||||
"max_output_tokens": 220,
|
||||
},
|
||||
)
|
||||
|
||||
text = getattr(resp, "text", None) or ""
|
||||
text = str(text).strip()
|
||||
if not text:
|
||||
return fallback_commentary(context)
|
||||
return text
|
||||
except Exception:
|
||||
return fallback_commentary(context)
|
||||
|
||||
86
src/reporting/precompute.py
Normal file
86
src/reporting/precompute.py
Normal file
@ -0,0 +1,86 @@
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from typing import List, Tuple, Optional, Dict, Any
|
||||
from src.analysis.geospatial import haversine_distance_vectorized
|
||||
from src.utils import format_datetime_ddmmyyyy_hhmmss
|
||||
|
||||
def build_lightning_table_rows(
|
||||
centroid_lat: float,
|
||||
centroid_lng: float,
|
||||
lightning_df: pd.DataFrame,
|
||||
distance_rings_m: List[int],
|
||||
ring_colors: List[str],
|
||||
) -> Tuple[List[List[str]], List[str]]:
|
||||
rows: List[List[str]] = []
|
||||
row_colors: List[str] = []
|
||||
|
||||
if len(lightning_df) == 0:
|
||||
return rows, row_colors
|
||||
|
||||
outermost_km = max(distance_rings_m) / 1000.0
|
||||
rings_km = np.array(distance_rings_m, dtype=float) / 1000.0
|
||||
|
||||
dists_km = haversine_distance_vectorized(
|
||||
centroid_lat, centroid_lng,
|
||||
lightning_df['lat'].values, lightning_df['lng'].values,
|
||||
) / 1000.0
|
||||
|
||||
for i, rec in enumerate(lightning_df.itertuples(index=False)):
|
||||
proximity = dists_km[i]
|
||||
if proximity > outermost_km:
|
||||
continue
|
||||
|
||||
ring_idx = int(np.searchsorted(rings_km, proximity, side='left'))
|
||||
if ring_idx >= len(rings_km):
|
||||
continue
|
||||
color = ring_colors[ring_idx]
|
||||
|
||||
try:
|
||||
dt_val = rec.local_time if not isinstance(rec.local_time, str) else pd.to_datetime(str(rec.local_time)[:19])
|
||||
local_time = format_datetime_ddmmyyyy_hhmmss(pd.to_datetime(dt_val))
|
||||
except Exception:
|
||||
local_time = str(getattr(rec, 'local_time', ''))[:19]
|
||||
|
||||
lightning_type = "cloud-to-ground" if str(rec.p_type) == '0' else "intercloud"
|
||||
rows.append([
|
||||
"",
|
||||
local_time,
|
||||
f"{rec.lat:.5f}",
|
||||
f"{rec.lng:.5f}",
|
||||
str(rec.current),
|
||||
str(getattr(rec, 'height', '')),
|
||||
lightning_type,
|
||||
f"{proximity:.2f}"
|
||||
])
|
||||
row_colors.append(color)
|
||||
|
||||
return rows, row_colors
|
||||
|
||||
|
||||
def precompute_group_distances_and_rings(
|
||||
centroid_lat: float,
|
||||
centroid_lng: float,
|
||||
lightning_df: pd.DataFrame,
|
||||
distance_rings_m: List[int],
|
||||
) -> Dict[str, Any]:
|
||||
if len(lightning_df) == 0:
|
||||
return {
|
||||
'dists_km': np.array([], dtype=float),
|
||||
'ring_idx': np.array([], dtype=int),
|
||||
'mask_within': np.array([], dtype=bool),
|
||||
}
|
||||
rings_km = np.array(distance_rings_m, dtype=float) / 1000.0
|
||||
dists_km = haversine_distance_vectorized(
|
||||
centroid_lat, centroid_lng,
|
||||
lightning_df['lat'].values, lightning_df['lng'].values,
|
||||
) / 1000.0
|
||||
mask_within = dists_km <= rings_km[-1]
|
||||
# For values beyond last ring, clamp index to len(rings)
|
||||
ring_idx = np.searchsorted(rings_km, dists_km, side='left')
|
||||
return {
|
||||
'dists_km': dists_km,
|
||||
'ring_idx': ring_idx,
|
||||
'mask_within': mask_within,
|
||||
}
|
||||
|
||||
|
||||
480
src/utils.py
Normal file
480
src/utils.py
Normal file
@ -0,0 +1,480 @@
|
||||
import json
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from datetime import datetime
|
||||
from typing import Union, Dict, Any, Optional, List
|
||||
from zoneinfo import ZoneInfo
|
||||
import logging
|
||||
import re
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def format_date_ddmmyyyy(dt: datetime) -> str:
|
||||
return dt.strftime('%d-%m-%Y')
|
||||
|
||||
def format_datetime_ddmmyyyy_hhmmss(dt: datetime) -> str:
|
||||
return dt.strftime('%d-%m-%Y %H:%M:%S')
|
||||
|
||||
def format_datetime_ddmmyyyy_hhmm(dt: datetime) -> str:
|
||||
return dt.strftime('%d-%m-%Y %H:%M')
|
||||
|
||||
def get_utc_offset_label(timezone_name: Optional[str]) -> Optional[str]:
|
||||
if not timezone_name:
|
||||
return None
|
||||
try:
|
||||
tz = ZoneInfo(timezone_name)
|
||||
dt = datetime.now(tz)
|
||||
offset = dt.utcoffset()
|
||||
if offset is None:
|
||||
return None
|
||||
total_seconds = int(offset.total_seconds())
|
||||
hours = total_seconds // 3600
|
||||
if hours >= 0:
|
||||
return f"UTC+{hours}"
|
||||
return f"UTC{hours}"
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
def now_in_timezone(timezone_name: Optional[str]) -> datetime:
|
||||
if not timezone_name:
|
||||
return datetime.now()
|
||||
try:
|
||||
return datetime.now(ZoneInfo(timezone_name))
|
||||
except Exception:
|
||||
return datetime.now()
|
||||
|
||||
def format_datetime_to_local_display(value: Optional[str], timezone_name: Optional[str] = None) -> str:
|
||||
if not value or str(value).strip() == '' or str(value).strip().upper() == 'N/A':
|
||||
return 'N/A'
|
||||
s = str(value).strip()
|
||||
try:
|
||||
ts = pd.to_datetime(s, utc=True)
|
||||
if timezone_name:
|
||||
ts = ts.tz_convert(ZoneInfo(timezone_name)).tz_localize(None)
|
||||
else:
|
||||
ts = ts.to_pydatetime().replace(tzinfo=None)
|
||||
dt = ts.to_pydatetime() if hasattr(ts, 'to_pydatetime') else ts
|
||||
return dt.strftime('%d-%m-%Y %H:%M')
|
||||
except Exception:
|
||||
return s[:19] if len(s) >= 19 else s
|
||||
|
||||
def parse_period_string_to_datetime(value: Optional[str]) -> Optional[datetime]:
|
||||
if value is None:
|
||||
return None
|
||||
value_str = str(value).strip()
|
||||
if not value_str:
|
||||
return None
|
||||
try:
|
||||
if re.fullmatch(r"\d{2}-\d{2}-\d{4}", value_str):
|
||||
return datetime.strptime(value_str, '%d-%m-%Y')
|
||||
if re.match(r"\d{2}-\d{2}-\d{4}\s+\d", value_str):
|
||||
try:
|
||||
return datetime.strptime(value_str[:19], '%d-%m-%Y %H:%M:%S')
|
||||
except ValueError:
|
||||
try:
|
||||
return datetime.strptime(value_str[:16], '%d-%m-%Y %H:%M')
|
||||
except ValueError:
|
||||
pass
|
||||
ts = pd.to_datetime(value_str, errors='raise')
|
||||
if isinstance(ts, pd.Timestamp):
|
||||
if ts.tzinfo is not None:
|
||||
ts = ts.tz_convert('UTC').tz_localize(None)
|
||||
return ts.to_pydatetime()
|
||||
except Exception as e:
|
||||
logger.debug(f"parse_period_string_to_datetime failed for {value_str}: {e}")
|
||||
return None
|
||||
return None
|
||||
|
||||
def normalize_local_time_to_timezone(
|
||||
df: pd.DataFrame,
|
||||
column: str,
|
||||
timezone_name: Optional[str],
|
||||
) -> pd.DataFrame:
|
||||
if len(df) == 0 or not timezone_name:
|
||||
return df
|
||||
tz = ZoneInfo(timezone_name)
|
||||
df = df.copy()
|
||||
df[column] = pd.to_datetime(df[column], utc=True, errors='coerce')
|
||||
df = df[~df[column].isna()]
|
||||
if len(df) == 0:
|
||||
return df
|
||||
df[column] = df[column].dt.tz_convert(tz).dt.tz_localize(None)
|
||||
return df
|
||||
|
||||
def format_period_display_for_report(start_value: Optional[str], end_value: Optional[str]) -> tuple[str, str]:
|
||||
def _format_one(val: Optional[str]) -> str:
|
||||
if not val or not str(val).strip():
|
||||
return ""
|
||||
s = str(val).strip()
|
||||
try:
|
||||
if re.fullmatch(r"\d{2}-\d{2}-\d{4}", s):
|
||||
return s
|
||||
if "T" in s or "Z" in s:
|
||||
ts = pd.to_datetime(s, utc=True)
|
||||
local_dt = ts.to_pydatetime().astimezone(None)
|
||||
return local_dt.strftime('%d-%m-%Y %H:%M')
|
||||
ts = pd.to_datetime(s, errors='raise')
|
||||
if isinstance(ts, pd.Timestamp):
|
||||
if ts.tzinfo is not None:
|
||||
ts = ts.tz_localize(None)
|
||||
dt = ts.to_pydatetime()
|
||||
return dt.strftime('%d-%m-%Y %H:%M')
|
||||
return s
|
||||
except Exception:
|
||||
return s
|
||||
start_display = _format_one(start_value) if start_value else ""
|
||||
end_display = _format_one(end_value) if end_value else ""
|
||||
return start_display, end_display
|
||||
|
||||
def get_grouping_radius_m() -> int:
|
||||
from .config import config
|
||||
rings = config.distance_rings or []
|
||||
grouping = config.grouping_params or {}
|
||||
max_distance_m = grouping.get('max_distance_m')
|
||||
if isinstance(max_distance_m, (int, float)) and max_distance_m > 0:
|
||||
return int(max_distance_m)
|
||||
ring_index = grouping.get('distance_ring_index', 2)
|
||||
if isinstance(ring_index, int) and 0 <= ring_index < len(rings):
|
||||
return int(rings[ring_index])
|
||||
return int(rings[2] if len(rings) > 2 else (max(rings) if rings else 0))
|
||||
|
||||
def get_analysis_radius_m() -> int:
|
||||
from .config import config
|
||||
rings = config.distance_rings or []
|
||||
grouping = config.grouping_params or {}
|
||||
ring_index = grouping.get('distance_ring_index')
|
||||
if isinstance(ring_index, int) and 0 <= ring_index < len(rings):
|
||||
return int(rings[ring_index])
|
||||
return int(max(rings) if rings else 0)
|
||||
|
||||
def get_turbine_color_by_fixed_intervals(risk_log_value: float) -> str:
|
||||
"""
|
||||
Get turbine color based on fixed risk score intervals.
|
||||
Uses consistent color coding across all groups and tables.
|
||||
|
||||
Args:
|
||||
risk_log_value: Log-transformed risk score
|
||||
|
||||
Returns:
|
||||
Color string for the turbine
|
||||
"""
|
||||
# Define fixed risk intervals and corresponding colors
|
||||
# Using the new color palette: F94144, F3722C, F8961E, F9C74F, 90BE6D, 43AA8B, 577590
|
||||
if risk_log_value < 0.1:
|
||||
return '#577590'
|
||||
elif risk_log_value < 0.2:
|
||||
return '#43AA8B'
|
||||
elif risk_log_value < 0.4:
|
||||
return '#90BE6D'
|
||||
elif risk_log_value < 0.6:
|
||||
return '#F9C74F'
|
||||
elif risk_log_value < 0.8:
|
||||
return '#F8961E'
|
||||
elif risk_log_value < 1.0:
|
||||
return '#F3722C'
|
||||
elif risk_log_value < 1.2:
|
||||
return '#F94144'
|
||||
elif risk_log_value < 1.4:
|
||||
return '#D32F2F'
|
||||
else:
|
||||
return '#B71C1C'
|
||||
|
||||
def get_risk_definition_by_fixed_intervals(risk_log_value: float) -> str:
|
||||
if risk_log_value < 0.1:
|
||||
return 'Very Low Risk'
|
||||
elif risk_log_value < 0.2:
|
||||
return 'Low Risk'
|
||||
elif risk_log_value < 0.4:
|
||||
return 'Med-Low Risk'
|
||||
elif risk_log_value < 0.6:
|
||||
return 'Medium Risk'
|
||||
elif risk_log_value < 0.8:
|
||||
return 'Med-High Risk'
|
||||
elif risk_log_value < 1.0:
|
||||
return 'High Risk'
|
||||
elif risk_log_value < 1.2:
|
||||
return 'Very High Risk'
|
||||
elif risk_log_value < 1.4:
|
||||
return 'Critical Risk'
|
||||
else:
|
||||
return 'Maximum Risk'
|
||||
|
||||
def get_turbine_colors_by_fixed_intervals(risk_log_values: List[float]) -> List[str]:
|
||||
"""
|
||||
Get turbine colors for a list of risk scores based on fixed intervals.
|
||||
|
||||
Args:
|
||||
risk_log_values: List of log-transformed risk scores
|
||||
|
||||
Returns:
|
||||
List of color strings for the turbines
|
||||
"""
|
||||
return [get_turbine_color_by_fixed_intervals(risk_log) for risk_log in risk_log_values]
|
||||
|
||||
def safe_datetime_conversion(time_str: str) -> Optional[datetime]:
|
||||
"""
|
||||
Safely convert string to datetime with error handling.
|
||||
|
||||
Args:
|
||||
time_str: String representation of datetime
|
||||
|
||||
Returns:
|
||||
datetime object or None if conversion fails
|
||||
"""
|
||||
if not time_str or pd.isna(time_str):
|
||||
return None
|
||||
|
||||
# Try different datetime formats
|
||||
formats = [
|
||||
'%Y-%m-%d %H:%M:%S',
|
||||
'%Y-%m-%d %H:%M:%S.%f',
|
||||
'%Y-%m-%dT%H:%M:%S',
|
||||
'%Y-%m-%dT%H:%M:%S.%f',
|
||||
'%Y-%m-%d'
|
||||
]
|
||||
|
||||
for fmt in formats:
|
||||
try:
|
||||
return datetime.strptime(time_str[:19], fmt)
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
# Try pandas parsing as fallback
|
||||
parsed = None
|
||||
try:
|
||||
parsed = pd.to_datetime(time_str, errors='coerce')
|
||||
except Exception:
|
||||
parsed = None
|
||||
if isinstance(parsed, pd.Timestamp) and not pd.isna(parsed):
|
||||
return parsed.to_pydatetime()
|
||||
|
||||
if isinstance(time_str, datetime):
|
||||
return time_str
|
||||
|
||||
logger.error(f"Failed to convert datetime: {time_str}")
|
||||
return None
|
||||
|
||||
def load_json_data(file_path: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Generic JSON loader with error handling.
|
||||
|
||||
Args:
|
||||
file_path: Path to JSON file
|
||||
|
||||
Returns:
|
||||
Dictionary containing JSON data
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If file doesn't exist
|
||||
json.JSONDecodeError: If JSON is invalid
|
||||
"""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
logger.info(f"Successfully loaded JSON data from {file_path}")
|
||||
return data
|
||||
except FileNotFoundError:
|
||||
logger.error(f"File not found: {file_path}")
|
||||
raise
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"Invalid JSON in {file_path}: {e}")
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Unexpected error loading {file_path}: {e}")
|
||||
raise
|
||||
|
||||
def filter_lightning_data_by_date_range(lightning_df: pd.DataFrame, start_date: Optional[str] = None, end_date: Optional[str] = None) -> pd.DataFrame:
|
||||
"""
|
||||
Filter lightning data by date range.
|
||||
|
||||
Args:
|
||||
lightning_df: DataFrame containing lightning data with 'local_time' column
|
||||
start_date: Start date in format 'DD-MM-YYYY' or None for no filtering
|
||||
end_date: End date in format 'DD-MM-YYYY' or None for no filtering
|
||||
|
||||
Returns:
|
||||
Filtered DataFrame containing only lightning data within the specified date range
|
||||
"""
|
||||
if start_date is None and end_date is None:
|
||||
return lightning_df
|
||||
|
||||
def _parse_flexible_datetime(value: Optional[str], is_end: bool = False) -> Optional[datetime]:
|
||||
if value is None:
|
||||
return None
|
||||
|
||||
value_str = str(value).strip()
|
||||
if not value_str:
|
||||
return None
|
||||
|
||||
try:
|
||||
if re.fullmatch(r"\d{2}-\d{2}-\d{4}", value_str):
|
||||
dt = datetime.strptime(value_str, '%d-%m-%Y')
|
||||
if is_end:
|
||||
dt = dt.replace(hour=23, minute=59, second=59)
|
||||
return dt
|
||||
|
||||
ts = pd.to_datetime(value_str, errors='raise')
|
||||
if isinstance(ts, pd.Timestamp):
|
||||
if ts.tzinfo is not None:
|
||||
ts = ts.tz_convert('UTC').tz_localize(None)
|
||||
return ts.to_pydatetime()
|
||||
except Exception as e:
|
||||
logger.error(f"Invalid datetime value: {value_str}. Error: {e}")
|
||||
return None
|
||||
|
||||
return None
|
||||
|
||||
df = lightning_df.copy()
|
||||
if df['local_time'].dtype == 'object':
|
||||
df['local_time'] = pd.to_datetime(df['local_time'])
|
||||
|
||||
if df['local_time'].dt.tz is not None:
|
||||
df['local_time'] = df['local_time'].dt.tz_localize(None)
|
||||
|
||||
start_dt = _parse_flexible_datetime(start_date, is_end=False)
|
||||
end_dt = _parse_flexible_datetime(end_date, is_end=True)
|
||||
|
||||
if start_date and start_dt is None:
|
||||
logger.error(f"Invalid start_date value: {start_date}. Expected 'DD-MM-YYYY' or ISO datetime string.")
|
||||
return lightning_df
|
||||
|
||||
if end_date and end_dt is None:
|
||||
logger.error(f"Invalid end_date value: {end_date}. Expected 'DD-MM-YYYY' or ISO datetime string.")
|
||||
return lightning_df
|
||||
|
||||
# Apply date filtering
|
||||
if start_dt and end_dt:
|
||||
mask = (df['local_time'] >= start_dt) & (df['local_time'] <= end_dt)
|
||||
filtered_df = df[mask]
|
||||
logger.info(f"Filtered lightning data from {len(lightning_df)} to {len(filtered_df)} records ({start_date} to {end_date})")
|
||||
return filtered_df
|
||||
|
||||
if start_dt:
|
||||
mask = df['local_time'] >= start_dt
|
||||
filtered_df = df[mask]
|
||||
logger.info(f"Filtered lightning data from {len(lightning_df)} to {len(filtered_df)} records (from {start_date})")
|
||||
return filtered_df
|
||||
|
||||
if end_dt:
|
||||
mask = df['local_time'] <= end_dt
|
||||
filtered_df = df[mask]
|
||||
logger.info(f"Filtered lightning data from {len(lightning_df)} to {len(filtered_df)} records (until {end_date})")
|
||||
return filtered_df
|
||||
|
||||
return df
|
||||
|
||||
def validate_lightning_data(df: pd.DataFrame) -> bool:
|
||||
"""
|
||||
Validate lightning data structure and content.
|
||||
|
||||
Args:
|
||||
df: Lightning DataFrame
|
||||
|
||||
Returns:
|
||||
True if valid, False otherwise
|
||||
"""
|
||||
required_columns = ['lat', 'lng', 'current', 'p_type', 'local_time']
|
||||
|
||||
# Handle empty dataset gracefully
|
||||
if len(df) == 0:
|
||||
logger.warning("Lightning dataset is empty - this is acceptable for analysis")
|
||||
return True
|
||||
|
||||
# Check required columns
|
||||
missing_columns = [col for col in required_columns if col not in df.columns]
|
||||
if missing_columns:
|
||||
logger.error(f"Missing required columns: {missing_columns}")
|
||||
return False
|
||||
|
||||
# Check data types
|
||||
if not pd.api.types.is_numeric_dtype(df['lat']):
|
||||
logger.error("Latitude column must be numeric")
|
||||
return False
|
||||
|
||||
if not pd.api.types.is_numeric_dtype(df['lng']):
|
||||
logger.error("Longitude column must be numeric")
|
||||
return False
|
||||
|
||||
if not pd.api.types.is_numeric_dtype(df['current']):
|
||||
logger.error("Current column must be numeric")
|
||||
return False
|
||||
|
||||
# Check coordinate ranges
|
||||
if not (df['lat'].between(-90, 90).all()):
|
||||
logger.error("Latitude values must be between -90 and 90")
|
||||
return False
|
||||
|
||||
if not (df['lng'].between(-180, 180).all()):
|
||||
logger.error("Longitude values must be between -180 and 180")
|
||||
return False
|
||||
|
||||
# Check p_type values
|
||||
valid_p_types = ['0', '1', 0, 1]
|
||||
invalid_p_types = df[~df['p_type'].astype(str).isin(['0', '1'])]
|
||||
if len(invalid_p_types) > 0:
|
||||
logger.warning(f"Found {len(invalid_p_types)} invalid p_type values")
|
||||
|
||||
logger.info(f"Lightning data validation passed: {len(df)} records")
|
||||
return True
|
||||
|
||||
def validate_turbine_data(df: pd.DataFrame) -> bool:
|
||||
"""
|
||||
Validate turbine data structure and content.
|
||||
|
||||
Args:
|
||||
df: Turbine DataFrame
|
||||
|
||||
Returns:
|
||||
True if valid, False otherwise
|
||||
"""
|
||||
required_columns = ['lat', 'lng', 'name']
|
||||
|
||||
# Check required columns
|
||||
missing_columns = [col for col in required_columns if col not in df.columns]
|
||||
if missing_columns:
|
||||
logger.error(f"Missing required columns: {missing_columns}")
|
||||
return False
|
||||
|
||||
# Check data types
|
||||
if not pd.api.types.is_numeric_dtype(df['lat']):
|
||||
logger.error("Latitude column must be numeric")
|
||||
return False
|
||||
|
||||
if not pd.api.types.is_numeric_dtype(df['lng']):
|
||||
logger.error("Longitude column must be numeric")
|
||||
return False
|
||||
|
||||
# Check coordinate ranges
|
||||
if not (df['lat'].between(-90, 90).all()):
|
||||
logger.error("Latitude values must be between -90 and 90")
|
||||
return False
|
||||
|
||||
if not (df['lng'].between(-180, 180).all()):
|
||||
logger.error("Longitude values must be between -180 and 180")
|
||||
return False
|
||||
|
||||
logger.info(f"Turbine data validation passed: {len(df)} records")
|
||||
return True
|
||||
|
||||
def ensure_datetime_column(df: pd.DataFrame, column: str) -> pd.DataFrame:
|
||||
"""
|
||||
Ensure a column contains datetime objects.
|
||||
|
||||
Args:
|
||||
df: DataFrame
|
||||
column: Column name to convert
|
||||
|
||||
Returns:
|
||||
DataFrame with converted datetime column
|
||||
"""
|
||||
# Handle empty DataFrame
|
||||
if len(df) == 0:
|
||||
return df
|
||||
|
||||
if df[column].dtype == 'object':
|
||||
df = df.copy()
|
||||
df[column] = pd.to_datetime(df[column], errors='coerce')
|
||||
logger.info(f"Converted {column} to datetime")
|
||||
return df
|
||||
1111
src/visualization/maps.py
Normal file
1111
src/visualization/maps.py
Normal file
File diff suppressed because it is too large
Load Diff
665
src/visualization/storm_cells.py
Normal file
665
src/visualization/storm_cells.py
Normal file
@ -0,0 +1,665 @@
|
||||
import plotly.graph_objects as go
|
||||
import plotly.express as px
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from datetime import datetime
|
||||
import json
|
||||
from typing import List, Dict, Tuple, Any
|
||||
import re
|
||||
from collections import defaultdict
|
||||
from zoneinfo import ZoneInfo
|
||||
from src.analysis.geospatial import haversine_distance
|
||||
from src.config import config
|
||||
from src.utils import parse_period_string_to_datetime
|
||||
|
||||
def format_datetime_for_display(datetime_str: str) -> str:
|
||||
"""
|
||||
Format datetime string from 'YYYY-MM-DD HH:MM:SS' to 'DD-MM-YYYY HH:MM:SS'.
|
||||
|
||||
Args:
|
||||
datetime_str: Datetime string in format 'YYYY-MM-DD HH:MM:SS'
|
||||
|
||||
Returns:
|
||||
Formatted datetime string in 'DD-MM-YYYY HH:MM:SS' format
|
||||
"""
|
||||
try:
|
||||
if datetime_str and datetime_str != 'N/A':
|
||||
dt = datetime.strptime(datetime_str, '%Y-%m-%d %H:%M:%S')
|
||||
return dt.strftime('%d-%m-%Y %H:%M:%S')
|
||||
return datetime_str
|
||||
except:
|
||||
return datetime_str
|
||||
|
||||
def parse_wkt_linestring(wkt_string: str) -> List[Tuple[float, float]]:
|
||||
"""
|
||||
Parse WKT LINESTRING format to extract coordinates.
|
||||
|
||||
Args:
|
||||
wkt_string: WKT string in format "LINESTRING(lon1 lat1, lon2 lat2, ...)"
|
||||
|
||||
Returns:
|
||||
List of (longitude, latitude) tuples
|
||||
"""
|
||||
try:
|
||||
# Extract coordinates from LINESTRING format
|
||||
# Remove "LINESTRING(" and ")" and split by commas
|
||||
coords_str = wkt_string.replace("LINESTRING(", "").replace(")", "")
|
||||
coord_pairs = coords_str.split(",")
|
||||
|
||||
coordinates = []
|
||||
for pair in coord_pairs:
|
||||
lon, lat = pair.strip().split()
|
||||
coordinates.append((float(lon), float(lat)))
|
||||
|
||||
return coordinates
|
||||
except Exception as e:
|
||||
print(f"Error parsing WKT: {e}")
|
||||
return []
|
||||
|
||||
def group_storm_data_by_day(storm_data: List[Dict]) -> Dict[str, List[Dict]]:
|
||||
"""
|
||||
Group storm data by day.
|
||||
|
||||
Args:
|
||||
storm_data: List of storm cell dictionaries
|
||||
|
||||
Returns:
|
||||
Dictionary with date as key and list of storms as value
|
||||
"""
|
||||
daily_storms = defaultdict(list)
|
||||
|
||||
for storm in storm_data:
|
||||
# Try different time fields that might exist in storm data
|
||||
time_field = storm.get('effective_time') or storm.get('creation_time') or storm.get('expire_time', '')
|
||||
if time_field:
|
||||
try:
|
||||
# Parse time field (format: '2025-08-29T15:08:45.002Z' or '2024-06-22 11:13:00')
|
||||
if 'T' in time_field:
|
||||
# ISO format with T
|
||||
storm_date = datetime.strptime(time_field[:10], '%Y-%m-%d').strftime('%d-%m-%Y')
|
||||
else:
|
||||
# Standard format
|
||||
storm_date = datetime.strptime(time_field[:10], '%Y-%m-%d').strftime('%d-%m-%Y')
|
||||
daily_storms[storm_date].append(storm)
|
||||
except:
|
||||
continue
|
||||
|
||||
return dict(daily_storms)
|
||||
|
||||
def group_storm_data_by_month(storm_data: List[Dict]) -> Dict[str, List[Dict]]:
|
||||
"""
|
||||
Group storm data by month.
|
||||
|
||||
Args:
|
||||
storm_data: List of storm cell dictionaries
|
||||
|
||||
Returns:
|
||||
Dictionary with month as key and list of storms as value
|
||||
"""
|
||||
monthly_storms = defaultdict(list)
|
||||
|
||||
for storm in storm_data:
|
||||
# Try different time fields that might exist in storm data
|
||||
time_field = storm.get('effective_time') or storm.get('creation_time') or storm.get('expire_time', '')
|
||||
if time_field:
|
||||
try:
|
||||
# Parse time field (format: '2025-08-29T15:08:45.002Z' or '2024-06-22 11:13:00')
|
||||
if 'T' in time_field:
|
||||
# ISO format with T
|
||||
storm_month = datetime.strptime(time_field[:10], '%Y-%m-%d').strftime('%Y-%m')
|
||||
else:
|
||||
# Standard format
|
||||
storm_month = datetime.strptime(time_field[:10], '%Y-%m-%d').strftime('%Y-%m')
|
||||
monthly_storms[storm_month].append(storm)
|
||||
except:
|
||||
continue
|
||||
|
||||
return dict(monthly_storms)
|
||||
|
||||
def create_storm_cells_coordinate_plane(storm_data: List[Dict], turbine_df: pd.DataFrame = None, center_lat: float = None, center_lon: float = None) -> go.Figure:
|
||||
"""
|
||||
Create a coordinate plane visualization showing storm cells with turbines and distance rings.
|
||||
|
||||
Args:
|
||||
storm_data: List of storm cell dictionaries with WKT data
|
||||
turbine_df: DataFrame containing turbine locations with 'lat' and 'lng' columns
|
||||
center_lat: Center latitude for map (optional)
|
||||
center_lon: Center longitude for map (optional)
|
||||
|
||||
Returns:
|
||||
Plotly figure with storm cells, turbines, and distance rings in coordinate plane view
|
||||
"""
|
||||
# Calculate center if not provided
|
||||
if center_lat is None or center_lon is None:
|
||||
if turbine_df is not None and not turbine_df.empty:
|
||||
# Use turbine centroid as center
|
||||
center_lat = turbine_df['lat'].mean()
|
||||
center_lon = turbine_df['lng'].mean()
|
||||
else:
|
||||
# Calculate center from storm data
|
||||
all_lats = []
|
||||
all_lons = []
|
||||
for storm in storm_data:
|
||||
coords = parse_wkt_linestring(storm.get('cell_polygon_wkt', ''))
|
||||
if coords:
|
||||
lons, lats = zip(*coords)
|
||||
all_lats.extend(lats)
|
||||
all_lons.extend(lons)
|
||||
|
||||
if all_lats and all_lons:
|
||||
center_lat = np.mean(all_lats)
|
||||
center_lon = np.mean(all_lons)
|
||||
else:
|
||||
center_lat, center_lon = 36.8, 33.8 # Default to Turkey
|
||||
|
||||
fig = go.Figure()
|
||||
|
||||
# Add distance rings if turbine data is provided
|
||||
if turbine_df is not None and not turbine_df.empty:
|
||||
from src.analysis.geospatial import create_circle_points
|
||||
|
||||
# Add distance rings from turbine centroid
|
||||
for radius, color in zip(config.distance_rings, config.ring_colors):
|
||||
circle_lats, circle_lons = create_circle_points(center_lat, center_lon, radius)
|
||||
fig.add_trace(go.Scatter(
|
||||
x=circle_lons, # X-axis = Longitude
|
||||
y=circle_lats, # Y-axis = Latitude
|
||||
mode='lines',
|
||||
line=dict(color=color, width=2),
|
||||
opacity=0.6,
|
||||
name=f'{radius/1000:.0f}km Ring',
|
||||
showlegend=True
|
||||
))
|
||||
|
||||
# Add turbines colored by risk using fixed intervals
|
||||
if 'risk_log' in turbine_df.columns:
|
||||
from src.utils import get_turbine_colors_by_fixed_intervals
|
||||
turbine_colors = get_turbine_colors_by_fixed_intervals(turbine_df['risk_log'].tolist())
|
||||
else:
|
||||
turbine_colors = ['red'] * len(turbine_df)
|
||||
|
||||
fig.add_trace(go.Scatter(
|
||||
x=turbine_df['lng'], # X-axis = Longitude
|
||||
y=turbine_df['lat'], # Y-axis = Latitude
|
||||
mode='markers+text',
|
||||
marker=dict(
|
||||
size=30,
|
||||
color=turbine_colors,
|
||||
symbol='triangle-down',
|
||||
opacity=0.8,
|
||||
line=dict(color='black', width=1)
|
||||
),
|
||||
text=turbine_df['name'].tolist(),
|
||||
textfont=dict(size=12, color='black'),
|
||||
textposition='middle center',
|
||||
name='Wind Turbines',
|
||||
showlegend=True,
|
||||
hovertemplate=(
|
||||
"<b>Wind Turbine</b><br>"
|
||||
"Name: %{text}<br>"
|
||||
"Lat: %{y:.5f}<br>"
|
||||
"Lng: %{x:.5f}<br>"
|
||||
"<extra></extra>"
|
||||
)
|
||||
))
|
||||
|
||||
severity_colors = {}
|
||||
seen_polygon_key_to_count: Dict[Tuple, int] = {}
|
||||
offset_deg_lon = 0.003
|
||||
offset_deg_lat = 0.002
|
||||
|
||||
for i, storm in enumerate(storm_data):
|
||||
wkt_string = storm.get('cell_polygon_wkt', '')
|
||||
if not wkt_string:
|
||||
continue
|
||||
|
||||
coords = parse_wkt_linestring(wkt_string)
|
||||
if not coords:
|
||||
continue
|
||||
|
||||
polygon_key = tuple((round(lon, 6), round(lat, 6)) for lon, lat in coords)
|
||||
duplicate_index = 0
|
||||
if polygon_key in seen_polygon_key_to_count:
|
||||
seen_polygon_key_to_count[polygon_key] += 1
|
||||
duplicate_index = seen_polygon_key_to_count[polygon_key] - 1
|
||||
else:
|
||||
seen_polygon_key_to_count[polygon_key] = 1
|
||||
|
||||
if duplicate_index > 0:
|
||||
off_lon = offset_deg_lon * duplicate_index
|
||||
off_lat = offset_deg_lat * duplicate_index
|
||||
coords = [(lon + off_lon, lat + off_lat) for lon, lat in coords]
|
||||
|
||||
lons, lats = zip(*coords)
|
||||
|
||||
severity = storm.get('lightning_severity', 'Unknown').lower()
|
||||
if severity == 'high':
|
||||
color = 'purple'
|
||||
elif severity == 'medium':
|
||||
color = 'orange'
|
||||
elif severity == 'low':
|
||||
color = 'green'
|
||||
else:
|
||||
color = 'gray'
|
||||
|
||||
# Track severity colors for legend
|
||||
severity_colors[severity] = color
|
||||
|
||||
# Add storm cell boundary
|
||||
fig.add_trace(go.Scatter(
|
||||
x=lons, # X-axis = Longitude
|
||||
y=lats, # Y-axis = Latitude
|
||||
mode='lines',
|
||||
line=dict(color=color, width=3),
|
||||
opacity=1,
|
||||
showlegend=False, # Don't show individual storms in legend
|
||||
hovertemplate=(
|
||||
f"<b>Storm Cell {i+1}</b><br>"
|
||||
f"Severity: {storm.get('lightning_severity', 'Unknown')}<br>"
|
||||
f"Effective: {format_datetime_for_display(storm.get('effective_time', 'N/A'))}<br>"
|
||||
f"Expire: {format_datetime_for_display(storm.get('expire_time', 'N/A'))}<br>"
|
||||
f"Direction: {storm.get('direction', 'N/A')}°<br>"
|
||||
f"Speed: {storm.get('speed', 'N/A')} km/h<br>"
|
||||
f"<extra></extra>"
|
||||
)
|
||||
))
|
||||
|
||||
# Add static legend entries for severity levels
|
||||
for severity, color in severity_colors.items():
|
||||
fig.add_trace(go.Scatter(
|
||||
x=[None], # Invisible trace for legend only
|
||||
y=[None],
|
||||
mode='lines',
|
||||
line=dict(color=color, width=3),
|
||||
name=f"{severity.title()} Severity",
|
||||
showlegend=True,
|
||||
hoverinfo='skip' # No hover for legend entries
|
||||
))
|
||||
|
||||
# Calculate axis limits based on data and distance rings
|
||||
max_radius_deg = max(config.distance_rings) / 111000
|
||||
|
||||
lat_min = center_lat - max_radius_deg * 1.5
|
||||
lat_max = center_lat + max_radius_deg * 1.5
|
||||
lon_min = center_lon - max_radius_deg * 1.5
|
||||
lon_max = center_lon + max_radius_deg * 1.5
|
||||
|
||||
fig.update_layout(
|
||||
font=dict(size=18),
|
||||
title=dict(text='Storm Cells - Coordinate Plane View', font=dict(size=28)),
|
||||
xaxis_title='Longitude',
|
||||
yaxis_title='Latitude',
|
||||
xaxis=dict(
|
||||
range=[lon_min, lon_max],
|
||||
showgrid=True,
|
||||
gridwidth=1,
|
||||
gridcolor='lightgray',
|
||||
zeroline=False,
|
||||
tickfont=dict(size=22),
|
||||
title_font=dict(size=28),
|
||||
),
|
||||
yaxis=dict(
|
||||
range=[lat_min, lat_max],
|
||||
showgrid=True,
|
||||
gridwidth=1,
|
||||
gridcolor='lightgray',
|
||||
zeroline=False,
|
||||
tickfont=dict(size=22),
|
||||
title_font=dict(size=28),
|
||||
),
|
||||
plot_bgcolor='white',
|
||||
paper_bgcolor='white',
|
||||
showlegend=True,
|
||||
legend=dict(
|
||||
title=dict(text='Legend', font=dict(size=24)),
|
||||
font=dict(size=20),
|
||||
orientation='h',
|
||||
x=0.5,
|
||||
xanchor='center',
|
||||
y=-0.18,
|
||||
yanchor='top',
|
||||
bgcolor='rgba(255, 255, 255, 0.8)',
|
||||
bordercolor='black',
|
||||
borderwidth=1,
|
||||
),
|
||||
width=800,
|
||||
height=900,
|
||||
margin=dict(l=70, r=40, t=50, b=130),
|
||||
)
|
||||
|
||||
return fig
|
||||
|
||||
def create_storm_cells_map(storm_data: List[Dict], turbine_df: pd.DataFrame = None, center_lat: float = None, center_lon: float = None) -> go.Figure:
|
||||
"""
|
||||
Create a map showing storm cells from the fırtına data with turbines and distance rings.
|
||||
Now uses coordinate plane view instead of mapbox.
|
||||
|
||||
Args:
|
||||
storm_data: List of storm cell dictionaries with WKT data
|
||||
turbine_df: DataFrame containing turbine locations with 'lat' and 'lng' columns
|
||||
center_lat: Center latitude for map (optional)
|
||||
center_lon: Center longitude for map (optional)
|
||||
|
||||
Returns:
|
||||
Plotly figure with storm cells, turbines, and distance rings in coordinate plane view
|
||||
"""
|
||||
return create_storm_cells_coordinate_plane(storm_data, turbine_df, center_lat, center_lon)
|
||||
|
||||
def create_daily_storm_maps(storm_data: List[Dict], max_maps_per_page: int = 2) -> List[go.Figure]:
|
||||
"""
|
||||
Create separate maps for each day with storms.
|
||||
|
||||
Args:
|
||||
storm_data: List of storm cell dictionaries
|
||||
max_maps_per_page: Maximum number of maps to show per page
|
||||
|
||||
Returns:
|
||||
List of Plotly figures, each containing maps for one or more days
|
||||
"""
|
||||
daily_storms = group_storm_data_by_day(storm_data)
|
||||
|
||||
if not daily_storms:
|
||||
return []
|
||||
|
||||
# Sort days
|
||||
sorted_days = sorted(daily_storms.keys())
|
||||
|
||||
figures = []
|
||||
current_fig = None
|
||||
maps_in_current_fig = 0
|
||||
|
||||
for day in sorted_days:
|
||||
day_storms = daily_storms[day]
|
||||
|
||||
if current_fig is None or maps_in_current_fig >= max_maps_per_page:
|
||||
# Create new figure
|
||||
current_fig = go.Figure()
|
||||
maps_in_current_fig = 0
|
||||
|
||||
# Set up subplot layout
|
||||
if max_maps_per_page == 1:
|
||||
# Single map layout
|
||||
current_fig.update_layout(
|
||||
mapbox=dict(
|
||||
style='carto-positron',
|
||||
center=dict(lat=36.8, lon=33.8),
|
||||
zoom=8
|
||||
),
|
||||
margin=dict(l=0, r=0, t=0, b=0),
|
||||
showlegend=True
|
||||
)
|
||||
else:
|
||||
# Multiple maps layout - will be handled in PDF generation
|
||||
pass
|
||||
|
||||
# Create map for this day
|
||||
day_fig = create_storm_cells_map(day_storms)
|
||||
|
||||
# Add day title
|
||||
day_fig.update_layout(
|
||||
title=f"Storm Cells - {day}",
|
||||
title_x=0.5,
|
||||
title_font_size=16
|
||||
)
|
||||
|
||||
# If this is the first map in the figure, use it as base
|
||||
if maps_in_current_fig == 0:
|
||||
current_fig = day_fig
|
||||
else:
|
||||
# For multiple maps, we'll handle layout in PDF generation
|
||||
pass
|
||||
|
||||
maps_in_current_fig += 1
|
||||
|
||||
# If we've reached the limit, add to figures list
|
||||
if maps_in_current_fig >= max_maps_per_page:
|
||||
figures.append(current_fig)
|
||||
current_fig = None
|
||||
maps_in_current_fig = 0
|
||||
|
||||
# Add remaining figure if any
|
||||
if current_fig is not None:
|
||||
figures.append(current_fig)
|
||||
|
||||
return figures
|
||||
|
||||
def create_monthly_storm_maps(storm_data: List[Dict]) -> Dict[str, go.Figure]:
|
||||
"""
|
||||
Create separate maps for each month with storms.
|
||||
|
||||
Args:
|
||||
storm_data: List of storm cell dictionaries
|
||||
|
||||
Returns:
|
||||
Dictionary with month as key and Plotly figure as value
|
||||
"""
|
||||
monthly_storms = group_storm_data_by_month(storm_data)
|
||||
|
||||
monthly_figures = {}
|
||||
|
||||
for month, month_storms in monthly_storms.items():
|
||||
if month_storms:
|
||||
fig = create_storm_cells_map(month_storms)
|
||||
fig.update_layout(
|
||||
title=f"Storm Cells - {month}",
|
||||
title_x=0.5,
|
||||
title_font_size=16
|
||||
)
|
||||
monthly_figures[month] = fig
|
||||
|
||||
return monthly_figures
|
||||
|
||||
def load_storm_data_from_json(json_file_path: str) -> List[Dict]:
|
||||
"""
|
||||
Load storm data from JSON file.
|
||||
|
||||
Args:
|
||||
json_file_path: Path to the JSON file
|
||||
|
||||
Returns:
|
||||
List of storm cell dictionaries
|
||||
"""
|
||||
try:
|
||||
with open(json_file_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
# Handle different JSON structures
|
||||
if isinstance(data, dict):
|
||||
# Handle structure with "success" and "data" keys
|
||||
if "data" in data and isinstance(data["data"], dict):
|
||||
# Convert dictionary of storm cells to list
|
||||
storm_list = []
|
||||
for storm_id, storm_records in data["data"].items():
|
||||
if isinstance(storm_records, list):
|
||||
storm_list.extend(storm_records)
|
||||
else:
|
||||
storm_list.append(storm_records)
|
||||
return storm_list
|
||||
# Handle structure where data is directly a list
|
||||
elif "data" in data and isinstance(data["data"], list):
|
||||
return data["data"]
|
||||
# Handle structure where data is directly the list
|
||||
else:
|
||||
return data
|
||||
|
||||
# If data is already a list
|
||||
elif isinstance(data, list):
|
||||
return data
|
||||
|
||||
return []
|
||||
except Exception as e:
|
||||
print(f"Error loading storm data: {e}")
|
||||
return []
|
||||
|
||||
def filter_storm_data_by_date_range(storm_data: List[Dict], start_date: str, end_date: str) -> List[Dict]:
|
||||
"""
|
||||
Filter storm data by date range.
|
||||
start_date/end_date can be 'DD-MM-YYYY', 'DD-MM-YYYY HH:MM', or ISO (e.g. 2026-01-22T07:00:00Z).
|
||||
Storm timestamps are converted to the farm's local timezone (config.timezone) for comparison.
|
||||
"""
|
||||
try:
|
||||
start_dt = parse_period_string_to_datetime(start_date)
|
||||
end_dt = parse_period_string_to_datetime(end_date)
|
||||
if start_dt is None or end_dt is None:
|
||||
return storm_data
|
||||
if re.fullmatch(r"\d{2}-\d{2}-\d{4}", str(end_date).strip()):
|
||||
end_dt = end_dt.replace(hour=23, minute=59, second=59)
|
||||
tz_name = getattr(config, 'timezone', None)
|
||||
tz = ZoneInfo(tz_name) if tz_name else None
|
||||
filtered_data = []
|
||||
for storm in storm_data:
|
||||
time_field = storm.get('effective_time') or storm.get('creation_time') or storm.get('expire_time', '')
|
||||
if time_field:
|
||||
try:
|
||||
storm_ts = pd.to_datetime(time_field, utc=True)
|
||||
if tz is not None:
|
||||
storm_ts = storm_ts.tz_convert(tz).tz_localize(None)
|
||||
storm_dt = storm_ts.to_pydatetime()
|
||||
else:
|
||||
storm_dt = storm_ts.to_pydatetime().replace(tzinfo=None)
|
||||
if start_dt <= storm_dt <= end_dt:
|
||||
filtered_data.append(storm)
|
||||
except Exception:
|
||||
continue
|
||||
return filtered_data
|
||||
except Exception as e:
|
||||
print(f"Error filtering storm data: {e}")
|
||||
return storm_data
|
||||
|
||||
def filter_storm_data_by_turbine_proximity(storm_data: List[Dict], turbine_df: pd.DataFrame, max_distance_km: float = None) -> List[Dict]:
|
||||
"""
|
||||
Filter storm data to only include storms within the specified distance from turbines.
|
||||
|
||||
Args:
|
||||
storm_data: List of storm cell dictionaries
|
||||
turbine_df: DataFrame containing turbine locations with 'lat' and 'lng' columns
|
||||
max_distance_km: Maximum distance in kilometers. If None, uses the farthest distance ring from config.
|
||||
|
||||
Returns:
|
||||
Filtered list of storm cell dictionaries
|
||||
"""
|
||||
if max_distance_km is None:
|
||||
# Use the farthest distance ring from config (convert meters to km)
|
||||
max_distance_km = max(config.distance_rings) / 1000
|
||||
|
||||
print(f"🌩️ Filtering storm cells within {max_distance_km} km of turbine locations...")
|
||||
|
||||
filtered_storms = []
|
||||
|
||||
for storm in storm_data:
|
||||
wkt_string = storm.get('cell_polygon_wkt', '')
|
||||
if not wkt_string:
|
||||
continue
|
||||
|
||||
coords = parse_wkt_linestring(wkt_string)
|
||||
if not coords:
|
||||
continue
|
||||
|
||||
# Check if any point in the storm cell is within the distance threshold
|
||||
storm_within_range = False
|
||||
|
||||
for storm_lon, storm_lat in coords:
|
||||
for _, turbine in turbine_df.iterrows():
|
||||
turbine_lat = turbine['lat']
|
||||
turbine_lon = turbine['lng']
|
||||
|
||||
distance_km = haversine_distance(turbine_lat, turbine_lon, storm_lat, storm_lon) / 1000
|
||||
|
||||
if distance_km <= max_distance_km:
|
||||
storm_within_range = True
|
||||
break
|
||||
|
||||
if storm_within_range:
|
||||
break
|
||||
|
||||
if storm_within_range:
|
||||
filtered_storms.append(storm)
|
||||
|
||||
print(f"🌩️ Filtered from {len(storm_data)} to {len(filtered_storms)} storm cells within {max_distance_km} km")
|
||||
return filtered_storms
|
||||
|
||||
def calculate_storm_cell_centroid(wkt_string: str) -> Tuple[float, float]:
|
||||
"""
|
||||
Calculate the centroid of a storm cell from WKT coordinates.
|
||||
|
||||
Args:
|
||||
wkt_string: WKT string representing the storm cell boundary
|
||||
|
||||
Returns:
|
||||
Tuple of (latitude, longitude) for the centroid
|
||||
"""
|
||||
coords = parse_wkt_linestring(wkt_string)
|
||||
if not coords:
|
||||
return None
|
||||
|
||||
# Calculate centroid (simple average of all points)
|
||||
lons, lats = zip(*coords)
|
||||
centroid_lat = np.mean(lats)
|
||||
centroid_lon = np.mean(lons)
|
||||
|
||||
return centroid_lat, centroid_lon
|
||||
|
||||
def create_storm_cells_summary(storm_data: List[Dict]) -> Dict[str, Any]:
|
||||
"""
|
||||
Create a summary of storm cells data.
|
||||
|
||||
Args:
|
||||
storm_data: List of storm cell dictionaries
|
||||
|
||||
Returns:
|
||||
Dictionary with summary statistics
|
||||
"""
|
||||
if not storm_data:
|
||||
return {}
|
||||
|
||||
# Count by severity
|
||||
severity_counts = {}
|
||||
total_cells = len(storm_data)
|
||||
|
||||
for storm in storm_data:
|
||||
severity = storm.get('lightning_severity', 'Unknown')
|
||||
severity_counts[severity] = severity_counts.get(severity, 0) + 1
|
||||
|
||||
# Calculate average direction and speed
|
||||
directions = [storm.get('direction', 0) for storm in storm_data if storm.get('direction') is not None]
|
||||
speeds = [storm.get('speed', 0) for storm in storm_data if storm.get('speed') is not None]
|
||||
|
||||
avg_direction = np.mean(directions) if directions else 0
|
||||
avg_speed = np.mean(speeds) if speeds else 0
|
||||
|
||||
# Get date range
|
||||
time_fields = []
|
||||
for storm in storm_data:
|
||||
time_field = storm.get('effective_time') or storm.get('creation_time') or storm.get('expire_time', '')
|
||||
if time_field:
|
||||
time_fields.append(time_field)
|
||||
|
||||
if time_fields:
|
||||
try:
|
||||
dates = []
|
||||
for time_field in time_fields:
|
||||
if 'T' in time_field:
|
||||
# ISO format with T
|
||||
dates.append(datetime.strptime(time_field[:10], '%Y-%m-%d'))
|
||||
else:
|
||||
# Standard format
|
||||
dates.append(datetime.strptime(time_field[:10], '%Y-%m-%d'))
|
||||
start_date = min(dates).strftime('%d-%m-%Y')
|
||||
end_date = max(dates).strftime('%d-%m-%Y')
|
||||
except:
|
||||
start_date = end_date = "Unknown"
|
||||
else:
|
||||
start_date = end_date = "Unknown"
|
||||
|
||||
# Get daily breakdown
|
||||
daily_storms = group_storm_data_by_day(storm_data)
|
||||
daily_summary = {day: len(storms) for day, storms in daily_storms.items()}
|
||||
|
||||
return {
|
||||
'total_cells': total_cells,
|
||||
'severity_counts': severity_counts,
|
||||
'avg_direction': avg_direction,
|
||||
'avg_speed': avg_speed,
|
||||
'date_range': {'start': start_date, 'end': end_date},
|
||||
'daily_breakdown': daily_summary
|
||||
}
|
||||
77
test_data/dagpazari_RES_coordinates.json
Normal file
77
test_data/dagpazari_RES_coordinates.json
Normal file
@ -0,0 +1,77 @@
|
||||
[
|
||||
{
|
||||
"name": "T1",
|
||||
"lat": 36.771944,
|
||||
"lng": 33.405000
|
||||
},
|
||||
{
|
||||
"name": "T2",
|
||||
"lat": 36.770278,
|
||||
"lng": 33.407500
|
||||
},
|
||||
{
|
||||
"name": "T3",
|
||||
"lat": 36.768611,
|
||||
"lng": 33.410000
|
||||
},
|
||||
{
|
||||
"name": "T4",
|
||||
"lat": 36.766389,
|
||||
"lng": 33.412500
|
||||
},
|
||||
{
|
||||
"name": "T5",
|
||||
"lat": 36.764167,
|
||||
"lng": 33.414722
|
||||
},
|
||||
{
|
||||
"name": "T6",
|
||||
"lat": 36.762778,
|
||||
"lng": 33.417500
|
||||
},
|
||||
{
|
||||
"name": "T7",
|
||||
"lat": 36.772778,
|
||||
"lng": 33.411667
|
||||
},
|
||||
{
|
||||
"name": "T8",
|
||||
"lat": 36.770556,
|
||||
"lng": 33.414167
|
||||
},
|
||||
{
|
||||
"name": "T9",
|
||||
"lat": 36.761667,
|
||||
"lng": 33.420000
|
||||
},
|
||||
{
|
||||
"name": "T10",
|
||||
"lat": 36.766944,
|
||||
"lng": 33.419167
|
||||
},
|
||||
{
|
||||
"name": "T11",
|
||||
"lat": 36.765833,
|
||||
"lng": 33.421667
|
||||
},
|
||||
{
|
||||
"name": "T12",
|
||||
"lat": 36.773889,
|
||||
"lng": 33.416389
|
||||
},
|
||||
{
|
||||
"name": "T13",
|
||||
"lat": 36.774722,
|
||||
"lng": 33.420556
|
||||
},
|
||||
{
|
||||
"name": "T14",
|
||||
"lat": 36.773333,
|
||||
"lng": 33.401111
|
||||
},
|
||||
{
|
||||
"name": "T15",
|
||||
"lat": 36.759722,
|
||||
"lng": 33.422778
|
||||
}
|
||||
]
|
||||
834
test_data/deneme.pdf
Normal file
834
test_data/deneme.pdf
Normal file
File diff suppressed because one or more lines are too long
1260
test_data/firtina_sorgulama_2024_05.json
Normal file
1260
test_data/firtina_sorgulama_2024_05.json
Normal file
File diff suppressed because it is too large
Load Diff
593
test_data/test_report.pdf
Normal file
593
test_data/test_report.pdf
Normal file
File diff suppressed because one or more lines are too long
138610
test_data/yildirim_simsek_sorgulama_2024_05.json
Normal file
138610
test_data/yildirim_simsek_sorgulama_2024_05.json
Normal file
File diff suppressed because it is too large
Load Diff
2
utm_converter_requirements.txt
Normal file
2
utm_converter_requirements.txt
Normal file
@ -0,0 +1,2 @@
|
||||
pyproj>=3.0.0
|
||||
pandas>=1.3.0
|
||||
262
utm_ed50_to_wgs84_converter.py
Normal file
262
utm_ed50_to_wgs84_converter.py
Normal file
@ -0,0 +1,262 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
UTM ED50 to WGS84 Coordinate Converter
|
||||
|
||||
This script converts UTM coordinates in 6-degree zones from ED50 (European Datum 1950)
|
||||
reference system to WGS84 format.
|
||||
|
||||
Requirements:
|
||||
pip install pyproj pandas
|
||||
|
||||
Usage:
|
||||
python utm_ed50_to_wgs84_converter.py input_file.csv output_file.csv
|
||||
python utm_ed50_to_wgs84_converter.py --interactive
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import pandas as pd
|
||||
import pyproj
|
||||
from typing import Tuple, List, Optional
|
||||
import sys
|
||||
|
||||
|
||||
class UTMED50ToWGS84Converter:
|
||||
"""Converter for UTM ED50 coordinates to WGS84."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the converter with ED50 and WGS84 projections."""
|
||||
self.ed50_utm_projections = {}
|
||||
self.wgs84_utm_projections = {}
|
||||
|
||||
def get_ed50_utm_projection(self, zone: int, northern: bool = True) -> pyproj.Proj:
|
||||
"""Get ED50 UTM projection for a specific zone."""
|
||||
key = (zone, northern)
|
||||
if key not in self.ed50_utm_projections:
|
||||
hemisphere = 'N' if northern else 'S'
|
||||
proj_string = f"+proj=utm +zone={zone} +ellps=intl +towgs84=-87,-98,-121,0,0,0,0 +units=m +no_defs"
|
||||
self.ed50_utm_projections[key] = pyproj.Proj(proj_string)
|
||||
return self.ed50_utm_projections[key]
|
||||
|
||||
def get_wgs84_utm_projection(self, zone: int, northern: bool = True) -> pyproj.Proj:
|
||||
"""Get WGS84 UTM projection for a specific zone."""
|
||||
key = (zone, northern)
|
||||
if key not in self.wgs84_utm_projections:
|
||||
hemisphere = 'N' if northern else 'S'
|
||||
proj_string = f"+proj=utm +zone={zone} +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
|
||||
self.wgs84_utm_projections[key] = pyproj.Proj(proj_string)
|
||||
return self.wgs84_utm_projections[key]
|
||||
|
||||
def convert_single_point(self, easting: float, northing: float, zone: int,
|
||||
northern: bool = True) -> Tuple[float, float]:
|
||||
"""
|
||||
Convert a single UTM ED50 point to WGS84 lat/lon.
|
||||
|
||||
Args:
|
||||
easting: UTM easting coordinate in meters
|
||||
northing: UTM northing coordinate in meters
|
||||
zone: UTM zone number (1-60)
|
||||
northern: True if in northern hemisphere, False if southern
|
||||
|
||||
Returns:
|
||||
Tuple of (latitude, longitude) in WGS84 decimal degrees
|
||||
"""
|
||||
if not (1 <= zone <= 60):
|
||||
raise ValueError(f"Invalid UTM zone: {zone}. Must be between 1 and 60.")
|
||||
|
||||
# Create ED50 UTM projection
|
||||
ed50_proj = self.get_ed50_utm_projection(zone, northern)
|
||||
|
||||
# Create WGS84 lat/lon projection
|
||||
wgs84_latlon = pyproj.Proj('+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs')
|
||||
|
||||
# Create transformer from ED50 UTM to WGS84 lat/lon
|
||||
transformer = pyproj.Transformer.from_proj(ed50_proj, wgs84_latlon)
|
||||
|
||||
# Transform coordinates directly to WGS84 lat/lon
|
||||
wgs84_lon, wgs84_lat = transformer.transform(easting, northing)
|
||||
|
||||
return wgs84_lat, wgs84_lon
|
||||
|
||||
def convert_dataframe(self, df: pd.DataFrame, easting_col: str, northing_col: str,
|
||||
zone_col: str, northern_col: Optional[str] = None,
|
||||
northern_default: bool = True) -> pd.DataFrame:
|
||||
"""
|
||||
Convert UTM ED50 coordinates in a DataFrame to WGS84.
|
||||
|
||||
Args:
|
||||
df: Input DataFrame with UTM coordinates
|
||||
easting_col: Column name for easting coordinates
|
||||
northing_col: Column name for northing coordinates
|
||||
zone_col: Column name for UTM zone
|
||||
northern_col: Column name for northern hemisphere flag (optional)
|
||||
northern_default: Default value for northern hemisphere if column not provided
|
||||
|
||||
Returns:
|
||||
DataFrame with additional WGS84 lat/lon columns
|
||||
"""
|
||||
result_df = df.copy()
|
||||
result_df['wgs84_lat'] = None
|
||||
result_df['wgs84_lon'] = None
|
||||
|
||||
for idx, row in df.iterrows():
|
||||
try:
|
||||
easting = float(row[easting_col])
|
||||
northing = float(row[northing_col])
|
||||
zone = int(row[zone_col])
|
||||
|
||||
if northern_col and northern_col in row:
|
||||
northern = bool(row[northern_col])
|
||||
else:
|
||||
northern = northern_default
|
||||
|
||||
lat, lon = self.convert_single_point(easting, northing, zone, northern)
|
||||
result_df.at[idx, 'wgs84_lat'] = lat
|
||||
result_df.at[idx, 'wgs84_lon'] = lon
|
||||
|
||||
except (ValueError, KeyError) as e:
|
||||
print(f"Warning: Could not convert row {idx}: {e}")
|
||||
result_df.at[idx, 'wgs84_lat'] = None
|
||||
result_df.at[idx, 'wgs84_lon'] = None
|
||||
|
||||
return result_df
|
||||
|
||||
|
||||
def interactive_mode():
|
||||
"""Run the converter in interactive mode."""
|
||||
converter = UTMED50ToWGS84Converter()
|
||||
|
||||
print("UTM ED50 to WGS84 Coordinate Converter")
|
||||
print("=" * 40)
|
||||
print("Enter coordinates (type 'quit' to exit)")
|
||||
print()
|
||||
|
||||
while True:
|
||||
try:
|
||||
# Get input
|
||||
easting_input = input("Enter easting (meters): ").strip()
|
||||
if easting_input.lower() == 'quit':
|
||||
break
|
||||
|
||||
northing_input = input("Enter northing (meters): ").strip()
|
||||
if northing_input.lower() == 'quit':
|
||||
break
|
||||
|
||||
zone_input = input("Enter UTM zone (1-60): ").strip()
|
||||
if zone_input.lower() == 'quit':
|
||||
break
|
||||
|
||||
hemisphere_input = input("Enter hemisphere (N/S) [default: N]: ").strip().upper()
|
||||
if hemisphere_input == 'QUIT':
|
||||
break
|
||||
|
||||
# Parse inputs
|
||||
easting = float(easting_input)
|
||||
northing = float(northing_input)
|
||||
zone = int(zone_input)
|
||||
northern = hemisphere_input != 'S' if hemisphere_input else True
|
||||
|
||||
# Convert
|
||||
lat, lon = converter.convert_single_point(easting, northing, zone, northern)
|
||||
|
||||
print(f"\nWGS84 Coordinates:")
|
||||
print(f"Latitude: {lat:.8f}°")
|
||||
print(f"Longitude: {lon:.8f}°")
|
||||
print("-" * 40)
|
||||
|
||||
except ValueError as e:
|
||||
print(f"Error: {e}")
|
||||
print("Please enter valid numeric values.")
|
||||
except KeyboardInterrupt:
|
||||
print("\nExiting...")
|
||||
break
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to handle command line arguments and file processing."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert UTM ED50 coordinates to WGS84 format",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Interactive mode
|
||||
python utm_ed50_to_wgs84_converter.py --interactive
|
||||
|
||||
# Convert CSV file
|
||||
python utm_ed50_to_wgs84_converter.py input.csv output.csv
|
||||
|
||||
# Convert with custom column names
|
||||
python utm_ed50_to_wgs84_converter.py input.csv output.csv --easting-col X --northing-col Y --zone-col ZONE
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('input_file', nargs='?', help='Input CSV file with UTM coordinates')
|
||||
parser.add_argument('output_file', nargs='?', help='Output CSV file for WGS84 coordinates')
|
||||
parser.add_argument('--interactive', '-i', action='store_true',
|
||||
help='Run in interactive mode')
|
||||
parser.add_argument('--easting-col', default='easting',
|
||||
help='Column name for easting coordinates (default: easting)')
|
||||
parser.add_argument('--northing-col', default='northing',
|
||||
help='Column name for northing coordinates (default: northing)')
|
||||
parser.add_argument('--zone-col', default='zone',
|
||||
help='Column name for UTM zone (default: zone)')
|
||||
parser.add_argument('--northern-col',
|
||||
help='Column name for northern hemisphere flag (optional)')
|
||||
parser.add_argument('--northern-default', action='store_true', default=True,
|
||||
help='Default value for northern hemisphere (default: True)')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.interactive:
|
||||
interactive_mode()
|
||||
return
|
||||
|
||||
if not args.input_file or not args.output_file:
|
||||
parser.error("Both input_file and output_file are required when not in interactive mode")
|
||||
|
||||
try:
|
||||
# Read input file
|
||||
print(f"Reading input file: {args.input_file}")
|
||||
df = pd.read_csv(args.input_file)
|
||||
|
||||
# Check required columns
|
||||
required_cols = [args.easting_col, args.northing_col, args.zone_col]
|
||||
missing_cols = [col for col in required_cols if col not in df.columns]
|
||||
if missing_cols:
|
||||
print(f"Error: Missing required columns: {missing_cols}")
|
||||
print(f"Available columns: {list(df.columns)}")
|
||||
sys.exit(1)
|
||||
|
||||
# Convert coordinates
|
||||
print("Converting coordinates...")
|
||||
converter = UTMED50ToWGS84Converter()
|
||||
result_df = converter.convert_dataframe(
|
||||
df,
|
||||
args.easting_col,
|
||||
args.northing_col,
|
||||
args.zone_col,
|
||||
args.northern_col,
|
||||
args.northern_default
|
||||
)
|
||||
|
||||
# Save output
|
||||
print(f"Saving output file: {args.output_file}")
|
||||
result_df.to_csv(args.output_file, index=False)
|
||||
|
||||
# Print summary
|
||||
total_rows = len(result_df)
|
||||
successful_conversions = result_df['wgs84_lat'].notna().sum()
|
||||
print(f"\nConversion complete!")
|
||||
print(f"Total rows: {total_rows}")
|
||||
print(f"Successful conversions: {successful_conversions}")
|
||||
print(f"Failed conversions: {total_rows - successful_conversions}")
|
||||
|
||||
except FileNotFoundError:
|
||||
print(f"Error: Input file '{args.input_file}' not found")
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
340
utm_ed50_to_wgs84_converter_enhanced.py
Normal file
340
utm_ed50_to_wgs84_converter_enhanced.py
Normal file
@ -0,0 +1,340 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Enhanced UTM ED50 to WGS84 Coordinate Converter
|
||||
|
||||
This script converts UTM coordinates in 6-degree zones from ED50 (European Datum 1950)
|
||||
reference system to WGS84 format. Supports both CSV and JSON input files.
|
||||
|
||||
Requirements:
|
||||
pip install pyproj pandas
|
||||
|
||||
Usage:
|
||||
python utm_ed50_to_wgs84_converter_enhanced.py input_file.csv output_file.csv
|
||||
python utm_ed50_to_wgs84_converter_enhanced.py input_file.json output_file.json
|
||||
python utm_ed50_to_wgs84_converter_enhanced.py --interactive
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import pandas as pd
|
||||
import pyproj
|
||||
import json
|
||||
import os
|
||||
from typing import Tuple, List, Optional, Union
|
||||
import sys
|
||||
|
||||
|
||||
class UTMED50ToWGS84Converter:
|
||||
"""Converter for UTM ED50 coordinates to WGS84."""
|
||||
|
||||
def __init__(self):
|
||||
"""Initialize the converter with ED50 and WGS84 projections."""
|
||||
self.ed50_utm_projections = {}
|
||||
self.wgs84_utm_projections = {}
|
||||
|
||||
def get_ed50_utm_projection(self, zone: int, northern: bool = True) -> pyproj.Proj:
|
||||
"""Get ED50 UTM projection for a specific zone."""
|
||||
key = (zone, northern)
|
||||
if key not in self.ed50_utm_projections:
|
||||
hemisphere = 'N' if northern else 'S'
|
||||
proj_string = f"+proj=utm +zone={zone} +ellps=intl +towgs84=-87,-98,-121,0,0,0,0 +units=m +no_defs"
|
||||
self.ed50_utm_projections[key] = pyproj.Proj(proj_string)
|
||||
return self.ed50_utm_projections[key]
|
||||
|
||||
def get_wgs84_utm_projection(self, zone: int, northern: bool = True) -> pyproj.Proj:
|
||||
"""Get WGS84 UTM projection for a specific zone."""
|
||||
key = (zone, northern)
|
||||
if key not in self.wgs84_utm_projections:
|
||||
hemisphere = 'N' if northern else 'S'
|
||||
proj_string = f"+proj=utm +zone={zone} +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
|
||||
self.wgs84_utm_projections[key] = pyproj.Proj(proj_string)
|
||||
return self.wgs84_utm_projections[key]
|
||||
|
||||
def convert_single_point(self, easting: float, northing: float, zone: int,
|
||||
northern: bool = True) -> Tuple[float, float]:
|
||||
"""
|
||||
Convert a single UTM ED50 point to WGS84 lat/lon.
|
||||
|
||||
Args:
|
||||
easting: UTM easting coordinate in meters
|
||||
northing: UTM northing coordinate in meters
|
||||
zone: UTM zone number (1-60)
|
||||
northern: True if in northern hemisphere, False if southern
|
||||
|
||||
Returns:
|
||||
Tuple of (latitude, longitude) in WGS84 decimal degrees
|
||||
"""
|
||||
if not (1 <= zone <= 60):
|
||||
raise ValueError(f"Invalid UTM zone: {zone}. Must be between 1 and 60.")
|
||||
|
||||
# Create ED50 UTM projection
|
||||
ed50_proj = self.get_ed50_utm_projection(zone, northern)
|
||||
|
||||
# Create WGS84 lat/lon projection
|
||||
wgs84_latlon = pyproj.Proj('+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs')
|
||||
|
||||
# Create transformer from ED50 UTM to WGS84 lat/lon
|
||||
transformer = pyproj.Transformer.from_proj(ed50_proj, wgs84_latlon)
|
||||
|
||||
# Transform coordinates directly to WGS84 lat/lon
|
||||
wgs84_lon, wgs84_lat = transformer.transform(easting, northing)
|
||||
|
||||
return wgs84_lat, wgs84_lon
|
||||
|
||||
def convert_dataframe(self, df: pd.DataFrame, easting_col: str, northing_col: str,
|
||||
zone_col: str, northern_col: Optional[str] = None,
|
||||
northern_default: bool = True) -> pd.DataFrame:
|
||||
"""
|
||||
Convert UTM ED50 coordinates in a DataFrame to WGS84.
|
||||
|
||||
Args:
|
||||
df: Input DataFrame with UTM coordinates
|
||||
easting_col: Column name for easting coordinates
|
||||
northing_col: Column name for northing coordinates
|
||||
zone_col: Column name for UTM zone
|
||||
northern_col: Column name for northern hemisphere flag (optional)
|
||||
northern_default: Default value for northern hemisphere if column not provided
|
||||
|
||||
Returns:
|
||||
DataFrame with additional lat/lng columns and renamed description column
|
||||
"""
|
||||
result_df = df.copy()
|
||||
result_df['lat'] = None
|
||||
result_df['lng'] = None
|
||||
|
||||
# Rename description column to name if it exists
|
||||
if 'description' in result_df.columns:
|
||||
result_df = result_df.rename(columns={'description': 'name'})
|
||||
|
||||
for idx, row in df.iterrows():
|
||||
try:
|
||||
easting = float(row[easting_col])
|
||||
northing = float(row[northing_col])
|
||||
zone = int(row[zone_col])
|
||||
|
||||
if northern_col and northern_col in row:
|
||||
northern = bool(row[northern_col])
|
||||
else:
|
||||
northern = northern_default
|
||||
|
||||
lat, lon = self.convert_single_point(easting, northing, zone, northern)
|
||||
result_df.at[idx, 'lat'] = lat
|
||||
result_df.at[idx, 'lng'] = lon
|
||||
|
||||
except (ValueError, KeyError) as e:
|
||||
print(f"Warning: Could not convert row {idx}: {e}")
|
||||
result_df.at[idx, 'lat'] = None
|
||||
result_df.at[idx, 'lng'] = None
|
||||
|
||||
return result_df
|
||||
|
||||
|
||||
def detect_file_format(file_path: str) -> str:
|
||||
"""Detect if file is CSV or JSON based on extension and content."""
|
||||
_, ext = os.path.splitext(file_path.lower())
|
||||
|
||||
if ext == '.csv':
|
||||
return 'csv'
|
||||
elif ext == '.json':
|
||||
return 'json'
|
||||
else:
|
||||
# Try to detect by reading first few lines
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
first_line = f.readline().strip()
|
||||
if first_line.startswith('[') or first_line.startswith('{'):
|
||||
return 'json'
|
||||
else:
|
||||
return 'csv'
|
||||
except:
|
||||
return 'csv' # Default to CSV
|
||||
|
||||
|
||||
def load_data(file_path: str, csv_separator: str = ';', decimal_separator: str = ',') -> pd.DataFrame:
|
||||
"""Load data from CSV or JSON file."""
|
||||
file_format = detect_file_format(file_path)
|
||||
|
||||
if file_format == 'json':
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
if isinstance(data, list):
|
||||
return pd.DataFrame(data)
|
||||
elif isinstance(data, dict):
|
||||
return pd.DataFrame([data])
|
||||
else:
|
||||
raise ValueError("JSON file must contain a list or dictionary")
|
||||
|
||||
else: # CSV
|
||||
return pd.read_csv(file_path, sep=csv_separator, decimal=decimal_separator)
|
||||
|
||||
|
||||
def save_data(df: pd.DataFrame, file_path: str, file_format: str = None, csv_separator: str = ';', decimal_separator: str = ','):
|
||||
"""Save data to CSV or JSON file."""
|
||||
if file_format is None:
|
||||
file_format = detect_file_format(file_path)
|
||||
|
||||
if file_format == 'json':
|
||||
# Convert DataFrame to list of dictionaries
|
||||
data = df.to_dict('records')
|
||||
with open(file_path, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
|
||||
else: # CSV
|
||||
df.to_csv(file_path, index=False, sep=csv_separator, decimal=decimal_separator)
|
||||
|
||||
|
||||
def interactive_mode():
|
||||
"""Run the converter in interactive mode."""
|
||||
converter = UTMED50ToWGS84Converter()
|
||||
|
||||
print("UTM ED50 to WGS84 Coordinate Converter")
|
||||
print("=" * 40)
|
||||
print("Enter coordinates (type 'quit' to exit)")
|
||||
print()
|
||||
|
||||
while True:
|
||||
try:
|
||||
# Get input
|
||||
easting_input = input("Enter easting (meters): ").strip()
|
||||
if easting_input.lower() == 'quit':
|
||||
break
|
||||
|
||||
northing_input = input("Enter northing (meters): ").strip()
|
||||
if northing_input.lower() == 'quit':
|
||||
break
|
||||
|
||||
zone_input = input("Enter UTM zone (1-60): ").strip()
|
||||
if zone_input.lower() == 'quit':
|
||||
break
|
||||
|
||||
hemisphere_input = input("Enter hemisphere (N/S) [default: N]: ").strip().upper()
|
||||
if hemisphere_input == 'QUIT':
|
||||
break
|
||||
|
||||
# Parse inputs
|
||||
easting = float(easting_input)
|
||||
northing = float(northing_input)
|
||||
zone = int(zone_input)
|
||||
northern = hemisphere_input != 'S' if hemisphere_input else True
|
||||
|
||||
# Convert
|
||||
lat, lon = converter.convert_single_point(easting, northing, zone, northern)
|
||||
|
||||
print(f"\nWGS84 Coordinates:")
|
||||
print(f"Latitude: {lat:.8f}°")
|
||||
print(f"Longitude: {lon:.8f}°")
|
||||
print("-" * 40)
|
||||
|
||||
except ValueError as e:
|
||||
print(f"Error: {e}")
|
||||
print("Please enter valid numeric values.")
|
||||
except KeyboardInterrupt:
|
||||
print("\nExiting...")
|
||||
break
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to handle command line arguments and file processing."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert UTM ED50 coordinates to WGS84 format (supports CSV and JSON)",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Interactive mode
|
||||
python utm_ed50_to_wgs84_converter_enhanced.py --interactive
|
||||
|
||||
# Convert CSV file (semicolon-separated by default)
|
||||
python utm_ed50_to_wgs84_converter_enhanced.py input.csv output.csv
|
||||
|
||||
# Convert CSV file with comma separator
|
||||
python utm_ed50_to_wgs84_converter_enhanced.py input.csv output.csv --separator ,
|
||||
|
||||
# Convert JSON file
|
||||
python utm_ed50_to_wgs84_converter_enhanced.py input.json output.json
|
||||
|
||||
# Convert with custom column names
|
||||
python utm_ed50_to_wgs84_converter_enhanced.py input.csv output.csv --easting-col X --northing-col Y --zone-col ZONE
|
||||
|
||||
# Using full paths
|
||||
python /full/path/to/script.py /full/path/to/input.csv /full/path/to/output.csv
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('input_file', nargs='?', help='Input file with UTM coordinates (CSV or JSON)')
|
||||
parser.add_argument('output_file', nargs='?', help='Output file for WGS84 coordinates (CSV or JSON)')
|
||||
parser.add_argument('--interactive', '-i', action='store_true',
|
||||
help='Run in interactive mode')
|
||||
parser.add_argument('--easting-col', default='easting',
|
||||
help='Column name for easting coordinates (default: easting)')
|
||||
parser.add_argument('--northing-col', default='northing',
|
||||
help='Column name for northing coordinates (default: northing)')
|
||||
parser.add_argument('--zone-col', default='zone',
|
||||
help='Column name for UTM zone (default: zone)')
|
||||
parser.add_argument('--northern-col',
|
||||
help='Column name for northern hemisphere flag (optional)')
|
||||
parser.add_argument('--northern-default', action='store_true', default=True,
|
||||
help='Default value for northern hemisphere (default: True)')
|
||||
parser.add_argument('--separator', '-s', default=';',
|
||||
help='CSV separator character (default: ;)')
|
||||
parser.add_argument('--decimal', '-d', default=',',
|
||||
help='CSV decimal separator character (default: ,)')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.interactive:
|
||||
interactive_mode()
|
||||
return
|
||||
|
||||
if not args.input_file or not args.output_file:
|
||||
parser.error("Both input_file and output_file are required when not in interactive mode")
|
||||
|
||||
try:
|
||||
# Check if input file exists
|
||||
if not os.path.exists(args.input_file):
|
||||
print(f"Error: Input file '{args.input_file}' not found")
|
||||
print(f"Current working directory: {os.getcwd()}")
|
||||
sys.exit(1)
|
||||
|
||||
# Load data
|
||||
print(f"Reading input file: {args.input_file}")
|
||||
df = load_data(args.input_file, args.separator, args.decimal)
|
||||
|
||||
# Check required columns
|
||||
required_cols = [args.easting_col, args.northing_col, args.zone_col]
|
||||
missing_cols = [col for col in required_cols if col not in df.columns]
|
||||
if missing_cols:
|
||||
print(f"Error: Missing required columns: {missing_cols}")
|
||||
print(f"Available columns: {list(df.columns)}")
|
||||
sys.exit(1)
|
||||
|
||||
# Convert coordinates
|
||||
print("Converting coordinates...")
|
||||
converter = UTMED50ToWGS84Converter()
|
||||
result_df = converter.convert_dataframe(
|
||||
df,
|
||||
args.easting_col,
|
||||
args.northing_col,
|
||||
args.zone_col,
|
||||
args.northern_col,
|
||||
args.northern_default
|
||||
)
|
||||
|
||||
# Save output
|
||||
print(f"Saving output file: {args.output_file}")
|
||||
save_data(result_df, args.output_file, csv_separator=args.separator, decimal_separator=args.decimal)
|
||||
|
||||
# Print summary
|
||||
total_rows = len(result_df)
|
||||
successful_conversions = result_df['lat'].notna().sum()
|
||||
print(f"\nConversion complete!")
|
||||
print(f"Total rows: {total_rows}")
|
||||
print(f"Successful conversions: {successful_conversions}")
|
||||
print(f"Failed conversions: {total_rows - successful_conversions}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
184
wind_farms_config.json
Normal file
184
wind_farms_config.json
Normal file
@ -0,0 +1,184 @@
|
||||
{
|
||||
"api_config": {
|
||||
"base_url": "https://risk.tarla.io/api",
|
||||
"timeout_seconds": 60,
|
||||
"retry_attempts": 3,
|
||||
"default_query_range": {
|
||||
"method": "current_month"
|
||||
}
|
||||
},
|
||||
"output_base_directory": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/reports/",
|
||||
"default_padding_km": 5,
|
||||
"wind_farms": [
|
||||
{
|
||||
"farm_id": "dagpazari_RES",
|
||||
"name": "Dağpazarı RES",
|
||||
"enabled": true,
|
||||
"coordinates_file": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/coordinates/dagpazari_RES_coordinates.json",
|
||||
"distance_rings": [2000, 4000, 6000, 8000],
|
||||
"ring_colors": ["#B71C1C", "#F94144", "#F8961E", "#90BE6D"],
|
||||
"lightning_source_type": "api",
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "auto",
|
||||
"padding_km": 5
|
||||
},
|
||||
"date_range": {
|
||||
"method": "manual",
|
||||
"start_date": "01-02-2026",
|
||||
"end_date": "28-02-2026"
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/reports/dagpazari_RES/",
|
||||
"timezone": "Europe/Istanbul"
|
||||
}
|
||||
},
|
||||
{
|
||||
"farm_id": "boreas_enez_RES",
|
||||
"name": "Boreas Enez RES",
|
||||
"enabled": false,
|
||||
"coordinates_file": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/coordinates/boreas_enez_RES_coordinates.json",
|
||||
"distance_rings": [4000, 6000, 8000, 10000],
|
||||
"ring_colors": ["#B71C1C", "#F94144", "#F8961E", "#90BE6D"],
|
||||
"lightning_source_type": "api",
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "auto",
|
||||
"padding_km": 5
|
||||
},
|
||||
"date_range": {
|
||||
"method": "manual",
|
||||
"start_date": "08-12-2025",
|
||||
"end_date": "23-03-2026"
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/reports/boreas_enez_RES/",
|
||||
"timezone": "Europe/Istanbul"
|
||||
}
|
||||
},
|
||||
{
|
||||
"farm_id": "maslaktepe_RES",
|
||||
"name": "Maslaktepe RES",
|
||||
"enabled": false,
|
||||
"coordinates_file": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/coordinates/maslaktepe_RES_coordinates.json",
|
||||
"distance_rings": [4000, 6000, 8000, 10000],
|
||||
"ring_colors": ["#B71C1C", "#F94144", "#F8961E", "#90BE6D"],
|
||||
"lightning_source_type": "api",
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "auto",
|
||||
"padding_km": 5
|
||||
},
|
||||
"date_range": {
|
||||
"method": "manual",
|
||||
"start_date": "08-12-2025",
|
||||
"end_date": "23-03-2026"
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/reports/maslaktepe_RES/",
|
||||
"timezone": "Europe/Istanbul"
|
||||
}
|
||||
},
|
||||
{
|
||||
"farm_id": "Susurluk_RES",
|
||||
"name": "Susurluk RES",
|
||||
"enabled": false,
|
||||
"coordinates_file": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/coordinates/susurluk_RES_coordinates.json",
|
||||
"distance_rings": [1000, 2000, 3000, 4000, 10000],
|
||||
"ring_colors": ["purple", "red", "orange", "coral", "green"],
|
||||
"lightning_source_type": "api",
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "auto",
|
||||
"padding_km": 5
|
||||
},
|
||||
"date_range": {
|
||||
"method": "manual",
|
||||
"start_date": "05-11-2025",
|
||||
"end_date": "08-12-2025"
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/reports/susurluk_RES/",
|
||||
"timezone": "Europe/Istanbul"
|
||||
}
|
||||
},
|
||||
{
|
||||
"farm_id": "benlikuyu_GES",
|
||||
"name": "Benlikuyu GES",
|
||||
"enabled": false,
|
||||
"coordinates_file": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/coordinates/benlikuyu_GES_coordinates.json",
|
||||
"distance_rings": [1000, 2000, 3000, 4000, 10000],
|
||||
"ring_colors": ["purple", "red", "orange", "coral", "green"],
|
||||
"lightning_source_type": "api",
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "auto",
|
||||
"padding_km": 5
|
||||
},
|
||||
"date_range": {
|
||||
"method": "manual",
|
||||
"start_date": "08-09-2025",
|
||||
"end_date": "08-12-2025"
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/reports/benlikuyu_GES/",
|
||||
"timezone": "Europe/Istanbul"
|
||||
}
|
||||
},
|
||||
{
|
||||
"farm_id": "SOKE-01",
|
||||
"name": "SOKE-01",
|
||||
"enabled": false,
|
||||
"coordinates_file": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/coordinates/SOKE-01.json",
|
||||
"distance_rings": [1000, 2000, 4000, 8000, 10000],
|
||||
"ring_colors": ["#D62828", "#F77F00", "#FCBF49", "#90BE6D", "#4D96FF"],
|
||||
"lightning_source_type": "api",
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "auto",
|
||||
"padding_km": 0.5
|
||||
},
|
||||
"date_range": {
|
||||
"method": "manual",
|
||||
"start_date": "2026-01-22T07:00:00Z",
|
||||
"end_date": "2026-01-22T08:00:00Z"
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/reports/SOKE-01/",
|
||||
"timezone": "Europe/Istanbul"
|
||||
}
|
||||
},
|
||||
{
|
||||
"farm_id": "Wind Farm Krnovo",
|
||||
"name": "Wind Farm Krnovo",
|
||||
"enabled": false,
|
||||
"coordinates_file": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/coordinates/wind_farm_krnovo.json",
|
||||
"distance_rings": [2000, 4000, 8000, 20000, 30000],
|
||||
"ring_colors": ["#D62828", "#F77F00", "#FCBF49", "#90BE6D", "#4D96FF"],
|
||||
"lightning_source_type": "csv",
|
||||
"lightning_csv": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/lightnings/26_02_19_karadag.csv",
|
||||
"api_params": {
|
||||
"location_bounds": {
|
||||
"method": "auto",
|
||||
"padding_km": 0.5
|
||||
},
|
||||
"date_range": {
|
||||
"method": "manual",
|
||||
"start_date": "19-02-2026",
|
||||
"end_date": "20-02-2026"
|
||||
}
|
||||
},
|
||||
"report_config": {
|
||||
"output_directory": "/Users/erdemerikci/Drive'ım/ERIKTRONIK/iklimco/Rapor/reports/wind_farm_krnovo/",
|
||||
"timezone": "Europe/Podgorica"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user