Environment_Monitoring

Author	SHA1	Message	Date
Murat ÖZDEMİR	c290882492	fix(monitoring): add missing conditions array to DNS monitors Uptime Kuma 1.23+ evaluates monitor.conditions.length internally. While HTTP monitors seem to bypass this check safely if conditions is null, DNS monitors crash the NodeJS backend with 'Cannot read properties of null (reading length)' if conditions is not explicitly initialized as an empty array.	2026-06-26 23:54:20 +03:00
Murat ÖZDEMİR	a7ecfc4b2d	fix(monitoring): add missing url property to DNS monitors The Node.js backend of Uptime Kuma 2.4.0 seems to crash on DNS monitors with 'Cannot read properties of null (reading length)' if the 'url' field is not explicitly set, because the API defaults it to null instead of 'https://' like the UI does.	2026-06-26 23:46:08 +03:00
Murat ÖZDEMİR	8a056a381b	fix(monitoring): prevent Vault crash and DNS null error - Vault: Wrap resp.json() in a try-except block to prevent JSONDecodeError when hitting an HTML error page (e.g. 502/503). This prevents the entire agent from crashing and missing heartbeats. - Uptime Kuma DNS: Explicitly set dns_resolve_server to 1.1.1.1 in Python API payload to prevent Uptime Kuma backend from crashing on null properties.	2026-06-26 23:23:02 +03:00
Murat ÖZDEMİR	b49ca276f0	fix(monitoring): support existing monitor updates and vault nodes - setup_uptime_kuma: Use api.edit_monitor to update existing monitors with new configuration instead of skipping them. - setup_uptime_kuma: Add port and accepted_statuscodes to DNS monitors to prevent NodeJS null reading errors in Kuma. - http.py: Parse VAULT_HOSTS environment variable for Vault cluster nodes instead of hardcoding 'vault'.	2026-06-26 23:07:37 +03:00
Murat ÖZDEMİR	b73ae4e5fb	revert(health-agent): revert ping monitors back to PING type	2026-06-26 21:55:42 +03:00
Murat ÖZDEMİR	94e6b57c52	fix(health-agent): check all 3 patroni node configs on storagebox; switch ping monitors to TCP port 22 (ICMP blocked from Docker)	2026-06-26 21:54:49 +03:00
Murat ÖZDEMİR	2827b227d5	fix(health-agent): fix notification param name and type — notificationIDList expects a list of IDs not a dict	2026-06-26 21:23:45 +03:00
Murat ÖZDEMİR	e4acd0e57b	fix(health-agent): skip uk_tokens.yml write when tokens dict is empty to prevent setup skip loop	2026-06-26 21:10:10 +03:00
Murat ÖZDEMİR	8b10653ff4	fix(health-agent): fix ping maxretries param and status page group lookup Fix ping monitor creation error ('max_retries' is not a valid uptime-kuma-api param; correct name is 'maxretries'). Fix status pages never linking groups: re-fetching get_monitors() after add_monitor() races with WebSocket delivery so newly created groups are missing; use group_map populated in Section 1 directly instead.	2026-06-26 21:07:11 +03:00
Murat ÖZDEMİR	bc8b3d0934	refactor: convert all monitor names to Title Case and update health-agent digest	2026-06-26 20:47:31 +03:00
Murat ÖZDEMİR	0ef4f0b6f8	refactor: rename iklimco-monitoring stack to monitoring	2026-06-26 19:24:01 +03:00
Murat ÖZDEMİR	58d5c24f41	feat(health-agent): add CI/CD pipeline, Uptime Kuma setup, and runtime configuration Some checks failed Deploy Environment Monitoring to Production Environment / deploy (push) Failing after 10s Details Deploy workflows: - Integrate health-agent build (test) and image promotion (prod) into monitoring stack workflows - Add storagebox download of health-agent runtime (.env.monitoring.health-agent-runtime → health-agent/.env) and setup (.env.monitoring.health-agent-setup → health-agent/.env.setup) env files - Add "Run Uptime Kuma Setup" step: runs setup_uptime_kuma.py inside the built image only when uk_tokens.yml is missing, writes tokens to HEALTH_AGENT_CONFIG_GENERATED_DIR (/mnt/storagebox/monitoring/uk_generated) - Add health-agent/** and health-agent/deploy/prod.env path triggers to test and prod workflows respectively - Add HARBOR_CI_TOKEN login and HARBOR_PULL_TOKEN login before stack deploy in both workflows - Source health-agent/.env before docker stack deploy to expose HEALTH_AGENT_CONFIG_GENERATED_DIR Dockerfile: - Copy config/ and scripts/ into image so setup_uptime_kuma.py can run inside the container setup_uptime_kuma.py: - Load .env and .env.setup automatically via python-dotenv (no manual export needed) - Write uk_tokens.yml to config/generated/ (aligned with container volume mount) Health checks: - PATRONI_HOSTS and VAULT_HOSTS are now configurable via env vars (comma-separated host:port); no code change needed when node count changes - REDIS_SENTINEL_HOSTS now correctly parses host:port format; default updated to redis-sentinel:26379 - Fix NameError in check_patroni_cluster() caused by leftover node variable after loop refactor - Remove verify_ssl=False from Vault check; vault.iklim.co has a valid certificate Ops: - Add ops/build-and-push-health-agent.sh for manual bypass of CI pipeline - Add health-agent/deploy/prod.env template for prod image promotion manifest Project structure: - Move .env.example and .env.setup.example to health-agent/env-example/ (root .gitignore excludes health-agent/.env*) - Add root .gitignore: excludes uk_tokens.yml, __pycache__, .venv, and env files - Remove health-agent/.gitignore (superseded by root .gitignore)	2026-06-26 18:45:17 +03:00
Murat ÖZDEMİR	f742bfdd11	feat(health-agent): add monitors.yml with env-aware node IP mapping from Ansible inventory	2026-06-25 18:59:14 +03:00

13 Commits