13 Commits

Author SHA1 Message Date
c290882492 fix(monitoring): add missing conditions array to DNS monitors
Uptime Kuma 1.23+ evaluates monitor.conditions.length internally.
While HTTP monitors seem to bypass this check safely if conditions is null,
DNS monitors crash the NodeJS backend with 'Cannot read properties of null (reading length)'
if conditions is not explicitly initialized as an empty array.
2026-06-26 23:54:20 +03:00
a7ecfc4b2d fix(monitoring): add missing url property to DNS monitors
The Node.js backend of Uptime Kuma 2.4.0 seems to crash on DNS monitors with 'Cannot read properties of null (reading length)' if the 'url' field is not explicitly set, because the API defaults it to null instead of 'https://' like the UI does.
2026-06-26 23:46:08 +03:00
8a056a381b fix(monitoring): prevent Vault crash and DNS null error
- Vault: Wrap resp.json() in a try-except block to prevent JSONDecodeError when hitting an HTML error page (e.g. 502/503). This prevents the entire agent from crashing and missing heartbeats.
- Uptime Kuma DNS: Explicitly set dns_resolve_server to 1.1.1.1 in Python API payload to prevent Uptime Kuma backend from crashing on null properties.
2026-06-26 23:23:02 +03:00
b49ca276f0 fix(monitoring): support existing monitor updates and vault nodes
- setup_uptime_kuma: Use api.edit_monitor to update existing monitors with new configuration instead of skipping them.
- setup_uptime_kuma: Add port and accepted_statuscodes to DNS monitors to prevent NodeJS null reading errors in Kuma.
- http.py: Parse VAULT_HOSTS environment variable for Vault cluster nodes instead of hardcoding 'vault'.
2026-06-26 23:07:37 +03:00
b73ae4e5fb revert(health-agent): revert ping monitors back to PING type 2026-06-26 21:55:42 +03:00
94e6b57c52 fix(health-agent): check all 3 patroni node configs on storagebox; switch ping monitors to TCP port 22 (ICMP blocked from Docker) 2026-06-26 21:54:49 +03:00
2827b227d5 fix(health-agent): fix notification param name and type — notificationIDList expects a list of IDs not a dict 2026-06-26 21:23:45 +03:00
e4acd0e57b fix(health-agent): skip uk_tokens.yml write when tokens dict is empty to prevent setup skip loop 2026-06-26 21:10:10 +03:00
8b10653ff4 fix(health-agent): fix ping maxretries param and status page group lookup
Fix ping monitor creation error ('max_retries' is not a valid uptime-kuma-api param; correct name is 'maxretries'). Fix status pages never linking groups: re-fetching get_monitors() after add_monitor() races with WebSocket delivery so newly created groups are missing; use group_map populated in Section 1 directly instead.
2026-06-26 21:07:11 +03:00
bc8b3d0934 refactor: convert all monitor names to Title Case and update health-agent digest 2026-06-26 20:47:31 +03:00
0ef4f0b6f8 refactor: rename iklimco-monitoring stack to monitoring 2026-06-26 19:24:01 +03:00
58d5c24f41 feat(health-agent): add CI/CD pipeline, Uptime Kuma setup, and runtime configuration
Some checks failed
Deploy Environment Monitoring to Production Environment / deploy (push) Failing after 10s
Deploy workflows:
- Integrate health-agent build (test) and image promotion (prod) into monitoring stack workflows
- Add storagebox download of health-agent runtime (.env.monitoring.health-agent-runtime → health-agent/.env) and setup (.env.monitoring.health-agent-setup → health-agent/.env.setup) env files
- Add "Run Uptime Kuma Setup" step: runs setup_uptime_kuma.py inside the built image only when uk_tokens.yml is missing, writes tokens to HEALTH_AGENT_CONFIG_GENERATED_DIR (/mnt/storagebox/monitoring/uk_generated)
- Add health-agent/** and health-agent/deploy/prod.env path triggers to test and prod workflows respectively
- Add HARBOR_CI_TOKEN login and HARBOR_PULL_TOKEN login before stack deploy in both workflows
- Source health-agent/.env before docker stack deploy to expose HEALTH_AGENT_CONFIG_GENERATED_DIR

Dockerfile:
- Copy config/ and scripts/ into image so setup_uptime_kuma.py can run inside the container

setup_uptime_kuma.py:
- Load .env and .env.setup automatically via python-dotenv (no manual export needed)
- Write uk_tokens.yml to config/generated/ (aligned with container volume mount)

Health checks:
- PATRONI_HOSTS and VAULT_HOSTS are now configurable via env vars (comma-separated host:port); no code change needed when node count changes
- REDIS_SENTINEL_HOSTS now correctly parses host:port format; default updated to redis-sentinel:26379
- Fix NameError in check_patroni_cluster() caused by leftover node variable after loop refactor
- Remove verify_ssl=False from Vault check; vault.iklim.co has a valid certificate

Ops:
- Add ops/build-and-push-health-agent.sh for manual bypass of CI pipeline
- Add health-agent/deploy/prod.env template for prod image promotion manifest

Project structure:
- Move .env.example and .env.setup.example to health-agent/env-example/ (root .gitignore excludes health-agent/.env*)
- Add root .gitignore: excludes uk_tokens.yml, __pycache__, .venv, and env files
- Remove health-agent/.gitignore (superseded by root .gitignore)
2026-06-26 18:45:17 +03:00
f742bfdd11 feat(health-agent): add monitors.yml with env-aware node IP mapping from Ansible inventory 2026-06-25 18:59:14 +03:00