Environment_Monitoring

Author	SHA1	Message	Date
Murat ÖZDEMİR	b49ca276f0	fix(monitoring): support existing monitor updates and vault nodes - setup_uptime_kuma: Use api.edit_monitor to update existing monitors with new configuration instead of skipping them. - setup_uptime_kuma: Add port and accepted_statuscodes to DNS monitors to prevent NodeJS null reading errors in Kuma. - http.py: Parse VAULT_HOSTS environment variable for Vault cluster nodes instead of hardcoding 'vault'.	2026-06-26 23:07:37 +03:00
Murat ÖZDEMİR	2a482ce4df	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 16s Details	2026-06-26 22:53:35 +03:00
Murat ÖZDEMİR	969c4a2301	fix(monitoring): resolve health-agent bugs and flapping monitors - Vault flapping: Fix resp evaluation on HTTP 429 - Storagebox block: Move mount check to a daemon thread - Push monitors: Increase interval to 75s and restore 60s sleep - Redis Sentinel: Fix authentication in sentinel_kwargs - Ext Https Api: Update URL to /health	2026-06-26 22:51:15 +03:00
Murat ÖZDEMİR	b73ae4e5fb	revert(health-agent): revert ping monitors back to PING type	2026-06-26 21:55:42 +03:00
Murat ÖZDEMİR	94e6b57c52	fix(health-agent): check all 3 patroni node configs on storagebox; switch ping monitors to TCP port 22 (ICMP blocked from Docker)	2026-06-26 21:54:49 +03:00
Murat ÖZDEMİR	fa7ed41063	fix(health-agent): reload uk_tokens.yml on every push call instead of caching at startup	2026-06-26 21:35:44 +03:00
Murat ÖZDEMİR	0551b01c64	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 16s Details	2026-06-26 21:27:14 +03:00
Murat ÖZDEMİR	2827b227d5	fix(health-agent): fix notification param name and type — notificationIDList expects a list of IDs not a dict	2026-06-26 21:23:45 +03:00
Murat ÖZDEMİR	a5fc058978	health-agent redeploy with new image Some checks failed Deploy Environment Monitoring to Production Environment / deploy (push) Failing after 15s Details	2026-06-26 21:15:06 +03:00
Murat ÖZDEMİR	e4acd0e57b	fix(health-agent): skip uk_tokens.yml write when tokens dict is empty to prevent setup skip loop	2026-06-26 21:10:10 +03:00
Murat ÖZDEMİR	8b10653ff4	fix(health-agent): fix ping maxretries param and status page group lookup Fix ping monitor creation error ('max_retries' is not a valid uptime-kuma-api param; correct name is 'maxretries'). Fix status pages never linking groups: re-fetching get_monitors() after add_monitor() races with WebSocket delivery so newly created groups are missing; use group_map populated in Section 1 directly instead.	2026-06-26 21:07:11 +03:00
Murat ÖZDEMİR	95dd439a34	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 36s Details	2026-06-26 20:53:59 +03:00
Murat ÖZDEMİR	3c2e872bf4	refactor(health-agent): rename monitor keys to Title Case With Space Update all hardcoded push monitor names in check files to match the new Title Case With Space format in monitors.yml. The uk_tokens.yml keys are derived from monitor names so the push() calls must match exactly.	2026-06-26 20:52:35 +03:00
Murat ÖZDEMİR	bc8b3d0934	refactor: convert all monitor names to Title Case and update health-agent digest	2026-06-26 20:47:31 +03:00
Murat ÖZDEMİR	d51c073556	fix(health-agent): fix uk_tokens.yml load race and LogRecord msg conflict - config.py: Replace exists()+open() with try/except open() to avoid TOCTOU race on SSHFS mounts where stat can succeed but open can fail with FileNotFoundError. - uptime_kuma.py: Rename msg key to push_msg in logger extra dicts. Python LogRecord reserves the msg field; passing it in extra raises ValueError which was being silently swallowed by the except block, masking successful pushes as errors.	2026-06-26 20:37:42 +03:00
Murat ÖZDEMİR	8d5fe55b14	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 31s Details	2026-06-26 20:06:48 +03:00
Murat ÖZDEMİR	0ef4f0b6f8	refactor: rename iklimco-monitoring stack to monitoring	2026-06-26 19:24:01 +03:00
Murat ÖZDEMİR	58d5c24f41	feat(health-agent): add CI/CD pipeline, Uptime Kuma setup, and runtime configuration Some checks failed Deploy Environment Monitoring to Production Environment / deploy (push) Failing after 10s Details Deploy workflows: - Integrate health-agent build (test) and image promotion (prod) into monitoring stack workflows - Add storagebox download of health-agent runtime (.env.monitoring.health-agent-runtime → health-agent/.env) and setup (.env.monitoring.health-agent-setup → health-agent/.env.setup) env files - Add "Run Uptime Kuma Setup" step: runs setup_uptime_kuma.py inside the built image only when uk_tokens.yml is missing, writes tokens to HEALTH_AGENT_CONFIG_GENERATED_DIR (/mnt/storagebox/monitoring/uk_generated) - Add health-agent/** and health-agent/deploy/prod.env path triggers to test and prod workflows respectively - Add HARBOR_CI_TOKEN login and HARBOR_PULL_TOKEN login before stack deploy in both workflows - Source health-agent/.env before docker stack deploy to expose HEALTH_AGENT_CONFIG_GENERATED_DIR Dockerfile: - Copy config/ and scripts/ into image so setup_uptime_kuma.py can run inside the container setup_uptime_kuma.py: - Load .env and .env.setup automatically via python-dotenv (no manual export needed) - Write uk_tokens.yml to config/generated/ (aligned with container volume mount) Health checks: - PATRONI_HOSTS and VAULT_HOSTS are now configurable via env vars (comma-separated host:port); no code change needed when node count changes - REDIS_SENTINEL_HOSTS now correctly parses host:port format; default updated to redis-sentinel:26379 - Fix NameError in check_patroni_cluster() caused by leftover node variable after loop refactor - Remove verify_ssl=False from Vault check; vault.iklim.co has a valid certificate Ops: - Add ops/build-and-push-health-agent.sh for manual bypass of CI pipeline - Add health-agent/deploy/prod.env template for prod image promotion manifest Project structure: - Move .env.example and .env.setup.example to health-agent/env-example/ (root .gitignore excludes health-agent/.env*) - Add root .gitignore: excludes uk_tokens.yml, __pycache__, .venv, and env files - Remove health-agent/.gitignore (superseded by root .gitignore)	2026-06-26 18:45:17 +03:00
Murat ÖZDEMİR	062d3ff90d	docs(health-agent): document --once and --dry-run flags in README	2026-06-26 16:47:53 +03:00
Murat ÖZDEMİR	7ab186b961	feat(health-agent): add --once and --dry-run flags to main.py	2026-06-26 16:43:21 +03:00
Murat ÖZDEMİR	28d726d2d8	fix(health-agent): correct Uptime Kuma URLs in example env files	2026-06-25 20:55:58 +03:00
Murat ÖZDEMİR	07a364b2bc	fix(health-agent): correct UK_URL placeholder in .env.setup.example	2026-06-25 20:54:28 +03:00
Murat ÖZDEMİR	208f4768b9	chore(health-agent): switch to uptime-kuma-api-v2, fix .env.setup.example credentials	2026-06-25 20:50:23 +03:00
Murat ÖZDEMİR	72a91072fb	feat(health-agent): add README, workflows, and translate monitors.yml to English - Add health-agent README with architecture, config, and deployment docs - Add deploy-monitoring-test.yml workflow (mirrors prod, test-runner, test storagebox paths) - Add health-agent service to docker-stack-monitoring.yml - Add .env.example with all runtime variables and .gitignore for generated files - Add config/generated/.gitkeep to track empty generated directory - Translate all Turkish group names and status page titles in monitors.yml to English - Remove users.yml.example (Dozzle was removed in previous commit)	2026-06-25 19:20:25 +03:00
Murat ÖZDEMİR	f742bfdd11	feat(health-agent): add monitors.yml with env-aware node IP mapping from Ansible inventory	2026-06-25 18:59:14 +03:00

25 Commits