Environment_Monitoring

Author	SHA1	Message	Date
Murat ÖZDEMİR	30fe75d383	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 20s Details	2026-06-26 23:56:44 +03:00
Murat ÖZDEMİR	c290882492	fix(monitoring): add missing conditions array to DNS monitors Uptime Kuma 1.23+ evaluates monitor.conditions.length internally. While HTTP monitors seem to bypass this check safely if conditions is null, DNS monitors crash the NodeJS backend with 'Cannot read properties of null (reading length)' if conditions is not explicitly initialized as an empty array.	2026-06-26 23:54:20 +03:00
Murat ÖZDEMİR	6f3bf6cef1	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 18s Details	2026-06-26 23:48:58 +03:00
Murat ÖZDEMİR	a7ecfc4b2d	fix(monitoring): add missing url property to DNS monitors The Node.js backend of Uptime Kuma 2.4.0 seems to crash on DNS monitors with 'Cannot read properties of null (reading length)' if the 'url' field is not explicitly set, because the API defaults it to null instead of 'https://' like the UI does.	2026-06-26 23:46:08 +03:00
Murat ÖZDEMİR	c1cda0b38a	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 20s Details	2026-06-26 23:31:39 +03:00
Murat ÖZDEMİR	8a056a381b	fix(monitoring): prevent Vault crash and DNS null error - Vault: Wrap resp.json() in a try-except block to prevent JSONDecodeError when hitting an HTML error page (e.g. 502/503). This prevents the entire agent from crashing and missing heartbeats. - Uptime Kuma DNS: Explicitly set dns_resolve_server to 1.1.1.1 in Python API payload to prevent Uptime Kuma backend from crashing on null properties.	2026-06-26 23:23:02 +03:00
Murat ÖZDEMİR	475eb762b9	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 17s Details	2026-06-26 23:13:38 +03:00
Murat ÖZDEMİR	b49ca276f0	fix(monitoring): support existing monitor updates and vault nodes - setup_uptime_kuma: Use api.edit_monitor to update existing monitors with new configuration instead of skipping them. - setup_uptime_kuma: Add port and accepted_statuscodes to DNS monitors to prevent NodeJS null reading errors in Kuma. - http.py: Parse VAULT_HOSTS environment variable for Vault cluster nodes instead of hardcoding 'vault'.	2026-06-26 23:07:37 +03:00
Murat ÖZDEMİR	2a482ce4df	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 16s Details	2026-06-26 22:53:35 +03:00
Murat ÖZDEMİR	969c4a2301	fix(monitoring): resolve health-agent bugs and flapping monitors - Vault flapping: Fix resp evaluation on HTTP 429 - Storagebox block: Move mount check to a daemon thread - Push monitors: Increase interval to 75s and restore 60s sleep - Redis Sentinel: Fix authentication in sentinel_kwargs - Ext Https Api: Update URL to /health	2026-06-26 22:51:15 +03:00
Murat ÖZDEMİR	b73ae4e5fb	revert(health-agent): revert ping monitors back to PING type	2026-06-26 21:55:42 +03:00
Murat ÖZDEMİR	94e6b57c52	fix(health-agent): check all 3 patroni node configs on storagebox; switch ping monitors to TCP port 22 (ICMP blocked from Docker)	2026-06-26 21:54:49 +03:00
Murat ÖZDEMİR	fa7ed41063	fix(health-agent): reload uk_tokens.yml on every push call instead of caching at startup	2026-06-26 21:35:44 +03:00
Murat ÖZDEMİR	0551b01c64	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 16s Details	2026-06-26 21:27:14 +03:00
Murat ÖZDEMİR	2827b227d5	fix(health-agent): fix notification param name and type — notificationIDList expects a list of IDs not a dict	2026-06-26 21:23:45 +03:00
Murat ÖZDEMİR	a5fc058978	health-agent redeploy with new image Some checks failed Deploy Environment Monitoring to Production Environment / deploy (push) Failing after 15s Details	2026-06-26 21:15:06 +03:00
Murat ÖZDEMİR	e4acd0e57b	fix(health-agent): skip uk_tokens.yml write when tokens dict is empty to prevent setup skip loop	2026-06-26 21:10:10 +03:00
Murat ÖZDEMİR	8b10653ff4	fix(health-agent): fix ping maxretries param and status page group lookup Fix ping monitor creation error ('max_retries' is not a valid uptime-kuma-api param; correct name is 'maxretries'). Fix status pages never linking groups: re-fetching get_monitors() after add_monitor() races with WebSocket delivery so newly created groups are missing; use group_map populated in Section 1 directly instead.	2026-06-26 21:07:11 +03:00
Murat ÖZDEMİR	95dd439a34	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 36s Details	2026-06-26 20:53:59 +03:00
Murat ÖZDEMİR	3c2e872bf4	refactor(health-agent): rename monitor keys to Title Case With Space Update all hardcoded push monitor names in check files to match the new Title Case With Space format in monitors.yml. The uk_tokens.yml keys are derived from monitor names so the push() calls must match exactly.	2026-06-26 20:52:35 +03:00
Murat ÖZDEMİR	bc8b3d0934	refactor: convert all monitor names to Title Case and update health-agent digest	2026-06-26 20:47:31 +03:00
Murat ÖZDEMİR	d51c073556	fix(health-agent): fix uk_tokens.yml load race and LogRecord msg conflict - config.py: Replace exists()+open() with try/except open() to avoid TOCTOU race on SSHFS mounts where stat can succeed but open can fail with FileNotFoundError. - uptime_kuma.py: Rename msg key to push_msg in logger extra dicts. Python LogRecord reserves the msg field; passing it in extra raises ValueError which was being silently swallowed by the except block, masking successful pushes as errors.	2026-06-26 20:37:42 +03:00
Murat ÖZDEMİR	8d5fe55b14	health-agent redeploy with new image All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 31s Details	2026-06-26 20:06:48 +03:00
Murat ÖZDEMİR	9fbc74d498	fix(workflow): use -s flag to trigger Uptime Kuma setup on empty uk_tokens.yml All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 27s Details The previous ! -f check skipped setup when uk_tokens.yml existed but was empty (0 bytes). Switching to ! -s triggers setup whenever the file is missing or empty.	2026-06-26 19:39:48 +03:00
Murat ÖZDEMİR	0ef4f0b6f8	refactor: rename iklimco-monitoring stack to monitoring	2026-06-26 19:24:01 +03:00
Murat ÖZDEMİR	344ab4ac13	ci(workflow): remove redundant paths-ignore filter, gitignore already excludes those paths All checks were successful Deploy Environment Monitoring to Production Environment / deploy (push) Successful in 36s Details	2026-06-26 18:56:13 +03:00
Murat ÖZDEMİR	656968823b	ci(workflow): replace paths filter with paths-ignore to trigger on any change except .venv and __pycache__	2026-06-26 18:55:41 +03:00
Murat ÖZDEMİR	e812a3b454	Merge branch 'main' into prod-env	2026-06-26 18:51:44 +03:00
Murat ÖZDEMİR	07b8db8de2	fix(common-functions): add no-op to empty refresh_calculated_env_vars to fix bash syntax error	2026-06-26 18:51:19 +03:00
Murat ÖZDEMİR	8347b7e25d	fix(common-functions): add no-op to empty refresh_calculated_env_vars to fix bash syntax error	2026-06-26 18:50:31 +03:00
Murat ÖZDEMİR	58d5c24f41	feat(health-agent): add CI/CD pipeline, Uptime Kuma setup, and runtime configuration Some checks failed Deploy Environment Monitoring to Production Environment / deploy (push) Failing after 10s Details Deploy workflows: - Integrate health-agent build (test) and image promotion (prod) into monitoring stack workflows - Add storagebox download of health-agent runtime (.env.monitoring.health-agent-runtime → health-agent/.env) and setup (.env.monitoring.health-agent-setup → health-agent/.env.setup) env files - Add "Run Uptime Kuma Setup" step: runs setup_uptime_kuma.py inside the built image only when uk_tokens.yml is missing, writes tokens to HEALTH_AGENT_CONFIG_GENERATED_DIR (/mnt/storagebox/monitoring/uk_generated) - Add health-agent/** and health-agent/deploy/prod.env path triggers to test and prod workflows respectively - Add HARBOR_CI_TOKEN login and HARBOR_PULL_TOKEN login before stack deploy in both workflows - Source health-agent/.env before docker stack deploy to expose HEALTH_AGENT_CONFIG_GENERATED_DIR Dockerfile: - Copy config/ and scripts/ into image so setup_uptime_kuma.py can run inside the container setup_uptime_kuma.py: - Load .env and .env.setup automatically via python-dotenv (no manual export needed) - Write uk_tokens.yml to config/generated/ (aligned with container volume mount) Health checks: - PATRONI_HOSTS and VAULT_HOSTS are now configurable via env vars (comma-separated host:port); no code change needed when node count changes - REDIS_SENTINEL_HOSTS now correctly parses host:port format; default updated to redis-sentinel:26379 - Fix NameError in check_patroni_cluster() caused by leftover node variable after loop refactor - Remove verify_ssl=False from Vault check; vault.iklim.co has a valid certificate Ops: - Add ops/build-and-push-health-agent.sh for manual bypass of CI pipeline - Add health-agent/deploy/prod.env template for prod image promotion manifest Project structure: - Move .env.example and .env.setup.example to health-agent/env-example/ (root .gitignore excludes health-agent/.env*) - Add root .gitignore: excludes uk_tokens.yml, __pycache__, .venv, and env files - Remove health-agent/.gitignore (superseded by root .gitignore)	2026-06-26 18:45:17 +03:00
Murat ÖZDEMİR	062d3ff90d	docs(health-agent): document --once and --dry-run flags in README	2026-06-26 16:47:53 +03:00
Murat ÖZDEMİR	7ab186b961	feat(health-agent): add --once and --dry-run flags to main.py	2026-06-26 16:43:21 +03:00
Murat ÖZDEMİR	c49616ac10	refactor(workflow): use source_env_file and require_env_file from common-functions-base.sh	2026-06-26 14:01:46 +03:00
Murat ÖZDEMİR	6fc9ff45aa	feat(workflow): add common-functions-base.sh and replace echo with log_message	2026-06-26 13:59:18 +03:00
Murat ÖZDEMİR	28d726d2d8	fix(health-agent): correct Uptime Kuma URLs in example env files	2026-06-25 20:55:58 +03:00
Murat ÖZDEMİR	07a364b2bc	fix(health-agent): correct UK_URL placeholder in .env.setup.example	2026-06-25 20:54:28 +03:00
Murat ÖZDEMİR	208f4768b9	chore(health-agent): switch to uptime-kuma-api-v2, fix .env.setup.example credentials	2026-06-25 20:50:23 +03:00
Murat ÖZDEMİR	21965d4183	fix(workflow): remove unnecessary concurrency block from test monitoring workflow	2026-06-25 19:22:19 +03:00
Murat ÖZDEMİR	72a91072fb	feat(health-agent): add README, workflows, and translate monitors.yml to English - Add health-agent README with architecture, config, and deployment docs - Add deploy-monitoring-test.yml workflow (mirrors prod, test-runner, test storagebox paths) - Add health-agent service to docker-stack-monitoring.yml - Add .env.example with all runtime variables and .gitignore for generated files - Add config/generated/.gitkeep to track empty generated directory - Translate all Turkish group names and status page titles in monitors.yml to English - Remove users.yml.example (Dozzle was removed in previous commit)	2026-06-25 19:20:25 +03:00
Murat ÖZDEMİR	f742bfdd11	feat(health-agent): add monitors.yml with env-aware node IP mapping from Ansible inventory	2026-06-25 18:59:14 +03:00
Murat ÖZDEMİR	a2e8997711	fix(workflow): correct file paths for standalone repo context paths filter and stack/swag references used Environment_Monitoring/ prefix which only makes sense in the main repo context. Since this workflow runs inside the Environment_Monitoring repo itself, all paths are relative to the repo root.	2026-06-25 17:19:55 +03:00
Murat ÖZDEMİR	735d957dfa	feat(monitoring): replace Dozzle with full observability stack Replace the single-purpose Dozzle log viewer with a comprehensive monitoring stack covering metrics, container telemetry, and persistent log aggregation. Stack changes (docker-stack-service.yml -> docker-stack-monitoring.yml): - remove Dozzle service and dozzle_users Docker secret - add Portainer CE + portainer-agent (Swarm management UI) - add node-exporter (global) — host CPU, memory, disk, network metrics - add cAdvisor (global) — per-container resource usage metrics - add Loki (replicated, service node) — persistent log storage, 31-day retention - add Promtail (global) — Docker service discovery; ships logs with service, stack, container, and project labels; sends to Loki - rename stack to iklimco-monitoring; add loki-vl persistent volume Workflow (.gitea/workflows/deploy-prod.yml -> deploy-monitoring-prod.yml): - rename file and add paths filter (Environment_Monitoring/*) - remove Dozzle secret creation and auth handling - add IMAGE_LOKI / IMAGE_PROMTAIL; clean up legacy dozzle_users Docker secret - update SWAG step to loop swag/site-confs/.conf.tpl (portainer only) - remove DOZZLE_SUBDOMAIN; remove dozzle DNS record; keep portainer DNS - replace "Wait for Dozzle" with "Wait for Loki" SWAG: - remove swag/dozzle.conf.tpl (Dozzle no longer in stack) - add swag/site-confs/portainer.conf.tpl (moved from main repo template dir; monitoring stack manages its own SWAG configs independently) - remove init/apisix-dozzle.sh (superseded by SWAG reverse proxy) README: - rewrite in Turkish; document Portainer, node-exporter, cAdvisor, Loki, Promtail - add Grafana log viewing guide: datasource setup, label filter table, LogQL examples, metric-log correlation workflow, adding log panels to dashboards Requires IMAGE_LOKI and IMAGE_PROMTAIL to be defined in .env and corresponding custom images (build/loki/, build/promtail/) pushed to Harbor.	2026-06-24 21:21:02 +03:00
Murat ÖZDEMİR	94dc1d2fe3	add docker-stack-service.yml, init scripts, and configuration files	2026-06-18 19:19:12 +03:00
Murat ÖZDEMİR	446e761eb2	first commit	2026-06-18 19:18:31 +03:00

45 Commits