- Database nodes now join the Docker Swarm as workers with `role=db` labels, allowing Swarm to manage their dedicated services. - The `docker-stack-infra.yml` has been updated for production to focus solely on application-level infrastructure components. - Dedicated database services (PostgreSQL, MongoDB, Patroni-etcd) are now explicitly deployed in separate Swarm stacks on `iklim-db-XX` nodes. - Standardizes node naming conventions (`iklim-app-XX`, `iklim-db-XX`) across the production roadmap documentation. - Clarifies that the `etcd` service within `docker-stack-infra.yml` is exclusively for APISIX configuration, distinct from Patroni's etcd cluster.
3.4 KiB
09 — Verification Checklist (Prod)
Context
Run after a successful prod pipeline deployment.
1 — Swarm cluster health
docker node ls
Expected: 3 managers (Leader + 2 Reachable) for iklim-app-01/02/03, 3 workers (Ready) for iklim-db-01/02/03.
docker service ls --filter label=project=co.iklim
All services show REPLICAS X/X (target met).
2 — SWAG cert is valid
docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates
Expected: *.iklim.co, VALID: XX days (Let's Encrypt, not the old manual cert).
TLS check from outside:
echo | openssl s_client -connect api.iklim.co:443 -servername api.iklim.co 2>/dev/null \
| openssl x509 -noout -subject -dates
Expected: CN=*.iklim.co, notAfter > 2026-07-15 (cert is Let's Encrypt, not expiring old one).
3 — Public API
curl -si https://api.iklim.co/health
HTTP 2xx, no TLS errors.
4 — IP restriction working
From a non-whitelisted IP:
curl -si https://grafana.iklim.co
curl -si https://apigw.iklim.co
curl -si https://rabbitmq.iklim.co
All expected: HTTP 403.
From whitelisted IP (78.187.87.109 or 95.70.151.248):
curl -si https://grafana.iklim.co # HTTP 200 Grafana
curl -si https://apigw.iklim.co # HTTP 200 APISIX Dashboard
curl -si https://rabbitmq.iklim.co # HTTP 200 RabbitMQ Management
5 — Vault not reachable externally
# From outside — must fail
curl -sk --connect-timeout 5 https://<iklim-app-01-public-ip>:8200/v1/sys/health
# Expected: connection refused or timeout
# From inside overlay — must succeed
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
curl -sk https://vault.iklim.co:8200/v1/sys/health
# Expected: {"sealed":false,...}
6 — cert-reloader watching
docker service logs iklimco_cert-reloader --tail 5
Expected: [cert-reloader] started, no errors.
7 — No unexpected published ports
docker service ls --format "{{.Name}}\t{{.Ports}}" \
--filter label=project=co.iklim
Only iklimco_swag should show *:80->80/tcp, *:443->443/tcp.
8 — DB nodes running correct services
# Patroni (PostgreSQL HA) stack
docker stack services iklim-patroni
docker service ps iklim-patroni_patroni-01
docker service ps iklim-patroni_patroni-02
docker service ps iklim-patroni_patroni-03
# etcd cluster (for Patroni)
docker stack services iklim-db-etcd
# MongoDB replica set
docker stack services iklim-db
docker service ps iklim-db_mongodb
All tasks should show node names matching iklim-db-01, iklim-db-02, or iklim-db-03 with placement constraint role=db.
9 — APISIX replicas
docker service ps iklimco_apisix
Expected: 2 tasks, both Running, on different nodes.
10 — fail2ban active
docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status
Expected: multiple jails listed.
11 — Microservice health (post-deploy)
After microservices are deployed (separate pipeline), verify via the public API:
curl -si https://api.iklim.co/v1/weather/current?lat=39&lon=35
Expected: valid JSON weather response.
⚠️ Old cert expiry reminder
The manually managed *.iklim.co cert expires 2026-07-15.
SWAG's Let's Encrypt cert auto-renews every ~60 days.
After first SWAG cert is confirmed valid, the manual cert in storagebox can be archived
and is no longer used.