- Database nodes now join the Docker Swarm as workers with `role=db` labels, allowing Swarm to manage their dedicated services. - The `docker-stack-infra.yml` has been updated for production to focus solely on application-level infrastructure components. - Dedicated database services (PostgreSQL, MongoDB, Patroni-etcd) are now explicitly deployed in separate Swarm stacks on `iklim-db-XX` nodes. - Standardizes node naming conventions (`iklim-app-XX`, `iklim-db-XX`) across the production roadmap documentation. - Clarifies that the `etcd` service within `docker-stack-infra.yml` is exclusively for APISIX configuration, distinct from Patroni's etcd cluster.
132 lines
3.4 KiB
Markdown
132 lines
3.4 KiB
Markdown
# 09 — Verification Checklist (Prod)
|
|
|
|
## Context
|
|
Run after a successful prod pipeline deployment.
|
|
|
|
## 1 — Swarm cluster health
|
|
|
|
```bash
|
|
docker node ls
|
|
```
|
|
Expected: 3 managers (`Leader` + 2 `Reachable`) for `iklim-app-01/02/03`, 3 workers (`Ready`) for `iklim-db-01/02/03`.
|
|
|
|
```bash
|
|
docker service ls --filter label=project=co.iklim
|
|
```
|
|
All services show `REPLICAS X/X` (target met).
|
|
|
|
## 2 — SWAG cert is valid
|
|
|
|
```bash
|
|
docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates
|
|
```
|
|
Expected: `*.iklim.co`, `VALID: XX days` (Let's Encrypt, not the old manual cert).
|
|
|
|
TLS check from outside:
|
|
```bash
|
|
echo | openssl s_client -connect api.iklim.co:443 -servername api.iklim.co 2>/dev/null \
|
|
| openssl x509 -noout -subject -dates
|
|
```
|
|
Expected: `CN=*.iklim.co`, `notAfter` > 2026-07-15 (cert is Let's Encrypt, not expiring old one).
|
|
|
|
## 3 — Public API
|
|
|
|
```bash
|
|
curl -si https://api.iklim.co/health
|
|
```
|
|
HTTP 2xx, no TLS errors.
|
|
|
|
## 4 — IP restriction working
|
|
|
|
From a non-whitelisted IP:
|
|
```bash
|
|
curl -si https://grafana.iklim.co
|
|
curl -si https://apigw.iklim.co
|
|
curl -si https://rabbitmq.iklim.co
|
|
```
|
|
All expected: HTTP 403.
|
|
|
|
From whitelisted IP (78.187.87.109 or 95.70.151.248):
|
|
```bash
|
|
curl -si https://grafana.iklim.co # HTTP 200 Grafana
|
|
curl -si https://apigw.iklim.co # HTTP 200 APISIX Dashboard
|
|
curl -si https://rabbitmq.iklim.co # HTTP 200 RabbitMQ Management
|
|
```
|
|
|
|
## 5 — Vault not reachable externally
|
|
|
|
```bash
|
|
# From outside — must fail
|
|
curl -sk --connect-timeout 5 https://<iklim-app-01-public-ip>:8200/v1/sys/health
|
|
# Expected: connection refused or timeout
|
|
```
|
|
|
|
```bash
|
|
# From inside overlay — must succeed
|
|
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
|
|
curl -sk https://vault.iklim.co:8200/v1/sys/health
|
|
# Expected: {"sealed":false,...}
|
|
```
|
|
|
|
## 6 — cert-reloader watching
|
|
|
|
```bash
|
|
docker service logs iklimco_cert-reloader --tail 5
|
|
```
|
|
Expected: `[cert-reloader] started`, no errors.
|
|
|
|
## 7 — No unexpected published ports
|
|
|
|
```bash
|
|
docker service ls --format "{{.Name}}\t{{.Ports}}" \
|
|
--filter label=project=co.iklim
|
|
```
|
|
Only `iklimco_swag` should show `*:80->80/tcp, *:443->443/tcp`.
|
|
|
|
## 8 — DB nodes running correct services
|
|
|
|
```bash
|
|
# Patroni (PostgreSQL HA) stack
|
|
docker stack services iklim-patroni
|
|
docker service ps iklim-patroni_patroni-01
|
|
docker service ps iklim-patroni_patroni-02
|
|
docker service ps iklim-patroni_patroni-03
|
|
|
|
# etcd cluster (for Patroni)
|
|
docker stack services iklim-db-etcd
|
|
|
|
# MongoDB replica set
|
|
docker stack services iklim-db
|
|
docker service ps iklim-db_mongodb
|
|
```
|
|
|
|
All tasks should show node names matching `iklim-db-01`, `iklim-db-02`, or `iklim-db-03` with placement constraint `role=db`.
|
|
|
|
## 9 — APISIX replicas
|
|
|
|
```bash
|
|
docker service ps iklimco_apisix
|
|
```
|
|
Expected: 2 tasks, both `Running`, on different nodes.
|
|
|
|
## 10 — fail2ban active
|
|
|
|
```bash
|
|
docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status
|
|
```
|
|
Expected: multiple jails listed.
|
|
|
|
## 11 — Microservice health (post-deploy)
|
|
|
|
After microservices are deployed (separate pipeline), verify via the public API:
|
|
```bash
|
|
curl -si https://api.iklim.co/v1/weather/current?lat=39&lon=35
|
|
```
|
|
Expected: valid JSON weather response.
|
|
|
|
## ⚠️ Old cert expiry reminder
|
|
The manually managed `*.iklim.co` cert expires **2026-07-15**.
|
|
SWAG's Let's Encrypt cert auto-renews every ~60 days.
|
|
After first SWAG cert is confirmed valid, the manual cert in storagebox can be archived
|
|
and is no longer used.
|