Corrects six documentation files to match the actual deployed pipeline behavior and align test/prod approaches where they share the same code. prod-env/02-godaddy-credentials.md - Step 1: correct secret file from .env.secrets.shared to .env.secrets.swag; add clarifying note that .env.secrets.shared holds AppRole/DB secrets and must not be used for GoDaddy credentials. - Step 4: document that GoDaddy A records are now managed automatically by the pipeline's 'Update DNS Records' step via the GoDaddy API; reference the Gitea variable PROD_FLOATING_IP that must be set once. prod-env/08-deploy-pipeline-update.md - Add Step 2 documenting the new 'Update DNS Records' pipeline step (GoDaddy API, idempotent check-before-update, requires jq and vars.PROD_FLOATING_IP). - Renumber subsequent steps 3-8 to accommodate the new step. - Fix DB hostnames in Step 7 (Run Database Init Scripts) from iklimco_postgresql/iklimco_mongodb to postgresql/mongodb, matching how Swarm overlay DNS resolves service names inside iklimco-net. - Update context block: correct DB hostname description, replace outdated storagebox path note with env-var approach, list new steps. - Update final step order to 24 steps including the DNS step and Release Deploy Lock; mark Wait for etcd as NEW. prod-env/09-verify.md - Insert check #2 for the precipitation image directory (/mnt/storagebox/precipitation/images) and iklimco_image-data volume bind mount, mirroring the equivalent check in test-env/08-verify.md. - Renumber all subsequent checks (3-12) to maintain sequential ordering. test-env/03-infra-stack-changes.md - Update SWAG service volume snippet: replace hardcoded paths (swag-vl:/config, /opt/iklimco/swag/dns-conf, /opt/iklimco/swag/site-confs) with env-var forms (${SWAG_CONFIG_DIR:-swag-vl}, ${SWAG_DNS_CONF_DIR:-...}, ${SWAG_SITE_CONFS_DIR:-...}) to match docker-stack-infra.yml. - Update cert-reloader volume snippet: replace swag-vl and /opt/iklimco/ssl with ${SWAG_CONFIG_DIR:-swag-vl} and ${SWAG_CERT_DIR:-/opt/iklimco/ssl}, enabling StorageBox override in prod without changing the base file. test-env/04-swag-nginx-configs.md - Replace RESTRICTED_IP_1/RESTRICTED_IP_2 individual env vars with RESTRICTED_IPS (comma-separated CIDR list) in the required-vars section, matching env-test/.env and the actual pipeline. - Update all three IP-restricted template examples (apigw, rabbitmq, grafana) from allow ${RESTRICTED_IP_1}; allow ${RESTRICTED_IP_2}; to ${RESTRICTED_IPS_BLOCK}, matching the actual .conf.tpl files in the repo. - Rewrite the deploy step section to match the real pipeline: docker run alpine for file writing, RESTRICTED_IPS_BLOCK generation via sed, and envsubst with explicit SWAG_VARS filter to protect nginx $upstream_* vars. test-env/07-deploy-pipeline-update.md - Step 2 (Prepare SWAG Directories): replace sudo-tee approach with the actual docker-run-alpine method used in deploy-test.yml; add nginx reload block; update notes to reflect RESTRICTED_IPS_BLOCK generation. - Step 4 (Re-order): correct step numbering to match actual pipeline (21 steps); mark 'Wait for etcd' as already present in pipeline rather than a new addition; add Bootstrap Vault TLS Placeholder which was missing from the documented order.
148 lines
3.8 KiB
Markdown
148 lines
3.8 KiB
Markdown
# 09 — Verification Checklist (Prod)
|
|
|
|
## Context
|
|
Run after a successful prod pipeline deployment.
|
|
|
|
## 1 — Swarm cluster health
|
|
|
|
```bash
|
|
docker node ls
|
|
```
|
|
Expected: 3 managers (`Leader` + 2 `Reachable`) for `iklim-app-01/02/03`, 3 workers (`Ready`) for `iklim-db-01/02/03`.
|
|
|
|
```bash
|
|
docker service ls --filter label=project=co.iklim
|
|
```
|
|
All services show `REPLICAS X/X` (target met).
|
|
|
|
## 2 — Precipitation image directory exists
|
|
|
|
```bash
|
|
ls -ld /mnt/storagebox/precipitation/images
|
|
```
|
|
|
|
Expected: directory exists. This must be created before `iklimco_precipitation-service` is deployed.
|
|
|
|
```bash
|
|
docker volume inspect iklimco_image-data
|
|
```
|
|
|
|
Expected: `Options.device` is `/mnt/storagebox/precipitation/images`.
|
|
|
|
## 3 — SWAG cert is valid
|
|
|
|
```bash
|
|
docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates
|
|
```
|
|
Expected: `*.iklim.co`, `VALID: XX days` (Let's Encrypt, not the old manual cert).
|
|
|
|
TLS check from outside:
|
|
```bash
|
|
echo | openssl s_client -connect api.iklim.co:443 -servername api.iklim.co 2>/dev/null \
|
|
| openssl x509 -noout -subject -dates
|
|
```
|
|
Expected: `CN=*.iklim.co`, `notAfter` > 2026-07-15 (cert is Let's Encrypt, not expiring old one).
|
|
|
|
## 4 — Public API
|
|
|
|
```bash
|
|
curl -si https://api.iklim.co/health
|
|
```
|
|
HTTP 2xx, no TLS errors.
|
|
|
|
## 5 — IP restriction working
|
|
|
|
From a non-whitelisted IP:
|
|
```bash
|
|
curl -si https://grafana.iklim.co
|
|
curl -si https://apigw.iklim.co
|
|
curl -si https://rabbitmq.iklim.co
|
|
```
|
|
All expected: HTTP 403.
|
|
|
|
From whitelisted IP (78.187.87.109 or 95.70.151.248):
|
|
```bash
|
|
curl -si https://grafana.iklim.co # HTTP 200 Grafana
|
|
curl -si https://apigw.iklim.co # HTTP 200 APISIX Dashboard
|
|
curl -si https://rabbitmq.iklim.co # HTTP 200 RabbitMQ Management
|
|
```
|
|
|
|
## 6 — Vault not reachable externally
|
|
|
|
```bash
|
|
# From outside — must fail
|
|
curl -sk --connect-timeout 5 https://<iklim-app-01-public-ip>:8200/v1/sys/health
|
|
# Expected: connection refused or timeout
|
|
```
|
|
|
|
```bash
|
|
# From inside overlay — must succeed
|
|
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
|
|
curl -sk https://vault.iklim.co:8200/v1/sys/health
|
|
# Expected: {"sealed":false,...}
|
|
```
|
|
|
|
## 7 — cert-reloader watching
|
|
|
|
```bash
|
|
docker service logs iklimco_cert-reloader --tail 5
|
|
```
|
|
Expected: `[cert-reloader] started`, no errors.
|
|
|
|
## 8 — No unexpected published ports
|
|
|
|
```bash
|
|
docker service ls --format "{{.Name}}\t{{.Ports}}" \
|
|
--filter label=project=co.iklim
|
|
```
|
|
Only `iklimco_swag` should show `*:80->80/tcp, *:443->443/tcp`.
|
|
|
|
## 9 — DB nodes running correct services
|
|
|
|
```bash
|
|
# Patroni (PostgreSQL HA) stack
|
|
docker stack services iklim-patroni
|
|
docker service ps iklim-patroni_patroni-01
|
|
docker service ps iklim-patroni_patroni-02
|
|
docker service ps iklim-patroni_patroni-03
|
|
|
|
# etcd cluster (for Patroni)
|
|
docker stack services iklim-etcd
|
|
|
|
# MongoDB replica set
|
|
docker stack services iklim-db
|
|
docker service ps iklim-db_mongodb-01
|
|
docker service ps iklim-db_mongodb-02
|
|
docker service ps iklim-db_mongodb-03
|
|
```
|
|
|
|
All tasks should show node names matching `iklim-db-01`, `iklim-db-02`, or `iklim-db-03` with placement constraint `role=db`.
|
|
|
|
## 10 — APISIX replicas
|
|
|
|
```bash
|
|
docker service ps iklimco_apisix
|
|
```
|
|
Expected: 3 tasks, all `Running`, on different nodes.
|
|
|
|
## 11 — fail2ban active
|
|
|
|
```bash
|
|
docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status
|
|
```
|
|
Expected: multiple jails listed.
|
|
|
|
## 12 — Microservice health (post-deploy)
|
|
|
|
After microservices are deployed (separate pipeline), verify via the public API:
|
|
```bash
|
|
curl -si https://api.iklim.co/v1/weather/current?lat=39&lon=35
|
|
```
|
|
Expected: valid JSON weather response.
|
|
|
|
## ⚠️ Old cert expiry reminder
|
|
The manually managed `*.iklim.co` cert expires **2026-07-15**.
|
|
SWAG's Let's Encrypt cert auto-renews every ~60 days.
|
|
After first SWAG cert is confirmed valid, the manual cert in storagebox can be archived
|
|
and is no longer used.
|