This commit introduces a reordered and renumbered set of setup documentation files to better reflect the deployment stages for both test and production environments. Key changes include: * A new `setup-vs-roadmap-map.md` file to provide a clear mapping between roadmap tasks and their corresponding setup phases. * Significantly expanded Ansible bootstrap documentation for both test and production, detailing Docker, Swarm, security hardening, and StorageBox SSH key management roles. * Formalized database Docker and Swarm cluster setup instructions for test and production, including explicit steps for Swarm worker integration of DB nodes. * Updated roadmap documentation (`roadmap/prod-env/*`) to align with the refined setup, incorporating correct private IP addresses for Swarm joins, new node labels, and floating IP usage for GoDaddy DNS records.
134 lines
3.5 KiB
Markdown
134 lines
3.5 KiB
Markdown
# 09 — Verification Checklist (Prod)
|
|
|
|
## Context
|
|
Run after a successful prod pipeline deployment.
|
|
|
|
## 1 — Swarm cluster health
|
|
|
|
```bash
|
|
docker node ls
|
|
```
|
|
Expected: 3 managers (`Leader` + 2 `Reachable`) for `iklim-app-01/02/03`, 3 workers (`Ready`) for `iklim-db-01/02/03`.
|
|
|
|
```bash
|
|
docker service ls --filter label=project=co.iklim
|
|
```
|
|
All services show `REPLICAS X/X` (target met).
|
|
|
|
## 2 — SWAG cert is valid
|
|
|
|
```bash
|
|
docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates
|
|
```
|
|
Expected: `*.iklim.co`, `VALID: XX days` (Let's Encrypt, not the old manual cert).
|
|
|
|
TLS check from outside:
|
|
```bash
|
|
echo | openssl s_client -connect api.iklim.co:443 -servername api.iklim.co 2>/dev/null \
|
|
| openssl x509 -noout -subject -dates
|
|
```
|
|
Expected: `CN=*.iklim.co`, `notAfter` > 2026-07-15 (cert is Let's Encrypt, not expiring old one).
|
|
|
|
## 3 — Public API
|
|
|
|
```bash
|
|
curl -si https://api.iklim.co/health
|
|
```
|
|
HTTP 2xx, no TLS errors.
|
|
|
|
## 4 — IP restriction working
|
|
|
|
From a non-whitelisted IP:
|
|
```bash
|
|
curl -si https://grafana.iklim.co
|
|
curl -si https://apigw.iklim.co
|
|
curl -si https://rabbitmq.iklim.co
|
|
```
|
|
All expected: HTTP 403.
|
|
|
|
From whitelisted IP (78.187.87.109 or 95.70.151.248):
|
|
```bash
|
|
curl -si https://grafana.iklim.co # HTTP 200 Grafana
|
|
curl -si https://apigw.iklim.co # HTTP 200 APISIX Dashboard
|
|
curl -si https://rabbitmq.iklim.co # HTTP 200 RabbitMQ Management
|
|
```
|
|
|
|
## 5 — Vault not reachable externally
|
|
|
|
```bash
|
|
# From outside — must fail
|
|
curl -sk --connect-timeout 5 https://<iklim-app-01-public-ip>:8200/v1/sys/health
|
|
# Expected: connection refused or timeout
|
|
```
|
|
|
|
```bash
|
|
# From inside overlay — must succeed
|
|
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
|
|
curl -sk https://vault.iklim.co:8200/v1/sys/health
|
|
# Expected: {"sealed":false,...}
|
|
```
|
|
|
|
## 6 — cert-reloader watching
|
|
|
|
```bash
|
|
docker service logs iklimco_cert-reloader --tail 5
|
|
```
|
|
Expected: `[cert-reloader] started`, no errors.
|
|
|
|
## 7 — No unexpected published ports
|
|
|
|
```bash
|
|
docker service ls --format "{{.Name}}\t{{.Ports}}" \
|
|
--filter label=project=co.iklim
|
|
```
|
|
Only `iklimco_swag` should show `*:80->80/tcp, *:443->443/tcp`.
|
|
|
|
## 8 — DB nodes running correct services
|
|
|
|
```bash
|
|
# Patroni (PostgreSQL HA) stack
|
|
docker stack services iklim-patroni
|
|
docker service ps iklim-patroni_patroni-01
|
|
docker service ps iklim-patroni_patroni-02
|
|
docker service ps iklim-patroni_patroni-03
|
|
|
|
# etcd cluster (for Patroni)
|
|
docker stack services iklim-etcd
|
|
|
|
# MongoDB replica set
|
|
docker stack services iklim-db
|
|
docker service ps iklim-db_mongodb-01
|
|
docker service ps iklim-db_mongodb-02
|
|
docker service ps iklim-db_mongodb-03
|
|
```
|
|
|
|
All tasks should show node names matching `iklim-db-01`, `iklim-db-02`, or `iklim-db-03` with placement constraint `role=db`.
|
|
|
|
## 9 — APISIX replicas
|
|
|
|
```bash
|
|
docker service ps iklimco_apisix
|
|
```
|
|
Expected: 2 tasks, both `Running`, on different nodes.
|
|
|
|
## 10 — fail2ban active
|
|
|
|
```bash
|
|
docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status
|
|
```
|
|
Expected: multiple jails listed.
|
|
|
|
## 11 — Microservice health (post-deploy)
|
|
|
|
After microservices are deployed (separate pipeline), verify via the public API:
|
|
```bash
|
|
curl -si https://api.iklim.co/v1/weather/current?lat=39&lon=35
|
|
```
|
|
Expected: valid JSON weather response.
|
|
|
|
## ⚠️ Old cert expiry reminder
|
|
The manually managed `*.iklim.co` cert expires **2026-07-15**.
|
|
SWAG's Let's Encrypt cert auto-renews every ~60 days.
|
|
After first SWAG cert is confirmed valid, the manual cert in storagebox can be archived
|
|
and is no longer used.
|