This commit introduces a reordered and renumbered set of setup documentation files to better reflect the deployment stages for both test and production environments. Key changes include: * A new `setup-vs-roadmap-map.md` file to provide a clear mapping between roadmap tasks and their corresponding setup phases. * Significantly expanded Ansible bootstrap documentation for both test and production, detailing Docker, Swarm, security hardening, and StorageBox SSH key management roles. * Formalized database Docker and Swarm cluster setup instructions for test and production, including explicit steps for Swarm worker integration of DB nodes. * Updated roadmap documentation (`roadmap/prod-env/*`) to align with the refined setup, incorporating correct private IP addresses for Swarm joins, new node labels, and floating IP usage for GoDaddy DNS records.
4.0 KiB
01 — Docker Swarm Init (Prod — Multi-Node)
Context
- Repo:
iklim.coroot - Environment: prod
- Topology:
- 3 × app nodes (
iklim-app-01/02/03) — all act as Swarm managers AND app workers (Raft quorum: 1 can fail) - 3 × DB nodes (
iklim-db-01/02/03) — join Swarm as workers withrole=dblabel; DB services are placed exclusively on them
- 3 × app nodes (
- Sizing: app nodes are
cpx42, DB nodes arecpx32; see../../hetzner-sizing-report.md - All 6 nodes are in the same private network.
- Pipeline trigger: push to
prod-envbranch → Gitea runner onprod-runner(first app node). - App Swarm managers: 3 nodes all manager-eligible and carry app workloads (no dedicated worker-only app nodes).
Node labeling plan
| Node | Role | Swarm role | Labels |
|---|---|---|---|
iklim-app-01 |
API services, SWAG, Vault | Manager + Worker | type=service |
iklim-app-02 |
API services replicas | Manager + Worker | type=service |
iklim-app-03 |
API services replicas | Manager + Worker | type=service |
iklim-db-01 |
PostgreSQL (Patroni), etcd | Worker | role=db |
iklim-db-02 |
PostgreSQL (Patroni), etcd | Worker | role=db |
iklim-db-03 |
MongoDB replica + PostgreSQL (Patroni), etcd | Worker | role=db |
Step 1 — Init Swarm on iklim-app-01 (the prod-runner node)
MANAGER_IP=$(hostname -I | awk '{print $1}')
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
docker swarm init --advertise-addr "$MANAGER_IP"
echo "Swarm initialized on $MANAGER_IP"
else
echo "Swarm already active"
fi
Step 2 — Get manager join token
docker swarm join-token manager # for iklim-app-02, iklim-app-03
Save this token — needed on iklim-app-02 and iklim-app-03.
Step 3 — Join iklim-app-02 and iklim-app-03 as managers
SSH into iklim-app-02 and iklim-app-03, run:
docker swarm join --token <MANAGER_TOKEN> 10.20.10.11:2377
Step 4 — Label app nodes
On iklim-app-01, after iklim-app-02 and iklim-app-03 have joined:
for node in iklim-app-01 iklim-app-02 iklim-app-03; do
docker node update --label-add type=service "$node"
done
Step 5 — Join DB nodes as Swarm workers
Get the worker join token on iklim-app-01:
docker swarm join-token worker
SSH into each DB node and join:
docker swarm join --token <WORKER_TOKEN> 10.20.10.11:2377
Then label them on iklim-app-01:
for node in iklim-db-01 iklim-db-02 iklim-db-03; do
docker node update --label-add role=db "$node"
done
DB nodes are Swarm workers only — they never become managers. DB services are pinned to them via
node.labels.role == dbplacement constraint. See08-prod-db-cluster-kurulum.mdfor DB stack deployment.
Step 6 — Verify
docker node ls
Expected: 6 nodes — 3 with MANAGER STATUS = Leader or Reachable, 3 workers with Ready.
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
Expected: map[type:service] for app nodes, map[role:db] for DB nodes.
Step 7 — Confirm init/swarm-init.sh multi-node awareness
The script is idempotent (skips init if already active). Verify:
grep -n "swarm init\|swarm join" init/swarm-init.sh
The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (swarm role),
not via the Gitea pipeline.
Placement constraints used in docker-stack-infra.yml
| Constraint | Resolves to |
|---|---|
node.role == manager |
iklim-app-01, iklim-app-02, iklim-app-03 |
node.labels.type == service |
iklim-app-01, iklim-app-02, iklim-app-03 |
node.labels.role == db |
iklim-db-01, iklim-db-02, iklim-db-03 |
SWAG, Vault, cert-reloader: pinned to node.role == manager.
Microservices: no constraint (distributed across all app nodes by Swarm scheduler).
DB services (Patroni, etcd, MongoDB): pinned to node.labels.role == db in separate DB stacks.