Environment_Infrastructure/roadmap/prod-env/01-swarm-init-multinode.md
Murat ÖZDEMİR bf8f011e43 Restructure setup documentation and refine environment bootstrapping
This commit introduces a reordered and renumbered set of setup documentation files to better reflect the deployment stages for both test and production environments.

Key changes include:
*   A new `setup-vs-roadmap-map.md` file to provide a clear mapping between roadmap tasks and their corresponding setup phases.
*   Significantly expanded Ansible bootstrap documentation for both test and production, detailing Docker, Swarm, security hardening, and StorageBox SSH key management roles.
*   Formalized database Docker and Swarm cluster setup instructions for test and production, including explicit steps for Swarm worker integration of DB nodes.
*   Updated roadmap documentation (`roadmap/prod-env/*`) to align with the refined setup, incorporating correct private IP addresses for Swarm joins, new node labels, and floating IP usage for GoDaddy DNS records.
2026-05-11 17:47:30 +03:00

4.0 KiB
Raw Blame History

01 — Docker Swarm Init (Prod — Multi-Node)

Context

  • Repo: iklim.co root
  • Environment: prod
  • Topology:
    • 3 × app nodes (iklim-app-01/02/03) — all act as Swarm managers AND app workers (Raft quorum: 1 can fail)
    • 3 × DB nodes (iklim-db-01/02/03) — join Swarm as workers with role=db label; DB services are placed exclusively on them
  • Sizing: app nodes are cpx42, DB nodes are cpx32; see ../../hetzner-sizing-report.md
  • All 6 nodes are in the same private network.
  • Pipeline trigger: push to prod-env branch → Gitea runner on prod-runner (first app node).
  • App Swarm managers: 3 nodes all manager-eligible and carry app workloads (no dedicated worker-only app nodes).

Node labeling plan

Node Role Swarm role Labels
iklim-app-01 API services, SWAG, Vault Manager + Worker type=service
iklim-app-02 API services replicas Manager + Worker type=service
iklim-app-03 API services replicas Manager + Worker type=service
iklim-db-01 PostgreSQL (Patroni), etcd Worker role=db
iklim-db-02 PostgreSQL (Patroni), etcd Worker role=db
iklim-db-03 MongoDB replica + PostgreSQL (Patroni), etcd Worker role=db

Step 1 — Init Swarm on iklim-app-01 (the prod-runner node)

MANAGER_IP=$(hostname -I | awk '{print $1}')
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
  docker swarm init --advertise-addr "$MANAGER_IP"
  echo "Swarm initialized on $MANAGER_IP"
else
  echo "Swarm already active"
fi

Step 2 — Get manager join token

docker swarm join-token manager  # for iklim-app-02, iklim-app-03

Save this token — needed on iklim-app-02 and iklim-app-03.

Step 3 — Join iklim-app-02 and iklim-app-03 as managers

SSH into iklim-app-02 and iklim-app-03, run:

docker swarm join --token <MANAGER_TOKEN> 10.20.10.11:2377

Step 4 — Label app nodes

On iklim-app-01, after iklim-app-02 and iklim-app-03 have joined:

for node in iklim-app-01 iklim-app-02 iklim-app-03; do
  docker node update --label-add type=service "$node"
done

Step 5 — Join DB nodes as Swarm workers

Get the worker join token on iklim-app-01:

docker swarm join-token worker

SSH into each DB node and join:

docker swarm join --token <WORKER_TOKEN> 10.20.10.11:2377

Then label them on iklim-app-01:

for node in iklim-db-01 iklim-db-02 iklim-db-03; do
  docker node update --label-add role=db "$node"
done

DB nodes are Swarm workers only — they never become managers. DB services are pinned to them via node.labels.role == db placement constraint. See 08-prod-db-cluster-kurulum.md for DB stack deployment.

Step 6 — Verify

docker node ls

Expected: 6 nodes — 3 with MANAGER STATUS = Leader or Reachable, 3 workers with Ready.

docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'

Expected: map[type:service] for app nodes, map[role:db] for DB nodes.

Step 7 — Confirm init/swarm-init.sh multi-node awareness

The script is idempotent (skips init if already active). Verify:

grep -n "swarm init\|swarm join" init/swarm-init.sh

The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (swarm role), not via the Gitea pipeline.

Placement constraints used in docker-stack-infra.yml

Constraint Resolves to
node.role == manager iklim-app-01, iklim-app-02, iklim-app-03
node.labels.type == service iklim-app-01, iklim-app-02, iklim-app-03
node.labels.role == db iklim-db-01, iklim-db-02, iklim-db-03

SWAG, Vault, cert-reloader: pinned to node.role == manager. Microservices: no constraint (distributed across all app nodes by Swarm scheduler). DB services (Patroni, etcd, MongoDB): pinned to node.labels.role == db in separate DB stacks.