- Synchronized swarm initialization, pipeline update, and certificate reloader instructions with the new monolithic stack logic and Ansible roles.
7.3 KiB
01 — Docker Swarm Init (Prod — Multi-Node)
Context
- Repo:
iklim.coroot - Environment: prod
- Topology:
- 3 × app nodes (
iklim-app-01/02/03) — all act as Swarm managers AND app workers withtype=servicelabel (Raft quorum: 1 can fail) - 3 × DB nodes (
iklim-db-01/02/03) — join Swarm as workers withrole=dblabel; DB services are placed exclusively on them
- 3 × app nodes (
- Sizing: app nodes are
cpx42, DB nodes arecpx32; see../../hetzner-sizing-report.md - All 6 nodes are in the same private network.
- Pipeline trigger: push to
prod-envbranch → Gitea runner onprod-runner(first app node). - App Swarm managers: 3 nodes all manager-eligible and carry app workloads with
type=servicelabel (no dedicated worker-only app nodes).
Node labeling plan
| Node | Role | Swarm role | Labels |
|---|---|---|---|
iklim-app-01 |
API services, SWAG, Vault | Manager + Worker | type=service |
iklim-app-02 |
API services replicas | Manager + Worker | type=service |
iklim-app-03 |
API services replicas | Manager + Worker | type=service |
iklim-db-01 |
MongoDB replica + PostgreSQL (Patroni), etcd | Worker | role=db, db-index=01 |
iklim-db-02 |
MongoDB replica + PostgreSQL (Patroni), etcd | Worker | role=db, db-index=02 |
iklim-db-03 |
MongoDB replica + PostgreSQL (Patroni), etcd | Worker | role=db, db-index=03 |
Label scheme rationale
App nodes carry type=service, DB nodes carry role=db. The two different label keys are not an inconsistency — they operate on different semantic planes:
type=service— "this node carries service workload"; determines which node group microservices and infrastructure services (APISIX, Vault, RabbitMQ, Redis, SWAG, etc.) are scheduled on.role=db— "this node is a database node"; pins PostgreSQL (Patroni) and MongoDB exclusively to DB nodes.
Docker Swarm's built-in node.role property (manager / worker) does not conflict with the custom node.labels.role label — the placement constraint syntax distinguishes them explicitly:
node.role == manager ← Swarm built-in (manager/worker distinction)
node.labels.type == service ← custom label (app node workload target)
node.labels.role == db ← custom label (DB node workload target)
This scheme is applied consistently across the current prod stack (docker-stack-infra_db-prod.yml), the separate Vault stack (docker-stack-vault.yml), and microservice stack definitions. The test environment uses the same type=service label on its service node, so both environments share the same constraint syntax.
node.role == worker is intentionally not used anywhere. DB nodes are Swarm workers, but targeting them via node.role == worker would also match any future worker-only app nodes. The explicit node.labels.role == db label provides precise, unambiguous targeting regardless of Swarm role.
Otomasyon Notu
ÖNEMLİ: Aşağıda listelenen tüm Swarm ilklendirme, join token işlemleri ve node etiketleme süreçleri artık manuel yapılmamaktadır. Bu işlemler Environment_Infrastructure/ansible/prod/prod-bootstrap.yml ve ortak swarm rolü tarafından otomatik olarak yürütülmektedir. Buradaki manuel bash komutları yalnızca referans, bilgi ve sorun giderme amaçlı tutulmaktadır.
Labeling iki aşamalıdır:
- Ortak
swarmrolü app node'laratype=service, DB node'lararole=dbetiketini ekler. - Prod playbook'u
iklim-app-01üzerinden DB node'laradb-index=01/02/03etiketini ekler.
Step 1 — Init Swarm on iklim-app-01 (the prod-runner node)
MANAGER_IP=$(hostname -I | awk '{print $1}')
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
docker swarm init --advertise-addr "$MANAGER_IP"
echo "Swarm initialized on $MANAGER_IP"
else
echo "Swarm already active"
fi
Step 2 — Get manager join token
docker swarm join-token manager # for iklim-app-02, iklim-app-03
Save this token — needed on iklim-app-02 and iklim-app-03.
Step 3 — Join iklim-app-02 and iklim-app-03 as managers
SSH into iklim-app-02 and iklim-app-03, run:
docker swarm join --token <MANAGER_TOKEN> 10.20.10.11:2377
Step 4 — Label app nodes
On iklim-app-01, after iklim-app-02 and iklim-app-03 have joined:
for node in iklim-app-01 iklim-app-02 iklim-app-03; do
docker node update --label-add type=service "$node"
done
Step 5 — Join DB nodes as Swarm workers
Get the worker join token on iklim-app-01:
docker swarm join-token worker
SSH into each DB node and join:
docker swarm join --token <WORKER_TOKEN> 10.20.10.11:2377
Then label them on iklim-app-01:
docker node update --label-add role=db iklim-db-01
docker node update --label-add role=db iklim-db-02
docker node update --label-add role=db iklim-db-03
docker node update --label-add db-index=01 iklim-db-01
docker node update --label-add db-index=02 iklim-db-02
docker node update --label-add db-index=03 iklim-db-03
DB nodes are Swarm workers only — they never become managers. DB services are pinned to them via
node.labels.role == dbplacement constraint. See08-prod-db-cluster-setup.mdfor DB stack deployment.
Step 6 — Verify
docker node ls
Expected: 6 nodes — 3 with MANAGER STATUS = Leader or Reachable, 3 workers with Ready.
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
Expected: map[type:service] for app nodes, map[db-index:01 role:db] (vb.) for DB nodes.
Step 7 — Confirm init/swarm-init.sh multi-node awareness
The script is idempotent (skips init if already active). Verify:
grep -n "swarm init\|swarm join" init/swarm-init.sh
The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (swarm role), not via the Gitea pipeline.
Placement Constraints Used in Current Prod Stacks
| Constraint | Resolves to | Services |
|---|---|---|
node.hostname == iklim-app-01 |
iklim-app-01 only | SWAG, cert-reloader |
node.labels.type == service |
iklim-app-01, iklim-app-02, iklim-app-03 | Vault, Redis, RabbitMQ, APISIX, Prometheus, Grafana, SWAG support services |
node.hostname == iklim-db-01/02/03 |
specific DB node | Patroni, MongoDB, and etcd services pinned per node in docker-stack-infra_db-prod.yml |
node.labels.role == db |
iklim-db-01, iklim-db-02, iklim-db-03 | Generic DB node identity; retained for operations and compatibility |
SWAG and cert-reloader are pinned to iklim-app-01 (the Floating IP node) because SWAG must match the public entry point. Vault is deployed by docker-stack-vault.yml across service nodes and reads certificates from /opt/iklimco/ssl. Microservices are distributed by the Swarm scheduler across app nodes. DB services are defined in docker-stack-infra_db-prod.yml and pinned to DB nodes by hostname constraints.
Historical / Superseded by Setup
Older notes that referred to docker-stack-infra.yml, docker-stack-infra.prod.yml, or docker-stack-db.prod.yml as the active prod deployment model are superseded by ../../setup/08-prod-db-cluster-setup.md and ../../setup/09-prod-runner-ha-and-swarm.md.