# 01 — Docker Swarm Init (Prod — Multi-Node) ## Context - **Repo:** `iklim.co` root - **Environment:** prod - **Topology:** - 3 × app nodes (`iklim-app-01/02/03`) — all act as **Swarm managers AND app workers** with `type=service` label (Raft quorum: 1 can fail) - 3 × DB nodes (`iklim-db-01/02/03`) — join Swarm as **workers** with `role=db` label; DB services are placed exclusively on them - **Sizing:** app nodes are `cpx42`, DB nodes are `cpx32`; see `../../hetzner-sizing-report.md` - All 6 nodes are in the same private network. - Pipeline trigger: push to `prod-env` branch → Gitea runner on `prod-runner` (first app node). - App Swarm managers: 3 nodes all manager-eligible and carry app workloads with `type=service` label (no dedicated worker-only app nodes). ## Node labeling plan | Node | Role | Swarm role | Labels | |------|------|------------|--------| | `iklim-app-01` | API services, SWAG, Vault | Manager + Worker | `type=service` | | `iklim-app-02` | API services replicas | Manager + Worker | `type=service` | | `iklim-app-03` | API services replicas | Manager + Worker | `type=service` | | `iklim-db-01` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db`, `db-index=01` | | `iklim-db-02` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db`, `db-index=02` | | `iklim-db-03` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db`, `db-index=03` | ### Label scheme rationale App nodes carry `type=service`, DB nodes carry `role=db`. The two different label keys are not an inconsistency — they operate on different semantic planes: - **`type=service`** — "this node carries service workload"; determines which node group microservices and infrastructure services (APISIX, Vault, RabbitMQ, Redis, SWAG, etc.) are scheduled on. - **`role=db`** — "this node is a database node"; pins PostgreSQL (Patroni) and MongoDB exclusively to DB nodes. Docker Swarm's **built-in** `node.role` property (`manager` / `worker`) does **not** conflict with the custom `node.labels.role` label — the placement constraint syntax distinguishes them explicitly: ``` node.role == manager ← Swarm built-in (manager/worker distinction) node.labels.type == service ← custom label (app node workload target) node.labels.role == db ← custom label (DB node workload target) ``` This scheme is applied consistently across the current prod stack (`docker-stack-infra_db-prod.yml`), the separate Vault stack (`docker-stack-vault.yml`), and microservice stack definitions. The test environment uses the same `type=service` label on its service node, so both environments share the same constraint syntax. `node.role == worker` is intentionally not used anywhere. DB nodes are Swarm workers, but targeting them via `node.role == worker` would also match any future worker-only app nodes. The explicit `node.labels.role == db` label provides precise, unambiguous targeting regardless of Swarm role. ## Otomasyon Notu **ÖNEMLİ:** Aşağıda listelenen tüm Swarm ilklendirme, join token işlemleri ve node etiketleme süreçleri artık manuel yapılmamaktadır. Bu işlemler `Environment_Infrastructure/ansible/prod/prod-bootstrap.yml` ve ortak `swarm` rolü tarafından otomatik olarak yürütülmektedir. Buradaki manuel bash komutları yalnızca referans, bilgi ve sorun giderme amaçlı tutulmaktadır. Labeling iki aşamalıdır: - Ortak `swarm` rolü app node'lara `type=service`, DB node'lara `role=db` etiketini ekler. - Prod playbook'u `iklim-app-01` üzerinden DB node'lara `db-index=01/02/03` etiketini ekler. ## Step 1 — Init Swarm on iklim-app-01 (the prod-runner node) ```bash MANAGER_IP=$(hostname -I | awk '{print $1}') if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then docker swarm init --advertise-addr "$MANAGER_IP" echo "Swarm initialized on $MANAGER_IP" else echo "Swarm already active" fi ``` ## Step 2 — Get manager join token ```bash docker swarm join-token manager # for iklim-app-02, iklim-app-03 ``` Save this token — needed on iklim-app-02 and iklim-app-03. ## Step 3 — Join iklim-app-02 and iklim-app-03 as managers SSH into iklim-app-02 and iklim-app-03, run: ```bash docker swarm join --token 10.20.10.11:2377 ``` ## Step 4 — Label app nodes On iklim-app-01, after iklim-app-02 and iklim-app-03 have joined: ```bash for node in iklim-app-01 iklim-app-02 iklim-app-03; do docker node update --label-add type=service "$node" done ``` ## Step 5 — Join DB nodes as Swarm workers Get the worker join token on iklim-app-01: ```bash docker swarm join-token worker ``` SSH into each DB node and join: ```bash docker swarm join --token 10.20.10.11:2377 ``` Then label them on iklim-app-01: ```bash docker node update --label-add role=db iklim-db-01 docker node update --label-add role=db iklim-db-02 docker node update --label-add role=db iklim-db-03 docker node update --label-add db-index=01 iklim-db-01 docker node update --label-add db-index=02 iklim-db-02 docker node update --label-add db-index=03 iklim-db-03 ``` > DB nodes are Swarm **workers** only — they never become managers. > DB services are pinned to them via `node.labels.role == db` placement constraint. > DB services are deployed by the current root production stack `docker-stack-infra_db-prod.yml`. ## Step 6 — Verify ```bash docker node ls ``` Expected: 6 nodes — 3 with `MANAGER STATUS` = `Leader` or `Reachable`, 3 workers with `Ready`. ```bash docker node inspect iklim-app-01 --format '{{.Spec.Labels}}' docker node inspect iklim-db-01 --format '{{.Spec.Labels}}' ``` Expected: `map[type:service]` for app nodes, `map[db-index:01 role:db]` (vb.) for DB nodes. ## Step 7 — Confirm `init/swarm-init.sh` multi-node awareness The script is idempotent (skips init if already active). Verify: ```bash grep -n "swarm init\|swarm join" init/swarm-init.sh ``` The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (`swarm` role), not via the Gitea pipeline. ## Placement Constraints Used in Current Prod Stacks | Constraint | Resolves to | Services | |------------|-------------|----------| | `node.hostname == iklim-app-01` | iklim-app-01 only | SWAG, cert-reloader | | `node.labels.type == service` | iklim-app-01, iklim-app-02, iklim-app-03 | Vault, Redis, RabbitMQ, APISIX, Prometheus, Grafana, SWAG support services | | `node.hostname == iklim-db-01/02/03` | specific DB node | Patroni, MongoDB, and etcd services pinned per node in `docker-stack-infra_db-prod.yml` | | `node.labels.role == db` | iklim-db-01, iklim-db-02, iklim-db-03 | Generic DB node identity; retained for operations and compatibility | SWAG and cert-reloader are pinned to `iklim-app-01` (the Floating IP node) because SWAG must match the public entry point. Vault is deployed by `docker-stack-vault.yml` across service nodes and reads certificates from `/opt/iklimco/ssl`. Microservices are distributed by the Swarm scheduler across app nodes. DB services are defined in `docker-stack-infra_db-prod.yml` and pinned to DB nodes by hostname constraints. ## Historical / Superseded by Setup Older notes that referred to `docker-stack-infra.yml`, `docker-stack-infra.prod.yml`, or `docker-stack-db.prod.yml` as the active prod deployment model are superseded by the current root production stack and workflow model.