# 01 — Docker Swarm Init (Prod — Multi-Node) ## Context - **Repo:** `iklim.co` root - **Environment:** prod - **Topology:** - 3 × service nodes — all act as **Swarm managers AND app workers** (Raft quorum: 1 can fail) - 3 × DB nodes — **NOT part of Docker Swarm** (separate DB cluster, out of scope) - All 6 nodes are in the same private network. - Pipeline trigger: push to `prod-env` branch → Gitea runner on `prod-runner` (first service node). - Swarm has 3 nodes total; all are manager-eligible and carry workloads (no dedicated worker-only nodes). ## Node labeling plan | Node | Role | Swarm role | Labels | |------|------|------------|--------| | service-1 | API services, SWAG, Vault | Manager + Worker | `type=service` | | service-2 | API services replicas | Manager + Worker | `type=service` | | service-3 | API services replicas | Manager + Worker | `type=service` | > DB nodes (`db-1/2/3`) are **not part of Docker Swarm**. They run as a separate cluster > and are provisioned independently. No Swarm join or label step applies to them. ## Step 1 — Init Swarm on service-1 (the prod-runner node) ```bash MANAGER_IP=$(hostname -I | awk '{print $1}') if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then docker swarm init --advertise-addr "$MANAGER_IP" echo "✅ Swarm initialized on $MANAGER_IP" else echo "ℹ️ Swarm already active" fi ``` ## Step 2 — Get manager join token ```bash docker swarm join-token manager # for service-2, service-3 ``` Save this token — needed on service-2 and service-3. ## Step 3 — Join service-2 and service-3 as managers SSH into service-2 and service-3, run: ```bash docker swarm join --token :2377 ``` ## Step 4 — Label all Swarm nodes On service-1, after service-2 and service-3 have joined: ```bash for node in service-1 service-2 service-3; do docker node update --label-add type=service "$node" done ``` > Replace `service-1`, etc. with actual node hostnames shown in `docker node ls`. > DB nodes are not in Swarm — no join or label step for them. ## Step 5 — Verify ```bash docker node ls ``` Expected: 3 nodes, all with `MANAGER STATUS` = `Leader` or `Reachable`. All 3 nodes remain in `AVAILABILITY=Active` (not drained) so they also carry workloads. ```bash docker node inspect service-1 --format '{{.Spec.Labels}}' ``` Expected: `map[type:service]`. ## Step 6 — Confirm `init/swarm-init.sh` multi-node awareness The script is idempotent (skips init if already active). Verify: ```bash grep -n "swarm init\|swarm join" init/swarm-init.sh ``` The prod pipeline runs on service-1 only. service-2/3 are joined via Ansible (`swarm` role), not via the Gitea pipeline. ## Placement constraints used in `docker-stack-infra.yml` | Constraint | Resolves to | |------------|-------------| | `node.role == manager` | service-1, service-2, service-3 | | `node.labels.type == service` | service-1, service-2, service-3 | SWAG, Vault, cert-reloader: pinned to `node.role == manager`. Microservices: no constraint (distributed across all 3 service nodes by Swarm scheduler). > `node.labels.type == db` constraint is **not used** — DB nodes are not in Swarm. > PostgreSQL and MongoDB run outside Swarm as a separately managed cluster.