Environment_Infrastructure/roadmap/prod-env/01-swarm-init-multinode.md
Murat ÖZDEMİR 76f87aa2f9 Integrate DB nodes into Swarm and refine prod service deployment
- Database nodes now join the Docker Swarm as workers with `role=db` labels, allowing Swarm to manage their dedicated services.
- The `docker-stack-infra.yml` has been updated for production to focus solely on application-level infrastructure components.
- Dedicated database services (PostgreSQL, MongoDB, Patroni-etcd) are now explicitly deployed in separate Swarm stacks on `iklim-db-XX` nodes.
- Standardizes node naming conventions (`iklim-app-XX`, `iklim-db-XX`) across the production roadmap documentation.
- Clarifies that the `etcd` service within `docker-stack-infra.yml` is exclusively for APISIX configuration, distinct from Patroni's etcd cluster.
2026-05-11 14:53:21 +03:00

125 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 01 — Docker Swarm Init (Prod — Multi-Node)
## Context
- **Repo:** `iklim.co` root
- **Environment:** prod
- **Topology:**
- 3 × app nodes (`iklim-app-01/02/03`) — all act as **Swarm managers AND app workers** (Raft quorum: 1 can fail)
- 3 × DB nodes (`iklim-db-01/02/03`) — join Swarm as **workers** with `role=db` label; DB services are placed exclusively on them
- **Sizing:** app nodes are `cpx42`, DB nodes are `cpx32`; see `../../hetzner-sizing-report.md`
- All 6 nodes are in the same private network.
- Pipeline trigger: push to `prod-env` branch → Gitea runner on `prod-runner` (first app node).
- App Swarm managers: 3 nodes all manager-eligible and carry app workloads (no dedicated worker-only app nodes).
## Node labeling plan
| Node | Role | Swarm role | Labels |
|------|------|------------|--------|
| `iklim-app-01` | API services, SWAG, Vault | Manager + Worker | `type=service` |
| `iklim-app-02` | API services replicas | Manager + Worker | `type=service` |
| `iklim-app-03` | API services replicas | Manager + Worker | `type=service` |
| `iklim-db-01` | PostgreSQL (Patroni), etcd | Worker | `role=db` |
| `iklim-db-02` | PostgreSQL (Patroni), etcd | Worker | `role=db` |
| `iklim-db-03` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` |
## Step 1 — Init Swarm on iklim-app-01 (the prod-runner node)
```bash
MANAGER_IP=$(hostname -I | awk '{print $1}')
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
docker swarm init --advertise-addr "$MANAGER_IP"
echo "Swarm initialized on $MANAGER_IP"
else
echo "Swarm already active"
fi
```
## Step 2 — Get manager join token
```bash
docker swarm join-token manager # for iklim-app-02, iklim-app-03
```
Save this token — needed on iklim-app-02 and iklim-app-03.
## Step 3 — Join iklim-app-02 and iklim-app-03 as managers
SSH into iklim-app-02 and iklim-app-03, run:
```bash
docker swarm join --token <MANAGER_TOKEN> 10.10.10.11:2377
```
## Step 4 — Label app nodes
On iklim-app-01, after iklim-app-02 and iklim-app-03 have joined:
```bash
for node in iklim-app-01 iklim-app-02 iklim-app-03; do
docker node update --label-add type=service "$node"
done
```
## Step 5 — Join DB nodes as Swarm workers
Get the worker join token on iklim-app-01:
```bash
docker swarm join-token worker
```
SSH into each DB node and join:
```bash
docker swarm join --token <WORKER_TOKEN> 10.10.10.11:2377
```
Then label them on iklim-app-01:
```bash
for node in iklim-db-01 iklim-db-02 iklim-db-03; do
docker node update --label-add role=db "$node"
done
```
> DB nodes are Swarm **workers** only — they never become managers.
> DB services are pinned to them via `node.labels.role == db` placement constraint.
> See `08-prod-db-cluster-kurulum.md` for DB stack deployment.
## Step 6 — Verify
```bash
docker node ls
```
Expected: 6 nodes — 3 with `MANAGER STATUS` = `Leader` or `Reachable`, 3 workers with `Ready`.
```bash
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
```
Expected: `map[role:service]` for app nodes, `map[role:db]` for DB nodes.
## Step 7 — Confirm `init/swarm-init.sh` multi-node awareness
The script is idempotent (skips init if already active). Verify:
```bash
grep -n "swarm init\|swarm join" init/swarm-init.sh
```
The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (`swarm` role),
not via the Gitea pipeline.
## Placement constraints used in `docker-stack-infra.yml`
| Constraint | Resolves to |
|------------|-------------|
| `node.role == manager` | iklim-app-01, iklim-app-02, iklim-app-03 |
| `node.labels.type == service` | iklim-app-01, iklim-app-02, iklim-app-03 |
| `node.labels.role == db` | iklim-db-01, iklim-db-02, iklim-db-03 |
SWAG, Vault, cert-reloader: pinned to `node.role == manager`.
Microservices: no constraint (distributed across all app nodes by Swarm scheduler).
DB services (Patroni, etcd, MongoDB): pinned to `node.labels.role == db` in separate DB stacks.