- Database nodes now join the Docker Swarm as workers with `role=db` labels, allowing Swarm to manage their dedicated services. - The `docker-stack-infra.yml` has been updated for production to focus solely on application-level infrastructure components. - Dedicated database services (PostgreSQL, MongoDB, Patroni-etcd) are now explicitly deployed in separate Swarm stacks on `iklim-db-XX` nodes. - Standardizes node naming conventions (`iklim-app-XX`, `iklim-db-XX`) across the production roadmap documentation. - Clarifies that the `etcd` service within `docker-stack-infra.yml` is exclusively for APISIX configuration, distinct from Patroni's etcd cluster.
125 lines
4.0 KiB
Markdown
125 lines
4.0 KiB
Markdown
# 01 — Docker Swarm Init (Prod — Multi-Node)
|
||
|
||
## Context
|
||
- **Repo:** `iklim.co` root
|
||
- **Environment:** prod
|
||
- **Topology:**
|
||
- 3 × app nodes (`iklim-app-01/02/03`) — all act as **Swarm managers AND app workers** (Raft quorum: 1 can fail)
|
||
- 3 × DB nodes (`iklim-db-01/02/03`) — join Swarm as **workers** with `role=db` label; DB services are placed exclusively on them
|
||
- **Sizing:** app nodes are `cpx42`, DB nodes are `cpx32`; see `../../hetzner-sizing-report.md`
|
||
- All 6 nodes are in the same private network.
|
||
- Pipeline trigger: push to `prod-env` branch → Gitea runner on `prod-runner` (first app node).
|
||
- App Swarm managers: 3 nodes all manager-eligible and carry app workloads (no dedicated worker-only app nodes).
|
||
|
||
## Node labeling plan
|
||
|
||
| Node | Role | Swarm role | Labels |
|
||
|------|------|------------|--------|
|
||
| `iklim-app-01` | API services, SWAG, Vault | Manager + Worker | `type=service` |
|
||
| `iklim-app-02` | API services replicas | Manager + Worker | `type=service` |
|
||
| `iklim-app-03` | API services replicas | Manager + Worker | `type=service` |
|
||
| `iklim-db-01` | PostgreSQL (Patroni), etcd | Worker | `role=db` |
|
||
| `iklim-db-02` | PostgreSQL (Patroni), etcd | Worker | `role=db` |
|
||
| `iklim-db-03` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` |
|
||
|
||
## Step 1 — Init Swarm on iklim-app-01 (the prod-runner node)
|
||
|
||
```bash
|
||
MANAGER_IP=$(hostname -I | awk '{print $1}')
|
||
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
|
||
docker swarm init --advertise-addr "$MANAGER_IP"
|
||
echo "Swarm initialized on $MANAGER_IP"
|
||
else
|
||
echo "Swarm already active"
|
||
fi
|
||
```
|
||
|
||
## Step 2 — Get manager join token
|
||
|
||
```bash
|
||
docker swarm join-token manager # for iklim-app-02, iklim-app-03
|
||
```
|
||
|
||
Save this token — needed on iklim-app-02 and iklim-app-03.
|
||
|
||
## Step 3 — Join iklim-app-02 and iklim-app-03 as managers
|
||
|
||
SSH into iklim-app-02 and iklim-app-03, run:
|
||
```bash
|
||
docker swarm join --token <MANAGER_TOKEN> 10.10.10.11:2377
|
||
```
|
||
|
||
## Step 4 — Label app nodes
|
||
|
||
On iklim-app-01, after iklim-app-02 and iklim-app-03 have joined:
|
||
|
||
```bash
|
||
for node in iklim-app-01 iklim-app-02 iklim-app-03; do
|
||
docker node update --label-add type=service "$node"
|
||
done
|
||
```
|
||
|
||
## Step 5 — Join DB nodes as Swarm workers
|
||
|
||
Get the worker join token on iklim-app-01:
|
||
|
||
```bash
|
||
docker swarm join-token worker
|
||
```
|
||
|
||
SSH into each DB node and join:
|
||
|
||
```bash
|
||
docker swarm join --token <WORKER_TOKEN> 10.10.10.11:2377
|
||
```
|
||
|
||
Then label them on iklim-app-01:
|
||
|
||
```bash
|
||
for node in iklim-db-01 iklim-db-02 iklim-db-03; do
|
||
docker node update --label-add role=db "$node"
|
||
done
|
||
```
|
||
|
||
> DB nodes are Swarm **workers** only — they never become managers.
|
||
> DB services are pinned to them via `node.labels.role == db` placement constraint.
|
||
> See `08-prod-db-cluster-kurulum.md` for DB stack deployment.
|
||
|
||
## Step 6 — Verify
|
||
|
||
```bash
|
||
docker node ls
|
||
```
|
||
|
||
Expected: 6 nodes — 3 with `MANAGER STATUS` = `Leader` or `Reachable`, 3 workers with `Ready`.
|
||
|
||
```bash
|
||
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
|
||
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
|
||
```
|
||
|
||
Expected: `map[role:service]` for app nodes, `map[role:db]` for DB nodes.
|
||
|
||
## Step 7 — Confirm `init/swarm-init.sh` multi-node awareness
|
||
|
||
The script is idempotent (skips init if already active). Verify:
|
||
|
||
```bash
|
||
grep -n "swarm init\|swarm join" init/swarm-init.sh
|
||
```
|
||
|
||
The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (`swarm` role),
|
||
not via the Gitea pipeline.
|
||
|
||
## Placement constraints used in `docker-stack-infra.yml`
|
||
|
||
| Constraint | Resolves to |
|
||
|------------|-------------|
|
||
| `node.role == manager` | iklim-app-01, iklim-app-02, iklim-app-03 |
|
||
| `node.labels.type == service` | iklim-app-01, iklim-app-02, iklim-app-03 |
|
||
| `node.labels.role == db` | iklim-db-01, iklim-db-02, iklim-db-03 |
|
||
|
||
SWAG, Vault, cert-reloader: pinned to `node.role == manager`.
|
||
Microservices: no constraint (distributed across all app nodes by Swarm scheduler).
|
||
DB services (Patroni, etcd, MongoDB): pinned to `node.labels.role == db` in separate DB stacks.
|