102 lines
3.2 KiB
Markdown
102 lines
3.2 KiB
Markdown
# 01 — Docker Swarm Init (Prod — Multi-Node)
|
||
|
||
## Context
|
||
- **Repo:** `iklim.co` root
|
||
- **Environment:** prod
|
||
- **Topology:**
|
||
- 3 × service nodes — all act as **Swarm managers AND app workers** (Raft quorum: 1 can fail)
|
||
- 3 × DB nodes — **NOT part of Docker Swarm** (separate DB cluster, out of scope)
|
||
- All 6 nodes are in the same private network.
|
||
- Pipeline trigger: push to `prod-env` branch → Gitea runner on `prod-runner` (first service node).
|
||
- Swarm has 3 nodes total; all are manager-eligible and carry workloads (no dedicated worker-only nodes).
|
||
|
||
## Node labeling plan
|
||
|
||
| Node | Role | Swarm role | Labels |
|
||
|------|------|------------|--------|
|
||
| service-1 | API services, SWAG, Vault | Manager + Worker | `type=service` |
|
||
| service-2 | API services replicas | Manager + Worker | `type=service` |
|
||
| service-3 | API services replicas | Manager + Worker | `type=service` |
|
||
|
||
> DB nodes (`db-1/2/3`) are **not part of Docker Swarm**. They run as a separate cluster
|
||
> and are provisioned independently. No Swarm join or label step applies to them.
|
||
|
||
## Step 1 — Init Swarm on service-1 (the prod-runner node)
|
||
|
||
```bash
|
||
MANAGER_IP=$(hostname -I | awk '{print $1}')
|
||
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
|
||
docker swarm init --advertise-addr "$MANAGER_IP"
|
||
echo "✅ Swarm initialized on $MANAGER_IP"
|
||
else
|
||
echo "ℹ️ Swarm already active"
|
||
fi
|
||
```
|
||
|
||
## Step 2 — Get manager join token
|
||
|
||
```bash
|
||
docker swarm join-token manager # for service-2, service-3
|
||
```
|
||
|
||
Save this token — needed on service-2 and service-3.
|
||
|
||
## Step 3 — Join service-2 and service-3 as managers
|
||
|
||
SSH into service-2 and service-3, run:
|
||
```bash
|
||
docker swarm join --token <MANAGER_TOKEN> <service-1-ip>:2377
|
||
```
|
||
|
||
## Step 4 — Label all Swarm nodes
|
||
|
||
On service-1, after service-2 and service-3 have joined:
|
||
|
||
```bash
|
||
for node in service-1 service-2 service-3; do
|
||
docker node update --label-add type=service "$node"
|
||
done
|
||
```
|
||
|
||
> Replace `service-1`, etc. with actual node hostnames shown in `docker node ls`.
|
||
> DB nodes are not in Swarm — no join or label step for them.
|
||
|
||
## Step 5 — Verify
|
||
|
||
```bash
|
||
docker node ls
|
||
```
|
||
|
||
Expected: 3 nodes, all with `MANAGER STATUS` = `Leader` or `Reachable`.
|
||
All 3 nodes remain in `AVAILABILITY=Active` (not drained) so they also carry workloads.
|
||
|
||
```bash
|
||
docker node inspect service-1 --format '{{.Spec.Labels}}'
|
||
```
|
||
|
||
Expected: `map[type:service]`.
|
||
|
||
## Step 6 — Confirm `init/swarm-init.sh` multi-node awareness
|
||
|
||
The script is idempotent (skips init if already active). Verify:
|
||
|
||
```bash
|
||
grep -n "swarm init\|swarm join" init/swarm-init.sh
|
||
```
|
||
|
||
The prod pipeline runs on service-1 only. service-2/3 are joined via Ansible (`swarm` role),
|
||
not via the Gitea pipeline.
|
||
|
||
## Placement constraints used in `docker-stack-infra.yml`
|
||
|
||
| Constraint | Resolves to |
|
||
|------------|-------------|
|
||
| `node.role == manager` | service-1, service-2, service-3 |
|
||
| `node.labels.type == service` | service-1, service-2, service-3 |
|
||
|
||
SWAG, Vault, cert-reloader: pinned to `node.role == manager`.
|
||
Microservices: no constraint (distributed across all 3 service nodes by Swarm scheduler).
|
||
|
||
> `node.labels.type == db` constraint is **not used** — DB nodes are not in Swarm.
|
||
> PostgreSQL and MongoDB run outside Swarm as a separately managed cluster.
|