Environment_Infrastructure/roadmap/prod-env/01-swarm-init-multinode.md
Murat ÖZDEMİR bf8f011e43 Restructure setup documentation and refine environment bootstrapping
This commit introduces a reordered and renumbered set of setup documentation files to better reflect the deployment stages for both test and production environments.

Key changes include:
*   A new `setup-vs-roadmap-map.md` file to provide a clear mapping between roadmap tasks and their corresponding setup phases.
*   Significantly expanded Ansible bootstrap documentation for both test and production, detailing Docker, Swarm, security hardening, and StorageBox SSH key management roles.
*   Formalized database Docker and Swarm cluster setup instructions for test and production, including explicit steps for Swarm worker integration of DB nodes.
*   Updated roadmap documentation (`roadmap/prod-env/*`) to align with the refined setup, incorporating correct private IP addresses for Swarm joins, new node labels, and floating IP usage for GoDaddy DNS records.
2026-05-11 17:47:30 +03:00

125 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 01 — Docker Swarm Init (Prod — Multi-Node)
## Context
- **Repo:** `iklim.co` root
- **Environment:** prod
- **Topology:**
- 3 × app nodes (`iklim-app-01/02/03`) — all act as **Swarm managers AND app workers** (Raft quorum: 1 can fail)
- 3 × DB nodes (`iklim-db-01/02/03`) — join Swarm as **workers** with `role=db` label; DB services are placed exclusively on them
- **Sizing:** app nodes are `cpx42`, DB nodes are `cpx32`; see `../../hetzner-sizing-report.md`
- All 6 nodes are in the same private network.
- Pipeline trigger: push to `prod-env` branch → Gitea runner on `prod-runner` (first app node).
- App Swarm managers: 3 nodes all manager-eligible and carry app workloads (no dedicated worker-only app nodes).
## Node labeling plan
| Node | Role | Swarm role | Labels |
|------|------|------------|--------|
| `iklim-app-01` | API services, SWAG, Vault | Manager + Worker | `type=service` |
| `iklim-app-02` | API services replicas | Manager + Worker | `type=service` |
| `iklim-app-03` | API services replicas | Manager + Worker | `type=service` |
| `iklim-db-01` | PostgreSQL (Patroni), etcd | Worker | `role=db` |
| `iklim-db-02` | PostgreSQL (Patroni), etcd | Worker | `role=db` |
| `iklim-db-03` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` |
## Step 1 — Init Swarm on iklim-app-01 (the prod-runner node)
```bash
MANAGER_IP=$(hostname -I | awk '{print $1}')
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
docker swarm init --advertise-addr "$MANAGER_IP"
echo "Swarm initialized on $MANAGER_IP"
else
echo "Swarm already active"
fi
```
## Step 2 — Get manager join token
```bash
docker swarm join-token manager # for iklim-app-02, iklim-app-03
```
Save this token — needed on iklim-app-02 and iklim-app-03.
## Step 3 — Join iklim-app-02 and iklim-app-03 as managers
SSH into iklim-app-02 and iklim-app-03, run:
```bash
docker swarm join --token <MANAGER_TOKEN> 10.20.10.11:2377
```
## Step 4 — Label app nodes
On iklim-app-01, after iklim-app-02 and iklim-app-03 have joined:
```bash
for node in iklim-app-01 iklim-app-02 iklim-app-03; do
docker node update --label-add type=service "$node"
done
```
## Step 5 — Join DB nodes as Swarm workers
Get the worker join token on iklim-app-01:
```bash
docker swarm join-token worker
```
SSH into each DB node and join:
```bash
docker swarm join --token <WORKER_TOKEN> 10.20.10.11:2377
```
Then label them on iklim-app-01:
```bash
for node in iklim-db-01 iklim-db-02 iklim-db-03; do
docker node update --label-add role=db "$node"
done
```
> DB nodes are Swarm **workers** only — they never become managers.
> DB services are pinned to them via `node.labels.role == db` placement constraint.
> See `08-prod-db-cluster-kurulum.md` for DB stack deployment.
## Step 6 — Verify
```bash
docker node ls
```
Expected: 6 nodes — 3 with `MANAGER STATUS` = `Leader` or `Reachable`, 3 workers with `Ready`.
```bash
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
```
Expected: `map[type:service]` for app nodes, `map[role:db]` for DB nodes.
## Step 7 — Confirm `init/swarm-init.sh` multi-node awareness
The script is idempotent (skips init if already active). Verify:
```bash
grep -n "swarm init\|swarm join" init/swarm-init.sh
```
The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (`swarm` role),
not via the Gitea pipeline.
## Placement constraints used in `docker-stack-infra.yml`
| Constraint | Resolves to |
|------------|-------------|
| `node.role == manager` | iklim-app-01, iklim-app-02, iklim-app-03 |
| `node.labels.type == service` | iklim-app-01, iklim-app-02, iklim-app-03 |
| `node.labels.role == db` | iklim-db-01, iklim-db-02, iklim-db-03 |
SWAG, Vault, cert-reloader: pinned to `node.role == manager`.
Microservices: no constraint (distributed across all app nodes by Swarm scheduler).
DB services (Patroni, etcd, MongoDB): pinned to `node.labels.role == db` in separate DB stacks.