Murat ÖZDEMİR 76f87aa2f9 Integrate DB nodes into Swarm and refine prod service deployment
- Database nodes now join the Docker Swarm as workers with `role=db` labels, allowing Swarm to manage their dedicated services.
- The `docker-stack-infra.yml` has been updated for production to focus solely on application-level infrastructure components.
- Dedicated database services (PostgreSQL, MongoDB, Patroni-etcd) are now explicitly deployed in separate Swarm stacks on `iklim-db-XX` nodes.
- Standardizes node naming conventions (`iklim-app-XX`, `iklim-db-XX`) across the production roadmap documentation.
- Clarifies that the `etcd` service within `docker-stack-infra.yml` is exclusively for APISIX configuration, distinct from Patroni's etcd cluster.
2026-05-11 14:53:21 +03:00

106 lines
3.8 KiB
Markdown

# 07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod)
## Context
Vault starts as a single instance on the manager node (iklim-app-01) for the initial prod launch.
This matches the current `docker-stack-infra.yml` configuration (file storage, single replica).
Raft HA cluster is planned for a later phase.
## Phase 1 — Initial prod launch (current)
- **Replicas:** 1
- **Storage:** file (`/vault/file`) on iklim-app-01
- **Placement:** `node.role == manager` (iklim-app-01)
- **Cert:** from `/opt/iklimco/ssl/` (populated by cert-reloader from SWAG volume)
- **TLS:** `VAULT_LOCAL_CONFIG` unchanged — `api_addr: https://vault.iklim.co:8200`
No changes to `docker-stack-infra.yml` vault service for Phase 1.
## Phase 2 — Vault Raft Cluster (future)
### What changes
- **Replicas:** 3 (one per service node)
- **Storage:** Raft integrated (replaces file storage)
- **Placement:** `node.labels.type == service` (all 3 service nodes)
- **Cert distribution:** cert-reloader SSH-copies renewed cert to iklim-app-02, iklim-app-03
### Prerequisites before migration
- [ ] All 3 service nodes are running and labeled `type=service`
- [ ] Vault data backed up from Phase 1 (snapshot via `vault operator raft snapshot save`)
- [ ] SSH key created for cert-reloader to reach iklim-app-02 and iklim-app-03
- [ ] SSH key stored as Docker secret `cert_reloader_ssh_key`
- [ ] `/opt/iklimco/ssl/` directory exists on iklim-app-02 and iklim-app-03
- [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes)
### Vault service update for Raft
```yaml
vault:
# ... (image, secrets, healthcheck unchanged)
environment:
VAULT_LOCAL_CONFIG: >-
{"api_addr":"https://vault.iklim.co:8200",
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
"listener":[{"tcp":{"address":"0.0.0.0:8200",
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
"tls_key_file":"/vault/certs/STAR.iklim.co_key.txt"}}],
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file # host path per node
- /opt/iklimco/ssl:/vault/certs:ro
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
```
> `{{ .Node.Hostname }}` is Docker Swarm's Go template for the node hostname —
> gives each Vault instance a unique `node_id`.
### Raft join procedure (after deploying 3-replica Vault)
Only the leader needs to be bootstrapped; others join via `vault operator raft join`:
```bash
# On the primary Vault (iklim-app-01 container):
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
# Unseal if needed
docker exec -it "$VAULT_CTR" vault operator unseal
# Check Raft peers
docker exec "$VAULT_CTR" vault operator raft list-peers
```
On iklim-app-02 and iklim-app-03 containers:
```bash
docker exec -it <vault-on-iklim-app-02> vault operator raft join \
https://vault.iklim.co:8200
```
### cert-reloader update for Raft
Update the cert-reloader command in `docker-stack-infra.yml` to SSH-copy the cert
to iklim-app-02 and iklim-app-03 after renewal:
```bash
# After copying to local /opt/iklimco/ssl/:
ssh -i /run/secrets/cert_reloader_ssh_key iklim-app-02 \
"cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
# (repeat for iklim-app-03 and privkey)
docker service update --force iklimco_vault
```
Add Docker secret to cert-reloader:
```yaml
secrets:
- cert_reloader_ssh_key
```
## Reference
- Vault Raft storage docs: https://developer.hashicorp.com/vault/docs/configuration/storage/raft
- Vault Swarm setup: https://manjit28.medium.com/setting-up-a-secure-and-highly-available-hashicorp-vault-cluster-for-secrets-and-certificates-0ce01a370582