- Database nodes now join the Docker Swarm as workers with `role=db` labels, allowing Swarm to manage their dedicated services. - The `docker-stack-infra.yml` has been updated for production to focus solely on application-level infrastructure components. - Dedicated database services (PostgreSQL, MongoDB, Patroni-etcd) are now explicitly deployed in separate Swarm stacks on `iklim-db-XX` nodes. - Standardizes node naming conventions (`iklim-app-XX`, `iklim-db-XX`) across the production roadmap documentation. - Clarifies that the `etcd` service within `docker-stack-infra.yml` is exclusively for APISIX configuration, distinct from Patroni's etcd cluster.
3.8 KiB
3.8 KiB
07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod)
Context
Vault starts as a single instance on the manager node (iklim-app-01) for the initial prod launch.
This matches the current docker-stack-infra.yml configuration (file storage, single replica).
Raft HA cluster is planned for a later phase.
Phase 1 — Initial prod launch (current)
- Replicas: 1
- Storage: file (
/vault/file) on iklim-app-01 - Placement:
node.role == manager(iklim-app-01) - Cert: from
/opt/iklimco/ssl/(populated by cert-reloader from SWAG volume) - TLS:
VAULT_LOCAL_CONFIGunchanged —api_addr: https://vault.iklim.co:8200
No changes to docker-stack-infra.yml vault service for Phase 1.
Phase 2 — Vault Raft Cluster (future)
What changes
- Replicas: 3 (one per service node)
- Storage: Raft integrated (replaces file storage)
- Placement:
node.labels.type == service(all 3 service nodes) - Cert distribution: cert-reloader SSH-copies renewed cert to iklim-app-02, iklim-app-03
Prerequisites before migration
- All 3 service nodes are running and labeled
type=service - Vault data backed up from Phase 1 (snapshot via
vault operator raft snapshot save) - SSH key created for cert-reloader to reach iklim-app-02 and iklim-app-03
- SSH key stored as Docker secret
cert_reloader_ssh_key /opt/iklimco/ssl/directory exists on iklim-app-02 and iklim-app-03- Vault data directory
/opt/iklimco/vault/data/exists on all 3 nodes (host path volumes)
Vault service update for Raft
vault:
# ... (image, secrets, healthcheck unchanged)
environment:
VAULT_LOCAL_CONFIG: >-
{"api_addr":"https://vault.iklim.co:8200",
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
"listener":[{"tcp":{"address":"0.0.0.0:8200",
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
"tls_key_file":"/vault/certs/STAR.iklim.co_key.txt"}}],
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file # host path per node
- /opt/iklimco/ssl:/vault/certs:ro
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
{{ .Node.Hostname }}is Docker Swarm's Go template for the node hostname — gives each Vault instance a uniquenode_id.
Raft join procedure (after deploying 3-replica Vault)
Only the leader needs to be bootstrapped; others join via vault operator raft join:
# On the primary Vault (iklim-app-01 container):
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
# Unseal if needed
docker exec -it "$VAULT_CTR" vault operator unseal
# Check Raft peers
docker exec "$VAULT_CTR" vault operator raft list-peers
On iklim-app-02 and iklim-app-03 containers:
docker exec -it <vault-on-iklim-app-02> vault operator raft join \
https://vault.iklim.co:8200
cert-reloader update for Raft
Update the cert-reloader command in docker-stack-infra.yml to SSH-copy the cert
to iklim-app-02 and iklim-app-03 after renewal:
# After copying to local /opt/iklimco/ssl/:
ssh -i /run/secrets/cert_reloader_ssh_key iklim-app-02 \
"cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
# (repeat for iklim-app-03 and privkey)
docker service update --force iklimco_vault
Add Docker secret to cert-reloader:
secrets:
- cert_reloader_ssh_key