3.7 KiB
3.7 KiB
07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod)
Context
Vault starts as a single instance on the manager node (service-1) for the initial prod launch.
This matches the current docker-stack-infra.yml configuration (file storage, single replica).
Raft HA cluster is planned for a later phase.
Phase 1 — Initial prod launch (current)
- Replicas: 1
- Storage: file (
/vault/file) on service-1 - Placement:
node.role == manager(service-1) - Cert: from
/opt/iklimco/ssl/(populated by cert-reloader from SWAG volume) - TLS:
VAULT_LOCAL_CONFIGunchanged —api_addr: https://vault.iklim.co:8200
No changes to docker-stack-infra.yml vault service for Phase 1.
Phase 2 — Vault Raft Cluster (future)
What changes
- Replicas: 3 (one per service node)
- Storage: Raft integrated (replaces file storage)
- Placement:
node.labels.type == service(all 3 service nodes) - Cert distribution: cert-reloader SSH-copies renewed cert to service-2, service-3
Prerequisites before migration
- All 3 service nodes are running and labeled
type=service - Vault data backed up from Phase 1 (snapshot via
vault operator raft snapshot save) - SSH key created for cert-reloader to reach service-2 and service-3
- SSH key stored as Docker secret
cert_reloader_ssh_key /opt/iklimco/ssl/directory exists on service-2 and service-3- Vault data directory
/opt/iklimco/vault/data/exists on all 3 nodes (host path volumes)
Vault service update for Raft
vault:
# ... (image, secrets, healthcheck unchanged)
environment:
VAULT_LOCAL_CONFIG: >-
{"api_addr":"https://vault.iklim.co:8200",
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
"listener":[{"tcp":{"address":"0.0.0.0:8200",
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
"tls_key_file":"/vault/certs/STAR.iklim.co_key.txt"}}],
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file # host path per node
- /opt/iklimco/ssl:/vault/certs:ro
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
{{ .Node.Hostname }}is Docker Swarm's Go template for the node hostname — gives each Vault instance a uniquenode_id.
Raft join procedure (after deploying 3-replica Vault)
Only the leader needs to be bootstrapped; others join via vault operator raft join:
# On the primary Vault (service-1 container):
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
# Unseal if needed
docker exec -it "$VAULT_CTR" vault operator unseal
# Check Raft peers
docker exec "$VAULT_CTR" vault operator raft list-peers
On service-2 and service-3 containers:
docker exec -it <vault-on-service-2> vault operator raft join \
https://vault.iklim.co:8200
cert-reloader update for Raft
Update the cert-reloader command in docker-stack-infra.yml to SSH-copy the cert
to service-2 and service-3 after renewal:
# After copying to local /opt/iklimco/ssl/:
ssh -i /run/secrets/cert_reloader_ssh_key service-2 \
"cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
# (repeat for service-3 and privkey)
docker service update --force iklimco_vault
Add Docker secret to cert-reloader:
secrets:
- cert_reloader_ssh_key