2026-05-09 16:26:06 +03:00

3.7 KiB

07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod)

Context

Vault starts as a single instance on the manager node (service-1) for the initial prod launch. This matches the current docker-stack-infra.yml configuration (file storage, single replica).

Raft HA cluster is planned for a later phase.

Phase 1 — Initial prod launch (current)

  • Replicas: 1
  • Storage: file (/vault/file) on service-1
  • Placement: node.role == manager (service-1)
  • Cert: from /opt/iklimco/ssl/ (populated by cert-reloader from SWAG volume)
  • TLS: VAULT_LOCAL_CONFIG unchanged — api_addr: https://vault.iklim.co:8200

No changes to docker-stack-infra.yml vault service for Phase 1.

Phase 2 — Vault Raft Cluster (future)

What changes

  • Replicas: 3 (one per service node)
  • Storage: Raft integrated (replaces file storage)
  • Placement: node.labels.type == service (all 3 service nodes)
  • Cert distribution: cert-reloader SSH-copies renewed cert to service-2, service-3

Prerequisites before migration

  • All 3 service nodes are running and labeled type=service
  • Vault data backed up from Phase 1 (snapshot via vault operator raft snapshot save)
  • SSH key created for cert-reloader to reach service-2 and service-3
  • SSH key stored as Docker secret cert_reloader_ssh_key
  • /opt/iklimco/ssl/ directory exists on service-2 and service-3
  • Vault data directory /opt/iklimco/vault/data/ exists on all 3 nodes (host path volumes)

Vault service update for Raft

vault:
  # ... (image, secrets, healthcheck unchanged)
  environment:
    VAULT_LOCAL_CONFIG: >-
      {"api_addr":"https://vault.iklim.co:8200",
       "cluster_addr":"https://{{ .Node.Hostname }}:8201",
       "storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
       "listener":[{"tcp":{"address":"0.0.0.0:8200",
         "tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
         "tls_key_file":"/vault/certs/STAR.iklim.co_key.txt"}}],
       "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
  volumes:
    - /opt/iklimco/vault/data:/vault/file    # host path per node
    - /opt/iklimco/ssl:/vault/certs:ro
  deploy:
    mode: replicated
    replicas: 3
    placement:
      constraints:
        - node.labels.type == service

{{ .Node.Hostname }} is Docker Swarm's Go template for the node hostname — gives each Vault instance a unique node_id.

Raft join procedure (after deploying 3-replica Vault)

Only the leader needs to be bootstrapped; others join via vault operator raft join:

# On the primary Vault (service-1 container):
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)

# Unseal if needed
docker exec -it "$VAULT_CTR" vault operator unseal

# Check Raft peers
docker exec "$VAULT_CTR" vault operator raft list-peers

On service-2 and service-3 containers:

docker exec -it <vault-on-service-2> vault operator raft join \
  https://vault.iklim.co:8200

cert-reloader update for Raft

Update the cert-reloader command in docker-stack-infra.yml to SSH-copy the cert to service-2 and service-3 after renewal:

# After copying to local /opt/iklimco/ssl/:
ssh -i /run/secrets/cert_reloader_ssh_key service-2 \
  "cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
# (repeat for service-3 and privkey)
docker service update --force iklimco_vault

Add Docker secret to cert-reloader:

secrets:
  - cert_reloader_ssh_key

Reference