Murat ÖZDEMİR 8780c7c05e docs(db): implement direct cluster access strategy for production
- Updated roadmap (03-infra-stack-changes.md) to deprecate database proxies in prod.
- Detailed direct subnet access via WireGuard for production developers.
- Provided multi-host connection parameters for Patroni and MongoDB Replica Sets in setup guide (08-prod-db-cluster-kurulum.md).
- Added environment comparison table to developer access guide.
2026-05-18 14:25:26 +03:00

4.0 KiB

07 — Vault: 3-Node Raft Cluster (Prod)

Context

Vault starts directly as a 3-node Raft cluster in prod. The single-instance phase used in test is skipped.

Test used a single Vault instance (file storage, 1 replica on the manager node). Prod goes straight to Raft HA.

Vault service configuration

  • Replicas: 3 (one per service node)
  • Storage: Raft integrated storage
  • Placement: node.labels.type == service (all 3 app nodes)
  • Cert distribution: No SSH needed — all nodes mount StorageBox, cert-reloader writes to SWAG_CERT_DIR=/mnt/storagebox/ssl, Vault reads from that path on every node

Prerequisites

  • All 3 service nodes are running and labeled type=service
  • /mnt/storagebox/ssl/ directory is mounted and accessible on all 3 app nodes
  • Vault data directory /opt/iklimco/vault/data/ exists on all 3 nodes (host path volumes)

Vault service YAML (docker-stack-infra.prod.yml overlay)

vault:
  # ... (image, secrets, healthcheck unchanged from base)
  environment:
    VAULT_LOCAL_CONFIG: >-
      {"api_addr":"https://vault.iklim.co:8200",
       "cluster_addr":"https://{{ .Node.Hostname }}:8201",
       "storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
       "listener":[{"tcp":{"address":"0.0.0.0:8200",
         "tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
         "tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}],
       "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
  volumes:
    - /opt/iklimco/vault/data:/vault/file    # host path per node
    - ${SWAG_CERT_DIR}:/vault/certs:ro   # StorageBox — shared across all nodes, no SSH distribution needed
  deploy:
    mode: replicated
    replicas: 3
    placement:
      max_replicas_per_node: 1
      constraints:
        - node.labels.type == service

{{ .Node.Hostname }} is Docker Swarm's Go template for the node hostname — gives each Vault instance a unique node_id.

Raft initialization procedure (first deploy)

Step 1 — Deploy the stack

docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco

All 3 Vault containers start. Only the first one to initialize becomes the leader.

Step 2 — Initialize Vault on the leader (iklim-app-01)

VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
docker exec -it "$VAULT_CTR" vault operator init

Save the unseal keys and root token securely. Store the unseal key as a Docker secret:

echo -n "<unseal-key>" | docker secret create vault_unseal_key -

Step 3 — Unseal the leader

docker exec -it "$VAULT_CTR" vault operator unseal

The healthcheck auto-unseals on subsequent restarts via the vault_unseal_key secret.

Step 4 — Join remaining nodes to the Raft cluster

On iklim-app-02 and iklim-app-03 containers:

docker exec -it <vault-on-iklim-app-02> vault operator raft join \
  https://vault.iklim.co:8200

docker exec -it <vault-on-iklim-app-03> vault operator raft join \
  https://vault.iklim.co:8200

Unseal each node after joining:

docker exec -it <vault-on-iklim-app-02> vault operator unseal
docker exec -it <vault-on-iklim-app-03> vault operator unseal

Step 5 — Verify cluster

docker exec "$VAULT_CTR" vault operator raft list-peers

Expected: 3 peers, one leader, two follower.

cert-reloader — no additional changes needed for Raft

cert-reloader writes the cert to SWAG_CERT_DIR=/mnt/storagebox/ssl. Since StorageBox is mounted on all app nodes, every Vault instance already sees the same path.

The cert renewal flow works unchanged with Raft:

cert changed → copy to /mnt/storagebox/ssl/ → docker service update --force iklimco_vault
Vault (3 replicas) restart → each auto-unseals via healthcheck

Reference