# 07 — Vault: 3-Node Raft Cluster (Prod) ## Context Vault starts directly as a 3-node Raft cluster in prod. The single-instance phase used in test is skipped. Test used a single Vault instance (file storage, 1 replica on the manager node). Prod goes straight to Raft HA. ## Vault service configuration - **Replicas:** 3 (one per service node) - **Storage:** Raft integrated storage - **Placement:** `node.labels.type == service` (all 3 app nodes) - **Cert distribution:** No SSH needed — all nodes mount StorageBox, cert-reloader writes to `SWAG_CERT_DIR=/mnt/storagebox/ssl`, Vault reads from that path on every node ### Prerequisites - [ ] All 3 service nodes are running and labeled `type=service` - [ ] `/mnt/storagebox/ssl/` directory is mounted and accessible on all 3 app nodes - [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes) ### Vault service YAML (docker-stack-infra.prod.yml overlay) ```yaml vault: # ... (image, secrets, healthcheck unchanged from base) environment: VAULT_LOCAL_CONFIG: >- {"api_addr":"https://vault.iklim.co:8200", "cluster_addr":"https://{{ .Node.Hostname }}:8201", "storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}}, "listener":[{"tcp":{"address":"0.0.0.0:8200", "tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt", "tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}], "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true} volumes: - /opt/iklimco/vault/data:/vault/file # host path per node - ${SWAG_CERT_DIR}:/vault/certs:ro # StorageBox — shared across all nodes, no SSH distribution needed deploy: mode: replicated replicas: 3 placement: max_replicas_per_node: 1 constraints: - node.labels.type == service ``` > `{{ .Node.Hostname }}` is Docker Swarm's Go template for the node hostname — > gives each Vault instance a unique `node_id`. ## Raft initialization procedure (first deploy) ### Step 1 — Deploy the stack ```bash docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco ``` All 3 Vault containers start. Only the first one to initialize becomes the leader. ### Step 2 — Initialize Vault on the leader (iklim-app-01) ```bash VAULT_CTR=$(docker ps -q -f name=iklimco_vault) docker exec -it "$VAULT_CTR" vault operator init ``` Save the unseal keys and root token securely. Store the unseal key as a Docker secret: ```bash echo -n "" | docker secret create vault_unseal_key - ``` ### Step 3 — Unseal the leader ```bash docker exec -it "$VAULT_CTR" vault operator unseal ``` The healthcheck auto-unseals on subsequent restarts via the `vault_unseal_key` secret. ### Step 4 — Join remaining nodes to the Raft cluster On iklim-app-02 and iklim-app-03 containers: ```bash docker exec -it vault operator raft join \ https://vault.iklim.co:8200 docker exec -it vault operator raft join \ https://vault.iklim.co:8200 ``` Unseal each node after joining: ```bash docker exec -it vault operator unseal docker exec -it vault operator unseal ``` ### Step 5 — Verify cluster ```bash docker exec "$VAULT_CTR" vault operator raft list-peers ``` Expected: 3 peers, one `leader`, two `follower`. ## cert-reloader — no additional changes needed for Raft cert-reloader writes the cert to `SWAG_CERT_DIR=/mnt/storagebox/ssl`. Since StorageBox is mounted on all app nodes, every Vault instance already sees the same path. The cert renewal flow works unchanged with Raft: ``` cert changed → copy to /mnt/storagebox/ssl/ → docker service update --force iklimco_vault Vault (3 replicas) restart → each auto-unseals via healthcheck ``` ## Reference - Vault Raft storage docs: https://developer.hashicorp.com/vault/docs/configuration/storage/raft - Vault Swarm setup: https://manjit28.medium.com/setting-up-a-secure-and-highly-available-hashicorp-vault-cluster-for-secrets-and-certificates-0ce01a370582