# 07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod) ## Context Vault starts as a single instance on the manager node (iklim-app-01) for the initial prod launch. This matches the current `docker-stack-infra.yml` configuration (file storage, single replica). Raft HA cluster is planned for a later phase. ## Phase 1 — Initial prod launch (current) - **Replicas:** 1 - **Storage:** file (`/vault/file`) on iklim-app-01 - **Placement:** `node.role == manager` (iklim-app-01) - **Cert:** from `/opt/iklimco/ssl/` (populated by cert-reloader from SWAG volume) - **TLS:** `VAULT_LOCAL_CONFIG` unchanged — `api_addr: https://vault.iklim.co:8200` No changes to `docker-stack-infra.yml` vault service for Phase 1. ## Phase 2 — Vault Raft Cluster (future) ### What changes - **Replicas:** 3 (one per service node) - **Storage:** Raft integrated (replaces file storage) - **Placement:** `node.labels.type == service` (all 3 service nodes) - **Cert distribution:** cert-reloader SSH-copies renewed cert to iklim-app-02, iklim-app-03 ### Prerequisites before migration - [ ] All 3 service nodes are running and labeled `type=service` - [ ] Vault data backed up from Phase 1 (snapshot via `vault operator raft snapshot save`) - [ ] SSH key created for cert-reloader to reach iklim-app-02 and iklim-app-03 - [ ] SSH key stored as Docker secret `cert_reloader_ssh_key` - [ ] `/opt/iklimco/ssl/` directory exists on iklim-app-02 and iklim-app-03 - [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes) ### Vault service update for Raft ```yaml vault: # ... (image, secrets, healthcheck unchanged) environment: VAULT_LOCAL_CONFIG: >- {"api_addr":"https://vault.iklim.co:8200", "cluster_addr":"https://{{ .Node.Hostname }}:8201", "storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}}, "listener":[{"tcp":{"address":"0.0.0.0:8200", "tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt", "tls_key_file":"/vault/certs/STAR.iklim.co_key.txt"}}], "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true} volumes: - /opt/iklimco/vault/data:/vault/file # host path per node - /opt/iklimco/ssl:/vault/certs:ro deploy: mode: replicated replicas: 3 placement: constraints: - node.labels.type == service ``` > `{{ .Node.Hostname }}` is Docker Swarm's Go template for the node hostname — > gives each Vault instance a unique `node_id`. ### Raft join procedure (after deploying 3-replica Vault) Only the leader needs to be bootstrapped; others join via `vault operator raft join`: ```bash # On the primary Vault (iklim-app-01 container): VAULT_CTR=$(docker ps -q -f name=iklimco_vault) # Unseal if needed docker exec -it "$VAULT_CTR" vault operator unseal # Check Raft peers docker exec "$VAULT_CTR" vault operator raft list-peers ``` On iklim-app-02 and iklim-app-03 containers: ```bash docker exec -it vault operator raft join \ https://vault.iklim.co:8200 ``` ### cert-reloader update for Raft Update the cert-reloader command in `docker-stack-infra.yml` to SSH-copy the cert to iklim-app-02 and iklim-app-03 after renewal: ```bash # After copying to local /opt/iklimco/ssl/: ssh -i /run/secrets/cert_reloader_ssh_key iklim-app-02 \ "cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt # (repeat for iklim-app-03 and privkey) docker service update --force iklimco_vault ``` Add Docker secret to cert-reloader: ```yaml secrets: - cert_reloader_ssh_key ``` ## Reference - Vault Raft storage docs: https://developer.hashicorp.com/vault/docs/configuration/storage/raft - Vault Swarm setup: https://manjit28.medium.com/setting-up-a-secure-and-highly-available-hashicorp-vault-cluster-for-secrets-and-certificates-0ce01a370582