Updates README.md with the new Shamir-based auto-unseal process using Docker secrets,
eliminating the need for a separate transit vault.
Adds `failover_scenarios.md` to detail the Vault cluster's resilience under
various failure conditions.
Translates `vault-bootstrap.sh` messages and step titles from Turkish to English,
and aligns its execution flow with the updated documentation.
Root cause: Docker Swarm assigns a new random container ID as $HOSTNAME on every
task restart, making node_id, api_addr, and cluster_addr change with each restart.
Vault could not recognize its own Raft data → cluster never reformed after restart.
Fixes:
- docker-stack-vault.yml: add hostname: "vault-{{.Task.Slot}}.iklim.co" so each
replica gets a stable, slot-based hostname covered by the *.iklim.co wildcard cert.
Replace STABLE_ID/NODE_ID_PLACEHOLDER logic with a single HOSTNAME_PLACEHOLDER sed.
Replace single unseal attempt with a retry loop (90×2s) so peer nodes unseal as
soon as they join Raft, without needing external intervention.
- vault-bootstrap.sh: add ADIM 6b — after rolling restart, wait for Raft leader to
unseal, wait for all peers to join Raft (vault operator raft list-peers), then
attempt explicit per-peer unseal via overlay network (best-effort).
ADIM 4 early-exit now fires N requests to the shared alias; all must return
Sealed: false before declaring the cluster healthy.
ADIM 7 polls up to 4 minutes via check_cluster_unsealed (9 shared-alias requests)
and retries peer unseal on each iteration.
- deploy-prod.yml: health check now fires 9 requests to the shared alias; all must
return Sealed: false (single-node check was masking partially-sealed clusters).
Remove vault-transit service entirely. Each vault node now auto-unseals at
startup by reading the Shamir unseal key from a Docker secret managed by
vault-bootstrap.sh. Eliminates the transit token expiry failure mode and
removes the vault_transit node-pinning requirement.
Changes:
- docker-stack-vault.yml: remove vault-transit service, vault_transit_config,
vault-transit-data-vl, transit_master_token / vault_transit_unseal_key
secrets; add vault_unseal_key secret; rewrite vault entrypoint to background
start + poll + auto-unseal loop
- vault-template-v1.json, vault-template-v2.json: remove seal.transit block
- vault-template-transit.json: deleted (vault-transit is gone)
- vault-bootstrap.sh: full rewrite — node-agnostic run_vault() helper (docker
exec fallback to docker run over overlay network), 7-step Shamir flow with
SKIP_DEPLOY support and early-exit when vault is already healthy
- deploy-prod.yml: replace BE-Forecast deploy with vault stack deploy +
bootstrap (SKIP_DEPLOY=true) + cluster health check