4 Commits

Author SHA1 Message Date
483bd40cc4 docs(vault): Document Shamir HA unseal and localize bootstrap script
Updates README.md with the new Shamir-based auto-unseal process using Docker secrets,
eliminating the need for a separate transit vault.

Adds `failover_scenarios.md` to detail the Vault cluster's resilience under
various failure conditions.

Translates `vault-bootstrap.sh` messages and step titles from Turkish to English,
and aligns its execution flow with the updated documentation.
2026-06-10 19:04:48 +03:00
392a015b8d fix(vault): Stable Raft cluster formation and reliable multi-node unseal on Docker Swarm
Root cause: Docker Swarm assigns a new random container ID as $HOSTNAME on every
task restart, making node_id, api_addr, and cluster_addr change with each restart.
Vault could not recognize its own Raft data → cluster never reformed after restart.

Fixes:
- docker-stack-vault.yml: add hostname: "vault-{{.Task.Slot}}.iklim.co" so each
  replica gets a stable, slot-based hostname covered by the *.iklim.co wildcard cert.
  Replace STABLE_ID/NODE_ID_PLACEHOLDER logic with a single HOSTNAME_PLACEHOLDER sed.
  Replace single unseal attempt with a retry loop (90×2s) so peer nodes unseal as
  soon as they join Raft, without needing external intervention.
- vault-bootstrap.sh: add ADIM 6b — after rolling restart, wait for Raft leader to
  unseal, wait for all peers to join Raft (vault operator raft list-peers), then
  attempt explicit per-peer unseal via overlay network (best-effort).
  ADIM 4 early-exit now fires N requests to the shared alias; all must return
  Sealed: false before declaring the cluster healthy.
  ADIM 7 polls up to 4 minutes via check_cluster_unsealed (9 shared-alias requests)
  and retries peer unseal on each iteration.
- deploy-prod.yml: health check now fires 9 requests to the shared alias; all must
  return Sealed: false (single-node check was masking partially-sealed clusters).
2026-06-10 18:17:59 +03:00
508363fc75 refactor(vault): Replace transit auto-unseal with Shamir + Docker secret
Remove vault-transit service entirely. Each vault node now auto-unseals at
startup by reading the Shamir unseal key from a Docker secret managed by
vault-bootstrap.sh. Eliminates the transit token expiry failure mode and
removes the vault_transit node-pinning requirement.

Changes:
- docker-stack-vault.yml: remove vault-transit service, vault_transit_config,
  vault-transit-data-vl, transit_master_token / vault_transit_unseal_key
  secrets; add vault_unseal_key secret; rewrite vault entrypoint to background
  start + poll + auto-unseal loop
- vault-template-v1.json, vault-template-v2.json: remove seal.transit block
- vault-template-transit.json: deleted (vault-transit is gone)
- vault-bootstrap.sh: full rewrite — node-agnostic run_vault() helper (docker
  exec fallback to docker run over overlay network), 7-step Shamir flow with
  SKIP_DEPLOY support and early-exit when vault is already healthy
- deploy-prod.yml: replace BE-Forecast deploy with vault stack deploy +
  bootstrap (SKIP_DEPLOY=true) + cluster health check
2026-06-10 13:37:32 +03:00
bf81b6ebee feat: initialize vault transit auto-unseal documentation and configs
- Added comprehensive step-by-step guide in README.md for Vault Transit auto-unseal setup.
- Included Docker Swarm stack definition (docker-stack-vault.yml).
- Added Vault configuration templates and bootstrap scripts.
- Configured Gitea workflows for the VaultTest environment.
2026-05-27 01:48:30 +03:00