VaultTest/.gitea/workflows/deploy-prod.yml
Murat ÖZDEMİR 392a015b8d fix(vault): Stable Raft cluster formation and reliable multi-node unseal on Docker Swarm
Root cause: Docker Swarm assigns a new random container ID as $HOSTNAME on every
task restart, making node_id, api_addr, and cluster_addr change with each restart.
Vault could not recognize its own Raft data → cluster never reformed after restart.

Fixes:
- docker-stack-vault.yml: add hostname: "vault-{{.Task.Slot}}.iklim.co" so each
  replica gets a stable, slot-based hostname covered by the *.iklim.co wildcard cert.
  Replace STABLE_ID/NODE_ID_PLACEHOLDER logic with a single HOSTNAME_PLACEHOLDER sed.
  Replace single unseal attempt with a retry loop (90×2s) so peer nodes unseal as
  soon as they join Raft, without needing external intervention.
- vault-bootstrap.sh: add ADIM 6b — after rolling restart, wait for Raft leader to
  unseal, wait for all peers to join Raft (vault operator raft list-peers), then
  attempt explicit per-peer unseal via overlay network (best-effort).
  ADIM 4 early-exit now fires N requests to the shared alias; all must return
  Sealed: false before declaring the cluster healthy.
  ADIM 7 polls up to 4 minutes via check_cluster_unsealed (9 shared-alias requests)
  and retries peer unseal on each iteration.
- deploy-prod.yml: health check now fires 9 requests to the shared alias; all must
  return Sealed: false (single-node check was masking partially-sealed clusters).
2026-06-10 18:17:59 +03:00

57 lines
1.8 KiB
YAML

name: Deploy Vault Stack to Production
on:
push:
branches:
- prod-env
concurrency:
group: vault-prod-deploy
cancel-in-progress: false
jobs:
deploy:
runs-on: prod-runner
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Connect Runner to Overlay Network
run: docker network connect iklimco-net $(hostname) || true
- name: Ensure vault_unseal_key placeholder exists
run: |
docker secret ls --format '{{.Name}}' | grep -q '^vault_unseal_key' || \
echo "bootstrap" | docker secret create vault_unseal_key -
- name: Deploy Vault Stack
run: |
docker stack deploy \
--with-registry-auth \
-c docker-stack-vault.yml \
iklimco
- name: Run Bootstrap
env:
SKIP_DEPLOY: "true"
run: bash vault-bootstrap.sh
- name: Verify Vault Cluster Health
run: |
# Fire 9 requests to the shared alias (load-balanced across all 3 nodes).
# Every request must return Sealed: false — one healthy node is not enough.
SEALED_COUNT=0
for i in $(seq 1 9); do
SEALED=$(docker run --rm --network iklimco-net hashicorp/vault:2.0.1 \
sh -c "VAULT_ADDR=https://vault.iklim.co:8200 VAULT_SKIP_VERIFY=true vault status 2>/dev/null" \
| awk '/^Sealed/{print $2}' || echo "true")
[ "$SEALED" = "true" ] && SEALED_COUNT=$((SEALED_COUNT+1))
sleep 1
done
if [ "$SEALED_COUNT" -eq 0 ]; then
echo "Vault cluster is fully unsealed and healthy (9/9 checks passed)"
else
echo "ERROR: $SEALED_COUNT/9 checks returned sealed or unreachable"
exit 1
fi