2026-05-09 16:26:06 +03:00

106 lines
3.7 KiB
Markdown

# 07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod)
## Context
Vault starts as a single instance on the manager node (service-1) for the initial prod launch.
This matches the current `docker-stack-infra.yml` configuration (file storage, single replica).
Raft HA cluster is planned for a later phase.
## Phase 1 — Initial prod launch (current)
- **Replicas:** 1
- **Storage:** file (`/vault/file`) on service-1
- **Placement:** `node.role == manager` (service-1)
- **Cert:** from `/opt/iklimco/ssl/` (populated by cert-reloader from SWAG volume)
- **TLS:** `VAULT_LOCAL_CONFIG` unchanged — `api_addr: https://vault.iklim.co:8200`
No changes to `docker-stack-infra.yml` vault service for Phase 1.
## Phase 2 — Vault Raft Cluster (future)
### What changes
- **Replicas:** 3 (one per service node)
- **Storage:** Raft integrated (replaces file storage)
- **Placement:** `node.labels.type == service` (all 3 service nodes)
- **Cert distribution:** cert-reloader SSH-copies renewed cert to service-2, service-3
### Prerequisites before migration
- [ ] All 3 service nodes are running and labeled `type=service`
- [ ] Vault data backed up from Phase 1 (snapshot via `vault operator raft snapshot save`)
- [ ] SSH key created for cert-reloader to reach service-2 and service-3
- [ ] SSH key stored as Docker secret `cert_reloader_ssh_key`
- [ ] `/opt/iklimco/ssl/` directory exists on service-2 and service-3
- [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes)
### Vault service update for Raft
```yaml
vault:
# ... (image, secrets, healthcheck unchanged)
environment:
VAULT_LOCAL_CONFIG: >-
{"api_addr":"https://vault.iklim.co:8200",
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
"listener":[{"tcp":{"address":"0.0.0.0:8200",
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
"tls_key_file":"/vault/certs/STAR.iklim.co_key.txt"}}],
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file # host path per node
- /opt/iklimco/ssl:/vault/certs:ro
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
```
> `{{ .Node.Hostname }}` is Docker Swarm's Go template for the node hostname —
> gives each Vault instance a unique `node_id`.
### Raft join procedure (after deploying 3-replica Vault)
Only the leader needs to be bootstrapped; others join via `vault operator raft join`:
```bash
# On the primary Vault (service-1 container):
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
# Unseal if needed
docker exec -it "$VAULT_CTR" vault operator unseal
# Check Raft peers
docker exec "$VAULT_CTR" vault operator raft list-peers
```
On service-2 and service-3 containers:
```bash
docker exec -it <vault-on-service-2> vault operator raft join \
https://vault.iklim.co:8200
```
### cert-reloader update for Raft
Update the cert-reloader command in `docker-stack-infra.yml` to SSH-copy the cert
to service-2 and service-3 after renewal:
```bash
# After copying to local /opt/iklimco/ssl/:
ssh -i /run/secrets/cert_reloader_ssh_key service-2 \
"cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
# (repeat for service-3 and privkey)
docker service update --force iklimco_vault
```
Add Docker secret to cert-reloader:
```yaml
secrets:
- cert_reloader_ssh_key
```
## Reference
- Vault Raft storage docs: https://developer.hashicorp.com/vault/docs/configuration/storage/raft
- Vault Swarm setup: https://manjit28.medium.com/setting-up-a-secure-and-highly-available-hashicorp-vault-cluster-for-secrets-and-certificates-0ce01a370582