- Database nodes now join the Docker Swarm as workers with `role=db` labels, allowing Swarm to manage their dedicated services. - The `docker-stack-infra.yml` has been updated for production to focus solely on application-level infrastructure components. - Dedicated database services (PostgreSQL, MongoDB, Patroni-etcd) are now explicitly deployed in separate Swarm stacks on `iklim-db-XX` nodes. - Standardizes node naming conventions (`iklim-app-XX`, `iklim-db-XX`) across the production roadmap documentation. - Clarifies that the `etcd` service within `docker-stack-infra.yml` is exclusively for APISIX configuration, distinct from Patroni's etcd cluster.
106 lines
3.8 KiB
Markdown
106 lines
3.8 KiB
Markdown
# 07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod)
|
|
|
|
## Context
|
|
Vault starts as a single instance on the manager node (iklim-app-01) for the initial prod launch.
|
|
This matches the current `docker-stack-infra.yml` configuration (file storage, single replica).
|
|
|
|
Raft HA cluster is planned for a later phase.
|
|
|
|
## Phase 1 — Initial prod launch (current)
|
|
|
|
- **Replicas:** 1
|
|
- **Storage:** file (`/vault/file`) on iklim-app-01
|
|
- **Placement:** `node.role == manager` (iklim-app-01)
|
|
- **Cert:** from `/opt/iklimco/ssl/` (populated by cert-reloader from SWAG volume)
|
|
- **TLS:** `VAULT_LOCAL_CONFIG` unchanged — `api_addr: https://vault.iklim.co:8200`
|
|
|
|
No changes to `docker-stack-infra.yml` vault service for Phase 1.
|
|
|
|
## Phase 2 — Vault Raft Cluster (future)
|
|
|
|
### What changes
|
|
- **Replicas:** 3 (one per service node)
|
|
- **Storage:** Raft integrated (replaces file storage)
|
|
- **Placement:** `node.labels.type == service` (all 3 service nodes)
|
|
- **Cert distribution:** cert-reloader SSH-copies renewed cert to iklim-app-02, iklim-app-03
|
|
|
|
### Prerequisites before migration
|
|
- [ ] All 3 service nodes are running and labeled `type=service`
|
|
- [ ] Vault data backed up from Phase 1 (snapshot via `vault operator raft snapshot save`)
|
|
- [ ] SSH key created for cert-reloader to reach iklim-app-02 and iklim-app-03
|
|
- [ ] SSH key stored as Docker secret `cert_reloader_ssh_key`
|
|
- [ ] `/opt/iklimco/ssl/` directory exists on iklim-app-02 and iklim-app-03
|
|
- [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes)
|
|
|
|
### Vault service update for Raft
|
|
|
|
```yaml
|
|
vault:
|
|
# ... (image, secrets, healthcheck unchanged)
|
|
environment:
|
|
VAULT_LOCAL_CONFIG: >-
|
|
{"api_addr":"https://vault.iklim.co:8200",
|
|
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
|
|
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
|
|
"listener":[{"tcp":{"address":"0.0.0.0:8200",
|
|
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
|
|
"tls_key_file":"/vault/certs/STAR.iklim.co_key.txt"}}],
|
|
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
|
|
volumes:
|
|
- /opt/iklimco/vault/data:/vault/file # host path per node
|
|
- /opt/iklimco/ssl:/vault/certs:ro
|
|
deploy:
|
|
mode: replicated
|
|
replicas: 3
|
|
placement:
|
|
constraints:
|
|
- node.labels.type == service
|
|
```
|
|
|
|
> `{{ .Node.Hostname }}` is Docker Swarm's Go template for the node hostname —
|
|
> gives each Vault instance a unique `node_id`.
|
|
|
|
### Raft join procedure (after deploying 3-replica Vault)
|
|
|
|
Only the leader needs to be bootstrapped; others join via `vault operator raft join`:
|
|
|
|
```bash
|
|
# On the primary Vault (iklim-app-01 container):
|
|
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
|
|
|
|
# Unseal if needed
|
|
docker exec -it "$VAULT_CTR" vault operator unseal
|
|
|
|
# Check Raft peers
|
|
docker exec "$VAULT_CTR" vault operator raft list-peers
|
|
```
|
|
|
|
On iklim-app-02 and iklim-app-03 containers:
|
|
```bash
|
|
docker exec -it <vault-on-iklim-app-02> vault operator raft join \
|
|
https://vault.iklim.co:8200
|
|
```
|
|
|
|
### cert-reloader update for Raft
|
|
|
|
Update the cert-reloader command in `docker-stack-infra.yml` to SSH-copy the cert
|
|
to iklim-app-02 and iklim-app-03 after renewal:
|
|
|
|
```bash
|
|
# After copying to local /opt/iklimco/ssl/:
|
|
ssh -i /run/secrets/cert_reloader_ssh_key iklim-app-02 \
|
|
"cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
|
|
# (repeat for iklim-app-03 and privkey)
|
|
docker service update --force iklimco_vault
|
|
```
|
|
|
|
Add Docker secret to cert-reloader:
|
|
```yaml
|
|
secrets:
|
|
- cert_reloader_ssh_key
|
|
```
|
|
|
|
## Reference
|
|
- Vault Raft storage docs: https://developer.hashicorp.com/vault/docs/configuration/storage/raft
|
|
- Vault Swarm setup: https://manjit28.medium.com/setting-up-a-secure-and-highly-available-hashicorp-vault-cluster-for-secrets-and-certificates-0ce01a370582
|