Environment_Infrastructure/roadmap/prod-env/03-infra-stack-changes.md
2026-05-09 16:26:06 +03:00

99 lines
3.0 KiB
Markdown

# 03 — docker-stack-infra.yml Changes (Prod)
## Context
- **File:** `docker-stack-infra.yml` (repo root — shared between test and prod)
- All changes from `test-env-setup/03-infra-stack-changes.md` apply here identically.
- **Additional prod-specific changes:**
- PostgreSQL and MongoDB placement constraints point to `type=db` nodes.
- Microservices have no constraint (distributed across service nodes by Swarm).
- Replica counts for stateless services are increased.
## Step 1 — Apply all test-env changes first
Follow every step in `test-env-setup/03-infra-stack-changes.md`:
- Add `swag` service
- Add `cert-reloader` service
- Remove published ports for vault, apisix, rabbitmq, prometheus, grafana, apisix-dashboard
- Add `swag-vl` volume
## Step 2 — Update PostgreSQL placement constraint
Change `postgres` service placement to use the `type=db` label:
```yaml
# CHANGE in postgres service:
placement:
constraints:
- node.labels.type == db
```
## Step 3 — Update MongoDB placement constraint
```yaml
# CHANGE in mongo service:
placement:
constraints:
- node.labels.type == db
```
## Step 4 — Pin Vault to manager node (initial prod — single instance)
Vault starts as a single instance pinned to the manager node.
Raft cluster migration is handled separately in `07-vault-raft-plan.md`.
```yaml
# Vault placement stays as:
placement:
constraints:
- node.role == manager
```
## Step 5 — Increase APISIX replicas for prod
```yaml
# CHANGE in apisix service deploy block:
mode: replicated
replicas: 2 # was 1
```
APISIX is stateless (config in etcd) — multiple replicas are safe.
Swarm load-balances SWAG's requests across APISIX replicas via VIP.
## Step 6 — etcd: 3-node cluster for prod
For prod, etcd should run as a 3-node cluster (minimum for Raft quorum).
The current single-instance etcd definition needs to be replaced with a 3-node
StatefulSet-style setup using separate service definitions or a dedicated
`docker-stack-etcd.yml`.
> **Scope note:** etcd clustering for prod is complex and out of scope for initial launch.
> Deploy with single etcd for initial prod launch. Add etcd clustering as a follow-up task.
> Track in: `Technical Debt/TODO.md`
## Step 7 — Verify the complete file
After all edits, validate the YAML:
```bash
docker stack config -c docker-stack-infra.yml > /dev/null && echo "✅ YAML valid"
```
No output errors = valid.
## Placement summary for prod
| Service | Placement |
|---------|-----------|
| swag | `node.role == manager` |
| cert-reloader | `node.role == manager` |
| vault | `node.role == manager` |
| apisix (2 replicas) | no constraint (any node) |
| apisix-dashboard | no constraint |
| postgres | `node.labels.type == db` |
| mongo | `node.labels.type == db` |
| redis | `node.role == manager` |
| rabbitmq | `node.role == manager` |
| etcd | `node.role == manager` |
| prometheus | `node.role == manager` |
| grafana | `node.role == manager` |