Environment_Infrastructure/roadmap/prod-env/03-infra-stack-changes.md
2026-05-09 16:26:06 +03:00

3.0 KiB

03 — docker-stack-infra.yml Changes (Prod)

Context

  • File: docker-stack-infra.yml (repo root — shared between test and prod)
  • All changes from test-env-setup/03-infra-stack-changes.md apply here identically.
  • Additional prod-specific changes:
    • PostgreSQL and MongoDB placement constraints point to type=db nodes.
    • Microservices have no constraint (distributed across service nodes by Swarm).
    • Replica counts for stateless services are increased.

Step 1 — Apply all test-env changes first

Follow every step in test-env-setup/03-infra-stack-changes.md:

  • Add swag service
  • Add cert-reloader service
  • Remove published ports for vault, apisix, rabbitmq, prometheus, grafana, apisix-dashboard
  • Add swag-vl volume

Step 2 — Update PostgreSQL placement constraint

Change postgres service placement to use the type=db label:

# CHANGE in postgres service:
      placement:
        constraints:
          - node.labels.type == db

Step 3 — Update MongoDB placement constraint

# CHANGE in mongo service:
      placement:
        constraints:
          - node.labels.type == db

Step 4 — Pin Vault to manager node (initial prod — single instance)

Vault starts as a single instance pinned to the manager node. Raft cluster migration is handled separately in 07-vault-raft-plan.md.

# Vault placement stays as:
      placement:
        constraints:
          - node.role == manager

Step 5 — Increase APISIX replicas for prod

# CHANGE in apisix service deploy block:
      mode: replicated
      replicas: 2      # was 1

APISIX is stateless (config in etcd) — multiple replicas are safe. Swarm load-balances SWAG's requests across APISIX replicas via VIP.

Step 6 — etcd: 3-node cluster for prod

For prod, etcd should run as a 3-node cluster (minimum for Raft quorum). The current single-instance etcd definition needs to be replaced with a 3-node StatefulSet-style setup using separate service definitions or a dedicated docker-stack-etcd.yml.

Scope note: etcd clustering for prod is complex and out of scope for initial launch. Deploy with single etcd for initial prod launch. Add etcd clustering as a follow-up task. Track in: Technical Debt/TODO.md

Step 7 — Verify the complete file

After all edits, validate the YAML:

docker stack config -c docker-stack-infra.yml > /dev/null && echo "✅ YAML valid"

No output errors = valid.

Placement summary for prod

Service Placement
swag node.role == manager
cert-reloader node.role == manager
vault node.role == manager
apisix (2 replicas) no constraint (any node)
apisix-dashboard no constraint
postgres node.labels.type == db
mongo node.labels.type == db
redis node.role == manager
rabbitmq node.role == manager
etcd node.role == manager
prometheus node.role == manager
grafana node.role == manager