Environment_Infrastructure/roadmap/prod-env/03-infra-stack-changes.md
Murat ÖZDEMİR 67dc2986dd docs(infra): restructure and update infrastructure setup documentation
- Anglicized setup and facts markdown file names for better consistency.

- Updated 01-swarm-init-multinode.md to highlight Ansible automation of Swarm initialization and labeling.

- Overhauled 03-infra-stack-changes.md to describe the single monolithic file strategy and reflect current Redis, RabbitMQ, and etcd cluster configurations.

- Fixed minor overrides and typos in Patroni templates and Ansible bootstrap documents.

- Restructured README and roadmap mapping to align with the renamed setup documents.
2026-06-15 16:42:18 +03:00

4.2 KiB

03 — Production Infrastructure and DB Stack Model

Context

This document records the production infrastructure target that is now implemented by the current setup runbooks. The execution source is no longer the old base-plus-prod overlay model.

Current references:

  • Setup source: ../../setup/08-prod-db-cluster-setup.md and ../../setup/09-prod-runner-ha-and-swarm.md
  • Main infra and DB stack: root docker-stack-infra_db-prod.yml
  • Vault stack: root docker-stack-vault.yml
  • Vault bootstrap: root init/vault/vault-bootstrap.sh, called through init-infra-prod.sh

Current Stack Strategy

Production uses a split stack model:

  • docker-stack-infra_db-prod.yml: APISIX, APISIX Dashboard, SWAG, cert services, Redis/Sentinel, RabbitMQ, Prometheus, Grafana, Patroni/PostgreSQL, MongoDB, and etcd.
  • docker-stack-vault.yml: Vault Raft cluster only.

The previous docker-stack-infra.yml + docker-stack-infra.prod.yml overlay strategy is superseded for production. Do not create or deploy docker-stack-infra.prod.yml for the current prod environment.

Placement Boundary

docker-stack-infra_db-prod.yml is intentionally a mixed stack. The placement model is the important boundary:

  • DB/cluster services run on iklim-db-*: Patroni/PostgreSQL, MongoDB, and etcd.
  • App/service-node infrastructure runs on iklim-app-* with node.labels.type == service: Redis, Redis Sentinel, RabbitMQ, APISIX, APISIX Dashboard, SWAG, cert-reloader/cert-distributor, Prometheus, and Grafana.
  • Redis and RabbitMQ are not DB-node host-mode services. They stay on the overlay network unless explicitly exposed by the stack or SWAG/APISIX.

DB services that require direct cluster traffic publish host-mode ports where the current stack defines them. Redis and RabbitMQ must not be changed to host-mode just because they live in the same stack file.

Current Production Services

Area Current model
APISIX 3 replicas on service nodes; config stored in etcd with /apisix prefix
Redis Sentinel model on service nodes; overlay-only
RabbitMQ 3-node service-node cluster; management exposed through SWAG, restricted by IP
Vault Separate 3-node Raft stack via docker-stack-vault.yml
PostgreSQL 3-node Patroni cluster on DB nodes
MongoDB 3-node replica set on DB nodes
etcd 3-node cluster on DB nodes, shared by Patroni and APISIX
Prometheus Single instance; local Docker volume
Grafana Single instance; StorageBox-backed data path

Monitoring Persistence

Prometheus TSDB remains on a local Docker volume because StorageBox/DAVFS is not suitable for Prometheus WAL and compaction I/O.

Grafana uses /mnt/storagebox/grafana/data through GRAFANA_DATA_DIR so dashboards, plugins, and the SQLite database survive manual service movement between service nodes.

APISIX and etcd

APISIX uses the DB-node etcd cluster through overlay DNS aliases such as etcd-01, etcd-02, and etcd-03. Patroni and APISIX use different etcd prefixes, so their data does not collide.

The app subnet to DB subnet firewall rule for etcd client traffic is part of the current production firewall model. See ../../setup/06-prod-terraform-iac.md.

Redis and RabbitMQ

Redis/Sentinel and RabbitMQ are service-node infrastructure. Their placement follows node.labels.type == service.

RabbitMQ-related private firewall rules belong to the app/service-node firewall model. Redis and Sentinel do not publish host-mode ports in the current prod stack and do not require Hetzner firewall openings.

Historical / Superseded by Setup

The following earlier roadmap ideas are retained only as historical context:

  • Creating docker-stack-infra.prod.yml as a prod overlay.
  • Deploying prod with docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco.
  • Keeping Vault inside the prod infra overlay with /opt/iklimco/vault/data host-path storage.
  • Treating PostgreSQL/MongoDB as separate DB stacks such as docker-stack-db.prod.yml.
  • Validating a prod merge with docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml.

For current execution, use the setup runbooks and root stack files listed in the Context section.