# 03 — Production Infrastructure and DB Stack Model ## Context This document records the production infrastructure target that is now implemented by the current setup runbooks. The execution source is no longer the old base-plus-prod overlay model. Current references: - Main infra and DB stack: root `docker-stack-infra_db-prod.yml` - Vault stack: root `docker-stack-vault.yml` - Vault bootstrap: root `init/vault/vault-bootstrap.sh`, called through `init-infra-prod.sh` ## Current Stack Strategy Production uses a split stack model: - `docker-stack-infra_db-prod.yml`: APISIX, APISIX Dashboard, SWAG, cert services, Redis/Sentinel, RabbitMQ, Prometheus, Grafana, Patroni/PostgreSQL, MongoDB, and etcd. - `docker-stack-vault.yml`: Vault Raft cluster only. The previous `docker-stack-infra.yml` + `docker-stack-infra.prod.yml` overlay strategy is superseded for production. Do not create or deploy `docker-stack-infra.prod.yml` for the current prod environment. ## Placement Boundary `docker-stack-infra_db-prod.yml` is intentionally a mixed stack. The placement model is the important boundary: - DB/cluster services run on `iklim-db-*`: Patroni/PostgreSQL, MongoDB, and etcd. - App/service-node infrastructure runs on `iklim-app-*` with `node.labels.type == service`: Redis, Redis Sentinel, RabbitMQ, APISIX, APISIX Dashboard, SWAG, cert-reloader/cert-distributor, Prometheus, and Grafana. - Redis and RabbitMQ are not DB-node host-mode services. They stay on the overlay network unless explicitly exposed by the stack or SWAG/APISIX. DB services that require direct cluster traffic publish host-mode ports where the current stack defines them. Redis and RabbitMQ must not be changed to host-mode just because they live in the same stack file. ## Current Production Services | Area | Current model | | --- | --- | | APISIX | 3 replicas on service nodes; config stored in etcd with `/apisix` prefix | | Redis | Sentinel model on service nodes; overlay-only | | RabbitMQ | 3-node service-node cluster; management exposed through SWAG, restricted by IP | | Vault | Separate 3-node Raft stack via `docker-stack-vault.yml` | | PostgreSQL | 3-node Patroni cluster on DB nodes | | MongoDB | 3-node replica set on DB nodes | | etcd | 3-node cluster on DB nodes, shared by Patroni and APISIX | | Prometheus | Single instance; local Docker volume | | Grafana | Single instance; StorageBox-backed data path | ## Monitoring Persistence Prometheus TSDB remains on a local Docker volume because StorageBox/DAVFS is not suitable for Prometheus WAL and compaction I/O. Grafana uses `/mnt/storagebox/grafana/data` through `GRAFANA_DATA_DIR` so dashboards, plugins, and the SQLite database survive manual service movement between service nodes. ## APISIX and etcd APISIX uses the DB-node etcd cluster through overlay DNS aliases such as `etcd-01`, `etcd-02`, and `etcd-03`. Patroni and APISIX use different etcd prefixes, so their data does not collide. The app subnet to DB subnet firewall rule for etcd client traffic is part of the current production firewall model in `terraform/hetzner/prod/firewall.tf`. ## Redis and RabbitMQ Redis/Sentinel and RabbitMQ are service-node infrastructure. Their placement follows `node.labels.type == service`. RabbitMQ-related private firewall rules belong to the app/service-node firewall model. Redis and Sentinel do not publish host-mode ports in the current prod stack and do not require Hetzner firewall openings. ## Historical / Superseded by Setup The following earlier roadmap ideas are retained only as historical context: - Creating `docker-stack-infra.prod.yml` as a prod overlay. - Deploying prod with `docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco`. - Keeping Vault inside the prod infra overlay with `/opt/iklimco/vault/data` host-path storage. - Treating PostgreSQL/MongoDB as separate DB stacks such as `docker-stack-db.prod.yml`. - Validating a prod merge with `docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml`. For current execution, use the setup runbooks and root stack files listed in the Context section.