Remove setup runbook references from prod roadmap docs so roadmap remains design intent only. Keep setup-to-roadmap links, but normalize them to explicit relative paths.
75 lines
4.0 KiB
Markdown
75 lines
4.0 KiB
Markdown
# 03 — Production Infrastructure and DB Stack Model
|
|
|
|
## Context
|
|
|
|
This document records the production infrastructure target that is now implemented by the current setup runbooks. The execution source is no longer the old base-plus-prod overlay model.
|
|
|
|
Current references:
|
|
|
|
- Main infra and DB stack: root `docker-stack-infra_db-prod.yml`
|
|
- Vault stack: root `docker-stack-vault.yml`
|
|
- Vault bootstrap: root `init/vault/vault-bootstrap.sh`, called through `init-infra-prod.sh`
|
|
|
|
## Current Stack Strategy
|
|
|
|
Production uses a split stack model:
|
|
|
|
- `docker-stack-infra_db-prod.yml`: APISIX, APISIX Dashboard, SWAG, cert services, Redis/Sentinel, RabbitMQ, Prometheus, Grafana, Patroni/PostgreSQL, MongoDB, and etcd.
|
|
- `docker-stack-vault.yml`: Vault Raft cluster only.
|
|
|
|
The previous `docker-stack-infra.yml` + `docker-stack-infra.prod.yml` overlay strategy is superseded for production. Do not create or deploy `docker-stack-infra.prod.yml` for the current prod environment.
|
|
|
|
## Placement Boundary
|
|
|
|
`docker-stack-infra_db-prod.yml` is intentionally a mixed stack. The placement model is the important boundary:
|
|
|
|
- DB/cluster services run on `iklim-db-*`: Patroni/PostgreSQL, MongoDB, and etcd.
|
|
- App/service-node infrastructure runs on `iklim-app-*` with `node.labels.type == service`: Redis, Redis Sentinel, RabbitMQ, APISIX, APISIX Dashboard, SWAG, cert-reloader/cert-distributor, Prometheus, and Grafana.
|
|
- Redis and RabbitMQ are not DB-node host-mode services. They stay on the overlay network unless explicitly exposed by the stack or SWAG/APISIX.
|
|
|
|
DB services that require direct cluster traffic publish host-mode ports where the current stack defines them. Redis and RabbitMQ must not be changed to host-mode just because they live in the same stack file.
|
|
|
|
## Current Production Services
|
|
|
|
| Area | Current model |
|
|
| --- | --- |
|
|
| APISIX | 3 replicas on service nodes; config stored in etcd with `/apisix` prefix |
|
|
| Redis | Sentinel model on service nodes; overlay-only |
|
|
| RabbitMQ | 3-node service-node cluster; management exposed through SWAG, restricted by IP |
|
|
| Vault | Separate 3-node Raft stack via `docker-stack-vault.yml` |
|
|
| PostgreSQL | 3-node Patroni cluster on DB nodes |
|
|
| MongoDB | 3-node replica set on DB nodes |
|
|
| etcd | 3-node cluster on DB nodes, shared by Patroni and APISIX |
|
|
| Prometheus | Single instance; local Docker volume |
|
|
| Grafana | Single instance; StorageBox-backed data path |
|
|
|
|
## Monitoring Persistence
|
|
|
|
Prometheus TSDB remains on a local Docker volume because StorageBox/DAVFS is not suitable for Prometheus WAL and compaction I/O.
|
|
|
|
Grafana uses `/mnt/storagebox/grafana/data` through `GRAFANA_DATA_DIR` so dashboards, plugins, and the SQLite database survive manual service movement between service nodes.
|
|
|
|
## APISIX and etcd
|
|
|
|
APISIX uses the DB-node etcd cluster through overlay DNS aliases such as `etcd-01`, `etcd-02`, and `etcd-03`. Patroni and APISIX use different etcd prefixes, so their data does not collide.
|
|
|
|
The app subnet to DB subnet firewall rule for etcd client traffic is part of the current production firewall model in `terraform/hetzner/prod/firewall.tf`.
|
|
|
|
## Redis and RabbitMQ
|
|
|
|
Redis/Sentinel and RabbitMQ are service-node infrastructure. Their placement follows `node.labels.type == service`.
|
|
|
|
RabbitMQ-related private firewall rules belong to the app/service-node firewall model. Redis and Sentinel do not publish host-mode ports in the current prod stack and do not require Hetzner firewall openings.
|
|
|
|
## Historical / Superseded by Setup
|
|
|
|
The following earlier roadmap ideas are retained only as historical context:
|
|
|
|
- Creating `docker-stack-infra.prod.yml` as a prod overlay.
|
|
- Deploying prod with `docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco`.
|
|
- Keeping Vault inside the prod infra overlay with `/opt/iklimco/vault/data` host-path storage.
|
|
- Treating PostgreSQL/MongoDB as separate DB stacks such as `docker-stack-db.prod.yml`.
|
|
- Validating a prod merge with `docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml`.
|
|
|
|
For current execution, use the setup runbooks and root stack files listed in the Context section.
|