- Refactor production setup documentation to reflect a 3-node Vault Raft cluster starting from launch. - Update all paths to use StorageBox mounts for shared state (SWAG config, TLS certs, Monitoring data). - Switch Nginx configuration convention from proxy-confs to site-confs to align with SWAG's auto-include behavior. - Standardize TLS private key extensions to .pem. - Update node failover and recovery facts to include monitoring services. - Align deployment pipeline instructions with the latest environment variable-driven approach.
568 lines
18 KiB
Markdown
568 lines
18 KiB
Markdown
# 03 — docker-stack-infra.yml Changes (Prod)
|
||
|
||
## Context
|
||
|
||
### File strategy — overlay approach
|
||
|
||
Prod-specific service changes are **not written directly** into `docker-stack-infra.yml`; they are kept in a separate overlay file:
|
||
|
||
| File | Usage |
|
||
|------|-------|
|
||
| `docker-stack-infra.yml` | Base — works as-is for test |
|
||
| `docker-stack-infra.prod.yml` | Prod overlay — additional services and overrides |
|
||
|
||
```bash
|
||
# Test deploy:
|
||
docker stack deploy -c docker-stack-infra.yml iklimco
|
||
|
||
# Prod deploy (Swarm merges both files):
|
||
docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
|
||
```
|
||
|
||
Docker Swarm merge rule: if the same service name appears in both files, the overlay wins (deploy, environment, etc.); services only present in the overlay are added.
|
||
|
||
### Prod-specific changes summary
|
||
- APISIX: 1 → 3 replicas (overlay override)
|
||
- Redis: single-instance → Sentinel cluster — 1 master + 2 replicas + 3 sentinels (overlay adds new services)
|
||
- RabbitMQ: 1 → 3-node Erlang cluster (overlay override + env)
|
||
- Vault: 1 → 3-node Raft cluster (overlay override) — see `07-vault-raft-plan.md`
|
||
- No separate APISIX etcd: Patroni etcd is shared (`/apisix` prefix)
|
||
- `init/apisix-core/init.sh`: when `PROFILE=prod`, rate limit `policy:local` → `policy:redis`
|
||
|
||
### swag-vl volume — not used in prod, not defined in overlay
|
||
|
||
Test-env Step 9 adds the `swag-vl` named volume to the base file. In prod, SWAG mounts to the StorageBox via the `${SWAG_CONFIG_DIR}` env var, so this volume is unused by any service. No need to remove it in the overlay — Swarm does not create unused volume definitions, it remains harmless.
|
||
|
||
No `swag-vl` definition is made in `docker-stack-infra.prod.yml`.
|
||
|
||
### Monitoring Persistence (StorageBox)
|
||
|
||
Prometheus and Grafana run as single instances. To ensure monitoring data and dashboards survive a node failover (moving from `iklim-app-01` to another node), their data is stored on the shared StorageBox:
|
||
- **Prometheus:** `/mnt/storagebox/prometheus/data`
|
||
- **Grafana:** `/mnt/storagebox/grafana/data`
|
||
|
||
These paths are mounted via env vars (`PROMETHEUS_DATA_DIR`, `GRAFANA_DATA_DIR`) with named-volume fallbacks for test. See Step 8 for implementation details.
|
||
|
||
**Note:** PostgreSQL and MongoDB are not in `docker-stack-infra.yml`. They run in separate stacks on DB nodes (`iklim-db` and `iklim-patroni`). See `08-prod-db-cluster-kurulum.md`.
|
||
|
||
## Step 1 — Apply all test-env changes first
|
||
|
||
Follow every step in `test-env/03-infra-stack-changes.md`:
|
||
- Add `swag` service
|
||
- Add `cert-reloader` service
|
||
- Remove published ports for vault, apisix, rabbitmq, prometheus, grafana, apisix-dashboard
|
||
- Add `swag-vl` volume
|
||
|
||
## Step 2 — Vault: 3-node Raft cluster (prod)
|
||
|
||
Vault starts directly with 3 replicas; the Phase 1 single-instance stage is skipped in prod.
|
||
See `07-vault-raft-plan.md` Phase 2 for detailed setup steps.
|
||
|
||
```yaml
|
||
vault:
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
```
|
||
|
||
## Step 3 — APISIX: 3 replicas + init.sh rate limit update (prod overlay)
|
||
|
||
Add to `docker-stack-infra.prod.yml`:
|
||
|
||
```yaml
|
||
# docker-stack-infra.prod.yml
|
||
services:
|
||
apisix:
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
|
||
apisix-dashboard:
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
```
|
||
|
||
APISIX and apisix-dashboard are stateless (config lives in Patroni etcd) — 3 replicas is safe.
|
||
Swarm distributes SWAG requests to APISIX replicas via VIP (IPVS round-robin).
|
||
|
||
### init.sh — rate limit policy:redis (prod)
|
||
|
||
With `policy:local`, each APISIX instance counts independently → the global limit effectively becomes 3× with 3 replicas.
|
||
Switch to `policy:redis` for `PROFILE=prod`.
|
||
|
||
Update the global rate limit block in `init/apisix-core/init.sh`:
|
||
|
||
```bash
|
||
if [[ "$PROFILE" != "dev" ]]; then
|
||
if [[ "$PROFILE" == "prod" ]]; then
|
||
RATE_POLICY="redis"
|
||
RATE_REDIS=',\"redis_host\":\"redis-master\",\"redis_port\":6379,\"redis_password\":\"'\"$REDIS_PASSWORD\"'\"'
|
||
else
|
||
RATE_POLICY="local"
|
||
RATE_REDIS=""
|
||
fi
|
||
|
||
call_api "global rate limit" -X PUT "$APISIX_ADMIN_URL/global_rules/1" \
|
||
-H "X-API-KEY: $API_KEY" -H "Content-Type: application/json" \
|
||
-d '{"plugins":{"limit-count":{"count":300,"time_window":60,"key_type":"var","key":"remote_addr","rejected_code":429,"policy":"'"$RATE_POLICY"'"'"$RATE_REDIS"'}}}'
|
||
fi
|
||
```
|
||
|
||
> APISIX's `limit-count` plugin does not natively support Redis Sentinel; `policy:redis` works with a single endpoint.
|
||
> The `redis-master` service name stays constant within Swarm — during Sentinel failover (~10-30 s) rate limiting may be
|
||
> temporarily inconsistent; this brief disruption is acceptable. Microservices use Spring Data Redis Sentinel natively.
|
||
|
||
## Step 4 — etcd: Separate APISIX etcd removed — Patroni etcd shared
|
||
|
||
The standalone `etcd` service in `docker-stack-infra.yml` is **not used in prod and must be removed**.
|
||
APISIX uses the 3-node Patroni etcd cluster running on DB nodes, via the `/apisix` prefix.
|
||
|
||
### Why consolidated?
|
||
- A standalone single-instance etcd was a SPOF for APISIX.
|
||
- Patroni etcd is already 3-node HA — APISIX gets a more reliable config store.
|
||
- etcd supports prefix-based namespacing; Patroni uses `/service/`, APISIX uses `/apisix/` — no collision.
|
||
|
||
### APISIX etcd connection configuration
|
||
|
||
Update the etcd endpoints in the APISIX service in `docker-stack-infra.yml` to point to DB nodes:
|
||
|
||
```yaml
|
||
apisix:
|
||
environment:
|
||
APISIX_STAND_ALONE: "false"
|
||
# via apisix/conf/config.yaml or environment:
|
||
# etcd:
|
||
# host:
|
||
# - "http://iklim-db-01:2379"
|
||
# - "http://iklim-db-02:2379"
|
||
# - "http://iklim-db-03:2379"
|
||
# prefix: "/apisix"
|
||
```
|
||
|
||
The preferred method is mounting `config.yaml` via a Docker config or volume:
|
||
|
||
```yaml
|
||
# config/apisix/config.yaml
|
||
etcd:
|
||
host:
|
||
- "http://iklim-db-01:2379"
|
||
- "http://iklim-db-02:2379"
|
||
- "http://iklim-db-03:2379"
|
||
prefix: "/apisix"
|
||
timeout: 30
|
||
```
|
||
|
||
### Firewall requirement
|
||
|
||
etcd access from app nodes to DB nodes must be open:
|
||
|
||
```bash
|
||
# Each app node → each db node, port 2379
|
||
# If inside Hetzner private network it may be open by default;
|
||
# verify there are no ufw/firewalld rules blocking it:
|
||
nc -zv iklim-db-01 2379
|
||
```
|
||
|
||
> **Note:** Docker Compose overlay files can only add/override services, not remove them. The standalone `etcd` service remains in the base stack and runs as an idle container in prod — APISIX connects to Patroni etcd instead (via config.yaml in the prod overlay). This is harmless; etcd uses negligible resources with no active clients.
|
||
|
||
## Step 5 — Redis: Sentinel cluster (prod overlay)
|
||
|
||
Redis runs as a single instance in test. In prod, Sentinel provides HA.
|
||
Bitnami images are used — all configuration is done via env vars, no separate `.conf` file needed.
|
||
|
||
### Prerequisites
|
||
|
||
```bash
|
||
# Create Docker secret for Redis password:
|
||
openssl rand -hex 32 | docker secret create redis_password -
|
||
```
|
||
|
||
### Topology
|
||
|
||
```
|
||
iklim-app-01: redis-master (1 replica, pinned to app-01)
|
||
iklim-app-02: redis-replica (1 replica, pinned to app-02)
|
||
iklim-app-03: redis-replica (1 replica, pinned to app-03)
|
||
iklim-app-01: redis-sentinel ┐
|
||
iklim-app-02: redis-sentinel ├─ 3 replicas, spread across all app nodes
|
||
iklim-app-03: redis-sentinel ┘
|
||
```
|
||
|
||
### docker-stack-infra.prod.yml — Redis services
|
||
|
||
The existing `redis` service is overridden in the prod overlay as **master**; `redis-replica` and `redis-sentinel` are added as new services. The service name (`redis`) remains unchanged so the APISIX connection config does not need updating.
|
||
|
||
```yaml
|
||
# docker-stack-infra.prod.yml
|
||
services:
|
||
redis: # override base single-instance redis → master
|
||
image: bitnamisecure/redis:latest
|
||
environment:
|
||
ALLOW_EMPTY_PASSWORD: no
|
||
REDIS_PASSWORD: ${REDIS_PASSWORD}
|
||
REDIS_REPLICATION_MODE: master
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 1
|
||
placement:
|
||
constraints:
|
||
- node.hostname == iklim-app-01
|
||
restart_policy:
|
||
condition: any
|
||
delay: 5s
|
||
labels:
|
||
project: co.iklim
|
||
|
||
redis-replica:
|
||
image: bitnamisecure/redis:latest
|
||
environment:
|
||
ALLOW_EMPTY_PASSWORD: no
|
||
REDIS_REPLICATION_MODE: slave
|
||
REDIS_MASTER_HOST: redis
|
||
REDIS_MASTER_PORT_NUMBER: "6379"
|
||
REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
|
||
REDIS_PASSWORD: ${REDIS_PASSWORD}
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 2
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
preferences:
|
||
- spread: node.hostname
|
||
restart_policy:
|
||
condition: any
|
||
delay: 5s
|
||
labels:
|
||
project: co.iklim
|
||
|
||
redis-sentinel:
|
||
image: bitnamisecure/redis-sentinel:latest
|
||
environment:
|
||
REDIS_SENTINEL_MASTER_NAME: mymaster
|
||
REDIS_MASTER_HOST: redis
|
||
REDIS_MASTER_PORT_NUMBER: "6379"
|
||
REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
|
||
REDIS_SENTINEL_QUORUM: "2"
|
||
REDIS_SENTINEL_DOWN_AFTER_MILLISECONDS: "5000"
|
||
REDIS_SENTINEL_FAILOVER_TIMEOUT: "10000"
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
preferences:
|
||
- spread: node.hostname
|
||
restart_policy:
|
||
condition: any
|
||
delay: 5s
|
||
labels:
|
||
project: co.iklim
|
||
```
|
||
|
||
### Microservice connection (Spring Data Redis)
|
||
|
||
Microservices must use a Sentinel-aware connection:
|
||
|
||
```yaml
|
||
# application-prod.yml
|
||
spring:
|
||
data:
|
||
redis:
|
||
sentinel:
|
||
master: mymaster
|
||
nodes:
|
||
- redis-sentinel:26379
|
||
password: ${REDIS_PASSWORD}
|
||
```
|
||
|
||
### Verification
|
||
|
||
```bash
|
||
# Query master identity:
|
||
docker exec $(docker ps -q -f name=iklimco_redis-sentinel | head -1) \
|
||
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster
|
||
```
|
||
|
||
## Step 6 — RabbitMQ: 3-node Erlang cluster (prod overlay)
|
||
|
||
RabbitMQ runs as a 3-node cluster with one instance per app node.
|
||
|
||
### Prerequisites
|
||
|
||
```bash
|
||
# Create Docker secret for Erlang cookie (must be identical on all nodes):
|
||
openssl rand -hex 32 | docker secret create rabbitmq_erlang_cookie -
|
||
```
|
||
|
||
### docker-stack-infra.prod.yml — RabbitMQ override
|
||
|
||
```yaml
|
||
# docker-stack-infra.prod.yml (add alongside redis services)
|
||
services:
|
||
rabbitmq:
|
||
image: rabbitmq:3-management
|
||
hostname: "rabbitmq-{{.Node.Hostname}}"
|
||
environment:
|
||
RABBITMQ_ERLANG_COOKIE_FILE: /run/secrets/rabbitmq_erlang_cookie
|
||
RABBITMQ_USE_LONGNAME: "true"
|
||
RABBITMQ_NODENAME: "rabbit@rabbitmq-{{.Node.Hostname}}"
|
||
secrets:
|
||
- rabbitmq_erlang_cookie
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
update_config:
|
||
parallelism: 1
|
||
order: stop-first
|
||
labels:
|
||
project: co.iklim
|
||
|
||
secrets:
|
||
rabbitmq_erlang_cookie:
|
||
external: true
|
||
```
|
||
|
||
### Cluster join procedure (first setup)
|
||
|
||
RabbitMQ nodes do not form a cluster automatically; manual join is required after first start:
|
||
|
||
```bash
|
||
# Find the RabbitMQ container on iklim-app-02:
|
||
CTR=$(docker ps -q -f name=iklimco_rabbitmq)
|
||
|
||
# Stop, join, start:
|
||
docker exec "$CTR" rabbitmqctl stop_app
|
||
docker exec "$CTR" rabbitmqctl join_cluster rabbit@rabbitmq-iklim-app-01
|
||
docker exec "$CTR" rabbitmqctl start_app
|
||
|
||
# Repeat for iklim-app-03
|
||
```
|
||
|
||
```bash
|
||
# Verify cluster status (from any node):
|
||
docker exec "$CTR" rabbitmqctl cluster_status
|
||
```
|
||
|
||
> **HA policy:** After the cluster is formed, set quorum queues as the default:
|
||
> ```bash
|
||
> docker exec "$CTR" rabbitmqctl set_policy ha-all ".*" \
|
||
> '{"queue-type":"quorum"}' --apply-to queues
|
||
> ```
|
||
|
||
## Step 7 — Create `docker-stack-infra.prod.yml`
|
||
|
||
Create this file in the repo root alongside `docker-stack-infra.yml`. It combines all prod-specific overrides from Steps 2–6:
|
||
|
||
```yaml
|
||
# docker-stack-infra.prod.yml
|
||
# Prod overlay — deploy with:
|
||
# docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
|
||
|
||
services:
|
||
|
||
vault:
|
||
environment:
|
||
VAULT_LOCAL_CONFIG: >-
|
||
{"api_addr":"https://vault.iklim.co:8200",
|
||
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
|
||
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
|
||
"listener":[{"tcp":{"address":"0.0.0.0:8200",
|
||
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
|
||
"tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}],
|
||
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
|
||
volumes:
|
||
- /opt/iklimco/vault/data:/vault/file
|
||
- /mnt/storagebox/ssl:/vault/certs:ro
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
|
||
apisix:
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
|
||
apisix-dashboard:
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
|
||
redis:
|
||
image: bitnamisecure/redis:latest
|
||
environment:
|
||
ALLOW_EMPTY_PASSWORD: no
|
||
REDIS_PASSWORD: ${REDIS_PASSWORD}
|
||
REDIS_REPLICATION_MODE: master
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 1
|
||
placement:
|
||
constraints:
|
||
- node.hostname == iklim-app-01
|
||
restart_policy:
|
||
condition: any
|
||
delay: 5s
|
||
labels:
|
||
project: co.iklim
|
||
|
||
redis-replica:
|
||
image: bitnamisecure/redis:latest
|
||
environment:
|
||
ALLOW_EMPTY_PASSWORD: no
|
||
REDIS_REPLICATION_MODE: slave
|
||
REDIS_MASTER_HOST: redis
|
||
REDIS_MASTER_PORT_NUMBER: "6379"
|
||
REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
|
||
REDIS_PASSWORD: ${REDIS_PASSWORD}
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 2
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
preferences:
|
||
- spread: node.hostname
|
||
restart_policy:
|
||
condition: any
|
||
delay: 5s
|
||
labels:
|
||
project: co.iklim
|
||
|
||
redis-sentinel:
|
||
image: bitnamisecure/redis-sentinel:latest
|
||
environment:
|
||
REDIS_SENTINEL_MASTER_NAME: mymaster
|
||
REDIS_MASTER_HOST: redis
|
||
REDIS_MASTER_PORT_NUMBER: "6379"
|
||
REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
|
||
REDIS_SENTINEL_QUORUM: "2"
|
||
REDIS_SENTINEL_DOWN_AFTER_MILLISECONDS: "5000"
|
||
REDIS_SENTINEL_FAILOVER_TIMEOUT: "10000"
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
preferences:
|
||
- spread: node.hostname
|
||
restart_policy:
|
||
condition: any
|
||
delay: 5s
|
||
labels:
|
||
project: co.iklim
|
||
|
||
rabbitmq:
|
||
image: rabbitmq:3-management
|
||
hostname: "rabbitmq-{{.Node.Hostname}}"
|
||
environment:
|
||
RABBITMQ_ERLANG_COOKIE_FILE: /run/secrets/rabbitmq_erlang_cookie
|
||
RABBITMQ_USE_LONGNAME: "true"
|
||
RABBITMQ_NODENAME: "rabbit@rabbitmq-{{.Node.Hostname}}"
|
||
secrets:
|
||
- rabbitmq_erlang_cookie
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 3
|
||
placement:
|
||
constraints:
|
||
- node.labels.type == service
|
||
update_config:
|
||
parallelism: 1
|
||
order: stop-first
|
||
labels:
|
||
project: co.iklim
|
||
|
||
secrets:
|
||
rabbitmq_erlang_cookie:
|
||
external: true
|
||
```
|
||
|
||
## Step 8 — Monitoring Data Persistence (StorageBox)
|
||
|
||
Prometheus and Grafana run as single instances. Without persistent storage, data is lost on node failover. This step mounts their data directories from the StorageBox shared filesystem.
|
||
|
||
**Changes already applied to `docker-stack-infra.yml`:**
|
||
|
||
```yaml
|
||
prometheus:
|
||
volumes:
|
||
- ${PROMETHEUS_DATA_DIR:-prometheus-vl}:/prometheus
|
||
|
||
grafana:
|
||
volumes:
|
||
- ${GRAFANA_DATA_DIR:-grafana-vl}:/var/lib/grafana
|
||
```
|
||
|
||
Test uses the named Docker volume fallbacks (`prometheus-vl`, `grafana-vl`) — no test env change needed.
|
||
|
||
**Add to `prod/secrets/iklim.co/.env.prod` on storagebox** (already in `env-prod/.env`):
|
||
|
||
```bash
|
||
PROMETHEUS_DATA_DIR=/mnt/storagebox/prometheus/data
|
||
GRAFANA_DATA_DIR=/mnt/storagebox/grafana/data
|
||
```
|
||
|
||
**Create directories on StorageBox before first prod deploy:**
|
||
|
||
```bash
|
||
mkdir -p /mnt/storagebox/prometheus/data /mnt/storagebox/grafana/data
|
||
```
|
||
|
||
> Grafana writes its SQLite database and dashboard JSON to `/var/lib/grafana`.
|
||
> Prometheus writes its TSDB to `/prometheus`. Both directories must exist before the stack starts.
|
||
|
||
## Step 9 — Verify
|
||
|
||
```bash
|
||
# Base file must be valid on its own (test deploy):
|
||
docker stack config -c docker-stack-infra.yml > /dev/null && echo "base OK"
|
||
|
||
# Prod merge must be valid:
|
||
docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml > /dev/null && echo "prod merge OK"
|
||
```
|
||
|
||
## Placement and Replica Summary — prod
|
||
|
||
| Service | File | Replicas | Placement | HA Note |
|
||
|---------|------|----------|-----------|---------|
|
||
| swag | base | 1 | `node.hostname == iklim-app-01` | No clustering support; Floating IP pinned to node |
|
||
| cert-reloader | base | 1 | `node.hostname == iklim-app-01` | Cron-style task; duplicate would be problematic |
|
||
| vault | prod overlay | 3 | `node.labels.type == service` | Raft cluster — see `07-vault-raft-plan.md` |
|
||
| apisix | prod overlay | 3 | `node.labels.type == service` | Stateless; config in Patroni etcd; rate limit policy:redis |
|
||
| apisix-dashboard | prod overlay | 3 | `node.labels.type == service` | Stateless; reads from etcd |
|
||
| redis (master) | prod overlay | 1 | `node.hostname == iklim-app-01` | Sentinel cluster master |
|
||
| redis-replica | prod overlay | 2 | `node.labels.type == service` | Sentinel replica; spread:hostname |
|
||
| redis-sentinel | prod overlay | 3 | `node.labels.type == service` | Quorum=2; failover automatic |
|
||
| rabbitmq | prod overlay | 3 | `node.labels.type == service` | Erlang cluster; quorum queues |
|
||
| etcd | base | 1 | `node.labels.type == service` | Idle in prod — APISIX uses Patroni etcd; standalone service remains in base stack |
|
||
| prometheus | base | 1 | `node.labels.type == service` | No native HA; Thanos is overkill at this scale |
|
||
| grafana | base | 1 | `node.labels.type == service` | Not critical |
|
||
|
||
> PostgreSQL and MongoDB run in separate DB stacks on `iklim-db-*` nodes. See `08-prod-db-cluster-kurulum.md`.
|
||
> etcd: 3-node cluster on DB nodes — APISIX shares it via `/apisix` prefix.
|