Environment_Infrastructure/setup/08-prod-db-cluster-setup.md
Murat ÖZDEMİR 67dc2986dd docs(infra): restructure and update infrastructure setup documentation
- Anglicized setup and facts markdown file names for better consistency.

- Updated 01-swarm-init-multinode.md to highlight Ansible automation of Swarm initialization and labeling.

- Overhauled 03-infra-stack-changes.md to describe the single monolithic file strategy and reflect current Redis, RabbitMQ, and etcd cluster configurations.

- Fixed minor overrides and typos in Patroni templates and Ansible bootstrap documents.

- Restructured README and roadmap mapping to align with the renamed setup documents.
2026-06-15 16:42:18 +03:00

22 KiB
Raw Permalink Blame History

08 - Prod DB Cluster Setup (Swarm)

The purpose of this phase is to add the three DB nodes to Docker Swarm as workers and configure the MongoDB replica set and the PostgreSQL high-availability setup managed with Patroni + etcd.

07-prod-ansible-bootstrap.md must be completed on all DB nodes.

Architecture

iklim-app-01/02/03  (Swarm manager'lar, 10.20.10.11/12/13)
        |
        |-- iklimco-net (overlay)
        |
iklim-db-01  (Swarm worker, 10.20.20.11)
    mongodb-01   [rs0 member 0 — preferred primary]
    etcd-01      [etcd cluster member]
    patroni-01   [Patroni + PostgreSQL — first primary candidate]

iklim-db-02  (Swarm worker, 10.20.20.12)
    mongodb-02   [rs0 member 1]
    etcd-02      [etcd cluster member]
    patroni-02   [Patroni + PostgreSQL — standby]

iklim-db-03  (Swarm worker, 10.20.20.13)
    mongodb-03   [rs0 member 2]
    etcd-03      [etcd cluster member]
    patroni-03   [Patroni + PostgreSQL — standby]

DB containers discover each other through overlay DNS aliases (mongodb-01, etcd-01, patroni-01, etc.) on the shared iklimco-net overlay network. Patroni/PostgreSQL, MongoDB, and etcd are the DB/cluster services covered by this document; they publish their cluster ports in host mode so replication traffic goes directly through the Hetzner private network while overlay DNS resolves service names correctly.

The current prod DB services are defined in the root docker-stack-infra_db-prod.yml stack file. That stack also contains non-DB infrastructure services such as Redis, Redis Sentinel, and RabbitMQ. Those services are intentionally different: they run on node.labels.type == service app/service nodes, do not publish host-mode ports in this stack, and communicate through the iklimco-net overlay network only. Do not generalize the DB host-mode rule to Redis or RabbitMQ.

1. Firewall Update

Verify that the following rules exist in terraform/hetzner/prod/firewall.tf; if any are missing, add them and run terraform apply.

Inside hcloud_firewall.app, from the DB subnet to Swarm ports:

rule {
  direction   = "in"
  protocol    = "tcp"
  port        = "2377"
  source_ips  = [local.db_subnet_cidr]
  description = "Docker Swarm control plane from DB subnet"
}

rule {
  direction   = "in"
  protocol    = "tcp"
  port        = "7946"
  source_ips  = [local.db_subnet_cidr]
  description = "Docker Swarm node discovery (TCP) from DB subnet"
}

rule {
  direction   = "in"
  protocol    = "udp"
  port        = "7946"
  source_ips  = [local.db_subnet_cidr]
  description = "Docker Swarm node discovery (UDP) from DB subnet"
}

rule {
  direction   = "in"
  protocol    = "udp"
  port        = "4789"
  source_ips  = [local.db_subnet_cidr]
  description = "Docker Swarm VXLAN overlay from DB subnet"
}

Inside hcloud_firewall.db, from the app subnet to Swarm ports + overlay, and etcd/Patroni traffic inside the DB subnet:

rule {
  direction   = "in"
  protocol    = "tcp"
  port        = "2377"
  source_ips  = [local.app_subnet_cidr]
  description = "Docker Swarm control plane from app subnet"
}

rule {
  direction   = "in"
  protocol    = "tcp"
  port        = "7946"
  source_ips  = [local.app_subnet_cidr]
  description = "Docker Swarm node discovery (TCP) from app subnet"
}

rule {
  direction   = "in"
  protocol    = "udp"
  port        = "7946"
  source_ips  = [local.app_subnet_cidr]
  description = "Docker Swarm node discovery (UDP) from app subnet"
}

rule {
  direction   = "in"
  protocol    = "udp"
  port        = "4789"
  source_ips  = [local.app_subnet_cidr]
  description = "Docker Swarm VXLAN overlay from app subnet"
}

rule {
  direction   = "in"
  protocol    = "tcp"
  port        = "2379"
  source_ips  = [local.db_subnet_cidr]
  description = "etcd client port within DB subnet"
}

rule {
  direction   = "in"
  protocol    = "tcp"
  port        = "2379"
  source_ips  = [local.app_subnet_cidr]
  description = "etcd client port from app subnet (APISIX connects to Patroni etcd)"
}

rule {
  direction   = "in"
  protocol    = "tcp"
  port        = "2380"
  source_ips  = [local.db_subnet_cidr]
  description = "etcd peer port within DB subnet"
}

rule {
  direction   = "in"
  protocol    = "tcp"
  port        = "8008"
  source_ips  = [local.db_subnet_cidr]
  description = "Patroni REST API within DB subnet"
}
cd terraform/hetzner/prod
terraform plan
terraform apply

2. Add DB Nodes to Swarm

This is handled by Environment_Infrastructure/ansible/prod/prod-bootstrap.yml through the swarm role. The role initializes Swarm on iklim-app-01, joins iklim-app-02/03 as managers, joins iklim-db-01/02/03 as workers, and labels DB nodes.

Manual equivalent, kept for troubleshooting only:

Swarm manager'lardan birinde (iklim-app-01) join token al:

docker swarm join-token worker

Her DB node'unda (iklim-db-01, iklim-db-02, iklim-db-03):

docker swarm join --token <TOKEN> 10.20.10.11:2377

Label the nodes on iklim-app-01. In automation this is split into two phases:

  • the shared swarm role adds role=db to DB nodes;
  • the prod-specific prod-bootstrap.yml play adds db-index=01/02/03.

Manual equivalent:

docker node update --label-add role=db iklim-db-01
docker node update --label-add role=db iklim-db-02
docker node update --label-add role=db iklim-db-03

docker node update --label-add db-index=01 iklim-db-01
docker node update --label-add db-index=02 iklim-db-02
docker node update --label-add db-index=03 iklim-db-03

docker node ls

3. StorageBox Directory Structure

DB data is stored on local DB-node paths prepared by Ansible:

/opt/iklimco/db/mongodb
/opt/iklimco/db/postgresql

Configuration files are placed on StorageBox. On each DB node, where /mnt/storagebox must already be mounted:

# On iklim-db-01:
mkdir -p /mnt/storagebox/db/mongodb-01/config
mkdir -p /mnt/storagebox/db/postgresql-01/config

# On iklim-db-02:
mkdir -p /mnt/storagebox/db/mongodb-02/config
mkdir -p /mnt/storagebox/db/postgresql-02/config

# On iklim-db-03:
mkdir -p /mnt/storagebox/db/mongodb-03/config
mkdir -p /mnt/storagebox/db/postgresql-03/config

Config files (mongod.conf, patroni.yml) and the MongoDB replica set key are deployed by the Ansible db_stack role into these directories. etcd uses Docker named volumes (etcd-01-data, etcd-02-data, etcd-03-data) from docker-stack-infra_db-prod.yml.

4. MongoDB Replica Set

mongod.conf

Her DB node'unda /mnt/storagebox/db/mongodb-0X/config/mongod.conf (Ansible db_stack rolü tarafından deploy edilir):

net:
  port: 27017
storage:
  engine: "wiredTiger"
  dbPath: "/data/db"
  directoryPerDB: true
systemLog:
  verbosity: 0
  timeStampFormat: "iso8601-local"
  destination: file
  path: "/data/log/mongo.log"
  logAppend: true
  logRotate: rename
replication:
  replSetName: "rs0"
security:
  authorization: enabled
  keyFile: "/data/configdb/rs-auth.key"

Replica Set Auth Key

The same key file must exist on all DB nodes. In the current production setup, this is automated by ansible/prod/roles/db_stack/tasks/db_node.yml:

  • iklim-db-01 generates /mnt/storagebox/db/mongodb-01/config/rs-auth.key if it is missing.
  • the same key content is copied to /mnt/storagebox/db/mongodb-02/config/rs-auth.key and /mnt/storagebox/db/mongodb-03/config/rs-auth.key;
  • permissions are set to 0400.

Manual recovery equivalent, kept only for troubleshooting:

openssl rand -base64 756 > /mnt/storagebox/db/mongodb-01/config/rs-auth.key
chmod 400 /mnt/storagebox/db/mongodb-01/config/rs-auth.key

cat /mnt/storagebox/db/mongodb-01/config/rs-auth.key \
  > /mnt/storagebox/db/mongodb-02/config/rs-auth.key
cat /mnt/storagebox/db/mongodb-01/config/rs-auth.key \
  > /mnt/storagebox/db/mongodb-03/config/rs-auth.key

chmod 400 /mnt/storagebox/db/mongodb-0{2,3}/config/rs-auth.key

Stack File — MongoDB

MongoDB services are defined in docker-stack-infra_db-prod.yml (repo root). Each service uses a local DB-node bind mount for data and a StorageBox bind mount for config:

mongodb-01:
  image: ${IMAGE_MONGODB}
  environment:
    MONGO_INITDB_ROOT_USERNAME: "${DATABASE_MONGODB_ROOT_USER}"
    MONGO_INITDB_ROOT_PASSWORD: "${DATABASE_MONGODB_ROOT_PASSWD}"
  volumes:
    - /opt/iklimco/db/mongodb:/data/db
    - /mnt/storagebox/db/mongodb-01/config:/data/configdb
  networks:
    iklimco-net:
      aliases:
        - mongodb-01
  ports:
    - target: 27017
      published: 27017
      protocol: tcp
      mode: host
  deploy:
    replicas: 1
    placement:
      max_replicas_per_node: 1
      constraints:
        - node.hostname == iklim-db-01

The same pattern is repeated for mongodb-02 and mongodb-03, with node-specific StorageBox config paths and placement constraints.

Replica Set Initialization

Replica set initialization is handled by the root prod workflow step Initialize MongoDB Replica Set. The workflow:

  1. Connects to the first host from DATABASE_MONGODB_HOST.
  2. Runs rs.initiate() if the replica set is uninitialized.
  3. Checks current members if the replica set already exists.
  4. Runs rs.add() through the primary if hosts from DATABASE_MONGODB_HOST are missing.

Manual equivalent, kept for troubleshooting only:

# On iklim-app-01 (overlay network erişimi için):
docker run --rm -it --network iklimco-net mongo:8.3.2 \
  mongosh "mongodb://mongo-root:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb-01/admin"

# Inside mongosh:
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongodb-01:27017", priority: 2 },
    { _id: 1, host: "mongodb-02:27017", priority: 1 },
    { _id: 2, host: "mongodb-03:27017", priority: 1 }
  ]
})

# Status check:
rs.status()

The replica set is ready when "stateStr": "PRIMARY" and two "SECONDARY" entries are visible.

5. PostgreSQL — Patroni + etcd

Patroni coordinates PostgreSQL primary/standby roles through etcd. If the primary goes down, one of the other nodes automatically wins the election and becomes primary. The Swarm service restarts the container; Patroni continues from where it left off.

5.1 Custom Image (Patroni + PostGIS)

Patroni is installed on top of the postgis/postgis:18-3.6 image. This image is pushed to Harbor and used in docker-stack-infra_db-prod.yml via ${CUSTOM_IMAGE_REGISTRY}${IMAGE_PATRONI}.

build/patroni-postgis/Dockerfile:

FROM postgis/postgis:18-3.6

USER root

RUN apt-get update && apt-get install -y --no-install-recommends \
    python3-pip \
    python3-dev \
    gcc \
    libpq-dev \
    && pip3 install --no-cache-dir 'patroni[etcd3]' \
    && apt-get purge -y gcc python3-dev \
    && apt-get autoremove -y \
    && rm -rf /var/lib/apt/lists/*

USER postgres

ENTRYPOINT ["patroni", "/etc/patroni/patroni.yml"]

Build and push is done with ops/push-harbor-custom-images.sh:

cd /path/to/repo
bash ops/push-harbor-custom-images.sh

Or manually:

cd build/patroni-postgis
docker build -t registry.tarla.io/iklimco/custom-patroni-postgis:18-3.6 .
echo "$HARBOR_CI_TOKEN" | docker login registry.tarla.io -u robot-ci-push-iklimco --password-stdin
docker push registry.tarla.io/iklimco/custom-patroni-postgis:18-3.6

5.2 etcd Cluster

etcd services are defined in docker-stack-infra_db-prod.yml. Each service uses a named Docker volume for data and has an overlay DNS alias. Environment variables reference peer URLs by alias, not by hardcoded IP:

etcd-01:
  image: ${IMAGE_ETCD}
  environment:
    ALLOW_NONE_AUTHENTICATION: "no"
    ETCD_NAME: etcd-01
    ETCD_INITIAL_ADVERTISE_PEER_URLS: http://etcd-01:2380
    ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380
    ETCD_ADVERTISE_CLIENT_URLS: http://etcd-01:2379
    ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379
    ETCD_INITIAL_CLUSTER: "etcd-01=http://etcd-01:2380,etcd-02=http://etcd-02:2380,etcd-03=http://etcd-03:2380"
    ETCD_INITIAL_CLUSTER_STATE: new
    ETCD_INITIAL_CLUSTER_TOKEN: iklimco-etcd-prod
    ETCD_ROOT_PASSWORD: "${ETCD_ROOT_PASSWORD}"
  volumes:
    - etcd-01-data:/bitnami/etcd/data
  networks:
    iklimco-net:
      aliases:
        - etcd-01
  deploy:
    replicas: 1
    placement:
      max_replicas_per_node: 1
      constraints:
        - node.hostname == iklim-db-01

APISIX etcd usage: In prod, APISIX shares this etcd cluster with the /apisix prefix. Patroni uses the /service/ prefix and APISIX uses the /apisix/ prefix — no collision. The overlay DNS names (etcd-01:2379, etcd-02:2379, etcd-03:2379) are reachable from app nodes via the iklimco-net overlay. Therefore, the app subnet → DB nodes port 2379 firewall rule is mandatory; it was added in Section 1.

Important: ETCD_INITIAL_CLUSTER_STATE is currently defined in docker-stack-infra_db-prod.yml. When changing etcd cluster membership, do not blindly expand ETCD_INITIAL_CLUSTER on a running cluster; add members through etcd membership operations first.

5.3 Patroni Configuration

patroni.yml is generated per-node by the Ansible db_stack role from templates/patroni.yml.j2 using inventory_hostname (e.g., iklim-db-01). The generated file uses overlay DNS aliases for all addresses.

Generated output — Node 01 (/mnt/storagebox/db/postgresql-01/config/patroni.yml):

scope: iklim-postgres
namespace: /db/
name: postgresql-01

restapi:
  listen: 0.0.0.0:8008
  connect_address: patroni-01:8008

etcd3:
  hosts:
    - etcd-01:2379
    - etcd-02:2379
    - etcd-03:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        wal_keep_size: 512
        max_wal_senders: 5
        max_replication_slots: 5
        shared_preload_libraries: 'pg_stat_statements'
        pg_stat_statements.track: 'all'

  initdb:
    - encoding: UTF8
    - data-checksums

  pg_hba:
    - host replication replicator 10.20.20.0/24 scram-sha-256
    - host all all 10.20.10.0/24 scram-sha-256
    - host all all 10.20.20.0/24 scram-sha-256

  users:
    postgres:
      password: "${DATABASE_POSTGRES_ROOT_PASSWD}"
      options:
        - superuser

postgresql:
  listen: 0.0.0.0:5432
  connect_address: patroni-01:5432
  data_dir: /var/lib/postgresql/data/pgdata
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: "${DATABASE_POSTGRES_REPLICATOR_PASSWORD}"
    superuser:
      username: postgres
      password: "${DATABASE_POSTGRES_ROOT_PASSWD}"
  parameters:
    unix_socket_directories: "/var/run/postgresql"

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

For Node 02 and 03, only name, restapi.connect_address, and postgresql.connect_address differ (postgresql-02/patroni-02:8008/patroni-02:5432, etc.).

5.4 Stack File — Patroni

Patroni services are defined in docker-stack-infra_db-prod.yml. Each service uses the custom image, a local DB-node bind mount for data, a StorageBox bind mount for the config file, and overlay DNS aliases:

patroni-01:
  image: ${CUSTOM_IMAGE_REGISTRY}${IMAGE_PATRONI}
  environment:
    POSTGRES_USER: "${DATABASE_POSTGRES_ROOT_USER}"
    POSTGRES_PASSWORD: "${DATABASE_POSTGRES_ROOT_PASSWD}"
    REPLICATOR_PASSWORD: "${DATABASE_POSTGRES_REPLICATOR_PASSWORD}"
    ETCD_ROOT_PASSWORD: "${ETCD_ROOT_PASSWORD}"
    TZ: "Europe/Istanbul"
  volumes:
    - /opt/iklimco/db/postgresql:/var/lib/postgresql/data
    - /mnt/storagebox/db/postgresql-01/config/patroni.yml:/etc/patroni/patroni.yml:ro
  networks:
    iklimco-net:
      aliases:
        - patroni-01
  ports:
    - target: 5432
      published: 5432
      protocol: tcp
      mode: host
    - target: 8008
      published: 8008
      protocol: tcp
      mode: host
  deploy:
    replicas: 1
    placement:
      max_replicas_per_node: 1
      constraints:
        - node.hostname == iklim-db-01

The same pattern is repeated for patroni-02 and patroni-03, with node-specific StorageBox config paths and placement constraints.

5.5 Status Check

# On iklim-app-01 — Patroni cluster status:
docker exec -it $(docker ps -q -f name=iklimco_patroni-01 | head -1) \
  patronictl -c /etc/patroni/patroni.yml list

Expected output: one Leader row and two Replica rows, all with the State column set to running.

# etcd cluster health (from app node via overlay):
docker run --rm --network iklimco-net alpine \
  sh -c "wget -qO- http://etcd-01:2379/health && \
         wget -qO- http://etcd-02:2379/health && \
         wget -qO- http://etcd-03:2379/health"
# Find the current primary:
docker exec -it $(docker ps -q -f name=iklimco_patroni-01 | head -1) \
  patronictl -c /etc/patroni/patroni.yml topology

6. Deploy

All DB services (etcd, MongoDB, Patroni) are in the current root prod stack docker-stack-infra_db-prod.yml. Normal deployment is done by .gitea/workflows/deploy-prod.yml, not by running a separate DB stack manually.

.env File

DB stack password variables (DATABASE_POSTGRES_ROOT_PASSWD, DATABASE_POSTGRES_REPLICATOR_PASSWORD, DATABASE_MONGODB_ROOT_PASSWD, ETCD_ROOT_PASSWORD) are stored in prod/secrets/iklim.co/.env.secrets.shared on StorageBox. The workflow fetches this file automatically.

scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.secrets.shared \
  /tmp/.env.secrets.shared
chmod 600 /tmp/.env.secrets.shared

Deploy Steps

The root prod workflow deploys the stack with:

docker stack deploy \
  --with-registry-auth \
  --resolve-image changed \
  -c docker-stack-infra_db-prod.yml \
  iklimco

After the stack deploy, the workflow waits for etcd, initializes APISIX, initializes the MongoDB replica set, and runs PostgreSQL/MongoDB init scripts.

DB Node Placement Check

docker service ps iklimco_etcd-01
docker service ps iklimco_mongodb-01
docker service ps iklimco_patroni-01

All tasks must run on the expected iklim-db-* nodes.

MongoDB Replica Set Initialization

Handled by the workflow. Manual form for troubleshooting:

# From iklim-app-01 via overlay network:
docker run --rm -it --network iklimco-net mongo:8.3.2 \
  mongosh "mongodb://mongo-root:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb-01/admin"

# Inside mongosh:
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongodb-01:27017", priority: 2 },
    { _id: 1, host: "mongodb-02:27017", priority: 1 },
    { _id: 2, host: "mongodb-03:27017", priority: 1 }
  ]
})

7. Access from App Services

App containers connect to DB services through the iklimco-net overlay network by overlay DNS name. Because the iklimco stack shares the iklimco-net external network, service names and aliases are resolved through overlay DNS.

MongoDB Replica Set Connection String

Variables in StorageBox prod/secrets/iklim.co/.env:

DATABASE_MONGODB_HOST=mongodb-01:27017,mongodb-02:27017,mongodb-03:27017
DATABASE_MONGODB_PARAMS=replicaSet=rs0&readPreference=secondaryPreferred&authSource=admin

Microservice URI through overlay DNS:

mongodb://<user>:<password>@mongodb-01:27017,mongodb-02:27017,mongodb-03:27017/<db>?replicaSet=rs0&readPreference=secondaryPreferred&authSource=admin

For direct testing, from outside the overlay with private IP: mongodb://mongo-root:<PASSWORD>@10.20.20.11:27017,10.20.20.12:27017,10.20.20.13:27017/admin?replicaSet=rs0&authSource=admin

PostgreSQL — Patroni

Variables in StorageBox prod/secrets/iklim.co/.env:

DATABASE_POSTGRES_HOST=patroni-01:5432,patroni-02:5432,patroni-03:5432
DATABASE_POSTGRES_PARAMS=targetServerType=preferSecondary&loadBalanceHosts=true

Patroni manages whichever node is primary at any moment. The JDBC/libpq driver automatically selects primary/secondary through the targetServerType parameter in the multi-host list:

# Write — goes to primary (libpq URI):
postgresql://<user>@patroni-01:5432,patroni-02:5432,patroni-03:5432/<db>?targetServerType=primary

# Read (load balancing):
postgresql://<user>@patroni-01:5432,patroni-02:5432,patroni-03:5432/<db>?targetServerType=preferSecondary&loadBalanceHosts=true

For direct testing, from outside the overlay with private IP: postgresql://postgres@10.20.20.11:5432,10.20.20.12:5432,10.20.20.13:5432/postgres?targetServerType=primary

Patroni REST API

Patroni exposes an HTTP endpoint on port 8008. This endpoint can be used with HAProxy or a similar load balancer to route to the primary automatically:

# Primary check (HTTP 200 = primary, HTTP 503 = replica):
curl -s http://patroni-01:8008/primary

8. Geliştirici ve Ofis Erişimi (Production)

Prod cluster yapısında pg-proxy veya mongo-proxy kullanılmaz. Ofis bilgisayarından erişim için doğrudan DB subnet'i hedef alınır.

WireGuard Ayarı

Ofis bilgisayarındaki .conf dosyasında AllowedIPs güncellenmelidir: AllowedIPs = 10.8.0.1/32, 10.20.20.0/24

Bağlantı Parametreleri (Multi-Host)

Modern veritabanı araçları (DBeaver, Compass vb.) küme farkındalıklı bağlantı kurmalıdır:

Veritabanı Host Listesi Port Kritik Parametre
PostgreSQL 10.20.20.11, 10.20.20.12, 10.20.20.13 5432 targetServerType=primary
MongoDB 10.20.20.11, 10.20.20.12, 10.20.20.13 27017 replicaSet=rs0

Acceptance Criteria

  • docker stack services iklimco — etcd-01/02/03, mongodb-01/02/03, patroni-01/02/03 are visible and all target replicas are healthy
  • docker service ps iklimco_patroni-01/02/03 — each task runs on its expected iklim-db-* node
  • docker service ps iklimco_mongodb-01/02/03 — each task runs on its expected iklim-db-* node
  • docker service ps iklimco_etcd-01/02/03 — each task runs on its expected iklim-db-* node
  • patronictl list — 1 Leader, 2 Replica, all running
  • etcd health endpoint returns "health":"true" on all three nodes via overlay
  • rs.status() — 1 PRIMARY, 2 SECONDARY
  • MongoDB and PostgreSQL are reachable from app nodes.
  • Ports 5432, 27017, 2379, 2380, and 8008 are closed from the public internet.
  • When a DB node is restarted, Patroni performs automatic election and a new primary is selected.
  • During Patroni primary transition, the old primary rejoins as standby; there is no split-brain.