Ansible roles: - act_runner/defaults: set act_runner_name to inventory_hostname (was hardcoded to iklim-test-app); added vault_gitea_runner_token to vault.yml - prod/group_vars/all: restructured from flat files to all/ directory; added act_runner_labels override (prod-runner,ubuntu-24.04,hostname); added storagebox_managed_directories; added swarm_manager_ip and other prod-specific vars - prod/roles/db_stack: prod-specific db_node tasks using StorageBox paths (/mnt/storagebox/db/...) instead of local paths - docker/tasks: split firewalld loop into all-nodes (Swarm ports) and app-only (80/443) tasks - swarm/tasks: added --advertise-addr private_ip to join commands for correct multi-homed node advertisement - hardening/tasks: corrected firewalld drop zone configuration - node_dirs/tasks: added /opt/iklimco/vault/data for Vault Raft volume - db_stack/tasks/app_node: updated stale comment (removed pg-proxy reference) - db_stack/templates: removed pg-proxy and mongo-proxy service blocks - test/host_vars/iklim-app-01: added act_runner_name override to preserve existing test runner registration Roadmap and setup docs: - roadmap/03-infra-stack-changes: added replicas:0 for etcd/postgresql/ mongodb/pg-proxy/mongo-proxy in prod overlay; updated placement table; fixed grafana/data mkdir (auto-created by Ansible); translated Turkish note to English - roadmap/08-deploy-pipeline-update: updated stale "remains idle" note for standalone etcd (now disabled with replicas:0) - roadmap/01-swarm-init-multinode: consistency fixes - setup/06: added Outputs section and etcd firewall port documentation - setup/07: removed prometheus/data from StorageBox acceptance criteria; replaced manual StorageBox mkdir section with Ansible auto-creation note; updated prod README section with full bootstrap instructions and vault docs; added act_runner_labels prod policy - setup/08: extensive rewrite — aligned with Patroni etcd overlay DNS, corrected hcloud_firewall.app reference, updated all StorageBox paths from /prod/db/ to /db/ - setup/09: removed prometheus/data from acceptance criteria; updated runner label policy (removed docker/swarm-manager labels); added acceptance criterion for disabled services absent from docker service ls Terraform: - prod/firewall.tf: added missing DB subnet mutual rules (etcd, Patroni) - prod/outputs.tf: added prod_floating_ip and prod_private_ips outputs - prod/servers.tf: aligned placement group and naming - prod/variables.tf: corrected variable descriptions - prod/terraform.tfvars.example: updated defaults - terraform/hetzner/README.md: new comprehensive README covering both test and prod environments with firewall tables and inventory instructions ansible/README.md: expanded prod section with inventory groups, bootstrap run order, runner label policy, and vault variable documentation
683 lines
21 KiB
Markdown
683 lines
21 KiB
Markdown
# 08 - Prod DB Cluster Setup (Swarm)
|
||
|
||
The purpose of this phase is to add the three DB nodes to Docker Swarm as workers and configure the MongoDB replica set and the PostgreSQL high-availability setup managed with Patroni + etcd.
|
||
|
||
`07-prod-ansible-bootstrap.md` must be completed on all DB nodes.
|
||
|
||
## Architecture
|
||
|
||
```
|
||
iklim-app-01/02/03 (Swarm manager'lar, 10.20.10.11/12/13)
|
||
|
|
||
|-- iklimco-net (overlay)
|
||
|
|
||
iklim-db-01 (Swarm worker, 10.20.20.11)
|
||
mongodb-01 [rs0 member 0 — preferred primary]
|
||
etcd-01 [etcd cluster member]
|
||
patroni-01 [Patroni + PostgreSQL — first primary candidate]
|
||
|
||
iklim-db-02 (Swarm worker, 10.20.20.12)
|
||
mongodb-02 [rs0 member 1]
|
||
etcd-02 [etcd cluster member]
|
||
patroni-02 [Patroni + PostgreSQL — standby]
|
||
|
||
iklim-db-03 (Swarm worker, 10.20.20.13)
|
||
mongodb-03 [rs0 member 2]
|
||
etcd-03 [etcd cluster member]
|
||
patroni-03 [Patroni + PostgreSQL — standby]
|
||
```
|
||
|
||
DB containers discover each other through **overlay DNS aliases** (`mongodb-01`, `etcd-01`, `patroni-01`, etc.) on the shared `iklimco-net` overlay network. Each service publishes its port in `host` mode so replication traffic goes directly through the Hetzner private network while the overlay DNS resolves service names correctly. All containers are defined in the single `docker-stack-db.prod.yml` stack file at the repo root.
|
||
|
||
## 1. Firewall Update
|
||
|
||
Verify that the following rules exist in `terraform/hetzner/prod/firewall.tf`; if any are missing, add them and run `terraform apply`.
|
||
|
||
Inside `hcloud_firewall.app`, from the DB subnet to Swarm ports:
|
||
|
||
```hcl
|
||
rule {
|
||
direction = "in"
|
||
protocol = "tcp"
|
||
port = "2377"
|
||
source_ips = [local.db_subnet_cidr]
|
||
description = "Docker Swarm control plane from DB subnet"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "tcp"
|
||
port = "7946"
|
||
source_ips = [local.db_subnet_cidr]
|
||
description = "Docker Swarm node discovery (TCP) from DB subnet"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "udp"
|
||
port = "7946"
|
||
source_ips = [local.db_subnet_cidr]
|
||
description = "Docker Swarm node discovery (UDP) from DB subnet"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "udp"
|
||
port = "4789"
|
||
source_ips = [local.db_subnet_cidr]
|
||
description = "Docker Swarm VXLAN overlay from DB subnet"
|
||
}
|
||
```
|
||
|
||
Inside `hcloud_firewall.db`, from the app subnet to Swarm ports + overlay, and etcd/Patroni traffic inside the DB subnet:
|
||
|
||
```hcl
|
||
rule {
|
||
direction = "in"
|
||
protocol = "tcp"
|
||
port = "2377"
|
||
source_ips = [local.app_subnet_cidr]
|
||
description = "Docker Swarm control plane from app subnet"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "tcp"
|
||
port = "7946"
|
||
source_ips = [local.app_subnet_cidr]
|
||
description = "Docker Swarm node discovery (TCP) from app subnet"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "udp"
|
||
port = "7946"
|
||
source_ips = [local.app_subnet_cidr]
|
||
description = "Docker Swarm node discovery (UDP) from app subnet"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "udp"
|
||
port = "4789"
|
||
source_ips = [local.app_subnet_cidr]
|
||
description = "Docker Swarm VXLAN overlay from app subnet"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "tcp"
|
||
port = "2379"
|
||
source_ips = [local.db_subnet_cidr]
|
||
description = "etcd client port within DB subnet"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "tcp"
|
||
port = "2379"
|
||
source_ips = [local.app_subnet_cidr]
|
||
description = "etcd client port from app subnet (APISIX connects to Patroni etcd)"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "tcp"
|
||
port = "2380"
|
||
source_ips = [local.db_subnet_cidr]
|
||
description = "etcd peer port within DB subnet"
|
||
}
|
||
|
||
rule {
|
||
direction = "in"
|
||
protocol = "tcp"
|
||
port = "8008"
|
||
source_ips = [local.db_subnet_cidr]
|
||
description = "Patroni REST API within DB subnet"
|
||
}
|
||
```
|
||
|
||
```bash
|
||
cd terraform/hetzner/prod
|
||
terraform plan
|
||
terraform apply
|
||
```
|
||
|
||
## 2. Add DB Nodes to Swarm
|
||
|
||
**Swarm manager'lardan birinde** (iklim-app-01) join token al:
|
||
|
||
```bash
|
||
docker swarm join-token worker
|
||
```
|
||
|
||
**Her DB node'unda** (iklim-db-01, iklim-db-02, iklim-db-03):
|
||
|
||
```bash
|
||
docker swarm join --token <TOKEN> 10.20.10.11:2377
|
||
```
|
||
|
||
Label the nodes **on iklim-app-01**:
|
||
|
||
```bash
|
||
docker node update --label-add role=db --label-add db-index=01 iklim-db-01
|
||
docker node update --label-add role=db --label-add db-index=02 iklim-db-02
|
||
docker node update --label-add role=db --label-add db-index=03 iklim-db-03
|
||
|
||
docker node ls
|
||
```
|
||
|
||
## 3. StorageBox Directory Structure
|
||
|
||
DB data and logs are stored on **local Docker named volumes** (performance, WAL/compaction requirements). Only config files are placed on StorageBox. On each DB node, where `/mnt/storagebox` must already be mounted:
|
||
|
||
```bash
|
||
# On iklim-db-01:
|
||
mkdir -p /mnt/storagebox/db/mongodb-01/config
|
||
mkdir -p /mnt/storagebox/db/postgresql-01/config
|
||
|
||
# On iklim-db-02:
|
||
mkdir -p /mnt/storagebox/db/mongodb-02/config
|
||
mkdir -p /mnt/storagebox/db/postgresql-02/config
|
||
|
||
# On iklim-db-03:
|
||
mkdir -p /mnt/storagebox/db/mongodb-03/config
|
||
mkdir -p /mnt/storagebox/db/postgresql-03/config
|
||
```
|
||
|
||
Config files (`mongod.conf`, `patroni.yml`) are deployed by the Ansible `db_stack` role into these directories. Named Docker volumes (`mongodb-01-data`, `etcd-01-data`, `postgresql-01-data`, etc.) are created automatically by the stack deploy.
|
||
|
||
## 4. MongoDB Replica Set
|
||
|
||
### mongod.conf
|
||
|
||
Her DB node'unda `/mnt/storagebox/db/mongodb-0X/config/mongod.conf` (Ansible `db_stack` rolü tarafından deploy edilir):
|
||
|
||
```yaml
|
||
net:
|
||
port: 27017
|
||
storage:
|
||
engine: "wiredTiger"
|
||
dbPath: "/data/db"
|
||
directoryPerDB: true
|
||
systemLog:
|
||
verbosity: 0
|
||
timeStampFormat: "iso8601-local"
|
||
destination: file
|
||
path: "/data/log/mongo.log"
|
||
logAppend: true
|
||
logRotate: rename
|
||
replication:
|
||
replSetName: "rs0"
|
||
security:
|
||
authorization: enabled
|
||
keyFile: "/data/configdb/rs-auth.key"
|
||
```
|
||
|
||
### Replica Set Auth Key
|
||
|
||
The **same** key file must exist on all DB nodes:
|
||
|
||
```bash
|
||
# Create on iklim-db-01:
|
||
openssl rand -base64 756 > /mnt/storagebox/db/mongodb-01/config/rs-auth.key
|
||
chmod 400 /mnt/storagebox/db/mongodb-01/config/rs-auth.key
|
||
|
||
# Copy the same content to the other nodes:
|
||
cat /mnt/storagebox/db/mongodb-01/config/rs-auth.key \
|
||
> /mnt/storagebox/db/mongodb-02/config/rs-auth.key
|
||
cat /mnt/storagebox/db/mongodb-01/config/rs-auth.key \
|
||
> /mnt/storagebox/db/mongodb-03/config/rs-auth.key
|
||
|
||
chmod 400 /mnt/storagebox/db/mongodb-0{2,3}/config/rs-auth.key
|
||
```
|
||
|
||
### Stack File — MongoDB
|
||
|
||
MongoDB services are defined in `docker-stack-db.prod.yml` (repo root). Each service uses a named Docker volume for data and log, and a StorageBox bind mount for config:
|
||
|
||
```yaml
|
||
mongodb-01:
|
||
image: mongo:8
|
||
volumes:
|
||
- mongodb-01-data:/data/db
|
||
- mongodb-01-log:/data/log
|
||
- /mnt/storagebox/db/mongodb-01/config:/data/configdb
|
||
networks:
|
||
iklimco-net:
|
||
aliases:
|
||
- mongodb-01
|
||
ports:
|
||
- target: 27017
|
||
published: 27017
|
||
protocol: tcp
|
||
mode: host
|
||
deploy:
|
||
replicas: 1
|
||
placement:
|
||
max_replicas_per_node: 1
|
||
constraints:
|
||
- node.hostname == iklim-db-01
|
||
```
|
||
|
||
Volumes `mongodb-01-data`, `mongodb-01-log`, etc. are declared at the bottom of `docker-stack-db.prod.yml` and are created automatically on first deploy.
|
||
|
||
### Replica Set Initialization
|
||
|
||
Run **once** after the stack is deployed:
|
||
|
||
```bash
|
||
# On iklim-app-01 (overlay network erişimi için):
|
||
docker run --rm -it --network iklimco-net mongo:8 \
|
||
mongosh "mongodb://mongo-root:${MONGO_ROOT_PASSWORD}@mongodb-01/admin"
|
||
|
||
# Inside mongosh:
|
||
rs.initiate({
|
||
_id: "rs0",
|
||
members: [
|
||
{ _id: 0, host: "mongodb-01:27017", priority: 2 },
|
||
{ _id: 1, host: "mongodb-02:27017", priority: 1 },
|
||
{ _id: 2, host: "mongodb-03:27017", priority: 1 }
|
||
]
|
||
})
|
||
|
||
# Status check:
|
||
rs.status()
|
||
```
|
||
|
||
The replica set is ready when `"stateStr": "PRIMARY"` and two `"SECONDARY"` entries are visible.
|
||
|
||
## 5. PostgreSQL — Patroni + etcd
|
||
|
||
Patroni coordinates PostgreSQL primary/standby roles through etcd. If the primary goes down, one of the other nodes automatically wins the election and becomes primary. The Swarm service restarts the container; Patroni continues from where it left off.
|
||
|
||
### 5.1 Custom Image (Patroni + PostGIS)
|
||
|
||
Patroni is installed on top of the `postgis/postgis:17-3.5` image. This image is pushed to Harbor and used in the stack.
|
||
|
||
`build/patroni-postgis/Dockerfile`:
|
||
|
||
```dockerfile
|
||
FROM postgis/postgis:17-3.5
|
||
|
||
USER root
|
||
|
||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||
python3-pip \
|
||
python3-dev \
|
||
gcc \
|
||
libpq-dev \
|
||
&& pip3 install --no-cache-dir 'patroni[etcd3]' \
|
||
&& apt-get purge -y gcc python3-dev \
|
||
&& apt-get autoremove -y \
|
||
&& rm -rf /var/lib/apt/lists/*
|
||
|
||
USER postgres
|
||
|
||
ENTRYPOINT ["patroni", "/etc/patroni/patroni.yml"]
|
||
```
|
||
|
||
Build and push is done with `ops/push-harbor-custom-images.sh`:
|
||
|
||
```bash
|
||
cd /path/to/repo
|
||
bash ops/push-harbor-custom-images.sh
|
||
```
|
||
|
||
Or manually:
|
||
|
||
```bash
|
||
cd build/patroni-postgis
|
||
docker build -t registry.tarla.io/iklimco/custom-patroni-postgis:17-3.5 .
|
||
echo "$HARBOR_CI_TOKEN" | docker login registry.tarla.io -u robot-ci-push-iklimco --password-stdin
|
||
docker push registry.tarla.io/iklimco/custom-patroni-postgis:17-3.5
|
||
```
|
||
|
||
### 5.2 etcd Cluster
|
||
|
||
etcd services are defined in `docker-stack-db.prod.yml`. Each service uses a named Docker volume for data and has an overlay DNS alias. Environment variables reference peer URLs by alias, not by hardcoded IP:
|
||
|
||
```yaml
|
||
etcd-01:
|
||
image: bitnami/etcd:3
|
||
environment:
|
||
ALLOW_NONE_AUTHENTICATION: "yes"
|
||
ETCD_NAME: etcd-01
|
||
ETCD_INITIAL_ADVERTISE_PEER_URLS: http://etcd-01:2380
|
||
ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380
|
||
ETCD_ADVERTISE_CLIENT_URLS: http://etcd-01:2379
|
||
ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379
|
||
ETCD_INITIAL_CLUSTER: "etcd-01=http://etcd-01:2380,etcd-02=http://etcd-02:2380,etcd-03=http://etcd-03:2380"
|
||
ETCD_INITIAL_CLUSTER_STATE: new
|
||
ETCD_INITIAL_CLUSTER_TOKEN: iklimco-etcd-prod
|
||
volumes:
|
||
- etcd-01-data:/bitnami/etcd/data
|
||
networks:
|
||
iklimco-net:
|
||
aliases:
|
||
- etcd-01
|
||
deploy:
|
||
replicas: 1
|
||
placement:
|
||
max_replicas_per_node: 1
|
||
constraints:
|
||
- node.hostname == iklim-db-01
|
||
```
|
||
|
||
**APISIX etcd usage:** In prod, APISIX shares this etcd cluster with the `/apisix` prefix. Patroni uses the `/service/` prefix and APISIX uses the `/apisix/` prefix — no collision. The overlay DNS names (`etcd-01:2379`, `etcd-02:2379`, `etcd-03:2379`) are reachable from app nodes via the `iklimco-net` overlay. Therefore, the app subnet → DB nodes port 2379 firewall rule is mandatory; it was added in Section 1.
|
||
|
||
**Important:** `ETCD_INITIAL_CLUSTER_STATE` must be `new` on the first deploy and `existing` on all later deploys. The deploy steps in Section 6 detect this automatically; no manual update is required.
|
||
|
||
### 5.3 Patroni Configuration
|
||
|
||
`patroni.yml` is generated per-node by the Ansible `db_stack` role from `templates/patroni.yml.j2` using `inventory_hostname` (e.g., `iklim-db-01`). The generated file uses overlay DNS aliases for all addresses.
|
||
|
||
**Generated output — Node 01** (`/mnt/storagebox/db/postgresql-01/config/patroni.yml`):
|
||
|
||
```yaml
|
||
scope: iklim-postgres
|
||
namespace: /db/
|
||
name: postgresql-01
|
||
|
||
restapi:
|
||
listen: 0.0.0.0:8008
|
||
connect_address: patroni-01:8008
|
||
|
||
etcd3:
|
||
hosts:
|
||
- etcd-01:2379
|
||
- etcd-02:2379
|
||
- etcd-03:2379
|
||
|
||
bootstrap:
|
||
dcs:
|
||
ttl: 30
|
||
loop_wait: 10
|
||
retry_timeout: 10
|
||
maximum_lag_on_failover: 1048576
|
||
postgresql:
|
||
use_pg_rewind: true
|
||
parameters:
|
||
wal_level: replica
|
||
hot_standby: "on"
|
||
wal_keep_size: 512
|
||
max_wal_senders: 5
|
||
max_replication_slots: 5
|
||
shared_preload_libraries: 'pg_stat_statements'
|
||
pg_stat_statements.track: 'all'
|
||
|
||
initdb:
|
||
- encoding: UTF8
|
||
- data-checksums
|
||
|
||
pg_hba:
|
||
- host replication replicator 10.20.20.0/24 scram-sha-256
|
||
- host all all 10.20.10.0/24 scram-sha-256
|
||
- host all all 10.20.20.0/24 scram-sha-256
|
||
|
||
users:
|
||
postgres:
|
||
password: "${POSTGRES_PASSWORD}"
|
||
options:
|
||
- superuser
|
||
|
||
postgresql:
|
||
listen: 0.0.0.0:5432
|
||
connect_address: patroni-01:5432
|
||
data_dir: /var/lib/postgresql/data/pgdata
|
||
pgpass: /tmp/pgpass0
|
||
authentication:
|
||
replication:
|
||
username: replicator
|
||
password: "${REPLICATOR_PASSWORD}"
|
||
superuser:
|
||
username: postgres
|
||
password: "${POSTGRES_PASSWORD}"
|
||
parameters:
|
||
unix_socket_directories: "/var/run/postgresql"
|
||
|
||
tags:
|
||
nofailover: false
|
||
noloadbalance: false
|
||
clonefrom: false
|
||
nosync: false
|
||
```
|
||
|
||
For Node 02 and 03, only `name`, `restapi.connect_address`, and `postgresql.connect_address` differ (`postgresql-02`/`patroni-02:8008`/`patroni-02:5432`, etc.).
|
||
|
||
### 5.4 Stack File — Patroni
|
||
|
||
Patroni services are defined in `docker-stack-db.prod.yml`. Each service uses the custom image, a named Docker volume for data, a StorageBox bind mount for the config file, and overlay DNS aliases:
|
||
|
||
```yaml
|
||
patroni-01:
|
||
image: registry.tarla.io/iklimco/custom-patroni-postgis:17-3.5
|
||
environment:
|
||
POSTGRES_PASSWORD: "${POSTGRES_PASSWORD}"
|
||
REPLICATOR_PASSWORD: "${REPLICATOR_PASSWORD}"
|
||
TZ: "Europe/Istanbul"
|
||
volumes:
|
||
- postgresql-01-data:/var/lib/postgresql/data
|
||
- /mnt/storagebox/db/postgresql-01/config/patroni.yml:/etc/patroni/patroni.yml:ro
|
||
networks:
|
||
iklimco-net:
|
||
aliases:
|
||
- patroni-01
|
||
ports:
|
||
- target: 5432
|
||
published: 5432
|
||
protocol: tcp
|
||
mode: host
|
||
- target: 8008
|
||
published: 8008
|
||
protocol: tcp
|
||
mode: host
|
||
deploy:
|
||
replicas: 1
|
||
placement:
|
||
max_replicas_per_node: 1
|
||
constraints:
|
||
- node.hostname == iklim-db-01
|
||
```
|
||
|
||
Volumes `postgresql-01-data`, `postgresql-02-data`, `postgresql-03-data` are declared at the bottom of `docker-stack-db.prod.yml` and created automatically on first deploy.
|
||
|
||
### 5.5 Status Check
|
||
|
||
```bash
|
||
# On iklim-app-01 — Patroni cluster status:
|
||
docker exec -it $(docker ps -q -f name=iklim-db_patroni-01 | head -1) \
|
||
patronictl -c /etc/patroni/patroni.yml list
|
||
```
|
||
|
||
Expected output: one `Leader` row and two `Replica` rows, all with the `State` column set to `running`.
|
||
|
||
```bash
|
||
# etcd cluster health (from app node via overlay):
|
||
docker run --rm --network iklimco-net alpine \
|
||
sh -c "wget -qO- http://etcd-01:2379/health && \
|
||
wget -qO- http://etcd-02:2379/health && \
|
||
wget -qO- http://etcd-03:2379/health"
|
||
```
|
||
|
||
```bash
|
||
# Find the current primary:
|
||
docker exec -it $(docker ps -q -f name=iklim-db_patroni-01 | head -1) \
|
||
patronictl -c /etc/patroni/patroni.yml topology
|
||
```
|
||
|
||
## 6. Deploy
|
||
|
||
All DB services (etcd, MongoDB, Patroni) are in the single `docker-stack-db.prod.yml` stack. Deploy from `iklim-app-01` in the repo working directory.
|
||
|
||
### .env File
|
||
|
||
The `/opt/iklimco/stacks/.env` file is stored on StorageBox as `prod/secrets/iklim.co/.env.stacks`. Fetch it once before first deploy:
|
||
|
||
```bash
|
||
scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \
|
||
/opt/iklimco/stacks/.env
|
||
chmod 600 /opt/iklimco/stacks/.env
|
||
```
|
||
|
||
File content (`/opt/iklimco/stacks/.env`, not committed to the repo):
|
||
|
||
```env
|
||
DATABASE_POSTGRES_ROOT_USER=postgres
|
||
POSTGRES_PASSWORD=<strong-password>
|
||
REPLICATOR_PASSWORD=<strong-password>
|
||
MONGO_ROOT_PASSWORD=<strong-password>
|
||
```
|
||
|
||
### Deploy Steps
|
||
|
||
```bash
|
||
# On iklim-app-01, in the repo working directory:
|
||
export $(cat /opt/iklimco/stacks/.env | xargs)
|
||
|
||
# Automatic ETCD_INITIAL_CLUSTER_STATE detection:
|
||
DEPLOY_FILE="docker-stack-db.prod.yml"
|
||
if docker service ls --filter name=iklim-db_etcd-01 -q 2>/dev/null | grep -q .; then
|
||
echo "ℹ️ etcd services mevcut, 'existing' ile deploy ediliyor..."
|
||
DEPLOY_FILE=$(mktemp /tmp/docker-stack-db.XXXXXX.yml)
|
||
sed "s/ETCD_INITIAL_CLUSTER_STATE: new/ETCD_INITIAL_CLUSTER_STATE: existing/g" \
|
||
docker-stack-db.prod.yml > "$DEPLOY_FILE"
|
||
else
|
||
echo "ℹ️ İlk deploy, 'new' state kullanılıyor..."
|
||
fi
|
||
|
||
docker stack deploy \
|
||
--with-registry-auth \
|
||
-c "$DEPLOY_FILE" \
|
||
iklim-db
|
||
|
||
[ "$DEPLOY_FILE" != "docker-stack-db.prod.yml" ] && rm -f "$DEPLOY_FILE"
|
||
|
||
# Wait for etcd cluster to be ready:
|
||
echo "⏳ etcd bekleniyor..."
|
||
for i in $(seq 1 18); do
|
||
if docker run --rm --network iklimco-net alpine \
|
||
sh -c "wget -qO- http://etcd-01:2379/health 2>/dev/null | grep -q '\"health\":\"true\"'"; then
|
||
echo "✅ etcd ready"
|
||
break
|
||
fi
|
||
[ "$i" -eq 18 ] && echo "❌ etcd timeout" && exit 1
|
||
echo " attempt $i/18 — 10s bekleniyor..."
|
||
sleep 10
|
||
done
|
||
|
||
docker stack services iklim-db
|
||
```
|
||
|
||
### DB Node Placement Check
|
||
|
||
```bash
|
||
docker service ps iklim-db_etcd-01
|
||
docker service ps iklim-db_mongodb-01
|
||
docker service ps iklim-db_patroni-01
|
||
```
|
||
|
||
All tasks must run on the expected `iklim-db-*` nodes.
|
||
|
||
### MongoDB Replica Set Initialization
|
||
|
||
Run once after the stack is deployed:
|
||
|
||
```bash
|
||
# From iklim-app-01 via overlay network:
|
||
docker run --rm -it --network iklimco-net mongo:8 \
|
||
mongosh "mongodb://mongo-root:${MONGO_ROOT_PASSWORD}@mongodb-01/admin"
|
||
|
||
# Inside mongosh:
|
||
rs.initiate({
|
||
_id: "rs0",
|
||
members: [
|
||
{ _id: 0, host: "mongodb-01:27017", priority: 2 },
|
||
{ _id: 1, host: "mongodb-02:27017", priority: 1 },
|
||
{ _id: 2, host: "mongodb-03:27017", priority: 1 }
|
||
]
|
||
})
|
||
```
|
||
|
||
## 7. Access from App Services
|
||
|
||
App containers connect to DB services through the `iklimco-net` overlay network by **overlay DNS name**. Because the `iklim-db` stack shares the `iklimco-net` external network, service names and aliases are resolved through overlay DNS.
|
||
|
||
### MongoDB Replica Set Connection String
|
||
|
||
Variables in `env-prod/.env`:
|
||
|
||
```bash
|
||
DATABASE_MONGODB_HOST=mongodb-01:27017,mongodb-02:27017,mongodb-03:27017
|
||
DATABASE_MONGODB_PARAMS=replicaSet=rs0&readPreference=secondaryPreferred&authSource=admin
|
||
```
|
||
|
||
Microservice URI through overlay DNS:
|
||
```
|
||
mongodb://<user>:<password>@mongodb-01:27017,mongodb-02:27017,mongodb-03:27017/<db>?replicaSet=rs0&readPreference=secondaryPreferred&authSource=admin
|
||
```
|
||
|
||
> For direct testing, from outside the overlay with private IP:
|
||
> `mongodb://mongo-root:<PASSWORD>@10.20.20.11:27017,10.20.20.12:27017,10.20.20.13:27017/admin?replicaSet=rs0&authSource=admin`
|
||
|
||
### PostgreSQL — Patroni
|
||
|
||
Variables in `env-prod/.env`:
|
||
|
||
```bash
|
||
DATABASE_POSTGRES_HOST=patroni-01:5432,patroni-02:5432,patroni-03:5432
|
||
DATABASE_POSTGRES_PARAMS=targetServerType=preferSecondary&loadBalanceHosts=true
|
||
```
|
||
|
||
Patroni manages whichever node is primary at any moment. The JDBC/libpq driver automatically selects primary/secondary through the `targetServerType` parameter in the multi-host list:
|
||
|
||
```
|
||
# Write — goes to primary (libpq URI):
|
||
postgresql://<user>@patroni-01:5432,patroni-02:5432,patroni-03:5432/<db>?targetServerType=primary
|
||
|
||
# Read (load balancing):
|
||
postgresql://<user>@patroni-01:5432,patroni-02:5432,patroni-03:5432/<db>?targetServerType=preferSecondary&loadBalanceHosts=true
|
||
```
|
||
|
||
> For direct testing, from outside the overlay with private IP:
|
||
> `postgresql://postgres@10.20.20.11:5432,10.20.20.12:5432,10.20.20.13:5432/postgres?targetServerType=primary`
|
||
|
||
### Patroni REST API
|
||
|
||
Patroni exposes an HTTP endpoint on port 8008. This endpoint can be used with HAProxy or a similar load balancer to route to the primary automatically:
|
||
|
||
```bash
|
||
# Primary check (HTTP 200 = primary, HTTP 503 = replica):
|
||
curl -s http://patroni-01:8008/primary
|
||
```
|
||
|
||
## 8. Geliştirici ve Ofis Erişimi (Production)
|
||
|
||
Prod cluster yapısında `pg-proxy` veya `mongo-proxy` **kullanılmaz**. Ofis bilgisayarından erişim için doğrudan DB subnet'i hedef alınır.
|
||
|
||
### WireGuard Ayarı
|
||
Ofis bilgisayarındaki `.conf` dosyasında `AllowedIPs` güncellenmelidir:
|
||
`AllowedIPs = 10.8.0.1/32, 10.20.20.0/24`
|
||
|
||
### Bağlantı Parametreleri (Multi-Host)
|
||
Modern veritabanı araçları (DBeaver, Compass vb.) küme farkındalıklı bağlantı kurmalıdır:
|
||
|
||
| Veritabanı | Host Listesi | Port | Kritik Parametre |
|
||
| :--- | :--- | :--- | :--- |
|
||
| **PostgreSQL** | `10.20.20.11, 10.20.20.12, 10.20.20.13` | `5432` | `targetServerType=primary` |
|
||
| **MongoDB** | `10.20.20.11, 10.20.20.12, 10.20.20.13` | `27017` | `replicaSet=rs0` |
|
||
|
||
## Acceptance Criteria
|
||
|
||
- `docker stack services iklim-db` — 9 services visible (etcd-01/02/03, mongodb-01/02/03, patroni-01/02/03), all `1/1`
|
||
- `docker service ps iklim-db_patroni-01/02/03` — each task runs on its expected `iklim-db-*` node
|
||
- `docker service ps iklim-db_mongodb-01/02/03` — each task runs on its expected `iklim-db-*` node
|
||
- `docker service ps iklim-db_etcd-01/02/03` — each task runs on its expected `iklim-db-*` node
|
||
- `patronictl list` — 1 `Leader`, 2 `Replica`, all `running`
|
||
- etcd health endpoint returns `"health":"true"` on all three nodes via overlay
|
||
- `rs.status()` — 1 PRIMARY, 2 SECONDARY
|
||
- MongoDB and PostgreSQL are reachable from app nodes.
|
||
- Ports `5432`, `27017`, `2379`, `2380`, and `8008` are closed from the public internet.
|
||
- When a DB node is restarted, Patroni performs automatic election and a new primary is selected.
|
||
- During Patroni primary transition, the old primary rejoins as standby; there is no split-brain.
|