docs: update production roadmap for HA Vault and shared storage

- Refactor production setup documentation to reflect a 3-node Vault Raft cluster starting from launch.
- Update all paths to use StorageBox mounts for shared state (SWAG config, TLS certs, Monitoring data).
- Switch Nginx configuration convention from proxy-confs to site-confs to align with SWAG's auto-include behavior.
- Standardize TLS private key extensions to .pem.
- Update node failover and recovery facts to include monitoring services.
- Align deployment pipeline instructions with the latest environment variable-driven approach.
This commit is contained in:
Murat ÖZDEMİR 2026-05-16 16:18:21 +03:00
parent f4b7f49968
commit 5ddba7eba4
17 changed files with 743 additions and 231 deletions

View File

@ -77,31 +77,34 @@ Prod ortamında birden fazla manager node (en az 3) çalıştırılır. Tek mana
--- ---
# Prod — SWAG Failover # Prod — Monitoring & SWAG Failover
SWAG cluster-native değildir; her zaman tek instance çalışır ve `iklim-app-01`'e (Floating IP node) sabitlenmiştir. `iklim-app-01` çöktüğünde SWAG ve cert-reloader da durur; DNS ve HTTPS erişimi kesilir. Swarm quorum 2 manager ile devam eder; mikroservisler ve Vault başka node'lara taşınır. SWAG, cert-reloader, Prometheus ve Grafana cluster-native (replicated) değildir; her zaman tek instance çalışırlar ve varsayılan olarak `iklim-app-01`'e (Floating IP node) sabitlenmişlerdir. `iklim-app-01` çöktüğünde bu servisler durur; DNS/HTTPS erişimi ve izleme (monitoring) kesilir. Swarm quorum 2 manager ile devam eder; mikroservisler ve Vault başka node'lara taşınır.
SWAG konfigürasyonu (`/config`, letsencrypt sertifikaları dahil) StorageBox'ta tutulduğu için (`SWAG_CONFIG_DIR=/mnt/storagebox/prod/swag/config`) manuel failover hızlı yapılabilir. Tüm bu servislerin verileri ve konfigürasyonları StorageBox'ta tutulur:
- **SWAG:** `/mnt/storagebox/swag/config`
- **SSL:** `/mnt/storagebox/ssl`
- **Prometheus:** `/mnt/storagebox/prometheus/data`
- **Grafana:** `/mnt/storagebox/grafana/data`
## Prod Senaryo: `iklim-app-01` Çöktü ## Prod Senaryo: `iklim-app-01` Çöktü
### 1. SWAG'ı Başka Node'a Taşı ### 1. Servisleri Başka Node'a Taşı
SWAG ve cert-reloader birlikte taşınmalıdır. Prometheus ve Grafana da bağımsız olarak veya aynı anda taşınabilir.
```bash ```bash
# iklim-app-02 veya iklim-app-03 üzerinde (aktif manager): # iklim-app-02 veya iklim-app-03 üzerinde (aktif manager):
docker service update \
--constraint-add "node.hostname == iklim-app-02" \
--constraint-rm "node.hostname == iklim-app-01" \
iklimco_swag
docker service update \ # SWAG & Cert-Reloader taşıma
--constraint-add "node.hostname == iklim-app-02" \ docker service update --constraint-add "node.hostname == iklim-app-02" --constraint-rm "node.hostname == iklim-app-01" iklimco_swag
--constraint-rm "node.hostname == iklim-app-01" \ docker service update --constraint-add "node.hostname == iklim-app-02" --constraint-rm "node.hostname == iklim-app-01" iklimco_cert-reloader
iklimco_cert-reloader
# Prometheus & Grafana taşıma
docker service update --constraint-add "node.hostname == iklim-app-02" --constraint-rm "node.hostname == iklim-app-01" iklimco_prometheus
docker service update --constraint-add "node.hostname == iklim-app-02" --constraint-rm "node.hostname == iklim-app-01" iklimco_grafana
``` ```
SWAG StorageBox'taki mevcut letsencrypt sertifikalarını bulur; yeni sertifika talep etmez. cert-reloader yeni node'da başlar ve `/mnt/storagebox/prod/ssl`'e yazar.
### 2. Floating IP'yi Yeni Node'a Taşı ### 2. Floating IP'yi Yeni Node'a Taşı
**CLI ile:** **CLI ile:**
@ -118,31 +121,24 @@ hcloud floating-ip assign <floating-ip-id> <iklim-app-02-server-id>
4. `iklim-prod-app-fip` satırının sağındaki **⋮** (üç nokta) menüsünü aç → **Reassign**. 4. `iklim-prod-app-fip` satırının sağındaki **⋮** (üç nokta) menüsünü aç → **Reassign**.
5. Açılan listeden **`iklim-app-02`**'yi seç → **Reassign** butonuna tıkla. 5. Açılan listeden **`iklim-app-02`**'yi seç → **Reassign** butonuna tıkla.
DNS A kaydı zaten Floating IP'yi gösterdiği için ek DNS değişikliği gerekmez.
### 3. Doğrula ### 3. Doğrula
```bash ```bash
docker service ps iklimco_swag docker service ls | grep -E 'swag|cert-reloader|prometheus|grafana'
docker service ps iklimco_cert-reloader
curl -si https://api.iklim.co/health curl -si https://api.iklim.co/health
``` ```
### `iklim-app-01` Geri Döndüğünde ### `iklim-app-01` Geri Döndüğünde
Node Swarm'a yeniden katıldıktan sonra servisleri tekrar `iklim-app-01`'e taşı ve Floating IP'yi geri aktar: Node Swarm'a yeniden katıldıktan sonra tüm servisleri tekrar `iklim-app-01`'e taşıyıp Floating IP'yi geri aktarabilirsiniz.
```bash ```bash
docker service update \ # Servisleri geri taşı
--constraint-add "node.hostname == iklim-app-01" \ for svc in iklimco_swag iklimco_cert-reloader iklimco_prometheus iklimco_grafana; do
--constraint-rm "node.hostname == iklim-app-02" \ docker service update --constraint-add "node.hostname == iklim-app-01" --constraint-rm "node.hostname == iklim-app-02" $svc
iklimco_swag done
docker service update \
--constraint-add "node.hostname == iklim-app-01" \
--constraint-rm "node.hostname == iklim-app-02" \
iklimco_cert-reloader
# Floating IP'yi iklim-app-01'e geri ata
hcloud floating-ip assign <floating-ip-id> <iklim-app-01-server-id> hcloud floating-ip assign <floating-ip-id> <iklim-app-01-server-id>
``` ```
@ -153,4 +149,5 @@ hcloud floating-ip assign <floating-ip-id> <iklim-app-01-server-id>
| Swarm quorum | Otomatik — 2 manager yeterli | | Swarm quorum | Otomatik — 2 manager yeterli |
| Vault, mikroservisler | Otomatik — `node.labels.type == service` constraint ile başka node'a schedule edilir | | Vault, mikroservisler | Otomatik — `node.labels.type == service` constraint ile başka node'a schedule edilir |
| SWAG, cert-reloader | Manuel — `docker service update --constraint-*` + Floating IP taşıma | | SWAG, cert-reloader | Manuel — `docker service update --constraint-*` + Floating IP taşıma |
| TLS sertifikaları | StorageBox'ta; failover node hemen erişir, yeniden istek gerekmez | | Prometheus, Grafana | Manuel — `docker service update --constraint-*` |
| Veriler & Konfig | StorageBox'ta; failover node hemen erişir, veri kaybı yaşanmaz |

View File

@ -135,7 +135,7 @@ not via the Gitea pipeline.
| Constraint | Resolves to | Services | | Constraint | Resolves to | Services |
|------------|-------------|----------| |------------|-------------|----------|
| `node.hostname == iklim-app-01` | iklim-app-01 only | SWAG, cert-reloader | | `node.hostname == iklim-app-01` | iklim-app-01 only | SWAG, cert-reloader |
| `node.labels.type == service` | iklim-app-01, iklim-app-02, iklim-app-03 | Vault, Redis, RabbitMQ, APISIX, Prometheus, Grafana, etcd | | `node.labels.type == service` | iklim-app-01, iklim-app-02, iklim-app-03 | Vault, Redis, RabbitMQ, APISIX, Prometheus, Grafana, etcd (idle in prod — APISIX uses Patroni etcd) |
| `node.labels.role == db` | iklim-db-01, iklim-db-02, iklim-db-03 | PostgreSQL, MongoDB, pg-proxy, mongo-proxy | | `node.labels.role == db` | iklim-db-01, iklim-db-02, iklim-db-03 | PostgreSQL, MongoDB, pg-proxy, mongo-proxy |
SWAG and cert-reloader are pinned to `iklim-app-01` (the Floating IP node) because SWAG does not support clustering and must match the public entry point. Vault floats across all service nodes; its TLS cert is read from StorageBox (`/mnt/storagebox/prod/ssl`) so it is available on whichever node Vault is scheduled on. Microservices carry no placement constraint and are distributed by the Swarm scheduler across all app nodes. DB services are pinned to DB nodes via separate DB stacks. SWAG and cert-reloader are pinned to `iklim-app-01` (the Floating IP node) because SWAG does not support clustering and must match the public entry point. Vault floats across all service nodes; its TLS cert is read from StorageBox (`/mnt/storagebox/ssl`) so it is available on whichever node Vault is scheduled on. Microservices carry no placement constraint and are distributed by the Swarm scheduler across all app nodes. DB services are pinned to DB nodes via separate DB stacks.

View File

@ -35,15 +35,16 @@ No additional action needed in the repo.
The deploy pipeline (see `08-deploy-pipeline-update.md`) runs on iklim-app-01: The deploy pipeline (see `08-deploy-pipeline-update.md`) runs on iklim-app-01:
```bash ```bash
mkdir -p /opt/iklimco/swag/dns-conf set -a; . ./.env; set +a
envsubst < swag/dns-conf/godaddy.ini.tpl > /opt/iklimco/swag/dns-conf/godaddy.ini mkdir -p "$SWAG_DNS_CONF_DIR"
chmod 600 /opt/iklimco/swag/dns-conf/godaddy.ini envsubst < swag/dns-conf/godaddy.ini.tpl > "$SWAG_DNS_CONF_DIR/godaddy.ini"
chmod 600 "$SWAG_DNS_CONF_DIR/godaddy.ini"
``` ```
## Step 4 — GoDaddy A records for prod subdomains ## Step 4 — GoDaddy A records for prod subdomains
In GoDaddy DNS panel for `iklim.co`, add/update A records pointing to the **Floating IP** (`iklim-prod-app-fip`). In GoDaddy DNS panel for `iklim.co`, add/update A records pointing to the **Floating IP** (`iklim-prod-app-fip`).
Floating IP değerini almak için: `terraform output prod_floating_ip` To get the Floating IP value: `terraform output prod_floating_ip`
| Record | Value | | Record | Value |
|--------|-------| |--------|-------|
@ -52,8 +53,8 @@ Floating IP değerini almak için: `terraform output prod_floating_ip`
| `rabbitmq` | `<iklim-prod-app-fip>` | | `rabbitmq` | `<iklim-prod-app-fip>` |
| `grafana` | `<iklim-prod-app-fip>` | | `grafana` | `<iklim-prod-app-fip>` |
> Floating IP `iklim-app-01`'e atanmıştır (`06-prod-terraform-iaac.md``floating_ip.tf`). > The Floating IP is assigned to `iklim-app-01` (`06-prod-terraform-iaac.md``floating_ip.tf`).
> Failover gerekirse Floating IP başka bir app node'una taşınabilir; DNS değişmez. > If failover is needed, the Floating IP can be reassigned to another app node; DNS does not change.
## Notes ## Notes
- Test and prod SWAG instances both obtain `*.iklim.co` independently from Let's Encrypt. - Test and prod SWAG instances both obtain `*.iklim.co` independently from Let's Encrypt.

View File

@ -1,13 +1,49 @@
# 03 — docker-stack-infra.yml Changes (Prod) # 03 — docker-stack-infra.yml Changes (Prod)
## Context ## Context
- **File:** `docker-stack-infra.yml` (repo root — shared between test and prod)
- All changes from `test-env/03-infra-stack-changes.md` apply here identically. ### File strategy — overlay approach
- **Additional prod-specific changes:**
- Microservices have no constraint (distributed across app nodes by Swarm). Prod-specific service changes are **not written directly** into `docker-stack-infra.yml`; they are kept in a separate overlay file:
- Replica counts for stateless services are increased.
- **Note:** PostgreSQL and MongoDB are **not** in `docker-stack-infra.yml` for prod. They run on | File | Usage |
dedicated DB nodes in separate stacks (`iklim-db` and `iklim-patroni`). See `08-prod-db-cluster-kurulum.md`. |------|-------|
| `docker-stack-infra.yml` | Base — works as-is for test |
| `docker-stack-infra.prod.yml` | Prod overlay — additional services and overrides |
```bash
# Test deploy:
docker stack deploy -c docker-stack-infra.yml iklimco
# Prod deploy (Swarm merges both files):
docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
```
Docker Swarm merge rule: if the same service name appears in both files, the overlay wins (deploy, environment, etc.); services only present in the overlay are added.
### Prod-specific changes summary
- APISIX: 1 → 3 replicas (overlay override)
- Redis: single-instance → Sentinel cluster — 1 master + 2 replicas + 3 sentinels (overlay adds new services)
- RabbitMQ: 1 → 3-node Erlang cluster (overlay override + env)
- Vault: 1 → 3-node Raft cluster (overlay override) — see `07-vault-raft-plan.md`
- No separate APISIX etcd: Patroni etcd is shared (`/apisix` prefix)
- `init/apisix-core/init.sh`: when `PROFILE=prod`, rate limit `policy:local``policy:redis`
### swag-vl volume — not used in prod, not defined in overlay
Test-env Step 9 adds the `swag-vl` named volume to the base file. In prod, SWAG mounts to the StorageBox via the `${SWAG_CONFIG_DIR}` env var, so this volume is unused by any service. No need to remove it in the overlay — Swarm does not create unused volume definitions, it remains harmless.
No `swag-vl` definition is made in `docker-stack-infra.prod.yml`.
### Monitoring Persistence (StorageBox)
Prometheus and Grafana run as single instances. To ensure monitoring data and dashboards survive a node failover (moving from `iklim-app-01` to another node), their data is stored on the shared StorageBox:
- **Prometheus:** `/mnt/storagebox/prometheus/data`
- **Grafana:** `/mnt/storagebox/grafana/data`
These paths are mounted via env vars (`PROMETHEUS_DATA_DIR`, `GRAFANA_DATA_DIR`) with named-volume fallbacks for test. See Step 8 for implementation details.
**Note:** PostgreSQL and MongoDB are not in `docker-stack-infra.yml`. They run in separate stacks on DB nodes (`iklim-db` and `iklim-patroni`). See `08-prod-db-cluster-kurulum.md`.
## Step 1 — Apply all test-env changes first ## Step 1 — Apply all test-env changes first
@ -17,69 +53,515 @@ Follow every step in `test-env/03-infra-stack-changes.md`:
- Remove published ports for vault, apisix, rabbitmq, prometheus, grafana, apisix-dashboard - Remove published ports for vault, apisix, rabbitmq, prometheus, grafana, apisix-dashboard
- Add `swag-vl` volume - Add `swag-vl` volume
## Step 2 — Pin Vault to manager node (initial prod — single instance) ## Step 2 — Vault: 3-node Raft cluster (prod)
Vault starts as a single instance pinned to the manager node. Vault starts directly with 3 replicas; the Phase 1 single-instance stage is skipped in prod.
Raft cluster migration is handled separately in `07-vault-raft-plan.md`. See `07-vault-raft-plan.md` Phase 2 for detailed setup steps.
```yaml ```yaml
# Vault placement stays as: vault:
placement: deploy:
constraints: mode: replicated
- node.role == manager replicas: 3
placement:
constraints:
- node.labels.type == service
``` ```
## Step 3 — Increase APISIX replicas for prod ## Step 3 — APISIX: 3 replicas + init.sh rate limit update (prod overlay)
Add to `docker-stack-infra.prod.yml`:
```yaml ```yaml
# CHANGE in apisix service deploy block: # docker-stack-infra.prod.yml
services:
apisix:
deploy:
mode: replicated mode: replicated
replicas: 2 # was 1 replicas: 3
```
APISIX is stateless (config in etcd) — multiple replicas are safe.
Swarm load-balances SWAG's requests across APISIX replicas via VIP.
## Step 4 — etcd: single instance in docker-stack-infra.yml (APISIX config store only)
The `etcd` service in `docker-stack-infra.yml` is used exclusively by APISIX as its configuration
store. It runs as a single instance on a manager node and is separate from the etcd cluster used by
Patroni for PostgreSQL HA.
```yaml
# etcd placement stays as:
placement: placement:
constraints: constraints:
- node.role == manager - node.labels.type == service
apisix-dashboard:
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
``` ```
> The 3-node etcd cluster for Patroni/PostgreSQL HA is deployed separately via `08-prod-db-cluster-kurulum.md` APISIX and apisix-dashboard are stateless (config lives in Patroni etcd) — 3 replicas is safe.
> on the dedicated DB nodes. These are two independent etcd deployments with different purposes. Swarm distributes SWAG requests to APISIX replicas via VIP (IPVS round-robin).
## Step 5 — Verify the complete file ### init.sh — rate limit policy:redis (prod)
After all edits, validate the YAML: With `policy:local`, each APISIX instance counts independently → the global limit effectively becomes 3× with 3 replicas.
Switch to `policy:redis` for `PROFILE=prod`.
Update the global rate limit block in `init/apisix-core/init.sh`:
```bash ```bash
docker stack config -c docker-stack-infra.yml > /dev/null && echo "YAML valid" if [[ "$PROFILE" != "dev" ]]; then
if [[ "$PROFILE" == "prod" ]]; then
RATE_POLICY="redis"
RATE_REDIS=',\"redis_host\":\"redis-master\",\"redis_port\":6379,\"redis_password\":\"'\"$REDIS_PASSWORD\"'\"'
else
RATE_POLICY="local"
RATE_REDIS=""
fi
call_api "global rate limit" -X PUT "$APISIX_ADMIN_URL/global_rules/1" \
-H "X-API-KEY: $API_KEY" -H "Content-Type: application/json" \
-d '{"plugins":{"limit-count":{"count":300,"time_window":60,"key_type":"var","key":"remote_addr","rejected_code":429,"policy":"'"$RATE_POLICY"'"'"$RATE_REDIS"'}}}'
fi
``` ```
No output errors = valid. > APISIX's `limit-count` plugin does not natively support Redis Sentinel; `policy:redis` works with a single endpoint.
> The `redis-master` service name stays constant within Swarm — during Sentinel failover (~10-30 s) rate limiting may be
> temporarily inconsistent; this brief disruption is acceptable. Microservices use Spring Data Redis Sentinel natively.
## Placement summary for prod (docker-stack-infra.yml only) ## Step 4 — etcd: Separate APISIX etcd removed — Patroni etcd shared
| Service | Placement | The standalone `etcd` service in `docker-stack-infra.yml` is **not used in prod and must be removed**.
|---------|-----------| APISIX uses the 3-node Patroni etcd cluster running on DB nodes, via the `/apisix` prefix.
| swag | `node.role == manager` |
| cert-reloader | `node.role == manager` |
| vault | `node.role == manager` |
| apisix (2 replicas) | no constraint (distributed across app nodes) |
| apisix-dashboard | no constraint |
| redis | `node.role == manager` |
| rabbitmq | `node.role == manager` |
| etcd (APISIX store) | `node.role == manager` |
| prometheus | `node.role == manager` |
| grafana | `node.role == manager` |
> PostgreSQL and MongoDB are deployed in separate DB stacks on `iklim-db-*` nodes. ### Why consolidated?
> See `08-prod-db-cluster-kurulum.md` for those stacks. - A standalone single-instance etcd was a SPOF for APISIX.
- Patroni etcd is already 3-node HA — APISIX gets a more reliable config store.
- etcd supports prefix-based namespacing; Patroni uses `/service/`, APISIX uses `/apisix/` — no collision.
### APISIX etcd connection configuration
Update the etcd endpoints in the APISIX service in `docker-stack-infra.yml` to point to DB nodes:
```yaml
apisix:
environment:
APISIX_STAND_ALONE: "false"
# via apisix/conf/config.yaml or environment:
# etcd:
# host:
# - "http://iklim-db-01:2379"
# - "http://iklim-db-02:2379"
# - "http://iklim-db-03:2379"
# prefix: "/apisix"
```
The preferred method is mounting `config.yaml` via a Docker config or volume:
```yaml
# config/apisix/config.yaml
etcd:
host:
- "http://iklim-db-01:2379"
- "http://iklim-db-02:2379"
- "http://iklim-db-03:2379"
prefix: "/apisix"
timeout: 30
```
### Firewall requirement
etcd access from app nodes to DB nodes must be open:
```bash
# Each app node → each db node, port 2379
# If inside Hetzner private network it may be open by default;
# verify there are no ufw/firewalld rules blocking it:
nc -zv iklim-db-01 2379
```
> **Note:** Docker Compose overlay files can only add/override services, not remove them. The standalone `etcd` service remains in the base stack and runs as an idle container in prod — APISIX connects to Patroni etcd instead (via config.yaml in the prod overlay). This is harmless; etcd uses negligible resources with no active clients.
## Step 5 — Redis: Sentinel cluster (prod overlay)
Redis runs as a single instance in test. In prod, Sentinel provides HA.
Bitnami images are used — all configuration is done via env vars, no separate `.conf` file needed.
### Prerequisites
```bash
# Create Docker secret for Redis password:
openssl rand -hex 32 | docker secret create redis_password -
```
### Topology
```
iklim-app-01: redis-master (1 replica, pinned to app-01)
iklim-app-02: redis-replica (1 replica, pinned to app-02)
iklim-app-03: redis-replica (1 replica, pinned to app-03)
iklim-app-01: redis-sentinel ┐
iklim-app-02: redis-sentinel ├─ 3 replicas, spread across all app nodes
iklim-app-03: redis-sentinel ┘
```
### docker-stack-infra.prod.yml — Redis services
The existing `redis` service is overridden in the prod overlay as **master**; `redis-replica` and `redis-sentinel` are added as new services. The service name (`redis`) remains unchanged so the APISIX connection config does not need updating.
```yaml
# docker-stack-infra.prod.yml
services:
redis: # override base single-instance redis → master
image: bitnamisecure/redis:latest
environment:
ALLOW_EMPTY_PASSWORD: no
REDIS_PASSWORD: ${REDIS_PASSWORD}
REDIS_REPLICATION_MODE: master
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == iklim-app-01
restart_policy:
condition: any
delay: 5s
labels:
project: co.iklim
redis-replica:
image: bitnamisecure/redis:latest
environment:
ALLOW_EMPTY_PASSWORD: no
REDIS_REPLICATION_MODE: slave
REDIS_MASTER_HOST: redis
REDIS_MASTER_PORT_NUMBER: "6379"
REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
REDIS_PASSWORD: ${REDIS_PASSWORD}
deploy:
mode: replicated
replicas: 2
placement:
constraints:
- node.labels.type == service
preferences:
- spread: node.hostname
restart_policy:
condition: any
delay: 5s
labels:
project: co.iklim
redis-sentinel:
image: bitnamisecure/redis-sentinel:latest
environment:
REDIS_SENTINEL_MASTER_NAME: mymaster
REDIS_MASTER_HOST: redis
REDIS_MASTER_PORT_NUMBER: "6379"
REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
REDIS_SENTINEL_QUORUM: "2"
REDIS_SENTINEL_DOWN_AFTER_MILLISECONDS: "5000"
REDIS_SENTINEL_FAILOVER_TIMEOUT: "10000"
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
preferences:
- spread: node.hostname
restart_policy:
condition: any
delay: 5s
labels:
project: co.iklim
```
### Microservice connection (Spring Data Redis)
Microservices must use a Sentinel-aware connection:
```yaml
# application-prod.yml
spring:
data:
redis:
sentinel:
master: mymaster
nodes:
- redis-sentinel:26379
password: ${REDIS_PASSWORD}
```
### Verification
```bash
# Query master identity:
docker exec $(docker ps -q -f name=iklimco_redis-sentinel | head -1) \
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster
```
## Step 6 — RabbitMQ: 3-node Erlang cluster (prod overlay)
RabbitMQ runs as a 3-node cluster with one instance per app node.
### Prerequisites
```bash
# Create Docker secret for Erlang cookie (must be identical on all nodes):
openssl rand -hex 32 | docker secret create rabbitmq_erlang_cookie -
```
### docker-stack-infra.prod.yml — RabbitMQ override
```yaml
# docker-stack-infra.prod.yml (add alongside redis services)
services:
rabbitmq:
image: rabbitmq:3-management
hostname: "rabbitmq-{{.Node.Hostname}}"
environment:
RABBITMQ_ERLANG_COOKIE_FILE: /run/secrets/rabbitmq_erlang_cookie
RABBITMQ_USE_LONGNAME: "true"
RABBITMQ_NODENAME: "rabbit@rabbitmq-{{.Node.Hostname}}"
secrets:
- rabbitmq_erlang_cookie
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
update_config:
parallelism: 1
order: stop-first
labels:
project: co.iklim
secrets:
rabbitmq_erlang_cookie:
external: true
```
### Cluster join procedure (first setup)
RabbitMQ nodes do not form a cluster automatically; manual join is required after first start:
```bash
# Find the RabbitMQ container on iklim-app-02:
CTR=$(docker ps -q -f name=iklimco_rabbitmq)
# Stop, join, start:
docker exec "$CTR" rabbitmqctl stop_app
docker exec "$CTR" rabbitmqctl join_cluster rabbit@rabbitmq-iklim-app-01
docker exec "$CTR" rabbitmqctl start_app
# Repeat for iklim-app-03
```
```bash
# Verify cluster status (from any node):
docker exec "$CTR" rabbitmqctl cluster_status
```
> **HA policy:** After the cluster is formed, set quorum queues as the default:
> ```bash
> docker exec "$CTR" rabbitmqctl set_policy ha-all ".*" \
> '{"queue-type":"quorum"}' --apply-to queues
> ```
## Step 7 — Create `docker-stack-infra.prod.yml`
Create this file in the repo root alongside `docker-stack-infra.yml`. It combines all prod-specific overrides from Steps 26:
```yaml
# docker-stack-infra.prod.yml
# Prod overlay — deploy with:
# docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
services:
vault:
environment:
VAULT_LOCAL_CONFIG: >-
{"api_addr":"https://vault.iklim.co:8200",
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
"listener":[{"tcp":{"address":"0.0.0.0:8200",
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
"tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}],
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file
- /mnt/storagebox/ssl:/vault/certs:ro
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
apisix:
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
apisix-dashboard:
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
redis:
image: bitnamisecure/redis:latest
environment:
ALLOW_EMPTY_PASSWORD: no
REDIS_PASSWORD: ${REDIS_PASSWORD}
REDIS_REPLICATION_MODE: master
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == iklim-app-01
restart_policy:
condition: any
delay: 5s
labels:
project: co.iklim
redis-replica:
image: bitnamisecure/redis:latest
environment:
ALLOW_EMPTY_PASSWORD: no
REDIS_REPLICATION_MODE: slave
REDIS_MASTER_HOST: redis
REDIS_MASTER_PORT_NUMBER: "6379"
REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
REDIS_PASSWORD: ${REDIS_PASSWORD}
deploy:
mode: replicated
replicas: 2
placement:
constraints:
- node.labels.type == service
preferences:
- spread: node.hostname
restart_policy:
condition: any
delay: 5s
labels:
project: co.iklim
redis-sentinel:
image: bitnamisecure/redis-sentinel:latest
environment:
REDIS_SENTINEL_MASTER_NAME: mymaster
REDIS_MASTER_HOST: redis
REDIS_MASTER_PORT_NUMBER: "6379"
REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
REDIS_SENTINEL_QUORUM: "2"
REDIS_SENTINEL_DOWN_AFTER_MILLISECONDS: "5000"
REDIS_SENTINEL_FAILOVER_TIMEOUT: "10000"
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
preferences:
- spread: node.hostname
restart_policy:
condition: any
delay: 5s
labels:
project: co.iklim
rabbitmq:
image: rabbitmq:3-management
hostname: "rabbitmq-{{.Node.Hostname}}"
environment:
RABBITMQ_ERLANG_COOKIE_FILE: /run/secrets/rabbitmq_erlang_cookie
RABBITMQ_USE_LONGNAME: "true"
RABBITMQ_NODENAME: "rabbit@rabbitmq-{{.Node.Hostname}}"
secrets:
- rabbitmq_erlang_cookie
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
update_config:
parallelism: 1
order: stop-first
labels:
project: co.iklim
secrets:
rabbitmq_erlang_cookie:
external: true
```
## Step 8 — Monitoring Data Persistence (StorageBox)
Prometheus and Grafana run as single instances. Without persistent storage, data is lost on node failover. This step mounts their data directories from the StorageBox shared filesystem.
**Changes already applied to `docker-stack-infra.yml`:**
```yaml
prometheus:
volumes:
- ${PROMETHEUS_DATA_DIR:-prometheus-vl}:/prometheus
grafana:
volumes:
- ${GRAFANA_DATA_DIR:-grafana-vl}:/var/lib/grafana
```
Test uses the named Docker volume fallbacks (`prometheus-vl`, `grafana-vl`) — no test env change needed.
**Add to `prod/secrets/iklim.co/.env.prod` on storagebox** (already in `env-prod/.env`):
```bash
PROMETHEUS_DATA_DIR=/mnt/storagebox/prometheus/data
GRAFANA_DATA_DIR=/mnt/storagebox/grafana/data
```
**Create directories on StorageBox before first prod deploy:**
```bash
mkdir -p /mnt/storagebox/prometheus/data /mnt/storagebox/grafana/data
```
> Grafana writes its SQLite database and dashboard JSON to `/var/lib/grafana`.
> Prometheus writes its TSDB to `/prometheus`. Both directories must exist before the stack starts.
## Step 9 — Verify
```bash
# Base file must be valid on its own (test deploy):
docker stack config -c docker-stack-infra.yml > /dev/null && echo "base OK"
# Prod merge must be valid:
docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml > /dev/null && echo "prod merge OK"
```
## Placement and Replica Summary — prod
| Service | File | Replicas | Placement | HA Note |
|---------|------|----------|-----------|---------|
| swag | base | 1 | `node.hostname == iklim-app-01` | No clustering support; Floating IP pinned to node |
| cert-reloader | base | 1 | `node.hostname == iklim-app-01` | Cron-style task; duplicate would be problematic |
| vault | prod overlay | 3 | `node.labels.type == service` | Raft cluster — see `07-vault-raft-plan.md` |
| apisix | prod overlay | 3 | `node.labels.type == service` | Stateless; config in Patroni etcd; rate limit policy:redis |
| apisix-dashboard | prod overlay | 3 | `node.labels.type == service` | Stateless; reads from etcd |
| redis (master) | prod overlay | 1 | `node.hostname == iklim-app-01` | Sentinel cluster master |
| redis-replica | prod overlay | 2 | `node.labels.type == service` | Sentinel replica; spread:hostname |
| redis-sentinel | prod overlay | 3 | `node.labels.type == service` | Quorum=2; failover automatic |
| rabbitmq | prod overlay | 3 | `node.labels.type == service` | Erlang cluster; quorum queues |
| etcd | base | 1 | `node.labels.type == service` | Idle in prod — APISIX uses Patroni etcd; standalone service remains in base stack |
| prometheus | base | 1 | `node.labels.type == service` | No native HA; Thanos is overkill at this scale |
| grafana | base | 1 | `node.labels.type == service` | Not critical |
> PostgreSQL and MongoDB run in separate DB stacks on `iklim-db-*` nodes. See `08-prod-db-cluster-kurulum.md`.
> etcd: 3-node cluster on DB nodes — APISIX shares it via `/apisix` prefix.

View File

@ -1,7 +1,7 @@
# 04 — SWAG Nginx Proxy Configs (Prod) # 04 — SWAG Nginx Proxy Configs (Prod)
## Context ## Context
Same template files as test (`swag/proxy-confs/*.conf.tpl`), different env vars. Same template files as test (`swag/site-confs/*.conf.tpl`), different env vars.
The pipeline processes templates with prod-specific subdomain values. The pipeline processes templates with prod-specific subdomain values.
## Required env vars (in `.env` on storagebox `prod/secrets/iklim.co/.env.prod`) ## Required env vars (in `.env` on storagebox `prod/secrets/iklim.co/.env.prod`)
@ -14,20 +14,22 @@ GRAFANA_SUBDOMAIN=grafana.iklim.co
RESTRICTED_IP_1=78.187.87.109 RESTRICTED_IP_1=78.187.87.109
RESTRICTED_IP_2=95.70.151.248 RESTRICTED_IP_2=95.70.151.248
# SWAG storage paths — StorageBox so certs are accessible from any app node # SWAG storage paths — StorageBox is mounted on all app nodes, shared filesystem
# cert-reloader writes here; Vault reads from here on any manager node # cert-reloader writes here; Vault reads from this path on every node — no SSH distribution needed
SWAG_CERT_DIR=/mnt/storagebox/prod/ssl SWAG_CERT_DIR=/mnt/storagebox/ssl
# SWAG full config dir (includes letsencrypt state) — enables clean node failover # SWAG config dirs on StorageBox — all three survive node failover without pipeline re-run
SWAG_CONFIG_DIR=/mnt/storagebox/prod/swag/config SWAG_CONFIG_DIR=/mnt/storagebox/swag/config
SWAG_DNS_CONF_DIR=/mnt/storagebox/swag/dns-conf
SWAG_SITE_CONFS_DIR=/mnt/storagebox/swag/site-confs
``` ```
## Template files (already created in test step 04) ## Template files (already created in test step 04)
- `swag/site-confs/default.conf` - `swag/site-confs/default.conf`
- `swag/proxy-confs/api.conf.tpl` - `swag/site-confs/api.conf.tpl`
- `swag/proxy-confs/apigw.conf.tpl` - `swag/site-confs/apigw.conf.tpl`
- `swag/proxy-confs/rabbitmq.conf.tpl` - `swag/site-confs/rabbitmq.conf.tpl`
- `swag/proxy-confs/grafana.conf.tpl` - `swag/site-confs/grafana.conf.tpl`
No new files to create — the same templates work for both environments. No new files to create — the same templates work for both environments.
@ -38,25 +40,25 @@ set -a; . ./.env; set +a
export RESTRICTED_IP_1="78.187.87.109" export RESTRICTED_IP_1="78.187.87.109"
export RESTRICTED_IP_2="95.70.151.248" export RESTRICTED_IP_2="95.70.151.248"
sudo mkdir -p /opt/iklimco/swag/proxy-confs /opt/iklimco/swag/site-confs mkdir -p "$SWAG_DNS_CONF_DIR" "$SWAG_SITE_CONFS_DIR"
for tpl in swag/proxy-confs/*.conf.tpl; do for tpl in swag/site-confs/*.conf.tpl; do
out="/opt/iklimco/swag/proxy-confs/$(basename "${tpl%.tpl}")" out="$SWAG_SITE_CONFS_DIR/$(basename "${tpl%.tpl}")"
envsubst < "$tpl" | sudo tee "$out" > /dev/null envsubst < "$tpl" | sudo tee "$out" > /dev/null
echo "✅ $out" echo "✅ $out"
done done
sudo cp swag/site-confs/default.conf /opt/iklimco/swag/site-confs/default.conf sudo cp swag/site-confs/default.conf "$SWAG_SITE_CONFS_DIR/default.conf"
``` ```
With `API_SUBDOMAIN=api.iklim.co`, the output file `/opt/iklimco/swag/proxy-confs/api.conf` With `API_SUBDOMAIN=api.iklim.co`, the output file `$SWAG_SITE_CONFS_DIR/api.conf`
will contain `server_name api.iklim.co;` — correct for prod. (`/mnt/storagebox/swag/site-confs/api.conf`) will contain `server_name api.iklim.co;` — correct for prod.
## Verification ## Verification
After deploy, on iklim-app-01: After deploy, on iklim-app-01:
```bash ```bash
cat /opt/iklimco/swag/proxy-confs/api.conf | grep server_name cat /mnt/storagebox/swag/site-confs/api.conf | grep server_name
``` ```
Expected: `server_name api.iklim.co;` Expected: `server_name api.iklim.co;`
@ -74,4 +76,4 @@ Expected: APISIX response with valid `*.iklim.co` cert.
- `Prometheus` is intentionally NOT exposed via SWAG. Access it via Grafana - `Prometheus` is intentionally NOT exposed via SWAG. Access it via Grafana
(internal connection: `http://prometheus:9090`) or SSH tunnel. (internal connection: `http://prometheus:9090`) or SSH tunnel.
- If additional restricted-access subdomains are needed in the future, create a new - If additional restricted-access subdomains are needed in the future, create a new
`swag/proxy-confs/<name>.conf.tpl` following the same pattern. `swag/site-confs/<name>.conf.tpl` following the same pattern.

View File

@ -20,11 +20,23 @@ Changes made for test already apply to prod.
## Prod-specific note ## Prod-specific note
APISIX runs with `replicas: 2` in prod. Both replicas receive the same configuration APISIX runs with `replicas: 3` in prod — this value is defined in the `docker-stack-infra.prod.yml` overlay (not in the base `docker-stack-infra.yml`). All replicas read the same configuration from Patroni etcd (`/apisix` prefix) — a single `init` run is sufficient.
from etcd — no additional steps needed beyond the single init run.
The `init/apisix-core/init.sh` is called once (from the pipeline) and configures the ```bash
shared etcd state that all APISIX instances read from. # Prod deploy:
docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
```
`init/apisix-core/init.sh` is run once by the pipeline and writes the etcd state that all APISIX instances read.
## SWAG → APISIX load distribution
SWAG connects to APISIX via `proxy_pass http://apisix:9080;` — using the service name directly.
No additional upstream or load balancer configuration is needed on the SWAG side.
**How it works:** Docker Swarm resolves the `apisix` service name to a VIP (Virtual IP).
Swarm's internal IPVS load balancer automatically distributes incoming connections across the 3 replicas
in round-robin. SWAG is unaware of this mechanism; it happens transparently at the overlay network layer.
## Verification ## Verification

View File

@ -1,41 +1,45 @@
# 06 — cert-reloader Sidecar Service (Prod) # 06 — cert-reloader Sidecar Service (Prod)
## Context ## Context
Same service definition as test (see `test-env-setup/06-cert-reloader.md`). Service definition is identical to test (see `test-env-setup/06-cert-reloader.md`).
Prod-specific consideration: Vault is single-instance on the manager node (same as SWAG), In prod, Vault runs as a 3-node Raft cluster; cert distribution is handled via the StorageBox shared mount — no SSH required.
so the cert copy to `/opt/iklimco/ssl/` works without cross-node distribution.
When Vault is expanded to a 3-node Raft cluster (see `07-vault-raft-plan.md`), the ## Prod flow (3-node Vault Raft)
cert-reloader must be updated to distribute the cert to the other Vault nodes.
## Current behavior (single-Vault prod)
``` ```
SWAG (manager) renews cert → swag-vl SWAG renews cert → writes to SWAG_CONFIG_DIR (/mnt/storagebox/swag/config)
cert-reloader (manager) detects change → copies to /opt/iklimco/ssl/ → reloads Vault cert-reloader detects MD5 change
Vault (manager) reads /opt/iklimco/ssl/ → serves new cert → copies to /mnt/storagebox/ssl/ (shared across all app nodes)
→ docker service update --force iklimco_vault
Vault (3 replicas) restarts
→ each instance has /mnt/storagebox/ssl/ mounted → reads the new cert
→ healthcheck checks sealed status every 30 seconds
→ if sealed: reads vault_unseal_key Docker secret and auto-unseals
``` ```
No cross-node distribution needed. No SSH distribution, additional secrets, or cert-reloader script changes are needed.
## Future behavior (3-node Vault Raft — see step 07) ## Auto-unseal mechanism
When Vault runs on iklim-app-01, iklim-app-02, iklim-app-03: The Vault healthcheck is already implemented in `docker-stack-infra.yml`:
``` ```yaml
cert-reloader detects cert change healthcheck:
→ copies cert to /opt/iklimco/ssl/ on iklim-app-01 (local) test:
→ SSH copy to iklim-app-02:/opt/iklimco/ssl/ - "CMD"
→ SSH copy to iklim-app-03:/opt/iklimco/ssl/ - "sh"
→ docker service update --force iklimco_vault (restarts all 3 replicas) - "-c"
- >-
vault status -format=json 2>/dev/null | grep -q '"sealed":false' ||
vault operator unseal $$(cat /run/secrets/vault_unseal_key 2>/dev/null)
interval: 30s
timeout: 10s
start_period: 15s
retries: 5
``` ```
This requires: Each Vault container runs its own healthcheck independently — all 3 replicas unseal separately.
- An SSH key that cert-reloader can use to reach iklim-app-02 and iklim-app-03 The cert renewal → restart → auto-unseal chain requires no manual intervention.
- That key mounted as a Docker secret into cert-reloader
- Known_hosts for iklim-app-02 and iklim-app-03 pre-configured
Script update for this phase is tracked in `07-vault-raft-plan.md`.
## Verification ## Verification
@ -54,4 +58,4 @@ docker exec $(docker ps -q -f name=iklimco_vault) \
| openssl x509 -noout -dates' | openssl x509 -noout -dates'
``` ```
`notAfter` should match the cert in `/opt/iklimco/ssl/STAR.iklim.co.full.crt`. `notAfter` should match the cert in `/mnt/storagebox/ssl/STAR.iklim.co.full.crt`.

View File

@ -1,42 +1,28 @@
# 07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod) # 07 — Vault: 3-Node Raft Cluster (Prod)
## Context ## Context
Vault starts as a single instance on the manager node (iklim-app-01) for the initial prod launch. Vault starts directly as a 3-node Raft cluster in prod. The single-instance phase used in test is skipped.
This matches the current `docker-stack-infra.yml` configuration (file storage, single replica).
Raft HA cluster is planned for a later phase. Test used a single Vault instance (file storage, 1 replica on the manager node). Prod goes straight to Raft HA.
## Phase 1 — Initial prod launch (current) ## Vault service configuration
- **Replicas:** 1
- **Storage:** file (`/vault/file`) on iklim-app-01
- **Placement:** `node.role == manager` (iklim-app-01)
- **Cert:** from `/opt/iklimco/ssl/` (populated by cert-reloader from SWAG volume)
- **TLS:** `VAULT_LOCAL_CONFIG` unchanged — `api_addr: https://vault.iklim.co:8200`
No changes to `docker-stack-infra.yml` vault service for Phase 1.
## Phase 2 — Vault Raft Cluster (future)
### What changes
- **Replicas:** 3 (one per service node) - **Replicas:** 3 (one per service node)
- **Storage:** Raft integrated (replaces file storage) - **Storage:** Raft integrated storage
- **Placement:** `node.labels.type == service` (all 3 service nodes) - **Placement:** `node.labels.type == service` (all 3 app nodes)
- **Cert distribution:** cert-reloader SSH-copies renewed cert to iklim-app-02, iklim-app-03 - **Cert distribution:** No SSH needed — all nodes mount StorageBox, cert-reloader writes to `SWAG_CERT_DIR=/mnt/storagebox/ssl`, Vault reads from that path on every node
### Prerequisites
### Prerequisites before migration
- [ ] All 3 service nodes are running and labeled `type=service` - [ ] All 3 service nodes are running and labeled `type=service`
- [ ] Vault data backed up from Phase 1 (snapshot via `vault operator raft snapshot save`) - [ ] `/mnt/storagebox/ssl/` directory is mounted and accessible on all 3 app nodes
- [ ] SSH key created for cert-reloader to reach iklim-app-02 and iklim-app-03
- [ ] SSH key stored as Docker secret `cert_reloader_ssh_key`
- [ ] `/opt/iklimco/ssl/` directory exists on iklim-app-02 and iklim-app-03
- [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes) - [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes)
### Vault service update for Raft ### Vault service YAML (docker-stack-infra.prod.yml overlay)
```yaml ```yaml
vault: vault:
# ... (image, secrets, healthcheck unchanged) # ... (image, secrets, healthcheck unchanged from base)
environment: environment:
VAULT_LOCAL_CONFIG: >- VAULT_LOCAL_CONFIG: >-
{"api_addr":"https://vault.iklim.co:8200", {"api_addr":"https://vault.iklim.co:8200",
@ -44,11 +30,11 @@ vault:
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}}, "storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
"listener":[{"tcp":{"address":"0.0.0.0:8200", "listener":[{"tcp":{"address":"0.0.0.0:8200",
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt", "tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
"tls_key_file":"/vault/certs/STAR.iklim.co_key.txt"}}], "tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}],
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true} "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes: volumes:
- /opt/iklimco/vault/data:/vault/file # host path per node - /opt/iklimco/vault/data:/vault/file # host path per node
- /opt/iklimco/ssl:/vault/certs:ro - /mnt/storagebox/ssl:/vault/certs:ro # StorageBox — shared across all nodes, no SSH distribution needed
deploy: deploy:
mode: replicated mode: replicated
replicas: 3 replicas: 3
@ -60,44 +46,73 @@ vault:
> `{{ .Node.Hostname }}` is Docker Swarm's Go template for the node hostname — > `{{ .Node.Hostname }}` is Docker Swarm's Go template for the node hostname —
> gives each Vault instance a unique `node_id`. > gives each Vault instance a unique `node_id`.
### Raft join procedure (after deploying 3-replica Vault) ## Raft initialization procedure (first deploy)
Only the leader needs to be bootstrapped; others join via `vault operator raft join`: ### Step 1 — Deploy the stack
```bash ```bash
# On the primary Vault (iklim-app-01 container): docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
# Unseal if needed
docker exec -it "$VAULT_CTR" vault operator unseal
# Check Raft peers
docker exec "$VAULT_CTR" vault operator raft list-peers
``` ```
All 3 Vault containers start. Only the first one to initialize becomes the leader.
### Step 2 — Initialize Vault on the leader (iklim-app-01)
```bash
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
docker exec -it "$VAULT_CTR" vault operator init
```
Save the unseal keys and root token securely. Store the unseal key as a Docker secret:
```bash
echo -n "<unseal-key>" | docker secret create vault_unseal_key -
```
### Step 3 — Unseal the leader
```bash
docker exec -it "$VAULT_CTR" vault operator unseal
```
The healthcheck auto-unseals on subsequent restarts via the `vault_unseal_key` secret.
### Step 4 — Join remaining nodes to the Raft cluster
On iklim-app-02 and iklim-app-03 containers: On iklim-app-02 and iklim-app-03 containers:
```bash ```bash
docker exec -it <vault-on-iklim-app-02> vault operator raft join \ docker exec -it <vault-on-iklim-app-02> vault operator raft join \
https://vault.iklim.co:8200 https://vault.iklim.co:8200
docker exec -it <vault-on-iklim-app-03> vault operator raft join \
https://vault.iklim.co:8200
``` ```
### cert-reloader update for Raft Unseal each node after joining:
Update the cert-reloader command in `docker-stack-infra.yml` to SSH-copy the cert
to iklim-app-02 and iklim-app-03 after renewal:
```bash ```bash
# After copying to local /opt/iklimco/ssl/: docker exec -it <vault-on-iklim-app-02> vault operator unseal
ssh -i /run/secrets/cert_reloader_ssh_key iklim-app-02 \ docker exec -it <vault-on-iklim-app-03> vault operator unseal
"cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
# (repeat for iklim-app-03 and privkey)
docker service update --force iklimco_vault
``` ```
Add Docker secret to cert-reloader: ### Step 5 — Verify cluster
```yaml
secrets: ```bash
- cert_reloader_ssh_key docker exec "$VAULT_CTR" vault operator raft list-peers
```
Expected: 3 peers, one `leader`, two `follower`.
## cert-reloader — no additional changes needed for Raft
cert-reloader writes the cert to `SWAG_CERT_DIR=/mnt/storagebox/ssl`.
Since StorageBox is mounted on all app nodes, every Vault instance already sees the same path.
The cert renewal flow works unchanged with Raft:
```
cert changed → copy to /mnt/storagebox/ssl/ → docker service update --force iklimco_vault
Vault (3 replicas) restart → each auto-unseals via healthcheck
``` ```
## Reference ## Reference

View File

@ -14,13 +14,13 @@
```yaml ```yaml
# DELETE from "Initialize Servers" step: # DELETE from "Initialize Servers" step:
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co.full.crt ./STAR.iklim.co.full.crt scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co.full.crt ./STAR.iklim.co.full.crt
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co_key.txt ./STAR.iklim.co_key.txt scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co_key.pem ./STAR.iklim.co_key.pem
``` ```
Also remove from `Prepare Init Files`: Also remove from `Prepare Init Files`:
```yaml ```yaml
# DELETE or make conditional: # DELETE or make conditional:
sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.txt /opt/iklimco/ssl/ sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.pem /opt/iklimco/ssl/
``` ```
## Step 2 — Add `Prepare SWAG Directories` step ## Step 2 — Add `Prepare SWAG Directories` step
@ -32,27 +32,26 @@ Insert **before** `Bootstrap Vault TLS Placeholder`:
run: | run: |
set -a; . ./.env; . ./.env.secrets.swag; set +a set -a; . ./.env; . ./.env.secrets.swag; set +a
docker run --rm -v /opt/iklimco/swag:/output alpine \ mkdir -p "$SWAG_CONFIG_DIR" "$SWAG_DNS_CONF_DIR" "$SWAG_SITE_CONFS_DIR"
mkdir -p /output/dns-conf /output/proxy-confs /output/site-confs
envsubst < swag/dns-conf/godaddy.ini.tpl | docker run --rm -i \ envsubst < swag/dns-conf/godaddy.ini.tpl | docker run --rm -i \
-v /opt/iklimco/swag/dns-conf:/output \ -v "${SWAG_DNS_CONF_DIR}:/output" \
alpine sh -c "cat > /output/godaddy.ini && chmod 600 /output/godaddy.ini" alpine sh -c "cat > /output/godaddy.ini && chmod 600 /output/godaddy.ini"
echo "✅ godaddy.ini written" echo "✅ godaddy.ini written"
export RESTRICTED_IPS_BLOCK="$(echo "$RESTRICTED_IPS" | tr ',' '\n' | sed 's|.*| allow &;|')" export RESTRICTED_IPS_BLOCK="$(echo "$RESTRICTED_IPS" | tr ',' '\n' | sed 's|.*| allow &;|')"
SWAG_VARS='${API_SUBDOMAIN}${APIGW_SUBDOMAIN}${GRAFANA_SUBDOMAIN}${RABBITMQ_SUBDOMAIN}${RESTRICTED_IPS_BLOCK}' SWAG_VARS='${API_SUBDOMAIN}${APIGW_SUBDOMAIN}${GRAFANA_SUBDOMAIN}${RABBITMQ_SUBDOMAIN}${RESTRICTED_IPS_BLOCK}'
for tpl in swag/proxy-confs/*.conf.tpl; do for tpl in swag/site-confs/*.conf.tpl; do
fname=$(basename "${tpl%.tpl}") fname=$(basename "${tpl%.tpl}")
envsubst "$SWAG_VARS" < "$tpl" | docker run --rm -i \ envsubst "$SWAG_VARS" < "$tpl" | docker run --rm -i \
-v /opt/iklimco/swag/site-confs:/output \ -v "${SWAG_SITE_CONFS_DIR}:/output" \
alpine sh -c "cat > /output/${fname}" alpine sh -c "cat > /output/${fname}"
echo "✅ ${fname}" echo "✅ ${fname}"
done done
cat swag/site-confs/default.conf | docker run --rm -i \ cat swag/site-confs/default.conf | docker run --rm -i \
-v /opt/iklimco/swag/site-confs:/output \ -v "${SWAG_SITE_CONFS_DIR}:/output" \
alpine sh -c "cat > /output/default.conf" alpine sh -c "cat > /output/default.conf"
echo "✅ SWAG directories ready" echo "✅ SWAG directories ready"
@ -89,6 +88,8 @@ APISIX reads its entire configuration from etcd; init script will fail silently
done done
``` ```
> **Note:** In prod, the standalone `etcd` service from `docker-stack-infra.yml` still runs (Docker Compose overlay files cannot remove services). APISIX currently uses this etcd; the Patroni etcd migration happens via `docker-stack-infra.prod.yml`. The `http://etcd:2379/health` check targets this standalone service and is correct for the current setup.
## Step 4 — Add `Run APISIX Init` step ## Step 4 — Add `Run APISIX Init` step
Insert **after** `Wait for etcd` and **before** `Bootstrap SWAG Certificate`. Insert **after** `Wait for etcd` and **before** `Bootstrap SWAG Certificate`.
@ -112,7 +113,7 @@ Insert **after** `Wait for etcd` and **before** `Bootstrap SWAG Certificate`.
> **Prod-specific:** `SPRING_PROFILES_ACTIVE=prod` — test pipeline uses `test`. > **Prod-specific:** `SPRING_PROFILES_ACTIVE=prod` — test pipeline uses `test`.
> `APISIX_ADMIN_KEY` is sourced from `.env.secrets.shared`. > `APISIX_ADMIN_KEY` is sourced from `.env.secrets.shared`.
> The init script is idempotent (PUT semantics); safe to re-run on subsequent deploys. > The init script is idempotent (PUT semantics); safe to re-run on subsequent deploys.
> With `replicas: 2` in prod, both APISIX instances read the same etcd state — no per-replica init needed. > With `replicas: 3` in prod, all APISIX instances read the same etcd state — no per-replica init needed.
## Step 5 — Add `Bootstrap SWAG Certificate` step ## Step 5 — Add `Bootstrap SWAG Certificate` step
@ -121,6 +122,7 @@ Insert **after** `Run APISIX Init`:
```yaml ```yaml
- name: Bootstrap SWAG Certificate - name: Bootstrap SWAG Certificate
run: | run: |
set -a; . ./.env; set +a
echo "Waiting for SWAG container to start..." echo "Waiting for SWAG container to start..."
SWAG_CTR="" SWAG_CTR=""
for i in $(seq 1 24); do for i in $(seq 1 24); do
@ -152,12 +154,12 @@ Insert **after** `Run APISIX Init`:
fi fi
docker exec "$SWAG_CTR" cat "$CERT_PATH" | \ docker exec "$SWAG_CTR" cat "$CERT_PATH" | \
docker run --rm -i -v /opt/iklimco/ssl:/output alpine \ docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
sh -c "cat > /output/STAR.iklim.co.full.crt && chmod 644 /output/STAR.iklim.co.full.crt" sh -c "cat > /output/STAR.iklim.co.full.crt && chmod 644 /output/STAR.iklim.co.full.crt"
docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \ docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \
docker run --rm -i -v /opt/iklimco/ssl:/output alpine \ docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
sh -c "cat > /output/STAR.iklim.co_key.txt && chmod 644 /output/STAR.iklim.co_key.txt" sh -c "cat > /output/STAR.iklim.co_key.pem && chmod 644 /output/STAR.iklim.co_key.pem"
echo "✅ Cert bootstrapped to /opt/iklimco/ssl/" echo "✅ Cert bootstrapped to ${SWAG_CERT_DIR}/"
working-directory: /workspace/iklim.co working-directory: /workspace/iklim.co
``` ```
@ -201,7 +203,7 @@ Insert **after** `Bootstrap SWAG Certificate` and **before** `Review Environment
working-directory: /workspace/iklim.co working-directory: /workspace/iklim.co
``` ```
> **Prod-specific:** DB hostnames are `iklimco_postgresql` ve `iklimco_mongodb` (Swarm VIP service names). > **Prod-specific:** DB hostnames are `iklimco_postgresql` and `iklimco_mongodb` (Swarm VIP service names).
> Test pipeline uses `postgresql` / `mongodb` (unqualified aliases within the same stack). > Test pipeline uses `postgresql` / `mongodb` (unqualified aliases within the same stack).
> SQL and JS files are generated by `Prepare Init Files` step via `init_postgresql` / `init_mongodb` functions in `common-functions.sh`. > SQL and JS files are generated by `Prepare Init Files` step via `init_postgresql` / `init_mongodb` functions in `common-functions.sh`.
> Step is idempotent — scripts use `CREATE IF NOT EXISTS` / `createCollection` semantics. > Step is idempotent — scripts use `CREATE IF NOT EXISTS` / `createCollection` semantics.

View File

@ -109,7 +109,7 @@ All tasks should show node names matching `iklim-db-01`, `iklim-db-02`, or `ikli
```bash ```bash
docker service ps iklimco_apisix docker service ps iklimco_apisix
``` ```
Expected: 2 tasks, both `Running`, on different nodes. Expected: 3 tasks, all `Running`, on different nodes.
## 10 — fail2ban active ## 10 — fail2ban active

View File

@ -47,7 +47,6 @@ Add after the `apisix-dashboard` service block:
volumes: volumes:
- swag-vl:/config - swag-vl:/config
- /opt/iklimco/swag/dns-conf:/config/dns-conf:ro - /opt/iklimco/swag/dns-conf:/config/dns-conf:ro
- /opt/iklimco/swag/proxy-confs:/config/nginx/proxy-confs:ro
- /opt/iklimco/swag/site-confs:/config/nginx/site-confs:ro - /opt/iklimco/swag/site-confs:/config/nginx/site-confs:ro
ports: ports:
- target: 80 - target: 80
@ -90,18 +89,18 @@ Add after the `swag` service block:
LAST_HASH="" LAST_HASH=""
echo "[cert-reloader] started" echo "[cert-reloader] started"
while true; do while true; do
sleep 3600
if [ -f "$$CERT_DIR/fullchain.pem" ]; then if [ -f "$$CERT_DIR/fullchain.pem" ]; then
CURR=$$(md5sum "$$CERT_DIR/fullchain.pem" | cut -d' ' -f1) CURR=$$(md5sum "$$CERT_DIR/fullchain.pem" | cut -d' ' -f1)
if [ "$$CURR" != "$$LAST_HASH" ]; then if [ "$$CURR" != "$$LAST_HASH" ]; then
echo "[cert-reloader] cert changed — copying and reloading Vault" echo "[cert-reloader] cert changed — copying and reloading Vault"
cp "$$CERT_DIR/fullchain.pem" "$$HOST_DIR/STAR.iklim.co.full.crt" cp "$$CERT_DIR/fullchain.pem" "$$HOST_DIR/STAR.iklim.co.full.crt"
cp "$$CERT_DIR/privkey.pem" "$$HOST_DIR/STAR.iklim.co_key.txt" cp "$$CERT_DIR/privkey.pem" "$$HOST_DIR/STAR.iklim.co_key.pem"
docker service update --force iklimco_vault docker service update --force iklimco_vault
LAST_HASH="$$CURR" LAST_HASH="$$CURR"
echo "[cert-reloader] done" echo "[cert-reloader] done"
fi fi
fi fi
sleep 3600
done done
deploy: deploy:
mode: replicated mode: replicated

View File

@ -1,9 +1,7 @@
# 04 — SWAG Nginx Proxy Configs (Test) # 04 — SWAG Nginx Proxy Configs (Test)
## Context ## Context
SWAG reads nginx configs from bind-mounted directories: SWAG nginx auto-includes only `site-confs/*.conf`. All proxy config templates live in `swag/site-confs/` in the repo and are rendered to `/opt/iklimco/swag/site-confs/` on the host at deploy time.
- `/config/nginx/proxy-confs/``swag/proxy-confs/` in repo, deployed to `/opt/iklimco/swag/proxy-confs/`
- `/config/nginx/site-confs/``swag/site-confs/` in repo, deployed to `/opt/iklimco/swag/site-confs/`
Templates use `${VAR}` placeholders processed with `envsubst` at deploy time. Templates use `${VAR}` placeholders processed with `envsubst` at deploy time.
@ -40,7 +38,7 @@ server {
} }
``` ```
### `swag/proxy-confs/api.conf.tpl` ### `swag/site-confs/api.conf.tpl`
Public API gateway — no IP restriction. Public API gateway — no IP restriction.
```nginx ```nginx
@ -65,7 +63,7 @@ server {
} }
``` ```
### `swag/proxy-confs/apigw.conf.tpl` ### `swag/site-confs/apigw.conf.tpl`
APISIX Dashboard — IP restricted. APISIX Dashboard — IP restricted.
```nginx ```nginx
@ -94,7 +92,7 @@ server {
} }
``` ```
### `swag/proxy-confs/rabbitmq.conf.tpl` ### `swag/site-confs/rabbitmq.conf.tpl`
RabbitMQ Management UI — IP restricted. RabbitMQ Management UI — IP restricted.
```nginx ```nginx
@ -123,7 +121,7 @@ server {
} }
``` ```
### `swag/proxy-confs/grafana.conf.tpl` ### `swag/site-confs/grafana.conf.tpl`
Grafana — IP restricted. Grafana — IP restricted.
```nginx ```nginx
@ -156,14 +154,14 @@ server {
```bash ```bash
# Process templates and write to host # Process templates and write to host
mkdir -p /opt/iklimco/swag/proxy-confs /opt/iklimco/swag/site-confs mkdir -p /opt/iklimco/swag/site-confs
set -a; . ./.env; set +a set -a; . ./.env; set +a
export RESTRICTED_IP_1="78.187.87.109" export RESTRICTED_IP_1="78.187.87.109"
export RESTRICTED_IP_2="95.70.151.248" export RESTRICTED_IP_2="95.70.151.248"
for tpl in swag/proxy-confs/*.conf.tpl; do for tpl in swag/site-confs/*.conf.tpl; do
out="/opt/iklimco/swag/proxy-confs/$(basename "${tpl%.tpl}")" out="/opt/iklimco/swag/site-confs/$(basename "${tpl%.tpl}")"
envsubst < "$tpl" > "$out" envsubst < "$tpl" > "$out"
echo "✅ $out" echo "✅ $out"
done done

View File

@ -13,10 +13,10 @@ Locate and **delete** this entire block:
```bash ```bash
# DELETE THIS BLOCK: # DELETE THIS BLOCK:
if [[ "$PROFILE" == "test" || "$PROFILE" == "prod" ]]; then if [[ "$PROFILE" == "test" || "$PROFILE" == "prod" ]]; then
if [[ -f "STAR.iklim.co.full.crt" && -f "STAR.iklim.co_key.txt" ]]; then if [[ -f "STAR.iklim.co.full.crt" && -f "STAR.iklim.co_key.pem" ]]; then
call_api "ssl iklim.co" -X PUT "$APISIX_ADMIN_URL/ssls/1" \ call_api "ssl iklim.co" -X PUT "$APISIX_ADMIN_URL/ssls/1" \
-H "X-API-KEY: $API_KEY" -H "Content-Type: application/json" \ -H "X-API-KEY: $API_KEY" -H "Content-Type: application/json" \
-d '{"cert":"'"$(cat STAR.iklim.co.full.crt)"'","key":"'"$(cat STAR.iklim.co_key.txt)"'","snis":["*.iklim.co"]}' -d '{"cert":"'"$(cat STAR.iklim.co.full.crt)"'","key":"'"$(cat STAR.iklim.co_key.pem)"'","snis":["*.iklim.co"]}'
else else
echo "iklim.co ssl certificates not found!" echo "iklim.co ssl certificates not found!"
fi fi

View File

@ -56,7 +56,7 @@ CERT="$SWAG_VOL/etc/letsencrypt/live/iklim.co/fullchain.pem"
if [ -f "$CERT" ]; then if [ -f "$CERT" ]; then
cp "$CERT" /opt/iklimco/ssl/STAR.iklim.co.full.crt cp "$CERT" /opt/iklimco/ssl/STAR.iklim.co.full.crt
KEYF="$SWAG_VOL/etc/letsencrypt/live/iklim.co/privkey.pem" KEYF="$SWAG_VOL/etc/letsencrypt/live/iklim.co/privkey.pem"
cp "$KEYF" /opt/iklimco/ssl/STAR.iklim.co_key.txt cp "$KEYF" /opt/iklimco/ssl/STAR.iklim.co_key.pem
docker service update --force iklimco_vault docker service update --force iklimco_vault
echo "✅ Manual reload triggered" echo "✅ Manual reload triggered"
else else

View File

@ -4,7 +4,7 @@
- **File:** `.gitea/workflows/deploy-test.yml` - **File:** `.gitea/workflows/deploy-test.yml`
- Changes: - Changes:
1. Remove manual `scp STAR.iklim.co.full.crt` steps (SWAG now owns cert lifecycle). 1. Remove manual `scp STAR.iklim.co.full.crt` steps (SWAG now owns cert lifecycle).
2. Add SWAG host directories preparation (dns-conf, nginx proxy-confs). 2. Add SWAG host directories preparation (dns-conf, nginx site-confs).
3. Add cert bootstrap step: on first deploy, wait for SWAG to obtain cert, then copy 3. Add cert bootstrap step: on first deploy, wait for SWAG to obtain cert, then copy
to `/opt/iklimco/ssl/` so Vault can start. to `/opt/iklimco/ssl/` so Vault can start.
4. Ensure `GODADDY_KEY` and `GODADDY_SECRET` are available from `.env.secrets.swag`. 4. Ensure `GODADDY_KEY` and `GODADDY_SECRET` are available from `.env.secrets.swag`.
@ -16,15 +16,15 @@
```yaml ```yaml
# DELETE these two lines from the "Initialize Servers" step: # DELETE these two lines from the "Initialize Servers" step:
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:test/app/iklim.co/ssl/STAR.iklim.co.full.crt ./STAR.iklim.co.full.crt scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:test/app/iklim.co/ssl/STAR.iklim.co.full.crt ./STAR.iklim.co.full.crt
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:test/app/iklim.co/ssl/STAR.iklim.co_key.txt ./STAR.iklim.co_key.txt scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:test/app/iklim.co/ssl/STAR.iklim.co_key.pem ./STAR.iklim.co_key.pem
``` ```
Also remove any references to `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.txt` in Also remove any references to `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.pem` in
the `Prepare Init Files` step's `sudo cp` commands: the `Prepare Init Files` step's `sudo cp` commands:
```yaml ```yaml
# DELETE or make conditional: # DELETE or make conditional:
sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.txt /opt/iklimco/ssl/ 2>/dev/null || true sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.pem /opt/iklimco/ssl/ 2>/dev/null || true
``` ```
## Step 2 — Add `Prepare SWAG Directories` step ## Step 2 — Add `Prepare SWAG Directories` step
@ -42,14 +42,14 @@ Insert this step **before** `Deploy Swarm Stack`:
sudo chmod 600 /opt/iklimco/swag/dns-conf/godaddy.ini sudo chmod 600 /opt/iklimco/swag/dns-conf/godaddy.ini
echo "✅ godaddy.ini written" echo "✅ godaddy.ini written"
# Nginx proxy conf files # Nginx site conf files
sudo mkdir -p /opt/iklimco/swag/proxy-confs /opt/iklimco/swag/site-confs sudo mkdir -p /opt/iklimco/swag/site-confs
export RESTRICTED_IP_1="78.187.87.109" export RESTRICTED_IP_1="78.187.87.109"
export RESTRICTED_IP_2="95.70.151.248" export RESTRICTED_IP_2="95.70.151.248"
for tpl in swag/proxy-confs/*.conf.tpl; do for tpl in swag/site-confs/*.conf.tpl; do
out="/opt/iklimco/swag/proxy-confs/$(basename "${tpl%.tpl}")" out="/opt/iklimco/swag/site-confs/$(basename "${tpl%.tpl}")"
envsubst < "$tpl" | sudo tee "$out" > /dev/null envsubst < "$tpl" | sudo tee "$out" > /dev/null
echo "✅ $out" echo "✅ $out"
done done
@ -105,7 +105,7 @@ Vault being accessible (e.g., `Provision Vault AppRole IDs`):
docker exec "$SWAG_CTR" cat "$CERT_PATH" | \ docker exec "$SWAG_CTR" cat "$CERT_PATH" | \
sudo tee /opt/iklimco/ssl/STAR.iklim.co.full.crt > /dev/null sudo tee /opt/iklimco/ssl/STAR.iklim.co.full.crt > /dev/null
docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \ docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \
sudo tee /opt/iklimco/ssl/STAR.iklim.co_key.txt > /dev/null sudo tee /opt/iklimco/ssl/STAR.iklim.co_key.pem > /dev/null
echo "✅ Cert bootstrapped to /opt/iklimco/ssl/" echo "✅ Cert bootstrapped to /opt/iklimco/ssl/"
working-directory: /workspace/iklim.co working-directory: /workspace/iklim.co
``` ```

View File

@ -119,7 +119,7 @@ Expected: `[cert-reloader] started` — no errors.
VAULT_CTR=$(docker ps -q -f name=iklimco_vault) VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
docker exec "$VAULT_CTR" ls /vault/certs/ docker exec "$VAULT_CTR" ls /vault/certs/
``` ```
Expected: `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.txt`. Expected: `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.pem`.
## 10 — fail2ban is active (SWAG) ## 10 — fail2ban is active (SWAG)

View File

@ -21,7 +21,7 @@ Terraform/Ansible setup aşamalarından hangisinde ele alındığını gösterir
| `act_runner` systemd kurulumu | **Ansible `05-test-runner-ve-deploy-onkosullari.md`**`act_runner` role (`test-app-post-stack.yml`) | | `act_runner` systemd kurulumu | **Ansible `05-test-runner-ve-deploy-onkosullari.md`**`act_runner` role (`test-app-post-stack.yml`) |
| GoDaddy credentials storagebox'a yükleme | **Manuel kalır** — secret yönetimi, Terraform/Ansible dışı | | GoDaddy credentials storagebox'a yükleme | **Manuel kalır** — secret yönetimi, Terraform/Ansible dışı |
| `docker-stack-infra.yml` port kaldırma + SWAG/cert-reloader ekleme | **Pipeline `deploy-test.yml`** + **repo değişikliği**`roadmap/test-env/03` | | `docker-stack-infra.yml` port kaldırma + SWAG/cert-reloader ekleme | **Pipeline `deploy-test.yml`** + **repo değişikliği**`roadmap/test-env/03` |
| SWAG nginx proxy conf'ları (`swag/proxy-confs/*.conf.tpl`) | **Repo içinde teslim edildi**`roadmap/test-env/04` | | SWAG nginx proxy conf'ları (`swag/site-confs/*.conf.tpl`) | **Repo içinde teslim edildi**`roadmap/test-env/04` |
| APISIX SSL cert yükleme bloğu kaldırma (`init/apisix-core/init.sh`) | **Repo değişikliği**`roadmap/test-env/05` | | APISIX SSL cert yükleme bloğu kaldırma (`init/apisix-core/init.sh`) | **Repo değişikliği**`roadmap/test-env/05` |
| cert-reloader sidecar servisi | **`docker-stack-infra.yml`'e eklendi** — `roadmap/test-env/06` | | cert-reloader sidecar servisi | **`docker-stack-infra.yml`'e eklendi** — `roadmap/test-env/06` |
| Pipeline güncelleme: Prepare SWAG Dirs + Bootstrap SWAG Cert + Run DB Init | **`deploy-test.yml`** — `roadmap/test-env/07` | | Pipeline güncelleme: Prepare SWAG Dirs + Bootstrap SWAG Cert + Run DB Init | **`deploy-test.yml`** — `roadmap/test-env/07` |
@ -49,7 +49,7 @@ Terraform/Ansible setup aşamalarından hangisinde ele alındığını gösterir
| 3× `act_runner` systemd (HA runner) | **Ansible `09-prod-runner-ha-ve-swarm.md`**`act_runner` role | | 3× `act_runner` systemd (HA runner) | **Ansible `09-prod-runner-ha-ve-swarm.md`**`act_runner` role |
| GoDaddy credentials storagebox'a yükleme | **Manuel kalır** — secret yönetimi, Terraform/Ansible dışı | | GoDaddy credentials storagebox'a yükleme | **Manuel kalır** — secret yönetimi, Terraform/Ansible dışı |
| `docker-stack-infra.yml` port kaldırma + SWAG/cert-reloader ekleme | **Repo değişikliği**`roadmap/prod-env/03` | | `docker-stack-infra.yml` port kaldırma + SWAG/cert-reloader ekleme | **Repo değişikliği**`roadmap/prod-env/03` |
| SWAG nginx proxy conf'ları (`swag/proxy-confs/*.conf.tpl`) | **Repo içinde teslim edildi**`roadmap/prod-env/04` | | SWAG nginx proxy conf'ları (`swag/site-confs/*.conf.tpl`) | **Repo içinde teslim edildi**`roadmap/prod-env/04` |
| APISIX SSL cert yükleme bloğu kaldırma (`init/apisix-core/init.sh`) | **Repo değişikliği**`roadmap/prod-env/05` | | APISIX SSL cert yükleme bloğu kaldırma (`init/apisix-core/init.sh`) | **Repo değişikliği**`roadmap/prod-env/05` |
| cert-reloader sidecar servisi | **`docker-stack-infra.yml`'e eklendi** — `roadmap/prod-env/06` | | cert-reloader sidecar servisi | **`docker-stack-infra.yml`'e eklendi** — `roadmap/prod-env/06` |
| Vault Raft Cluster geçiş planı | **Manuel / İleri Faz**`roadmap/prod-env/07` | | Vault Raft Cluster geçiş planı | **Manuel / İleri Faz**`roadmap/prod-env/07` |