docs(roadmap): update various roadmap docs to align with latest infrastructure setup

- Synchronized swarm initialization, pipeline update, and certificate reloader instructions with the new monolithic stack logic and Ansible roles.
This commit is contained in:
Murat ÖZDEMİR 2026-06-15 16:48:04 +03:00
parent 67dc2986dd
commit 67f4c10c93
10 changed files with 285 additions and 531 deletions

View File

@ -37,12 +37,17 @@ node.labels.type == service ← custom label (app node workload target)
node.labels.role == db ← custom label (DB node workload target)
```
This scheme is applied consistently across `docker-stack-infra.yml` and all 10 microservice `docker-stack-service.yml` files. The test environment uses the same `type=service` label on its single node, so both environments share the same constraint syntax.
This scheme is applied consistently across the current prod stack (`docker-stack-infra_db-prod.yml`), the separate Vault stack (`docker-stack-vault.yml`), and microservice stack definitions. The test environment uses the same `type=service` label on its service node, so both environments share the same constraint syntax.
`node.role == worker` is intentionally not used anywhere. DB nodes are Swarm workers, but targeting them via `node.role == worker` would also match any future worker-only app nodes. The explicit `node.labels.role == db` label provides precise, unambiguous targeting regardless of Swarm role.
## Otomasyon Notu
**ÖNEMLİ:** Aşağıda listelenen tüm Swarm ilklendirme, join token işlemleri ve node etiketleme (labeling) süreçleri artık manuel yapılmamaktadır. Bu işlemler `Environment_Infrastructure/ansible/prod/prod-bootstrap.yml` ve ortak `swarm` rolü tarafından **tamamen otomatik** olarak yürütülmektedir. Buradaki manuel bash komutları yalnızca referans, bilgi ve sorun giderme (troubleshooting) amaçlı tutulmaktadır.
**ÖNEMLİ:** Aşağıda listelenen tüm Swarm ilklendirme, join token işlemleri ve node etiketleme süreçleri artık manuel yapılmamaktadır. Bu işlemler `Environment_Infrastructure/ansible/prod/prod-bootstrap.yml` ve ortak `swarm` rolü tarafından otomatik olarak yürütülmektedir. Buradaki manuel bash komutları yalnızca referans, bilgi ve sorun giderme amaçlı tutulmaktadır.
Labeling iki aşamalıdır:
- Ortak `swarm` rolü app node'lara `type=service`, DB node'lara `role=db` etiketini ekler.
- Prod playbook'u `iklim-app-01` üzerinden DB node'lara `db-index=01/02/03` etiketini ekler.
## Step 1 — Init Swarm on iklim-app-01 (the prod-runner node)
@ -98,9 +103,13 @@ docker swarm join --token <WORKER_TOKEN> 10.20.10.11:2377
Then label them on iklim-app-01:
```bash
docker node update --label-add role=db --label-add db-index=01 iklim-db-01
docker node update --label-add role=db --label-add db-index=02 iklim-db-02
docker node update --label-add role=db --label-add db-index=03 iklim-db-03
docker node update --label-add role=db iklim-db-01
docker node update --label-add role=db iklim-db-02
docker node update --label-add role=db iklim-db-03
docker node update --label-add db-index=01 iklim-db-01
docker node update --label-add db-index=02 iklim-db-02
docker node update --label-add db-index=03 iklim-db-03
```
> DB nodes are Swarm **workers** only — they never become managers.
@ -130,15 +139,19 @@ The script is idempotent (skips init if already active). Verify:
grep -n "swarm init\|swarm join" init/swarm-init.sh
```
The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (`swarm` role),
not via the Gitea pipeline.
The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (`swarm` role), not via the Gitea pipeline.
## Placement constraints used in `docker-stack-infra.yml`
## Placement Constraints Used in Current Prod Stacks
| Constraint | Resolves to | Services |
|------------|-------------|----------|
| `node.hostname == iklim-app-01` | iklim-app-01 only | SWAG, cert-reloader |
| `node.labels.type == service` | iklim-app-01, iklim-app-02, iklim-app-03 | Vault, Redis, RabbitMQ, APISIX, Prometheus, Grafana, etcd (idle in prod — APISIX uses Patroni etcd) |
| `node.labels.role == db` | iklim-db-01, iklim-db-02, iklim-db-03 | PostgreSQL (Patroni), MongoDB, etcd (via `docker-stack-db.prod.yml`) |
| `node.labels.type == service` | iklim-app-01, iklim-app-02, iklim-app-03 | Vault, Redis, RabbitMQ, APISIX, Prometheus, Grafana, SWAG support services |
| `node.hostname == iklim-db-01/02/03` | specific DB node | Patroni, MongoDB, and etcd services pinned per node in `docker-stack-infra_db-prod.yml` |
| `node.labels.role == db` | iklim-db-01, iklim-db-02, iklim-db-03 | Generic DB node identity; retained for operations and compatibility |
SWAG and cert-reloader are pinned to `iklim-app-01` (the Floating IP node) because SWAG does not support clustering and must match the public entry point. Vault floats across all service nodes; its TLS cert is read from StorageBox (`/mnt/storagebox/ssl`) so it is available on whichever node Vault is scheduled on. Microservices carry no placement constraint and are distributed by the Swarm scheduler across all app nodes. DB services are pinned to DB nodes via separate DB stacks.
SWAG and cert-reloader are pinned to `iklim-app-01` (the Floating IP node) because SWAG must match the public entry point. Vault is deployed by `docker-stack-vault.yml` across service nodes and reads certificates from `/opt/iklimco/ssl`. Microservices are distributed by the Swarm scheduler across app nodes. DB services are defined in `docker-stack-infra_db-prod.yml` and pinned to DB nodes by hostname constraints.
## Historical / Superseded by Setup
Older notes that referred to `docker-stack-infra.yml`, `docker-stack-infra.prod.yml`, or `docker-stack-db.prod.yml` as the active prod deployment model are superseded by `../../setup/08-prod-db-cluster-setup.md` and `../../setup/09-prod-runner-ha-and-swarm.md`.

View File

@ -1,7 +1,7 @@
# 02 — GoDaddy DNS Credentials for SWAG (Prod)
## Context
Identical to test-env-setup/02, except the storagebox path is `prod/` instead of `test/`.
Same credential model as `../test-env/02-godaddy-credentials.md`, except the StorageBox path is `prod/` instead of `test/`.
## ⚠️ Security — Rotate credentials before use
@ -30,24 +30,22 @@ GODADDY_SECRET=<your-new-api-secret>
## Step 2 — Repo template file
Same file as test: `template/swag/dns-conf/godaddy.ini.tpl` (already created in test step 02).
No additional action needed in the repo.
Same file as test: `template/swag/dns-conf/godaddy.ini.tpl` (already created in test step 02). No additional action needed in the repo.
## Step 3 — (Handled by pipeline) Write credentials file on prod host
## Step 3 — (Handled by pipeline) Write credentials file on prod StorageBox path
The deploy pipeline (see `08-deploy-pipeline-update.md`) runs on iklim-app-01:
```bash
set -a; . ./.env; set +a
mkdir -p "$SWAG_CONFIG_DIR/dns-conf"
envsubst < template/swag/dns-conf/godaddy.ini.tpl > "$SWAG_CONFIG_DIR/dns-conf/godaddy.ini"
chmod 600 "$SWAG_CONFIG_DIR/dns-conf/godaddy.ini"
mkdir -p "$SWAG_DNS_CONFIG_DIR"
envsubst < template/swag/dns-conf/godaddy.ini.tpl > "$SWAG_DNS_CONFIG_DIR/godaddy.ini"
chmod 600 "$SWAG_DNS_CONFIG_DIR/godaddy.ini"
```
## Step 4 — GoDaddy A records for prod subdomains (handled by pipeline)
The deploy pipeline's **Update DNS Records** step automatically manages A records via GoDaddy API.
It reads the Floating IP from the Gitea variable `vars.PROD_FLOATING_IP` — set this once in Gitea project settings.
The deploy pipeline's **Update DNS Records** step automatically manages A records via GoDaddy API. It reads the Floating IP from the Gitea variable `vars.PROD_FLOATING_IP` — set this once in Gitea project settings.
To get the Floating IP: `terraform output prod_floating_ip`
@ -64,6 +62,5 @@ Logic: for each record, pipeline queries the current value via GoDaddy API. If a
> If failover is needed, the Floating IP can be reassigned to another app node; DNS does not change.
## Notes
- Test and prod SWAG instances both obtain `*.iklim.co` independently from Let's Encrypt.
There is no conflict — they use the same domain, different servers.
- Test and prod SWAG instances both obtain `*.iklim.co` independently from Let's Encrypt. There is no conflict — they use the same domain, different servers.
- `DNSPROPAGATION=90` handles GoDaddy's typical 30-90s propagation delay.

View File

@ -1,10 +1,12 @@
# 04 — SWAG Nginx Proxy Configs (Prod)
## Context
Same template files as test (`template/swag/site-confs/*.conf.tpl`), different env vars.
The pipeline processes templates with prod-specific subdomain values.
## Required env vars (in `.env` on storagebox `prod/secrets/iklim.co/.env.prod`)
Production uses the same SWAG template files as test, with production subdomain values and StorageBox-backed output directories. The current setup source is `../../setup/09-prod-runner-ha-and-swarm.md`.
## Required Environment Variables
The production env file is `prod/secrets/iklim.co/.env` on StorageBox.
```bash
API_SUBDOMAIN=api.iklim.co
@ -13,65 +15,47 @@ RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co
GRAFANA_SUBDOMAIN=grafana.iklim.co
RESTRICTED_IPS="78.187.87.109/32,95.70.151.248/32"
# SWAG storage paths — StorageBox is mounted on all app nodes, shared filesystem
# cert-reloader writes here; Vault reads from this path on every node — no SSH distribution needed
SWAG_CERT_DIR=/mnt/storagebox/ssl
# SWAG config dirs on StorageBox — all three survive node failover without pipeline re-run
SWAG_CONFIG_DIR=/mnt/storagebox/swag/config
SWAG_DNS_CONFIG_DIR=/mnt/storagebox/swag/dns-conf
SWAG_SITE_CONFS_DIR=/mnt/storagebox/swag/site-confs
SWAG_PROXY_CONFS_DIR=/mnt/storagebox/swag/proxy-confs
```
## Template files (already created in test step 04)
## Template Files
The shared templates live under root `template/swag/`:
- `template/swag/dns-conf/godaddy.ini.tpl`
- `template/swag/site-confs/default.conf`
- `template/swag/site-confs/api.conf.tpl`
- `template/swag/site-confs/apigw.conf.tpl`
- `template/swag/site-confs/rabbitmq.conf.tpl`
- `template/swag/site-confs/grafana.conf.tpl`
No new files to create — the same templates work for both environments.
## Deploy Behavior
## Deploy step (handled by pipeline — see `08-deploy-pipeline-update.md`)
The production workflow renders:
```bash
set -a; . ./.env; set +a
export RESTRICTED_IPS_BLOCK="$(echo "$RESTRICTED_IPS" | tr ',' '\n' | sed 's|.*| allow &;|')"
- GoDaddy DNS credentials into `$SWAG_DNS_CONFIG_DIR/godaddy.ini`.
- SWAG site configs into `$SWAG_SITE_CONFS_DIR`.
- Optional proxy configs into `$SWAG_PROXY_CONFS_DIR` when templates exist.
mkdir -p "$SWAG_SITE_CONFS_DIR"
SWAG_VARS='${API_SUBDOMAIN}${APIGW_SUBDOMAIN}${GRAFANA_SUBDOMAIN}${RABBITMQ_SUBDOMAIN}${RESTRICTED_IPS_BLOCK}'
for tpl in template/swag/site-confs/*.conf.tpl; do
out="$SWAG_SITE_CONFS_DIR/$(basename "${tpl%.tpl}")"
envsubst "$SWAG_VARS" < "$tpl" | sudo tee "$out" > /dev/null
echo "✅ $out"
done
sudo cp template/swag/site-confs/default.conf "$SWAG_SITE_CONFS_DIR/default.conf"
```
With `API_SUBDOMAIN=api.iklim.co`, the output file `$SWAG_SITE_CONFS_DIR/api.conf`
(`/mnt/storagebox/swag/site-confs/api.conf`) will contain `server_name api.iklim.co;` — correct for prod.
Because StorageBox is mounted on the service nodes, files rendered by the runner are visible to SWAG regardless of which service node runs the container.
## Verification
After deploy, on iklim-app-01:
```bash
cat /mnt/storagebox/swag/site-confs/api.conf | grep server_name
```
Expected: `server_name api.iklim.co;`
```bash
docker exec $(docker ps -q -f name=iklimco_swag) nginx -t
```
Expected: `syntax is ok`
```bash
docker exec $(docker ps -q -f name=iklimco_swag | head -1) nginx -t
curl -si https://api.iklim.co/health
```
Expected: APISIX response with valid `*.iklim.co` cert.
## Notes
- `Prometheus` is intentionally NOT exposed via SWAG. Access it via Grafana
(internal connection: `http://prometheus:9090`) or SSH tunnel.
- If additional restricted-access subdomains are needed in the future, create a new
`template/swag/site-confs/<name>.conf.tpl` following the same pattern.
Expected:
- `server_name api.iklim.co;`
- Nginx config syntax is valid.
- Public API returns an APISIX response with a valid `*.iklim.co` certificate.
## Historical / Superseded by Setup
The previous `SWAG_CONFIG_DIR=/mnt/storagebox/swag/config` and `.env.prod` references are superseded. Use the split `SWAG_DNS_CONFIG_DIR`, `SWAG_SITE_CONFS_DIR`, and `SWAG_PROXY_CONFS_DIR` variables from the current setup.

View File

@ -1,48 +1,45 @@
# 05 — APISIX: Remove SSL / Configure Trusted Proxy (Prod)
## Context
Identical to `test-env-setup/05-apisix-remove-ssl.md`.
The same `init/apisix-core/init.sh` and custom APISIX image are used for both environments.
Changes made for test already apply to prod.
The same `init/apisix-core/init.sh` and custom APISIX image are used for test and prod. TLS terminates at SWAG; APISIX receives plain HTTP over the `iklimco-net` overlay network.
## Checklist
- [ ] `ssls/1` PUT block removed from `init/apisix-core/init.sh`
- [ ] `dev` SSL block removed or confirmed non-impactful for prod
- [ ] Custom APISIX image (`custom-apisix:3.12.0`) `template/apisix-core/config.yaml.template` contains
`real_ip_header`, `real_ip_recursive`, and `set_real_ip_from` (`10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`)
- [ ] New image built and pushed to Harbor if config.yaml.template was changed:
```bash
bash ops/push-harbor-custom-images.sh
```
- `ssls/1` PUT block is removed from `init/apisix-core/init.sh`.
- The dev-only SSL block is removed or confirmed to be non-impactful for prod.
- The custom APISIX image includes trusted proxy settings in `template/apisix-core/config.yaml.template`: `real_ip_header`, `real_ip_recursive`, and `set_real_ip_from` for private ranges.
- The custom image is pushed to Harbor when the APISIX config template changes.
## Prod-specific note
## Current Prod Model
APISIX runs with `replicas: 3` in prod — this value is defined in the `docker-stack-infra.prod.yml` overlay (not in the base `docker-stack-infra.yml`). All replicas read the same configuration from Patroni etcd (`/apisix` prefix) — a single `init` run is sufficient.
APISIX runs with 3 replicas in `docker-stack-infra_db-prod.yml`. All replicas read configuration from the shared DB-node etcd cluster with the `/apisix` prefix, so the pipeline runs `init/apisix-core/init.sh` once.
Production deployment uses:
```bash
# Prod deploy:
docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
docker stack deploy --with-registry-auth -c docker-stack-infra_db-prod.yml iklimco
```
`init/apisix-core/init.sh` is run once by the pipeline and writes the etcd state that all APISIX instances read.
## SWAG to APISIX Load Distribution
## SWAG → APISIX load distribution
SWAG connects to APISIX through the service name:
SWAG connects to APISIX via `proxy_pass http://apisix:9080;` — using the service name directly.
No additional upstream or load balancer configuration is needed on the SWAG side.
```nginx
proxy_pass http://apisix:9080;
```
**How it works:** Docker Swarm resolves the `apisix` service name to a VIP (Virtual IP).
Swarm's internal IPVS load balancer automatically distributes incoming connections across the 3 replicas
in round-robin. SWAG is unaware of this mechanism; it happens transparently at the overlay network layer.
Docker Swarm resolves `apisix` to the service VIP and distributes requests across APISIX replicas. SWAG does not need a separate upstream list for APISIX.
## Verification
```bash
# From a whitelisted IP, make a request and check real IP in APISIX logs
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
tail -5 /usr/local/apisix/logs/access.log
```
Client IP should appear in the log, not SWAG's internal overlay IP.
## Historical / Superseded by Setup
The old prod overlay command `docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco` is superseded by `docker-stack-infra_db-prod.yml`.

View File

@ -1,61 +1,54 @@
# 06 — cert-reloader Sidecar Service (Prod)
# 06 — Certificate Renewal and Vault Reload Flow (Prod)
## Context
Service definition is identical to test (see `test-env-setup/06-cert-reloader.md`).
In prod, Vault runs as a 3-node Raft cluster; cert distribution is handled via the StorageBox shared mount — no SSH required.
## Prod flow (3-node Vault Raft)
The production certificate flow is implemented in the current infra stack and setup runbooks. See `../../setup/09-prod-runner-ha-and-swarm.md`.
```
SWAG renews cert → writes to SWAG_CONFIG_DIR (/mnt/storagebox/swag/config)
cert-reloader detects MD5 change
→ copies to /mnt/storagebox/ssl/ (shared across all app nodes)
→ docker service update --force iklimco_vault
Vault (3 replicas) restarts
→ each instance has /mnt/storagebox/ssl/ mounted → reads the new cert
→ healthcheck checks sealed status every 30 seconds
→ if sealed: reads vault_unseal_key Docker secret and auto-unseals
## Current Flow
```text
SWAG renews the certificate inside its persistent config volume
cert-reloader detects the MD5 change
-> copies STAR.iklim.co.full.crt and STAR.iklim.co_key.pem to /mnt/storagebox/ssl
cert-distributor syncs those files to /opt/iklimco/ssl on service nodes
-> forces iklimco_vault to restart
Vault reads /opt/iklimco/ssl through /vault/certs
Vault entrypoint retry-unseal loop reads vault_unseal_key and unseals each replica
```
No SSH distribution, additional secrets, or cert-reloader script changes are needed.
No SSH certificate distribution is required in prod.
## Auto-unseal mechanism
## Vault Unseal Model
The Vault healthcheck is already implemented in `docker-stack-infra.yml`:
Vault auto-unseal is not implemented as the old Docker healthcheck snippet in the prod roadmap anymore. The current `docker-stack-vault.yml` and Vault entrypoint logic handle retry-unseal with the `vault_unseal_key` Docker secret.
```yaml
healthcheck:
test:
- "CMD"
- "sh"
- "-c"
- >-
vault status -format=json 2>/dev/null | grep -q '"sealed":false' ||
vault operator unseal $$(cat /run/secrets/vault_unseal_key 2>/dev/null)
interval: 30s
timeout: 10s
start_period: 15s
retries: 5
```
Each Vault container runs its own healthcheck independently — all 3 replicas unseal separately.
The cert renewal → restart → auto-unseal chain requires no manual intervention.
The `vault_unseal_key` secret is created/rotated by `init/vault/vault-bootstrap.sh` during bootstrap.
## Verification
```bash
docker service ps iklimco_cert-reloader
docker service ps iklimco_cert-distributor
docker service logs iklimco_cert-reloader --tail 20
docker service ps iklimco_vault
```
Expected: `[cert-reloader] started`, no error lines.
Expected:
- `cert-reloader` is running.
- `cert-distributor` is running.
- Vault service restarts cleanly after certificate renewal.
- Vault remains unsealed.
Confirm Vault sees the current certificate:
Confirm Vault cert is current after SWAG renewal:
```bash
# Check cert expiry on Vault's TLS endpoint from inside the overlay
docker exec $(docker ps -q -f name=iklimco_vault) \
sh -c 'echo | openssl s_client -connect vault.iklim.co:8200 2>/dev/null \
| openssl x509 -noout -dates'
docker exec $(docker ps -q -f name=iklimco_vault | head -1) \
sh -c 'echo | openssl s_client -connect vault.iklim.co:8200 2>/dev/null | openssl x509 -noout -dates'
```
`notAfter` should match the cert in `/mnt/storagebox/ssl/STAR.iklim.co.full.crt`.
`notAfter` should match the certificate distributed through `/opt/iklimco/ssl`.
## Historical / Superseded by Setup
The earlier plan that said “service definition is identical to test” and relied on a Vault healthcheck command is superseded. Prod now has a separate Vault stack, cert-distributor, and retry-unseal behavior.

View File

@ -1,315 +1,96 @@
# 08 — Deploy Pipeline Update (Prod)
# 08 — Production Deploy Pipeline Model
## Context
- **File:** `.gitea/workflows/deploy-prod.yml`
- Same changes as test pipeline (`test-env-setup/07-deploy-pipeline-update.md`),
adapted for prod paths and prod runner.
- **Prod-specific differences from test:**
- `SPRING_PROFILES_ACTIVE=prod` (not `test`) in Run APISIX Init
- DB hostnames: `postgresql`, `mongodb` (Swarm overlay DNS — same as test)
- Storagebox paths via env vars (`SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, vb.) instead of local host paths
- Extra steps: Update DNS Records (GoDaddy API), Wait for etcd
## Step 1 — Remove manual cert scp lines from `Initialize Workspace`
The production deploy pipeline is no longer a pending set of step additions. The current source of truth is the root `.gitea/workflows/deploy-prod.yml`, with the operational explanation in `../../setup/09-prod-runner-ha-and-swarm.md` and root `prod_env-ci_dc-pipeline.md`.
```yaml
# DELETE from "Initialize Servers" step:
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co.full.crt ./STAR.iklim.co.full.crt
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co_key.pem ./STAR.iklim.co_key.pem
```
## Current Pipeline Order
Also remove from `Prepare Init Files`:
```yaml
# DELETE or make conditional:
sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.pem /opt/iklimco/ssl/
```
The current root production workflow runs in this order:
## Step 2 — Add `Update DNS Records` step
| # | Step | Note |
| --- | --- | --- |
| 1 | Checkout Branch | |
| 2 | Prepare Folders | |
| 3 | Set up SSH Key and Add to known_hosts | |
| 4 | Update Apt Repository and Install Required Tools | `gettext tree jq`; `jq` is required for the GoDaddy DNS API |
| 5 | Fetch Prod Env From Storagebox | Fetch `.env` and `.env.secrets.shared` |
| 6 | Fetch Service Secret Files | Fetch `.env.secrets.<svc>` and `.env.secrets.swag` |
| 7 | Prepare Database Init Files | Render PostgreSQL/MongoDB init templates |
| 8 | Docker Login to Harbor | |
| 9 | Prepare SWAG Directories | Render `dns-conf` and `site-confs`; reload node-local SWAG if present |
| 10 | Bootstrap Vault TLS Placeholder | Creates a temporary cert only if missing |
| 11 | Create Infrastructure Docker Secrets | Creates `rabbitmq_erlang_cookie` if missing |
| 12 | Deploy Swarm Stacks | Deploys `docker-stack-infra_db-prod.yml` |
| 13 | Connect Runner to Overlay Network | Connects the job container to `iklimco-net` |
| 14 | Initialize Production Infrastructure | Runs `init-infra-prod.sh`; this triggers Vault bootstrap and RabbitMQ setup |
| 15 | Wait for Infrastructure Services | Waits for `iklimco_vault` and `iklimco_rabbitmq` |
| 16 | Provision Vault AppRole IDs and Docker Secrets | Downloads service `vault-files`, runs `init/provision-all-services.sh` |
| 17 | Upload Updated Secrets to Storagebox | Uploads `.env.secrets.*` and `.env` |
| 18 | Wait for etcd | Waits for etcd health |
| 19 | Run APISIX Init | `SPRING_PROFILES_ACTIVE=prod` |
| 20 | Bootstrap SWAG Certificate | Waits for SWAG and cert-reloader output in `SWAG_CERT_DIR` |
| 21 | Initialize MongoDB Replica Set | Runs `rs.initiate()` or missing-member `rs.add()` |
| 22 | Run Database Init Scripts | Patroni primary + MongoDB replica set; SQL and JS init |
| 23 | Update DNS Records | GoDaddy API; `api`, `apigw`, `rabbitmq`, and `grafana` A records |
| 24 | Review Environment | |
Insert **after** `Docker Login to Harbor` and **before** `Prepare SWAG Directories`.
All production deploy workflows must share `concurrency.group: prod-deploy` so infra and microservice deploys cannot overlap.
```yaml
- name: Update DNS Records
run: |
set -a; . ./.env; . ./.env.secrets.swag; set +a
FLOATING_IP="${{ vars.PROD_FLOATING_IP }}"
DOMAIN="iklim.co"
## Current Environment Files
for record in api apigw rabbitmq grafana; do
CURRENT=$(curl -s \
-H "Authorization: sso-key ${GODADDY_KEY}:${GODADDY_SECRET}" \
"https://api.godaddy.com/v1/domains/${DOMAIN}/records/A/${record}" \
2>/dev/null | jq -r '.[0].data // empty' 2>/dev/null || true)
The production StorageBox env file is `prod/secrets/iklim.co/.env`. The old `.env.prod` name is superseded.
if [ "$CURRENT" = "$FLOATING_IP" ]; then
echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (exists, skipping)"
else
curl -sf -X PUT \
-H "Authorization: sso-key ${GODADDY_KEY}:${GODADDY_SECRET}" \
-H "Content-Type: application/json" \
"https://api.godaddy.com/v1/domains/${DOMAIN}/records/A/${record}" \
-d "[{\"data\":\"${FLOATING_IP}\",\"ttl\":600}]"
echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (added/updated)"
fi
done
working-directory: /workspace/iklim.co
```
> `GODADDY_KEY` and `GODADDY_SECRET` are read from `.env.secrets.swag`.
> `PROD_FLOATING_IP` must be defined as a Gitea project variable (`terraform output prod_floating_ip`).
> `jq` is required — it must have been added to the `Update Apt Repository` step: `apt-get install -y gettext tree jq`.
> Runs on every deploy; existing and correct records are skipped (idempotent).
## Step 3 — Add `Prepare SWAG Directories` step
Insert **before** `Bootstrap Vault TLS Placeholder`:
```yaml
- name: Prepare SWAG Directories
run: |
set -a; . ./.env; . ./.env.secrets.swag; set +a
mkdir -p "$SWAG_CONFIG_DIR/dns-conf" "$SWAG_SITE_CONFS_DIR"
envsubst < template/swag/dns-conf/godaddy.ini.tpl | docker run --rm -i \
-v "${SWAG_CONFIG_DIR}/dns-conf:/output" \
alpine sh -c "cat > /output/godaddy.ini && chmod 600 /output/godaddy.ini"
echo "✅ godaddy.ini written"
export RESTRICTED_IPS_BLOCK="$(echo "$RESTRICTED_IPS" | tr ',' '\n' | sed 's|.*| allow &;|')"
SWAG_VARS='${API_SUBDOMAIN}${APIGW_SUBDOMAIN}${GRAFANA_SUBDOMAIN}${RABBITMQ_SUBDOMAIN}${RESTRICTED_IPS_BLOCK}'
for tpl in template/swag/site-confs/*.conf.tpl; do
fname=$(basename "${tpl%.tpl}")
envsubst "$SWAG_VARS" < "$tpl" | docker run --rm -i \
-v "${SWAG_SITE_CONFS_DIR}:/output" \
alpine sh -c "cat > /output/${fname}"
echo "✅ ${fname}"
done
cat template/swag/site-confs/default.conf | docker run --rm -i \
-v "${SWAG_SITE_CONFS_DIR}:/output" \
alpine sh -c "cat > /output/default.conf"
echo "✅ SWAG directories ready"
SWAG_CTR=$(docker ps -q -f name=iklimco_swag 2>/dev/null | head -1)
if [ -n "$SWAG_CTR" ]; then
docker exec "$SWAG_CTR" nginx -t && docker exec "$SWAG_CTR" nginx -s reload
echo "✅ SWAG nginx reloaded"
fi
working-directory: /workspace/iklim.co
```
> `.env` is sourced first so `API_SUBDOMAIN=api.iklim.co` (prod values) are used.
> Ensure these vars are in `prod/secrets/iklim.co/.env.prod` on storagebox.
## Step 4 — Add `Wait for etcd` step
Insert **after** `Deploy Swarm Stack` and **before** `Run APISIX Init`.
APISIX reads its entire configuration from etcd; init script will fail silently if etcd is not ready.
```yaml
- name: Wait for etcd
run: |
echo "⏳ Waiting for Patroni etcd..."
for i in $(seq 1 30); do
if docker run --rm --network iklimco-net alpine \
sh -c "wget -qO- http://etcd:2379/health 2>/dev/null | grep -q '\"health\":\"true\"'"; then
echo "✅ Patroni etcd ready"
break
fi
[ "$i" -eq 30 ] && echo "❌ Patroni etcd did not become ready in time" && exit 1
echo " attempt $i/30 — waiting 5s..."
sleep 5
done
```
> **Note:** In prod, APISIX uses the 3-node Patroni etcd cluster on DB nodes (`etcd/02/03:2379`) via the `/apisix` prefix — resolved through `iklimco-net` overlay DNS aliases defined in `docker-stack-db.prod.yml`. The standalone `etcd` service from the base stack is disabled (`replicas: 0` in the prod overlay) and removed from the service list by a post-deploy step. This step waits for Patroni etcd (`etcd:2379`) to be healthy before running the APISIX init script.
## Step 5 — Add `Run APISIX Init` step
Insert **after** `Wait for etcd` and **before** `Bootstrap SWAG Certificate`.
```yaml
- name: Run APISIX Init
run: |
set -a; . ./.env; . ./.env.secrets.shared; set +a
echo "⏳ Waiting for Swarm APISIX..."
until curl -sf -o /dev/null \
-H "X-API-KEY: ${APISIX_ADMIN_KEY}" \
"http://apisix:9180/apisix/admin/upstreams" 2>/dev/null; do
sleep 5
done
export SPRING_PROFILES_ACTIVE=prod
/bin/bash init/apisix-core/init.sh
echo "✅ APISIX routes configured"
working-directory: /workspace/iklim.co
```
> **Prod-specific:** `SPRING_PROFILES_ACTIVE=prod` — test pipeline uses `test`.
> `APISIX_ADMIN_KEY` is sourced from `.env.secrets.shared`.
> The init script is idempotent (PUT semantics); safe to re-run on subsequent deploys.
> With `replicas: 3` in prod, all APISIX instances read the same etcd state — no per-replica init needed.
## Step 6 — Add `Bootstrap SWAG Certificate` step
Insert **after** `Run APISIX Init`:
```yaml
- name: Bootstrap SWAG Certificate
run: |
set -a; . ./.env; set +a
echo "Waiting for SWAG container to start..."
SWAG_CTR=""
for i in $(seq 1 24); do
SWAG_CTR=$(docker ps -q -f name=iklimco_swag 2>/dev/null | head -1)
[ -n "$SWAG_CTR" ] && break
sleep 10
done
if [ -z "$SWAG_CTR" ]; then
echo "❌ SWAG container did not start"
exit 1
fi
CERT_PATH="/config/etc/letsencrypt/live/iklim.co/fullchain.pem"
echo "Waiting for cert (up to 10 min)..."
for i in $(seq 1 20); do
if docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "✅ Cert obtained"
break
fi
echo " attempt $i/20 — waiting 30s..."
sleep 30
done
if ! docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "❌ SWAG did not obtain cert. Logs:"
docker service logs iklimco_swag --tail 50
exit 1
fi
docker exec "$SWAG_CTR" cat "$CERT_PATH" | \
docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
sh -c "cat > /output/STAR.iklim.co.full.crt && chmod 644 /output/STAR.iklim.co.full.crt"
docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \
docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
sh -c "cat > /output/STAR.iklim.co_key.pem && chmod 644 /output/STAR.iklim.co_key.pem"
echo "✅ Cert bootstrapped to ${SWAG_CERT_DIR}/"
working-directory: /workspace/iklim.co
```
## Step 7 — Add `Run Database Init Scripts` step
Insert **after** `Bootstrap SWAG Certificate` and **before** `Review Environment`.
```yaml
- name: Run Database Init Scripts
run: |
set -a; . ./.env; . ./.env.secrets.shared; set +a
echo "⏳ Waiting for PostgreSQL..."
until docker run --rm --network iklimco-net \
-e PGPASSWORD="${DATABASE_POSTGRES_ROOT_PASSWD}" \
postgis/postgis:18-3.6 \
pg_isready -h postgresql -U "${DATABASE_POSTGRES_ROOT_USER}" -q 2>/dev/null; do
sleep 5
done
for sql_file in $(ls ./init/postgresql/*.sql 2>/dev/null | sort); do
echo "▶ $(basename "$sql_file")"
docker run --rm -i --network iklimco-net \
-e PGPASSWORD="${DATABASE_POSTGRES_ROOT_PASSWD}" \
postgis/postgis:18-3.6 \
psql -h postgresql -U "${DATABASE_POSTGRES_ROOT_USER}" < "$sql_file"
done
echo "⏳ Waiting for MongoDB..."
until docker run --rm --network iklimco-net mongo:8.3.2 \
mongosh "mongodb://${DATABASE_MONGODB_ROOT_USER}:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb/admin" \
--eval "db.runCommand({ping:1})" --quiet 2>/dev/null; do
sleep 5
done
for js_file in $(ls ./init/mongodb/*.js 2>/dev/null | sort); do
echo "▶ $(basename "$js_file")"
docker run --rm -i --network iklimco-net mongo:8.3.2 \
mongosh "mongodb://${DATABASE_MONGODB_ROOT_USER}:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb/admin" \
--quiet < "$js_file"
done
echo "✅ Database init scripts completed"
working-directory: /workspace/iklim.co
```
> **Prod-specific:** DB hostnames are `postgresql` and `mongodb` (Swarm VIP service names).
> Test pipeline uses `postgresql` / `mongodb` (unqualified aliases within the same stack).
> SQL and JS files are generated by `Prepare Init Files` step via `init_postgresql` / `init_mongodb` functions in `common-functions-prod.sh`.
> Step is idempotent — scripts use `CREATE IF NOT EXISTS` / `createCollection` semantics.
## Step 8 — Microservice prod deploy overlay
Each microservice has its own `docker-stack-service.prod.yml` overlay file. This file contains prod-specific `replicas: 3` and `max_replicas_per_node: 1` settings.
In microservice deploy pipelines (`deploy-prod.yml`), the `docker stack deploy` command should be:
Current SWAG-related variables include:
```bash
docker stack deploy \
-c BE-<ServiceName>/docker-stack-service.yml \
-c BE-<ServiceName>/docker-stack-service.prod.yml \
iklimco
SWAG_CERT_DIR=/mnt/storagebox/ssl
SWAG_DNS_CONFIG_DIR=/mnt/storagebox/swag/dns-conf
SWAG_SITE_CONFS_DIR=/mnt/storagebox/swag/site-confs
SWAG_PROXY_CONFS_DIR=/mnt/storagebox/swag/proxy-confs
```
For example, for `BE-Authentication`:
## Current Stack Deployment
The pipeline deploys the current production infra/DB stack:
```bash
docker stack deploy \
-c BE-Authentication/docker-stack-service.yml \
-c BE-Authentication/docker-stack-service.prod.yml \
iklimco
docker stack deploy --with-registry-auth -c docker-stack-infra_db-prod.yml iklimco
```
> When a new microservice is added, `BE-<ServiceName>/docker-stack-service.prod.yml` must be created and the pipeline must include this overlay.
## Step 9 — Ensure subdomain env vars are in prod `.env`
Add to `prod/secrets/iklim.co/.env.prod` on storagebox:
Vault is not part of that stack. Vault is deployed and bootstrapped by `init/vault/vault-bootstrap.sh` through `init-infra-prod.sh` using:
```bash
API_SUBDOMAIN=api.iklim.co
APIGW_SUBDOMAIN=apigw.iklim.co
RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co
GRAFANA_SUBDOMAIN=grafana.iklim.co
docker stack deploy --with-registry-auth -c docker-stack-vault.yml iklimco
```
## Step 10 — Final step order for prod pipeline
## Database Initialization
To prevent concurrent deploys, a Gitea Actions `concurrency` block is added per pipeline:
MongoDB replica set initialization is a dedicated workflow step. It runs `rs.initiate()` when the replica set is uninitialized and `rs.add()` when members from `DATABASE_MONGODB_HOST` are missing.
```yaml
concurrency:
group: prod-deploy
cancel-in-progress: false
```
Database init scripts run after Patroni primary and MongoDB replica set readiness. PostgreSQL uses the multi-host Patroni connection with `target_session_attrs=read-write`; MongoDB uses the replica set host list from `DATABASE_MONGODB_HOST`.
With `cancel-in-progress: false`, a new run waits in the queue until the previous one finishes; Gitea UI shows it as "queued" and does not return an error.
## Microservice Deploy Model
1. Checkout Branch
2. Prepare Folders
3. Set up SSH Key and Add to known_hosts
4. Update Apt Repository and Install Required Tools (`gettext tree jq`)
5. Fetch Service Secret Files
6. Initialize Workspace ← cert scp lines removed
7. Upload Updated Secrets to Storagebox
8. Provision Vault AppRole IDs and Docker Secrets
9. Upload Updated Env to Storagebox
10. Prepare Init Files ← cert copy lines removed
11. Initialize Docker Swarm
12. Docker Login to Harbor
13. **Update DNS Records** ← NEW (GoDaddy API, idempotent)
14. **Prepare SWAG Directories** ← NEW (`$SWAG_CONFIG_DIR/dns-conf`; renders nginx conf templates)
15. Bootstrap Vault TLS Placeholder
16. Deploy Swarm Stack
17. **Wait for etcd** ← NEW (Patroni etcd `etcd:2379` overlay DNS)
18. **Run APISIX Init** ← NEW (`SPRING_PROFILES_ACTIVE=prod`)
19. **Bootstrap SWAG Certificate** ← NEW
20. **Run Database Init Scripts** ← NEW (`postgresql`, `mongodb`)
21. Review Environment
Prod microservice workflows do not use a separate `docker-stack-service.prod.yml` overlay anymore.
The current model is:
- read `deploy/prod.env`;
- promote the tested Harbor digest to the stable prod tag;
- call `swarm_service_update` with `deploy/docker-stack-service.yml`;
- use `docker service update` with `--update-order start-first` and rollback behavior for existing services.
## Historical / Superseded by Setup
The following earlier plan items are superseded:
- Removing cert `scp` lines from an `Initialize Workspace` step as a live action; those lines are already gone.
- Creating prod deploy steps around `docker-stack-infra.yml` + `docker-stack-infra.prod.yml`.
- Waiting for a legacy `etcd:2379` service from a base stack.
- Using `docker-stack-db.prod.yml` as the DB stack reference.
- Writing SWAG DNS files through `SWAG_CONFIG_DIR/dns-conf`.
- Storing prod env in `prod/secrets/iklim.co/.env.prod`.
- Deploying microservices with `docker-stack-service.yml` plus `docker-stack-service.prod.yml`.
Keep this file as a roadmap summary. For exact commands, use the root workflow and `../../setup/09-prod-runner-ha-and-swarm.md`.

View File

@ -1,147 +1,158 @@
# 09 — Verification Checklist (Prod)
## Context
Run after a successful prod pipeline deployment.
## 1 — Swarm cluster health
Run these checks after a successful production pipeline deployment. The current setup source is `../../setup/09-prod-runner-ha-and-swarm.md`.
## 1 — Swarm Cluster Health
```bash
docker node ls
```
Expected: 3 managers (`Leader` + 2 `Reachable`) for `iklim-app-01/02/03`, 3 workers (`Ready`) for `iklim-db-01/02/03`.
Expected: 3 managers (`Leader` + 2 `Reachable`) for `iklim-app-01/02/03`, and 3 workers (`Ready`) for `iklim-db-01/02/03`.
```bash
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
```
Expected: app nodes have `type=service`; DB nodes have `role=db` and `db-index=01/02/03`.
## 2 — Infra, DB, and Vault Services
```bash
docker service ls --filter label=project=co.iklim
docker service ps iklimco_vault
docker service ps iklimco_rabbitmq
docker service ps iklimco_apisix
```
All services show `REPLICAS X/X` (target met).
## 2 — Precipitation image directory exists
Expected: all current services show their desired replica counts.
Vault is deployed by `docker-stack-vault.yml`; the main infra and DB services are deployed by `docker-stack-infra_db-prod.yml`.
## 3 — DB Node Placement
```bash
ls -ld /mnt/storagebox/precipitation/images
docker service ps iklimco_patroni-01
docker service ps iklimco_patroni-02
docker service ps iklimco_patroni-03
docker service ps iklimco_mongodb-01
docker service ps iklimco_mongodb-02
docker service ps iklimco_mongodb-03
docker service ps iklimco_etcd-01
docker service ps iklimco_etcd-02
docker service ps iklimco_etcd-03
```
Expected: directory exists. This must be created before `iklimco_precipitation-service` is deployed.
Expected: tasks run on their matching `iklim-db-0X` hostnames according to the stack placement constraints.
## 4 — Service-Node Infrastructure Placement
```bash
docker volume inspect iklimco_image-data
docker service ps iklimco_redis
docker service ps iklimco_redis-sentinel
docker service ps iklimco_rabbitmq
docker service ps iklimco_swag
docker service ps iklimco_cert-reloader
docker service ps iklimco_cert-distributor
```
Expected: `Options.device` is `/mnt/storagebox/precipitation/images`.
Expected: Redis, Sentinel, RabbitMQ, SWAG, and cert services run on app/service nodes, not DB nodes.
## 3 — SWAG cert is valid
## 5 — SWAG Certificate Is Valid
```bash
docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates
docker exec $(docker ps -q -f name=iklimco_swag | head -1) certbot certificates
```
Expected: `*.iklim.co`, `VALID: XX days` (Let's Encrypt, not the old manual cert).
Expected: certificate for `*.iklim.co`, valid and issued by Let's Encrypt.
TLS check from outside:
```bash
echo | openssl s_client -connect api.iklim.co:443 -servername api.iklim.co 2>/dev/null \
| openssl x509 -noout -subject -dates
```
Expected: `CN=*.iklim.co`, `notAfter` > 2026-07-15 (cert is Let's Encrypt, not expiring old one).
## 4 — Public API
Expected: `CN=*.iklim.co` and a current `notAfter` date.
## 6 — Public API and Restricted Subdomains
```bash
curl -si https://api.iklim.co/health
```
HTTP 2xx, no TLS errors.
## 5 — IP restriction working
Expected: HTTP 2xx or an APISIX response, with no TLS error.
From a non-whitelisted IP:
```bash
curl -si https://grafana.iklim.co
curl -si https://apigw.iklim.co
curl -si https://rabbitmq.iklim.co
```
All expected: HTTP 403.
From whitelisted IP (78.187.87.109 or 95.70.151.248):
Expected: HTTP 403.
From a whitelisted IP:
```bash
curl -si https://grafana.iklim.co # HTTP 200 Grafana
curl -si https://apigw.iklim.co # HTTP 200 APISIX Dashboard
curl -si https://rabbitmq.iklim.co # HTTP 200 RabbitMQ Management
curl -si https://grafana.iklim.co
curl -si https://apigw.iklim.co
curl -si https://rabbitmq.iklim.co
```
## 6 — Vault not reachable externally
Expected: HTTP 200 or the expected login/management page.
## 7 — Vault Is Not Publicly Reachable
From outside:
```bash
# From outside — must fail
curl -sk --connect-timeout 5 https://<iklim-app-01-public-ip>:8200/v1/sys/health
# Expected: connection refused or timeout
```
Expected: connection refused or timeout.
From inside overlay:
```bash
# From inside overlay — must succeed
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
curl -sk https://vault.iklim.co:8200/v1/sys/health
# Expected: {"sealed":false,...}
```
## 7 — cert-reloader watching
Expected: JSON response with `"sealed":false`.
## 8 — Certificate Reload Chain
```bash
docker service logs iklimco_cert-reloader --tail 5
docker service logs iklimco_cert-reloader --tail 10
docker service ps iklimco_cert-distributor
docker exec $(docker ps -q -f name=iklimco_vault | head -1) ls /vault/certs/
```
Expected: `[cert-reloader] started`, no errors.
## 8 — No unexpected published ports
Expected: cert-reloader has no errors, cert-distributor is running, and Vault sees `STAR.iklim.co.full.crt` plus `STAR.iklim.co_key.pem`.
## 9 — No Unexpected Published Ports
```bash
docker service ls --format "{{.Name}}\t{{.Ports}}" \
--filter label=project=co.iklim
```
Only `iklimco_swag` should show `*:80->80/tcp, *:443->443/tcp`.
## 9 — DB nodes running correct services
```bash
# Patroni (PostgreSQL HA) stack
docker stack services iklim-patroni
docker service ps iklim-patroni_patroni-01
docker service ps iklim-patroni_patroni-02
docker service ps iklim-patroni_patroni-03
# etcd cluster (for Patroni)
docker stack services iklim-etcd
# MongoDB replica set
docker stack services iklimco
docker service ps iklimco_mongodb-01
docker service ps iklimco_mongodb-02
docker service ps iklimco_mongodb-03
docker service ls --format "{{.Name}}\t{{.Ports}}" --filter label=project=co.iklim
```
All tasks should show node names matching `iklim-db-01`, `iklim-db-02`, or `iklim-db-03` with placement constraint `role=db`.
Expected: only services intentionally published by the stack expose ports. Redis and RabbitMQ must not appear as DB-node host-mode services.
## 10 — APISIX replicas
## 10 — Microservice Health
```bash
docker service ps iklimco_apisix
```
Expected: 3 tasks, all `Running`, on different nodes.
After microservices are deployed by their separate production workflows:
## 11 — fail2ban active
```bash
docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status
```
Expected: multiple jails listed.
## 12 — Microservice health (post-deploy)
After microservices are deployed (separate pipeline), verify via the public API:
```bash
curl -si https://api.iklim.co/v1/weather/current?lat=39&lon=35
```
Expected: valid JSON weather response.
## ⚠️ Old cert expiry reminder
The manually managed `*.iklim.co` cert expires **2026-07-15**.
SWAG's Let's Encrypt cert auto-renews every ~60 days.
After first SWAG cert is confirmed valid, the manual cert in storagebox can be archived
and is no longer used.
Expected: valid JSON response.
## Historical / Superseded by Setup
Older verification snippets that used `iklim-patroni`, `iklim-etcd`, or separate DB stack names are superseded. Current prod DB services are part of the `iklimco` stack deployed from `docker-stack-infra_db-prod.yml`.

View File

@ -2,9 +2,7 @@
## Context
- **File:** `docker-stack-infra.yml` (repo root)
- **Goal:** Add SWAG as TLS-terminating reverse proxy; remove all published ports from internal
services (they become reachable only via SWAG through the `iklimco-net` overlay network);
remove Vault's external port entirely.
- **Goal:** Add SWAG as TLS-terminating reverse proxy; remove all published ports from internal services (they become reachable only via SWAG through the `iklimco-net` overlay network); remove Vault's external port entirely.
## Changes Summary
@ -46,7 +44,7 @@ Add after the `apisix-dashboard` service block:
- DNSPROPAGATION=90
volumes:
- ${SWAG_CONFIG_DIR:-swag-vl}:/config
- ${SWAG_DNS_CONF_DIR:-/opt/iklimco/swag/dns-conf}:/config/dns-conf
- ${SWAG_DNS_CONFIG_DIR:-/opt/iklimco/swag/dns-conf}:/config/dns-conf
- ${SWAG_SITE_CONFS_DIR:-/opt/iklimco/swag/site-confs}:/config/nginx/site-confs
ports:
- target: 80
@ -130,8 +128,7 @@ Find the `vault` service `ports:` block and **delete it entirely**:
mode: host
```
Vault remains reachable within `iklimco-net` via the overlay alias `vault.iklim.co:8200`.
The `VAULT_LOCAL_CONFIG` `api_addr` and `networks.default.aliases` entries stay unchanged.
Vault remains reachable within `iklimco-net` via the overlay alias `vault.iklim.co:8200`. The `VAULT_LOCAL_CONFIG` `api_addr` and `networks.default.aliases` entries stay unchanged.
## Step 4 — Remove `apisix` published ports
@ -154,8 +151,7 @@ Find the `apisix` service `ports:` block and **delete it entirely**:
mode: host
```
APISIX admin API (9180) access: use `docker exec` or SSH tunnel.
APISIX is reachable from SWAG via `http://apisix:9080` on the overlay network.
APISIX admin API (9180) access: use `docker exec` or SSH tunnel. APISIX is reachable from SWAG via `http://apisix:9080` on the overlay network.
## Step 5 — Remove `apisix-dashboard` published port

View File

@ -1,10 +1,8 @@
# 06 — cert-reloader Sidecar Service (Test)
## Context
- **Purpose:** Watches SWAG's certificate volume for changes; copies renewed certs to
`/opt/iklimco/ssl/` on the host; forces Vault to reload its TLS cert.
- **Replaces:** `ops/vault-reload-after-swag-renewal.sh` (which was designed for manual use).
The sidecar automates this after every SWAG renewal.
- **Purpose:** Watches SWAG's certificate volume for changes; copies renewed certs to `/opt/iklimco/ssl/` on the host; forces Vault to reload its TLS cert.
- **Replaces:** `ops/vault-reload-after-swag-renewal.sh` (which was designed for manual use). The sidecar automates this after every SWAG renewal.
- **Runs on:** manager node (same node as SWAG and Vault, ensuring volume + socket access).
## How it works
@ -22,16 +20,13 @@ Vault restarts
## Step 1 — Service definition (already in `03-infra-stack-changes.md`)
The `cert-reloader` service is added to `docker-stack-infra.yml` as documented in step 03.
No separate action needed here beyond that file change.
The `cert-reloader` service is added to `docker-stack-infra.yml` as documented in step 03. No separate action needed here beyond that file change.
## Step 2 — Ensure `/opt/iklimco/ssl/` exists on the host
The `Prepare Init Files` step in the pipeline already creates this directory and copies
the initial cert. The cert-reloader handles subsequent renewals.
The `Prepare Init Files` step in the pipeline already creates this directory and copies the initial cert. The cert-reloader handles subsequent renewals.
On first deploy, the bootstrap cert (copied during pipeline init) is used until SWAG
obtains its first Let's Encrypt cert (see `07-deploy-pipeline-update.md`).
On first deploy, the bootstrap cert (copied during pipeline init) is used until SWAG obtains its first Let's Encrypt cert (see `07-deploy-pipeline-update.md`).
## Step 3 — Verify cert-reloader is running
@ -65,15 +60,9 @@ fi
```
## Notes
- Docker socket (`/var/run/docker.sock`) is mounted into cert-reloader — this is intentional
and necessary. The service is pinned to manager and is minimal (`docker:27-cli` image).
- cert-reloader checks every 3600s (1 hour). Let's Encrypt certs renew every ~60 days;
the 1-hour check window is more than sufficient.
- If Vault restarts (due to cert reload), it may need to be **unsealed** automatically.
Vault's healthcheck in `docker-stack-infra.yml` already handles auto-unseal via the
`vault_unseal_key` Docker secret. Verify this works after a cert reload.
- Docker socket (`/var/run/docker.sock`) is mounted into cert-reloader — this is intentional and necessary. The service is pinned to manager and is minimal (`docker:27-cli` image).
- cert-reloader checks every 3600s (1 hour). Let's Encrypt certs renew every ~60 days; the 1-hour check window is more than sufficient.
- If Vault restarts (due to cert reload), it may need to be **unsealed** automatically. Vault's healthcheck in `docker-stack-infra.yml` already handles auto-unseal via the `vault_unseal_key` Docker secret. Verify this works after a cert reload.
## Future — Multi-node Vault (prod)
When Vault runs as a 3-node Raft cluster on different physical machines,
cert-reloader must also SSH-copy the cert to the other nodes' `/opt/iklimco/ssl/`.
This is handled in `prod-env-setup/06-cert-reloader.md`.
Production no longer requires SSH-copy based certificate distribution. The current prod model uses StorageBox plus `cert-distributor` to sync certificates to `/opt/iklimco/ssl` on service nodes. See `../prod-env/06-cert-reloader.md`.

View File

@ -19,8 +19,7 @@
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:test/app/iklim.co/ssl/STAR.iklim.co_key.pem ./STAR.iklim.co_key.pem
```
Also remove any references to `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.pem` in
the `Prepare Init Files` step's `sudo cp` commands:
Also remove any references to `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.pem` in the `Prepare Init Files` step's `sudo cp` commands:
```yaml
# DELETE or make conditional:
@ -78,8 +77,7 @@ Insert this step **before** `Deploy Swarm Stack`:
## Step 3 — Add `Bootstrap SWAG Certificate` step
Insert this step **after** `Deploy Swarm Stack` and **before** any step that depends on
Vault being accessible (e.g., `Provision Vault AppRole IDs`):
Insert this step **after** `Deploy Swarm Stack` and **before** any step that depends on Vault being accessible (e.g., `Provision Vault AppRole IDs`):
```yaml
- name: Bootstrap SWAG Certificate
@ -163,11 +161,6 @@ Final step order in the pipeline:
> move step 16 before step 8. Adjust based on observed behavior.
## Notes
- `.env` must contain the subdomain env vars added in step 04. Add them to storagebox
`test/secrets/iklim.co/.env` before the first deploy.
- `RESTRICTED_IP_1` and `RESTRICTED_IP_2` are hardcoded in the pipeline step above.
Move to `.env` if they change often.
- Precipitation service expects its image-data bind mount at
`/mnt/storagebox/precipitation/images`. This directory is provisioned by the
test Ansible bootstrap through `storagebox_managed_directories`; do not rely on
the deploy pipeline to create it.
- `.env` must contain the subdomain env vars added in step 04. Add them to storagebox `test/secrets/iklim.co/.env` before the first deploy.
- `RESTRICTED_IPS` should be kept as a comma-separated CIDR list in `.env`, then rendered into nginx `allow` directives by the pipeline.
- Precipitation service expects its image-data bind mount at `/mnt/storagebox/precipitation/images`. This directory is provisioned by the test Ansible bootstrap through `storagebox_managed_directories`; do not rely on the deploy pipeline to create it.