initial commit

This commit is contained in:
Murat ÖZDEMİR 2026-05-09 16:26:06 +03:00
parent 93412c9868
commit 81c38e8d39
27 changed files with 3032 additions and 0 deletions

48
.gitignore vendored Normal file
View File

@ -0,0 +1,48 @@
# Terraform local/runtime files
.terraform/
*.tfstate
*.tfstate.*
crash.log
crash.*.log
override.tf
override.tf.json
*_override.tf
*_override.tf.json
# Terraform secret variable files
*.tfvars
*.tfvars.json
terraform.tfvars
terraform.tfvars.json
# Ansible local/runtime files
*.retry
.ansible/
ansible-vault-password*
vault-password*
# Secret material
.env
.env.*
!.env.example
secrets/
secret/
*.pem
*.key
id_rsa
id_rsa.pub
id_ed25519
id_ed25519.pub
*_private_key
*_private_key.pub
# Gitea runner tokens/config generated with secrets
act_runner.token
gitea-runner-registration-token*
runner-registration-token*
runner-config.secret.yaml
# OS/editor noise
.DS_Store
*.swp
*.swo

View File

@ -0,0 +1,101 @@
# 01 — Docker Swarm Init (Prod — Multi-Node)
## Context
- **Repo:** `iklim.co` root
- **Environment:** prod
- **Topology:**
- 3 × service nodes — all act as **Swarm managers AND app workers** (Raft quorum: 1 can fail)
- 3 × DB nodes — **NOT part of Docker Swarm** (separate DB cluster, out of scope)
- All 6 nodes are in the same private network.
- Pipeline trigger: push to `prod-env` branch → Gitea runner on `prod-runner` (first service node).
- Swarm has 3 nodes total; all are manager-eligible and carry workloads (no dedicated worker-only nodes).
## Node labeling plan
| Node | Role | Swarm role | Labels |
|------|------|------------|--------|
| service-1 | API services, SWAG, Vault | Manager + Worker | `type=service` |
| service-2 | API services replicas | Manager + Worker | `type=service` |
| service-3 | API services replicas | Manager + Worker | `type=service` |
> DB nodes (`db-1/2/3`) are **not part of Docker Swarm**. They run as a separate cluster
> and are provisioned independently. No Swarm join or label step applies to them.
## Step 1 — Init Swarm on service-1 (the prod-runner node)
```bash
MANAGER_IP=$(hostname -I | awk '{print $1}')
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
docker swarm init --advertise-addr "$MANAGER_IP"
echo "✅ Swarm initialized on $MANAGER_IP"
else
echo " Swarm already active"
fi
```
## Step 2 — Get manager join token
```bash
docker swarm join-token manager # for service-2, service-3
```
Save this token — needed on service-2 and service-3.
## Step 3 — Join service-2 and service-3 as managers
SSH into service-2 and service-3, run:
```bash
docker swarm join --token <MANAGER_TOKEN> <service-1-ip>:2377
```
## Step 4 — Label all Swarm nodes
On service-1, after service-2 and service-3 have joined:
```bash
for node in service-1 service-2 service-3; do
docker node update --label-add type=service "$node"
done
```
> Replace `service-1`, etc. with actual node hostnames shown in `docker node ls`.
> DB nodes are not in Swarm — no join or label step for them.
## Step 5 — Verify
```bash
docker node ls
```
Expected: 3 nodes, all with `MANAGER STATUS` = `Leader` or `Reachable`.
All 3 nodes remain in `AVAILABILITY=Active` (not drained) so they also carry workloads.
```bash
docker node inspect service-1 --format '{{.Spec.Labels}}'
```
Expected: `map[type:service]`.
## Step 6 — Confirm `init/swarm-init.sh` multi-node awareness
The script is idempotent (skips init if already active). Verify:
```bash
grep -n "swarm init\|swarm join" init/swarm-init.sh
```
The prod pipeline runs on service-1 only. service-2/3 are joined via Ansible (`swarm` role),
not via the Gitea pipeline.
## Placement constraints used in `docker-stack-infra.yml`
| Constraint | Resolves to |
|------------|-------------|
| `node.role == manager` | service-1, service-2, service-3 |
| `node.labels.type == service` | service-1, service-2, service-3 |
SWAG, Vault, cert-reloader: pinned to `node.role == manager`.
Microservices: no constraint (distributed across all 3 service nodes by Swarm scheduler).
> `node.labels.type == db` constraint is **not used** — DB nodes are not in Swarm.
> PostgreSQL and MongoDB run outside Swarm as a separately managed cluster.

View File

@ -0,0 +1,63 @@
# 02 — GoDaddy DNS Credentials for SWAG (Prod)
## Context
Identical to test-env-setup/02, except the storagebox path is `prod/` instead of `test/`.
## ⚠️ Security — Rotate credentials before use
If credentials were shared in any chat log, Slack message, or email, **revoke them immediately**:
1. Go to: https://developer.godaddy.com/keys
2. Revoke the exposed key
3. Create a new Production key pair
**Never commit credentials to the repository.**
## Step 1 — Add credentials to storagebox `.env.secrets.shared` (prod path)
Open the file at storagebox path:
```
prod/secrets/iklim.co/.env.secrets.shared
```
Add:
```bash
GODADDY_KEY=<your-new-api-key>
GODADDY_SECRET=<your-new-api-secret>
```
## Step 2 — Repo template file
Same file as test: `swag/dns-conf/godaddy.ini.tpl` (already created in test step 02).
No additional action needed in the repo.
## Step 3 — (Handled by pipeline) Write credentials file on prod host
The deploy pipeline (see `08-deploy-pipeline-update.md`) runs on service-1:
```bash
mkdir -p /opt/iklimco/swag/dns-conf
envsubst < swag/dns-conf/godaddy.ini.tpl > /opt/iklimco/swag/dns-conf/godaddy.ini
chmod 600 /opt/iklimco/swag/dns-conf/godaddy.ini
```
## Step 4 — GoDaddy A records for prod subdomains
In GoDaddy DNS panel for `iklim.co`, add/update A records pointing to service-1's public IP:
| Record | Value |
|--------|-------|
| `api` | `<service-1-public-ip>` |
| `apigw` | `<service-1-public-ip>` |
| `rabbitmq` | `<service-1-public-ip>` |
| `grafana` | `<service-1-public-ip>` |
> Swarm's routing mesh means any node IP would work, but service-1 is the designated
> entry point (runs SWAG). Using a single IP keeps DNS simple.
>
> For HA: add a load balancer or use Hetzner's floating IP in front of the 3 service nodes.
> DNS then points to the floating IP. This is a future improvement.
## Notes
- Test and prod SWAG instances both obtain `*.iklim.co` independently from Let's Encrypt.
There is no conflict — they use the same domain, different servers.
- `DNSPROPAGATION=90` handles GoDaddy's typical 30-90s propagation delay.

View File

@ -0,0 +1,98 @@
# 03 — docker-stack-infra.yml Changes (Prod)
## Context
- **File:** `docker-stack-infra.yml` (repo root — shared between test and prod)
- All changes from `test-env-setup/03-infra-stack-changes.md` apply here identically.
- **Additional prod-specific changes:**
- PostgreSQL and MongoDB placement constraints point to `type=db` nodes.
- Microservices have no constraint (distributed across service nodes by Swarm).
- Replica counts for stateless services are increased.
## Step 1 — Apply all test-env changes first
Follow every step in `test-env-setup/03-infra-stack-changes.md`:
- Add `swag` service
- Add `cert-reloader` service
- Remove published ports for vault, apisix, rabbitmq, prometheus, grafana, apisix-dashboard
- Add `swag-vl` volume
## Step 2 — Update PostgreSQL placement constraint
Change `postgres` service placement to use the `type=db` label:
```yaml
# CHANGE in postgres service:
placement:
constraints:
- node.labels.type == db
```
## Step 3 — Update MongoDB placement constraint
```yaml
# CHANGE in mongo service:
placement:
constraints:
- node.labels.type == db
```
## Step 4 — Pin Vault to manager node (initial prod — single instance)
Vault starts as a single instance pinned to the manager node.
Raft cluster migration is handled separately in `07-vault-raft-plan.md`.
```yaml
# Vault placement stays as:
placement:
constraints:
- node.role == manager
```
## Step 5 — Increase APISIX replicas for prod
```yaml
# CHANGE in apisix service deploy block:
mode: replicated
replicas: 2 # was 1
```
APISIX is stateless (config in etcd) — multiple replicas are safe.
Swarm load-balances SWAG's requests across APISIX replicas via VIP.
## Step 6 — etcd: 3-node cluster for prod
For prod, etcd should run as a 3-node cluster (minimum for Raft quorum).
The current single-instance etcd definition needs to be replaced with a 3-node
StatefulSet-style setup using separate service definitions or a dedicated
`docker-stack-etcd.yml`.
> **Scope note:** etcd clustering for prod is complex and out of scope for initial launch.
> Deploy with single etcd for initial prod launch. Add etcd clustering as a follow-up task.
> Track in: `Technical Debt/TODO.md`
## Step 7 — Verify the complete file
After all edits, validate the YAML:
```bash
docker stack config -c docker-stack-infra.yml > /dev/null && echo "✅ YAML valid"
```
No output errors = valid.
## Placement summary for prod
| Service | Placement |
|---------|-----------|
| swag | `node.role == manager` |
| cert-reloader | `node.role == manager` |
| vault | `node.role == manager` |
| apisix (2 replicas) | no constraint (any node) |
| apisix-dashboard | no constraint |
| postgres | `node.labels.type == db` |
| mongo | `node.labels.type == db` |
| redis | `node.role == manager` |
| rabbitmq | `node.role == manager` |
| etcd | `node.role == manager` |
| prometheus | `node.role == manager` |
| grafana | `node.role == manager` |

View File

@ -0,0 +1,71 @@
# 04 — SWAG Nginx Proxy Configs (Prod)
## Context
Same template files as test (`swag/proxy-confs/*.conf.tpl`), different env vars.
The pipeline processes templates with prod-specific subdomain values.
## Required env vars (in `.env` on storagebox `prod/secrets/iklim.co/.env.prod`)
```bash
API_SUBDOMAIN=api.iklim.co
APIGW_SUBDOMAIN=apigw.iklim.co
RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co
GRAFANA_SUBDOMAIN=grafana.iklim.co
RESTRICTED_IP_1=78.187.87.109
RESTRICTED_IP_2=95.70.151.248
```
## Template files (already created in test step 04)
- `swag/site-confs/default.conf`
- `swag/proxy-confs/api.conf.tpl`
- `swag/proxy-confs/apigw.conf.tpl`
- `swag/proxy-confs/rabbitmq.conf.tpl`
- `swag/proxy-confs/grafana.conf.tpl`
No new files to create — the same templates work for both environments.
## Deploy step (handled by pipeline — see `08-deploy-pipeline-update.md`)
```bash
set -a; . ./.env; set +a
export RESTRICTED_IP_1="78.187.87.109"
export RESTRICTED_IP_2="95.70.151.248"
sudo mkdir -p /opt/iklimco/swag/proxy-confs /opt/iklimco/swag/site-confs
for tpl in swag/proxy-confs/*.conf.tpl; do
out="/opt/iklimco/swag/proxy-confs/$(basename "${tpl%.tpl}")"
envsubst < "$tpl" | sudo tee "$out" > /dev/null
echo "✅ $out"
done
sudo cp swag/site-confs/default.conf /opt/iklimco/swag/site-confs/default.conf
```
With `API_SUBDOMAIN=api.iklim.co`, the output file `/opt/iklimco/swag/proxy-confs/api.conf`
will contain `server_name api.iklim.co;` — correct for prod.
## Verification
After deploy, on service-1:
```bash
cat /opt/iklimco/swag/proxy-confs/api.conf | grep server_name
```
Expected: `server_name api.iklim.co;`
```bash
docker exec $(docker ps -q -f name=iklimco_swag) nginx -t
```
Expected: `syntax is ok`
```bash
curl -si https://api.iklim.co/health
```
Expected: APISIX response with valid `*.iklim.co` cert.
## Notes
- `Prometheus` is intentionally NOT exposed via SWAG. Access it via Grafana
(internal connection: `http://prometheus:9090`) or SSH tunnel.
- If additional restricted-access subdomains are needed in the future, create a new
`swag/proxy-confs/<name>.conf.tpl` following the same pattern.

View File

@ -0,0 +1,37 @@
# 05 — APISIX: Remove SSL / Configure Trusted Proxy (Prod)
## Context
Identical to `test-env-setup/05-apisix-remove-ssl.md`.
The same `init/apisix-core/init.sh` and custom APISIX image are used for both environments.
Changes made for test already apply to prod.
## Checklist
- [ ] `ssls/1` PUT block removed from `init/apisix-core/init.sh`
- [ ] `dev` SSL block removed or confirmed non-impactful for prod
- [ ] Custom APISIX image (`custom-apisix:3.12.0`) config.yaml contains `real_ip_header`
and `set_real_ip_from` for overlay CIDR (`10.0.0.0/8`)
- [ ] New image built and pushed to Harbor if config.yaml was changed:
```bash
docker build -t registry.tarla.io/iklimco/custom-apisix:3.12.0 .
docker push registry.tarla.io/iklimco/custom-apisix:3.12.0
```
## Prod-specific note
APISIX runs with `replicas: 2` in prod. Both replicas receive the same configuration
from etcd — no additional steps needed beyond the single init run.
The `init/apisix-core/init.sh` is called once (from the pipeline) and configures the
shared etcd state that all APISIX instances read from.
## Verification
```bash
# From a whitelisted IP, make a request and check real IP in APISIX logs
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
tail -5 /usr/local/apisix/logs/access.log
```
Client IP should appear in the log, not SWAG's internal overlay IP.

View File

@ -0,0 +1,57 @@
# 06 — cert-reloader Sidecar Service (Prod)
## Context
Same service definition as test (see `test-env-setup/06-cert-reloader.md`).
Prod-specific consideration: Vault is single-instance on the manager node (same as SWAG),
so the cert copy to `/opt/iklimco/ssl/` works without cross-node distribution.
When Vault is expanded to a 3-node Raft cluster (see `07-vault-raft-plan.md`), the
cert-reloader must be updated to distribute the cert to the other Vault nodes.
## Current behavior (single-Vault prod)
```
SWAG (manager) renews cert → swag-vl
cert-reloader (manager) detects change → copies to /opt/iklimco/ssl/ → reloads Vault
Vault (manager) reads /opt/iklimco/ssl/ → serves new cert
```
No cross-node distribution needed.
## Future behavior (3-node Vault Raft — see step 07)
When Vault runs on service-1, service-2, service-3:
```
cert-reloader detects cert change
→ copies cert to /opt/iklimco/ssl/ on service-1 (local)
→ SSH copy to service-2:/opt/iklimco/ssl/
→ SSH copy to service-3:/opt/iklimco/ssl/
→ docker service update --force iklimco_vault (restarts all 3 replicas)
```
This requires:
- An SSH key that cert-reloader can use to reach service-2 and service-3
- That key mounted as a Docker secret into cert-reloader
- Known_hosts for service-2 and service-3 pre-configured
Script update for this phase is tracked in `07-vault-raft-plan.md`.
## Verification
```bash
docker service ps iklimco_cert-reloader
docker service logs iklimco_cert-reloader --tail 20
```
Expected: `[cert-reloader] started`, no error lines.
Confirm Vault cert is current after SWAG renewal:
```bash
# Check cert expiry on Vault's TLS endpoint from inside the overlay
docker exec $(docker ps -q -f name=iklimco_vault) \
sh -c 'echo | openssl s_client -connect vault.iklim.co:8200 2>/dev/null \
| openssl x509 -noout -dates'
```
`notAfter` should match the cert in `/opt/iklimco/ssl/STAR.iklim.co.full.crt`.

View File

@ -0,0 +1,105 @@
# 07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod)
## Context
Vault starts as a single instance on the manager node (service-1) for the initial prod launch.
This matches the current `docker-stack-infra.yml` configuration (file storage, single replica).
Raft HA cluster is planned for a later phase.
## Phase 1 — Initial prod launch (current)
- **Replicas:** 1
- **Storage:** file (`/vault/file`) on service-1
- **Placement:** `node.role == manager` (service-1)
- **Cert:** from `/opt/iklimco/ssl/` (populated by cert-reloader from SWAG volume)
- **TLS:** `VAULT_LOCAL_CONFIG` unchanged — `api_addr: https://vault.iklim.co:8200`
No changes to `docker-stack-infra.yml` vault service for Phase 1.
## Phase 2 — Vault Raft Cluster (future)
### What changes
- **Replicas:** 3 (one per service node)
- **Storage:** Raft integrated (replaces file storage)
- **Placement:** `node.labels.type == service` (all 3 service nodes)
- **Cert distribution:** cert-reloader SSH-copies renewed cert to service-2, service-3
### Prerequisites before migration
- [ ] All 3 service nodes are running and labeled `type=service`
- [ ] Vault data backed up from Phase 1 (snapshot via `vault operator raft snapshot save`)
- [ ] SSH key created for cert-reloader to reach service-2 and service-3
- [ ] SSH key stored as Docker secret `cert_reloader_ssh_key`
- [ ] `/opt/iklimco/ssl/` directory exists on service-2 and service-3
- [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes)
### Vault service update for Raft
```yaml
vault:
# ... (image, secrets, healthcheck unchanged)
environment:
VAULT_LOCAL_CONFIG: >-
{"api_addr":"https://vault.iklim.co:8200",
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
"listener":[{"tcp":{"address":"0.0.0.0:8200",
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
"tls_key_file":"/vault/certs/STAR.iklim.co_key.txt"}}],
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file # host path per node
- /opt/iklimco/ssl:/vault/certs:ro
deploy:
mode: replicated
replicas: 3
placement:
constraints:
- node.labels.type == service
```
> `{{ .Node.Hostname }}` is Docker Swarm's Go template for the node hostname —
> gives each Vault instance a unique `node_id`.
### Raft join procedure (after deploying 3-replica Vault)
Only the leader needs to be bootstrapped; others join via `vault operator raft join`:
```bash
# On the primary Vault (service-1 container):
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
# Unseal if needed
docker exec -it "$VAULT_CTR" vault operator unseal
# Check Raft peers
docker exec "$VAULT_CTR" vault operator raft list-peers
```
On service-2 and service-3 containers:
```bash
docker exec -it <vault-on-service-2> vault operator raft join \
https://vault.iklim.co:8200
```
### cert-reloader update for Raft
Update the cert-reloader command in `docker-stack-infra.yml` to SSH-copy the cert
to service-2 and service-3 after renewal:
```bash
# After copying to local /opt/iklimco/ssl/:
ssh -i /run/secrets/cert_reloader_ssh_key service-2 \
"cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
# (repeat for service-3 and privkey)
docker service update --force iklimco_vault
```
Add Docker secret to cert-reloader:
```yaml
secrets:
- cert_reloader_ssh_key
```
## Reference
- Vault Raft storage docs: https://developer.hashicorp.com/vault/docs/configuration/storage/raft
- Vault Swarm setup: https://manjit28.medium.com/setting-up-a-secure-and-highly-available-hashicorp-vault-cluster-for-secrets-and-certificates-0ce01a370582

View File

@ -0,0 +1,130 @@
# 08 — Deploy Pipeline Update (Prod)
## Context
- **File:** `.gitea/workflows/deploy-prod.yml`
- Same changes as test pipeline (`test-env-setup/07-deploy-pipeline-update.md`),
adapted for prod paths and prod runner.
## Step 1 — Remove manual cert scp lines from `Initialize Servers`
```yaml
# DELETE from "Initialize Servers" step:
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co.full.crt ./STAR.iklim.co.full.crt
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co_key.txt ./STAR.iklim.co_key.txt
```
Also remove from `Prepare Init Files`:
```yaml
# DELETE or make conditional:
sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.txt /opt/iklimco/ssl/
```
## Step 2 — Add `Prepare SWAG Directories` step
Insert **before** `Deploy Swarm Stack`:
```yaml
- name: Prepare SWAG Directories
run: |
set -a; . ./.env; . ./.env.secrets.shared; set +a
sudo mkdir -p /opt/iklimco/swag/dns-conf
envsubst < swag/dns-conf/godaddy.ini.tpl | sudo tee /opt/iklimco/swag/dns-conf/godaddy.ini > /dev/null
sudo chmod 600 /opt/iklimco/swag/dns-conf/godaddy.ini
echo "✅ godaddy.ini written"
sudo mkdir -p /opt/iklimco/swag/proxy-confs /opt/iklimco/swag/site-confs
export RESTRICTED_IP_1="78.187.87.109"
export RESTRICTED_IP_2="95.70.151.248"
for tpl in swag/proxy-confs/*.conf.tpl; do
out="/opt/iklimco/swag/proxy-confs/$(basename "${tpl%.tpl}")"
envsubst < "$tpl" | sudo tee "$out" > /dev/null
echo "✅ $out"
done
sudo cp swag/site-confs/default.conf /opt/iklimco/swag/site-confs/default.conf
echo "✅ SWAG directories ready"
working-directory: /workspace/iklim.co
```
> `.env` is sourced first so `API_SUBDOMAIN=api.iklim.co` (prod values) are used.
> Ensure these vars are in `prod/secrets/iklim.co/.env.prod` on storagebox.
## Step 3 — Add `Bootstrap SWAG Certificate` step
Insert **after** `Deploy Swarm Stack`:
```yaml
- name: Bootstrap SWAG Certificate
run: |
echo "Waiting for SWAG container to start..."
SWAG_CTR=""
for i in $(seq 1 24); do
SWAG_CTR=$(docker ps -q -f name=iklimco_swag 2>/dev/null | head -1)
[ -n "$SWAG_CTR" ] && break
sleep 10
done
if [ -z "$SWAG_CTR" ]; then
echo "❌ SWAG container did not start"
exit 1
fi
CERT_PATH="/config/etc/letsencrypt/live/iklim.co/fullchain.pem"
echo "Waiting for cert (up to 10 min)..."
for i in $(seq 1 20); do
if docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "✅ Cert obtained"
break
fi
echo " attempt $i/20 — waiting 30s..."
sleep 30
done
if ! docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "❌ SWAG did not obtain cert. Logs:"
docker service logs iklimco_swag --tail 50
exit 1
fi
sudo mkdir -p /opt/iklimco/ssl
docker exec "$SWAG_CTR" cat "$CERT_PATH" | \
sudo tee /opt/iklimco/ssl/STAR.iklim.co.full.crt > /dev/null
docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \
sudo tee /opt/iklimco/ssl/STAR.iklim.co_key.txt > /dev/null
echo "✅ Cert bootstrapped to /opt/iklimco/ssl/"
working-directory: /workspace/iklim.co
```
## Step 4 — Ensure subdomain env vars are in prod `.env`
Add to `prod/secrets/iklim.co/.env.prod` on storagebox:
```bash
API_SUBDOMAIN=api.iklim.co
APIGW_SUBDOMAIN=apigw.iklim.co
RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co
GRAFANA_SUBDOMAIN=grafana.iklim.co
```
## Step 5 — Final step order for prod pipeline
1. Checkout Branch
2. Prepare Folders
3. Set up SSH Key
4. Install Required Tools
5. Fetch Service Secret Files
6. Initialize Servers ← cert scp lines removed
7. Upload Updated Secrets to Storagebox
8. Provision Vault AppRole IDs and Docker Secrets
9. Upload Updated Env to Storagebox
10. Prepare Init Files ← cert copy lines removed
11. Initialize Docker Swarm
12. Stop Docker Compose Services
13. Docker Login to Harbor
14. **Prepare SWAG Directories** ← NEW
15. Deploy Swarm Stack
16. **Bootstrap SWAG Certificate** ← NEW
17. Review Environment

View File

@ -0,0 +1,120 @@
# 09 — Verification Checklist (Prod)
## Context
Run after a successful prod pipeline deployment.
## 1 — Swarm cluster health
```bash
docker node ls
```
Expected: 3 managers (`Leader` + 2 `Reachable`), 3 workers (`Ready`).
```bash
docker service ls --filter label=project=co.iklim
```
All services show `REPLICAS X/X` (target met).
## 2 — SWAG cert is valid
```bash
docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates
```
Expected: `*.iklim.co`, `VALID: XX days` (Let's Encrypt, not the old manual cert).
TLS check from outside:
```bash
echo | openssl s_client -connect api.iklim.co:443 -servername api.iklim.co 2>/dev/null \
| openssl x509 -noout -subject -dates
```
Expected: `CN=*.iklim.co`, `notAfter` > 2026-07-15 (cert is Let's Encrypt, not expiring old one).
## 3 — Public API
```bash
curl -si https://api.iklim.co/health
```
HTTP 2xx, no TLS errors.
## 4 — IP restriction working
From a non-whitelisted IP:
```bash
curl -si https://grafana.iklim.co
curl -si https://apigw.iklim.co
curl -si https://rabbitmq.iklim.co
```
All expected: HTTP 403.
From whitelisted IP (78.187.87.109 or 95.70.151.248):
```bash
curl -si https://grafana.iklim.co # HTTP 200 Grafana
curl -si https://apigw.iklim.co # HTTP 200 APISIX Dashboard
curl -si https://rabbitmq.iklim.co # HTTP 200 RabbitMQ Management
```
## 5 — Vault not reachable externally
```bash
# From outside — must fail
curl -sk --connect-timeout 5 https://<service-1-public-ip>:8200/v1/sys/health
# Expected: connection refused or timeout
```
```bash
# From inside overlay — must succeed
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
curl -sk https://vault.iklim.co:8200/v1/sys/health
# Expected: {"sealed":false,...}
```
## 6 — cert-reloader watching
```bash
docker service logs iklimco_cert-reloader --tail 5
```
Expected: `[cert-reloader] started`, no errors.
## 7 — No unexpected published ports
```bash
docker service ls --format "{{.Name}}\t{{.Ports}}" \
--filter label=project=co.iklim
```
Only `iklimco_swag` should show `*:80->80/tcp, *:443->443/tcp`.
## 8 — DB nodes running correct services
```bash
docker service ps iklimco_postgres
docker service ps iklimco_mongo
```
Tasks should show node names matching `db-1`, `db-2`, or `db-3`.
## 9 — APISIX replicas
```bash
docker service ps iklimco_apisix
```
Expected: 2 tasks, both `Running`, on different nodes.
## 10 — fail2ban active
```bash
docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status
```
Expected: multiple jails listed.
## 11 — Microservice health (post-deploy)
After microservices are deployed (separate pipeline), verify via the public API:
```bash
curl -si https://api.iklim.co/v1/weather/current?lat=39&lon=35
```
Expected: valid JSON weather response.
## ⚠️ Old cert expiry reminder
The manually managed `*.iklim.co` cert expires **2026-07-15**.
SWAG's Let's Encrypt cert auto-renews every ~60 days.
After first SWAG cert is confirmed valid, the manual cert in storagebox can be archived
and is no longer used.

View File

@ -0,0 +1,74 @@
# 01 — Docker Swarm Init (Test)
## Context
- **Repo:** `iklim.co` root
- **Environment:** test
- **Server:** single node — same machine is both Swarm manager and worker
- Pipeline trigger: push to `test-env` branch → Gitea runner executes directly on the test server
- `init/swarm-init.sh` already exists in the repo and is called by the pipeline
## Prerequisites
- Docker Engine installed on test server
- User running the pipeline has Docker access (group `docker` or root)
## Step 1 — Verify / update `init/swarm-init.sh`
Check that the script handles idempotent init:
```bash
grep -n "swarm init" init/swarm-init.sh
```
The script must contain logic similar to:
```bash
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
docker swarm init --advertise-addr $(hostname -I | awk '{print $1}')
echo "✅ Swarm initialized"
else
echo " Swarm already active, skipping init"
fi
```
If this guard is missing, add it. Without it, the step fails on second deploy.
## Step 2 — Run via pipeline
The pipeline step `Initialize Docker Swarm` in `.gitea/workflows/deploy-test.yml` already calls:
```bash
/bin/bash init/swarm-init.sh
```
No manual action needed after the script is correct.
## Step 3 — Apply node label
The `type=service` label is required for placement constraints in `docker-stack-infra.yml`.
Run once after Swarm init (Ansible handles this in automated setup):
```bash
docker node update --label-add type=service $(docker node ls -q)
```
## Step 4 — Verify
SSH into the test server and run:
```bash
docker node ls
```
Expected: one node, `STATUS=Ready`, `AVAILABILITY=Active`, `MANAGER STATUS=Leader`.
```bash
docker node inspect self --format '{{.Spec.Labels}}'
```
Expected: `map[type:service]`.
## Notes
- Single-node Swarm: node is simultaneously manager and worker (`AVAILABILITY=Active`, not drained).
- Placement constraints `node.role == manager` and `node.labels.type == service` both resolve to this machine.
- No worker-join or manager-join steps needed for test.
- Docker Swarm overlay network `iklimco-net` is created automatically on first `docker stack deploy`.

View File

@ -0,0 +1,73 @@
# 02 — GoDaddy DNS Credentials for SWAG (Test)
## Context
SWAG uses certbot with `certbot-dns-godaddy` plugin to obtain and auto-renew the
`*.iklim.co` wildcard certificate via DNS-01 challenge.
GoDaddy API credentials must be available at deploy time.
## ⚠️ Security — Rotate credentials before use
If credentials were shared in any chat log, Slack message, or email, **revoke them immediately**:
1. Go to: https://developer.godaddy.com/keys
2. Revoke the exposed key
3. Create a new Production key pair
4. Use the new Key + Secret everywhere below
**Never commit credentials to the repository.**
## Step 1 — Add credentials to storagebox `.env.secrets.swag`
Open (or create) the file at storagebox path:
```
test/secrets/iklim.co/.env.secrets.swag
```
Add:
```bash
GODADDY_KEY=<your-new-api-key>
GODADDY_SECRET=<your-new-api-secret>
```
These are fetched by the deploy pipeline's `Fetch Service Secret Files` step and sourced into the environment before further steps run.
## Step 2 — Template file in the repo
`swag/dns-conf/godaddy.ini.tpl` already exists in the repository root:
```ini
dns_godaddy_key = ${GODADDY_KEY}
dns_godaddy_secret = ${GODADDY_SECRET}
```
This template is processed at deploy time (Step 07) with `envsubst`.
## Step 3 — (Handled by pipeline) Write the actual credentials file on the host
The deploy pipeline (see `07-deploy-pipeline-update.md`) runs:
```bash
mkdir -p /opt/iklimco/swag/dns-conf
envsubst < swag/dns-conf/godaddy.ini.tpl > /opt/iklimco/swag/dns-conf/godaddy.ini
chmod 600 /opt/iklimco/swag/dns-conf/godaddy.ini
```
`GODADDY_KEY` and `GODADDY_SECRET` are already in the environment (sourced from `.env.secrets.swag`).
The file is bind-mounted into the SWAG container at `/config/dns-conf/godaddy.ini` (read-only).
## Step 4 — Verify (after SWAG is deployed)
Inside the SWAG container:
```bash
docker exec $(docker ps -q -f name=iklimco_swag) cat /config/dns-conf/godaddy.ini
```
Expected output: file with real key/secret values, not `${...}` placeholders.
## Notes
- `DNSPROPAGATION=90` is configured in SWAG's environment — GoDaddy DNS changes can take up to 90s.
- SWAG stores the obtained cert at `/config/etc/letsencrypt/live/iklim.co/` inside the container
(persisted in the `swag-vl` Docker named volume).
- cert-reloader service watches this volume and copies renewed certs to `/opt/iklimco/ssl/`
for Vault (see `06-cert-reloader.md`).

View File

@ -0,0 +1,241 @@
# 03 — docker-stack-infra.yml Changes (Test)
## Context
- **File:** `docker-stack-infra.yml` (repo root)
- **Goal:** Add SWAG as TLS-terminating reverse proxy; remove all published ports from internal
services (they become reachable only via SWAG through the `iklimco-net` overlay network);
remove Vault's external port entirely.
## Changes Summary
| Service | Before | After |
|---------|--------|-------|
| **swag** | does not exist | add: ports 80+443, manager-pinned |
| **cert-reloader** | does not exist | add: manager-pinned, Docker socket |
| **vault** | publishes 8200 | no published port |
| **apisix** | publishes 8080, 8443, 9180 | no published ports |
| **rabbitmq** | publishes 5672, 15672, 61613, 15674 | no published ports |
| **prometheus** | publishes 9090 | no published port |
| **grafana** | publishes 3000 | no published port |
| **apisix-dashboard** | publishes 9000 | no published port |
> **RabbitMQ STOMP note:** Ports 61613 (STOMP) and 15674 (WebSocket STOMP) are removed because
> APISIX already proxies WebSocket STOMP to RabbitMQ via the overlay network. Verify that
> APISIX has a stream/WebSocket route for STOMP before removing these if external clients
> connect to STOMP directly (not via APISIX).
## Step 1 — Add `swag` service
Add after the `apisix-dashboard` service block:
```yaml
swag:
image: lscr.io/linuxserver/swag:latest
cap_add:
- NET_ADMIN
environment:
- PUID=1000
- PGID=1000
- TZ=Europe/Istanbul
- URL=iklim.co
- SUBDOMAINS=wildcard
- VALIDATION=dns
- DNSPLUGIN=godaddy
- ONLY_SUBDOMAINS=false
- EMAIL=muratozdemir@tarla.io
- DNSPROPAGATION=90
volumes:
- swag-vl:/config
- /opt/iklimco/swag/dns-conf:/config/dns-conf:ro
- /opt/iklimco/swag/proxy-confs:/config/nginx/proxy-confs:ro
- /opt/iklimco/swag/site-confs:/config/nginx/site-confs:ro
ports:
- target: 80
published: 80
protocol: tcp
mode: host
- target: 443
published: 443
protocol: tcp
mode: host
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
delay: 5s
labels:
project: co.iklim
```
## Step 2 — Add `cert-reloader` service
Add after the `swag` service block:
```yaml
cert-reloader:
image: docker:27-cli
volumes:
- swag-vl:/swag-config:ro
- /opt/iklimco/ssl:/host-ssl
- /var/run/docker.sock:/var/run/docker.sock
entrypoint: ["/bin/sh", "-c"]
command:
- |
CERT_DIR=/swag-config/etc/letsencrypt/live/iklim.co
HOST_DIR=/host-ssl
LAST_HASH=""
echo "[cert-reloader] started"
while true; do
sleep 3600
if [ -f "$$CERT_DIR/fullchain.pem" ]; then
CURR=$$(md5sum "$$CERT_DIR/fullchain.pem" | cut -d' ' -f1)
if [ "$$CURR" != "$$LAST_HASH" ]; then
echo "[cert-reloader] cert changed — copying and reloading Vault"
cp "$$CERT_DIR/fullchain.pem" "$$HOST_DIR/STAR.iklim.co.full.crt"
cp "$$CERT_DIR/privkey.pem" "$$HOST_DIR/STAR.iklim.co_key.txt"
docker service update --force iklimco_vault
LAST_HASH="$$CURR"
echo "[cert-reloader] done"
fi
fi
done
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
restart_policy:
condition: on-failure
delay: 10s
labels:
project: co.iklim
```
> `$$` is required in Docker Swarm YAML to escape `$` and prevent host-side variable expansion.
## Step 3 — Remove `vault` published port
Find the `vault` service `ports:` block and **delete it entirely**:
```yaml
# DELETE this entire block from vault service:
ports:
- target: 8200
published: 8200
protocol: tcp
mode: host
```
Vault remains reachable within `iklimco-net` via the overlay alias `vault.iklim.co:8200`.
The `VAULT_LOCAL_CONFIG` `api_addr` and `networks.default.aliases` entries stay unchanged.
## Step 4 — Remove `apisix` published ports
Find the `apisix` service `ports:` block and **delete it entirely**:
```yaml
# DELETE this entire block from apisix service:
ports:
- target: 9080
published: 8080
protocol: tcp
mode: host
- target: 9443
published: 8443
protocol: tcp
mode: host
- target: 9180
published: 9180
protocol: tcp
mode: host
```
APISIX admin API (9180) access: use `docker exec` or SSH tunnel.
APISIX is reachable from SWAG via `http://apisix:9080` on the overlay network.
## Step 5 — Remove `apisix-dashboard` published port
```yaml
# DELETE from apisix-dashboard:
ports:
- target: 9000
published: 9000
protocol: tcp
mode: host
```
## Step 6 — Remove `rabbitmq` published ports
```yaml
# DELETE from rabbitmq:
ports:
- target: 5672
published: 5672
protocol: tcp
mode: host
- target: 15672
published: 15672
protocol: tcp
mode: host
- target: 61613
published: 61613
protocol: tcp
mode: host
- target: 15674
published: 15674
protocol: tcp
mode: host
```
## Step 7 — Remove `prometheus` published port
```yaml
# DELETE from prometheus:
ports:
- target: 9090
published: 9090
protocol: tcp
mode: host
```
## Step 8 — Remove `grafana` published port
```yaml
# DELETE from grafana:
ports:
- target: 3000
published: 3000
protocol: tcp
mode: host
```
## Step 9 — Add `swag-vl` volume
In the `volumes:` section at the bottom of the file, add:
```yaml
swag-vl:
labels:
project: co.iklim
```
## Verification
After deploy:
```bash
docker service ls --filter label=project=co.iklim
```
Confirm `iklimco_swag` and `iklimco_cert-reloader` appear in the list.
```bash
docker service ps iklimco_swag
docker service ps iklimco_cert-reloader
```
Both should show `Running`.

View File

@ -0,0 +1,193 @@
# 04 — SWAG Nginx Proxy Configs (Test)
## Context
SWAG reads nginx configs from bind-mounted directories:
- `/config/nginx/proxy-confs/``swag/proxy-confs/` in repo, deployed to `/opt/iklimco/swag/proxy-confs/`
- `/config/nginx/site-confs/``swag/site-confs/` in repo, deployed to `/opt/iklimco/swag/site-confs/`
Templates use `${VAR}` placeholders processed with `envsubst` at deploy time.
## Required env vars (in `.env` on storagebox `test/secrets/iklim.co/.env`)
```bash
API_SUBDOMAIN=api-test.iklim.co
APIGW_SUBDOMAIN=apigw-test.iklim.co
RABBITMQ_SUBDOMAIN=rabbitmq-test.iklim.co
GRAFANA_SUBDOMAIN=grafana-test.iklim.co
RESTRICTED_IP_1=78.187.87.109
RESTRICTED_IP_2=95.70.151.248
```
## Files to create
### `swag/site-confs/default.conf`
Default catch-all: HTTP→HTTPS redirect + 444 for unknown HTTPS hosts.
```nginx
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name _;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2 default_server;
listen [::]:443 ssl http2 default_server;
server_name _;
include /config/nginx/ssl.conf;
return 444;
}
```
### `swag/proxy-confs/api.conf.tpl`
Public API gateway — no IP restriction.
```nginx
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name ${API_SUBDOMAIN};
include /config/nginx/ssl.conf;
include /config/nginx/resolver.conf;
client_max_body_size 50m;
location / {
include /config/nginx/proxy.conf;
include /config/nginx/resolver.conf;
set $upstream_app apisix;
set $upstream_port 9080;
set $upstream_proto http;
proxy_pass $upstream_proto://$upstream_app:$upstream_port;
}
}
```
### `swag/proxy-confs/apigw.conf.tpl`
APISIX Dashboard — IP restricted.
```nginx
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name ${APIGW_SUBDOMAIN};
include /config/nginx/ssl.conf;
include /config/nginx/resolver.conf;
client_max_body_size 0;
location / {
allow ${RESTRICTED_IP_1};
allow ${RESTRICTED_IP_2};
deny all;
include /config/nginx/proxy.conf;
include /config/nginx/resolver.conf;
set $upstream_app apisix-dashboard;
set $upstream_port 9000;
set $upstream_proto http;
proxy_pass $upstream_proto://$upstream_app:$upstream_port;
}
}
```
### `swag/proxy-confs/rabbitmq.conf.tpl`
RabbitMQ Management UI — IP restricted.
```nginx
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name ${RABBITMQ_SUBDOMAIN};
include /config/nginx/ssl.conf;
include /config/nginx/resolver.conf;
client_max_body_size 0;
location / {
allow ${RESTRICTED_IP_1};
allow ${RESTRICTED_IP_2};
deny all;
include /config/nginx/proxy.conf;
include /config/nginx/resolver.conf;
set $upstream_app rabbitmq;
set $upstream_port 15672;
set $upstream_proto http;
proxy_pass $upstream_proto://$upstream_app:$upstream_port;
}
}
```
### `swag/proxy-confs/grafana.conf.tpl`
Grafana — IP restricted.
```nginx
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name ${GRAFANA_SUBDOMAIN};
include /config/nginx/ssl.conf;
include /config/nginx/resolver.conf;
client_max_body_size 0;
location / {
allow ${RESTRICTED_IP_1};
allow ${RESTRICTED_IP_2};
deny all;
include /config/nginx/proxy.conf;
include /config/nginx/resolver.conf;
set $upstream_app grafana;
set $upstream_port 3000;
set $upstream_proto http;
proxy_pass $upstream_proto://$upstream_app:$upstream_port;
}
}
```
## Deploy step (handled by pipeline — see `07-deploy-pipeline-update.md`)
```bash
# Process templates and write to host
mkdir -p /opt/iklimco/swag/proxy-confs /opt/iklimco/swag/site-confs
set -a; . ./.env; set +a
export RESTRICTED_IP_1="78.187.87.109"
export RESTRICTED_IP_2="95.70.151.248"
for tpl in swag/proxy-confs/*.conf.tpl; do
out="/opt/iklimco/swag/proxy-confs/$(basename "${tpl%.tpl}")"
envsubst < "$tpl" > "$out"
echo "✅ $out"
done
cp swag/site-confs/default.conf /opt/iklimco/swag/site-confs/default.conf
```
## Verification
After deploy, check SWAG nginx config is valid:
```bash
docker exec $(docker ps -q -f name=iklimco_swag) nginx -t
```
Check subdomains resolve (from outside the server):
```bash
curl -sk https://api-test.iklim.co/health # expects APISIX response
curl -sk https://grafana-test.iklim.co # expects 403 Forbidden (wrong IP)
```
## Notes
- `include /config/nginx/resolver.conf` enables dynamic upstream resolution via Docker DNS —
required for overlay service names like `apisix`, `grafana`, etc.
- SWAG's `proxy.conf` already sets `X-Real-IP`, `X-Forwarded-For`, `X-Forwarded-Proto` and
WebSocket upgrade headers. No manual addition needed.
- `*.iklim.co` cert covers both `api.iklim.co` and `api-test.iklim.co` subdomains —
both test and prod servers can independently obtain and use it.

View File

@ -0,0 +1,86 @@
# 05 — APISIX: Remove SSL / Configure Trusted Proxy (Test)
## Context
- **File:** `init/apisix-core/init.sh`
- SWAG now terminates TLS. APISIX receives plain HTTP from SWAG via the overlay network.
- The `ssls/1` cert upload is no longer needed.
- APISIX must trust SWAG's `X-Real-IP` header to see real client IPs (for rate limiting, fail2ban).
## Step 1 — Remove the SSL cert upload block from `init/apisix-core/init.sh`
Locate and **delete** this entire block:
```bash
# DELETE THIS BLOCK:
if [[ "$PROFILE" == "test" || "$PROFILE" == "prod" ]]; then
if [[ -f "STAR.iklim.co.full.crt" && -f "STAR.iklim.co_key.txt" ]]; then
call_api "ssl iklim.co" -X PUT "$APISIX_ADMIN_URL/ssls/1" \
-H "X-API-KEY: $API_KEY" -H "Content-Type: application/json" \
-d '{"cert":"'"$(cat STAR.iklim.co.full.crt)"'","key":"'"$(cat STAR.iklim.co_key.txt)"'","snis":["*.iklim.co"]}'
else
echo "iklim.co ssl certificates not found!"
fi
fi
```
Also delete the `dev` SSL block if it only serves the `ssls/1` endpoint:
```bash
# DELETE THIS BLOCK (if only used for cert upload):
if [[ "$PROFILE" == "dev" ]]; then
if [[ -f "localhost.crt" && -f "localhost.key" ]]; then
call_api "ssl dev" -X PUT "$APISIX_ADMIN_URL/ssls/1" \
-H "X-API-KEY: $API_KEY" -H "Content-Type: application/json" \
-d '{"cert":"'"$(cat localhost.crt)"'","key":"'"$(cat localhost.key)"'","snis":["localhost"]}'
else
echo "localhost ssl certificates not found!"
fi
fi
```
> If the `dev` block is still needed for local development, keep it but ensure it does not
> affect test/prod behavior.
## Step 2 — APISIX trusted proxy configuration (custom image)
APISIX's custom image (`registry.tarla.io/iklimco/custom-apisix:3.12.0`) includes a
`config.yaml`. That config must set real IP headers so APISIX sees real client IPs, not
SWAG's overlay IP.
Locate the APISIX `config.yaml` in the custom image build source and ensure it contains:
```yaml
nginx_config:
http:
real_ip_header: "X-Real-IP"
real_ip_recursive: "on"
set_real_ip_from:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
```
Docker Swarm overlay networks use `10.x.x.x` addressing. These CIDR ranges cover all
typical overlay subnet allocations.
If the custom image config does not have these, add them and rebuild+push the image to Harbor
before deploying.
## Step 3 — Remove APISIX TLS upstream configs (if any)
If any APISIX upstream in `init/apisix-core/init.sh` uses `scheme: https` pointing to
backend microservices, change to `scheme: http`. Backends are internal HTTP-only.
The `apisix:9443` HTTPS listener is gone; APISIX only listens on `9080` (HTTP).
## Verification
After deploy, confirm APISIX receives real client IPs:
```bash
# From a machine with known IP, make a request to api-test.iklim.co
# Then check APISIX access log
docker exec $(docker ps -q -f name=iklimco_apisix) \
tail -20 /usr/local/apisix/logs/access.log
```
The IP in the log should be the actual client IP, not SWAG's overlay IP (`10.x.x.x`).

View File

@ -0,0 +1,79 @@
# 06 — cert-reloader Sidecar Service (Test)
## Context
- **Purpose:** Watches SWAG's certificate volume for changes; copies renewed certs to
`/opt/iklimco/ssl/` on the host; forces Vault to reload its TLS cert.
- **Replaces:** `ops/vault-reload-after-swag-renewal.sh` (which was designed for manual use).
The sidecar automates this after every SWAG renewal.
- **Runs on:** manager node (same node as SWAG and Vault, ensuring volume + socket access).
## How it works
```
SWAG renews cert
→ writes new fullchain.pem to swag-vl:/config/etc/letsencrypt/live/iklim.co/
cert-reloader wakes every 3600s
→ detects MD5 change on fullchain.pem
→ copies fullchain.pem + privkey.pem to /opt/iklimco/ssl/ (host bind mount)
→ docker service update --force iklimco_vault
Vault restarts
→ reads new cert from /opt/iklimco/ssl/ (already mounted as /vault/certs)
```
## Step 1 — Service definition (already in `03-infra-stack-changes.md`)
The `cert-reloader` service is added to `docker-stack-infra.yml` as documented in step 03.
No separate action needed here beyond that file change.
## Step 2 — Ensure `/opt/iklimco/ssl/` exists on the host
The `Prepare Init Files` step in the pipeline already creates this directory and copies
the initial cert. The cert-reloader handles subsequent renewals.
On first deploy, the bootstrap cert (copied during pipeline init) is used until SWAG
obtains its first Let's Encrypt cert (see `07-deploy-pipeline-update.md`).
## Step 3 — Verify cert-reloader is running
```bash
docker service ps iklimco_cert-reloader
docker service logs iklimco_cert-reloader --tail 20
```
Expected log on startup:
```
[cert-reloader] started
```
## Step 4 — Trigger a manual test (optional, for verification)
Force a cert copy and Vault reload without waiting for renewal:
```bash
SWAG_VOL=$(docker volume inspect iklimco_swag-vl --format '{{.Mountpoint}}')
CERT="$SWAG_VOL/etc/letsencrypt/live/iklim.co/fullchain.pem"
if [ -f "$CERT" ]; then
cp "$CERT" /opt/iklimco/ssl/STAR.iklim.co.full.crt
KEYF="$SWAG_VOL/etc/letsencrypt/live/iklim.co/privkey.pem"
cp "$KEYF" /opt/iklimco/ssl/STAR.iklim.co_key.txt
docker service update --force iklimco_vault
echo "✅ Manual reload triggered"
else
echo "⚠️ Cert not yet obtained by SWAG"
fi
```
## Notes
- Docker socket (`/var/run/docker.sock`) is mounted into cert-reloader — this is intentional
and necessary. The service is pinned to manager and is minimal (`docker:27-cli` image).
- cert-reloader checks every 3600s (1 hour). Let's Encrypt certs renew every ~60 days;
the 1-hour check window is more than sufficient.
- If Vault restarts (due to cert reload), it may need to be **unsealed** automatically.
Vault's healthcheck in `docker-stack-infra.yml` already handles auto-unseal via the
`vault_unseal_key` Docker secret. Verify this works after a cert reload.
## Future — Multi-node Vault (prod)
When Vault runs as a 3-node Raft cluster on different physical machines,
cert-reloader must also SSH-copy the cert to the other nodes' `/opt/iklimco/ssl/`.
This is handled in `prod-env-setup/06-cert-reloader.md`.

View File

@ -0,0 +1,151 @@
# 07 — Deploy Pipeline Update (Test)
## Context
- **File:** `.gitea/workflows/deploy-test.yml`
- Changes:
1. Remove manual `scp STAR.iklim.co.full.crt` steps (SWAG now owns cert lifecycle).
2. Add SWAG host directories preparation (dns-conf, nginx proxy-confs).
3. Add cert bootstrap step: on first deploy, wait for SWAG to obtain cert, then copy
to `/opt/iklimco/ssl/` so Vault can start.
4. Ensure `GODADDY_KEY` and `GODADDY_SECRET` are available from `.env.secrets.swag`.
## Step 1 — Update `Initialize Servers` step
**Remove** the two `scp` lines that copy the TLS cert files:
```yaml
# DELETE these two lines from the "Initialize Servers" step:
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:test/app/iklim.co/ssl/STAR.iklim.co.full.crt ./STAR.iklim.co.full.crt
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:test/app/iklim.co/ssl/STAR.iklim.co_key.txt ./STAR.iklim.co_key.txt
```
Also remove any references to `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.txt` in
the `Prepare Init Files` step's `sudo cp` commands:
```yaml
# DELETE or make conditional:
sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.txt /opt/iklimco/ssl/ 2>/dev/null || true
```
## Step 2 — Add `Prepare SWAG Directories` step
Insert this step **before** `Deploy Swarm Stack`:
```yaml
- name: Prepare SWAG Directories
run: |
set -a; . ./.env; . ./.env.secrets.swag; set +a
# GoDaddy credentials file
sudo mkdir -p /opt/iklimco/swag/dns-conf
envsubst < swag/dns-conf/godaddy.ini.tpl | sudo tee /opt/iklimco/swag/dns-conf/godaddy.ini > /dev/null
sudo chmod 600 /opt/iklimco/swag/dns-conf/godaddy.ini
echo "✅ godaddy.ini written"
# Nginx proxy conf files
sudo mkdir -p /opt/iklimco/swag/proxy-confs /opt/iklimco/swag/site-confs
export RESTRICTED_IP_1="78.187.87.109"
export RESTRICTED_IP_2="95.70.151.248"
for tpl in swag/proxy-confs/*.conf.tpl; do
out="/opt/iklimco/swag/proxy-confs/$(basename "${tpl%.tpl}")"
envsubst < "$tpl" | sudo tee "$out" > /dev/null
echo "✅ $out"
done
sudo cp swag/site-confs/default.conf /opt/iklimco/swag/site-confs/default.conf
echo "✅ SWAG directories ready"
working-directory: /workspace/iklim.co
```
> `GODADDY_KEY` and `GODADDY_SECRET` must be present in `.env.secrets.swag` (see step 02).
> `API_SUBDOMAIN`, `APIGW_SUBDOMAIN`, etc. must be in `.env` (see step 04).
## Step 3 — Add `Bootstrap SWAG Certificate` step
Insert this step **after** `Deploy Swarm Stack` and **before** any step that depends on
Vault being accessible (e.g., `Provision Vault AppRole IDs`):
```yaml
- name: Bootstrap SWAG Certificate
run: |
echo "Waiting for SWAG container to start..."
SWAG_CTR=""
for i in $(seq 1 24); do
SWAG_CTR=$(docker ps -q -f name=iklimco_swag 2>/dev/null | head -1)
[ -n "$SWAG_CTR" ] && break
sleep 10
done
if [ -z "$SWAG_CTR" ]; then
echo "❌ SWAG container did not start in time"
exit 1
fi
CERT_PATH="/config/etc/letsencrypt/live/iklim.co/fullchain.pem"
echo "Waiting for SWAG to obtain Let's Encrypt cert (up to 10 min)..."
for i in $(seq 1 20); do
if docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "✅ Cert obtained by SWAG"
break
fi
echo " attempt $i/20 — waiting 30s..."
sleep 30
done
if ! docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "❌ SWAG did not obtain cert in time. Check logs:"
docker service logs iklimco_swag --tail 50
exit 1
fi
# Copy cert to host for Vault bootstrap
sudo mkdir -p /opt/iklimco/ssl
docker exec "$SWAG_CTR" cat "$CERT_PATH" | \
sudo tee /opt/iklimco/ssl/STAR.iklim.co.full.crt > /dev/null
docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \
sudo tee /opt/iklimco/ssl/STAR.iklim.co_key.txt > /dev/null
echo "✅ Cert bootstrapped to /opt/iklimco/ssl/"
working-directory: /workspace/iklim.co
```
> **First deploy only:** SWAG contacts Let's Encrypt via GoDaddy DNS challenge.
> This step waits up to 10 minutes. On subsequent deploys the cert is already in
> `swag-vl` (persisted volume) and SWAG starts immediately — wait loop exits fast.
## Step 4 — Re-order steps
Final step order in the pipeline:
1. Checkout Branch
2. Prepare Folders
3. Set up SSH Key
4. Update Apt / Install Tools
5. Fetch Service Secret Files
6. Initialize Servers
7. Upload Updated Secrets to Storagebox
8. Provision Vault AppRole IDs and Docker Secrets
9. Upload Updated Env to Storagebox
10. Prepare Init Files ← `sudo cp STAR.iklim.co.*.crt` lines removed
11. Initialize Docker Swarm
12. Stop Docker Compose Services
13. Docker Login to Harbor
14. **Prepare SWAG Directories** ← NEW
15. Deploy Swarm Stack
16. **Bootstrap SWAG Certificate** ← NEW
17. Review Environment
> Steps 8 (Provision Vault) runs before SWAG because it creates Docker secrets and
> AppRole IDs — Vault must be reachable for this. On re-deploys, Vault is already
> running with the previous cert. On first deploy, step 16 handles the cert wait before
> any further Vault interaction is needed post-deploy.
>
> If Vault provisioning (step 8) fails on first deploy because Vault has no cert yet,
> move step 16 before step 8. Adjust based on observed behavior.
## Notes
- `.env` must contain the subdomain env vars added in step 04. Add them to storagebox
`test/secrets/iklim.co/.env` before the first deploy.
- `RESTRICTED_IP_1` and `RESTRICTED_IP_2` are hardcoded in the pipeline step above.
Move to `.env` if they change often.

View File

@ -0,0 +1,125 @@
# 08 — Verification Checklist (Test)
## Context
Run these checks after a successful pipeline deployment to the test environment.
## 1 — Swarm services are up
```bash
docker service ls --filter label=project=co.iklim
```
All services should show `REPLICAS 1/1`.
```bash
docker service ps iklimco_swag
docker service ps iklimco_cert-reloader
docker service ps iklimco_vault
docker service ps iklimco_apisix
```
No tasks in `Failed` or `Rejected` state.
## 2 — SWAG obtained the cert
```bash
docker exec $(docker ps -q -f name=iklimco_swag) \
certbot certificates
```
Expected: certificate for `*.iklim.co`, `VALID: XX days`.
```bash
docker exec $(docker ps -q -f name=iklimco_swag) \
ls /config/etc/letsencrypt/live/iklim.co/
```
Expected: `fullchain.pem`, `privkey.pem`, `cert.pem`, `chain.pem`.
## 3 — Nginx config is valid
```bash
docker exec $(docker ps -q -f name=iklimco_swag) nginx -t
```
Expected: `syntax is ok` and `test is successful`.
## 4 — Public API endpoint
```bash
curl -si https://api-test.iklim.co/health
```
Expected: HTTP 2xx or APISIX response (not a cert error, not a 502).
TLS cert check:
```bash
echo | openssl s_client -connect api-test.iklim.co:443 -servername api-test.iklim.co 2>/dev/null \
| openssl x509 -noout -subject -dates
```
Expected: `subject=CN=*.iklim.co`, dates valid, `notAfter` > today.
## 5 — IP-restricted subdomains block non-whitelisted IPs
From a non-whitelisted IP:
```bash
curl -si https://grafana-test.iklim.co
```
Expected: HTTP 403.
From a whitelisted IP (78.187.87.109 or 95.70.151.248):
```bash
curl -si https://grafana-test.iklim.co
```
Expected: HTTP 200 (Grafana login page).
## 6 — Vault is reachable internally (not externally)
From outside the server:
```bash
curl -sk https://vault.iklim.co:8200/v1/sys/health
# or
curl -sk https://<server-public-ip>:8200/v1/sys/health
```
Expected: **connection refused** or **timeout** — Vault must not be reachable externally.
From inside the Swarm (exec into any service container):
```bash
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
curl -sk https://vault.iklim.co:8200/v1/sys/health
```
Expected: JSON response `{"sealed":false,...}`.
## 7 — cert-reloader is watching
```bash
docker service logs iklimco_cert-reloader --tail 10
```
Expected: `[cert-reloader] started` — no errors.
## 8 — Vault cert path is correct
```bash
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
docker exec "$VAULT_CTR" ls /vault/certs/
```
Expected: `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.txt`.
## 9 — fail2ban is active (SWAG)
```bash
docker exec $(docker ps -q -f name=iklimco_swag) \
fail2ban-client status
```
Expected: list of jails including `nginx-http-auth`, `nginx-botsearch`, etc.
## 10 — No services have published unexpected ports
```bash
docker service ls --format "{{.Name}}\t{{.Ports}}" \
--filter label=project=co.iklim
```
Only `iklimco_swag` should have published ports (`*:80->80`, `*:443->443`).
All other services should show empty ports column.

55
setup-vs-roadmap-map.md Normal file
View File

@ -0,0 +1,55 @@
# Setup Aşamaları — Roadmap Eşleştirme Tablosu
Bu tablo, `roadmap/test-env` ve `roadmap/prod-env` klasörlerindeki yol haritası adımlarının
Terraform/Ansible setup aşamalarından hangisinde ele alındığını gösterir.
## TEST ortamı
| Roadmap adımı | Hangi aşamada ele alınmalı |
| ------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
| Hetzner firewall (sadece 22/80/443) | **Terraform `01-test-terraform-iaac.md`**`firewall.tf` |
| Sunucu oluşturma (`test-swarm-01`, `test-db-01`) | **Terraform `01-test-terraform-iaac.md`**`servers.tf` |
| Private network + placement group | **Terraform `01-test-terraform-iaac.md`**`network.tf`, `placement.tf` |
| Docker Engine kurulumu | **Ansible `02-test-ansible-bootstrap.md`**`docker` role |
| Security hardening (SSH, UFW, fail2ban) | **Ansible `02-test-ansible-bootstrap.md`**`hardening` role |
| Docker Swarm init (`init/swarm-init.sh`) | **Ansible `02-test-ansible-bootstrap.md`**`swarm` role (pipeline script idempotent çalışmaya devam eder) |
| `type=service` node label | **Ansible `02-test-ansible-bootstrap.md`**`swarm` role |
| `/opt/iklimco/...` dizinleri | **Ansible `02-test-ansible-bootstrap.md`**`node_dirs` role |
| `act_runner` systemd kurulumu | **Ansible `03-test-runner-ve-deploy-onkosullari.md`**`gitea_runner` role |
| GoDaddy credentials storagebox'a yükleme | **Manuel kalır** — secret yönetimi, Terraform/Ansible dışı |
## PROD ortamı
| Roadmap adımı | Hangi aşamada ele alınmalı |
| ----------------------------------------------- | ------------------------------------------------------------------------ |
| 6 sunucu oluşturma (3 swarm + 3 db) | **Terraform `04-prod-terraform-iaac.md`**`servers.tf` |
| Private network + 2 placement group | **Terraform `04-prod-terraform-iaac.md`**`network.tf`, `placement.tf` |
| Firewall (sadece 22/80/443 public) | **Terraform `04-prod-terraform-iaac.md`**`firewall.tf` |
| Docker Engine kurulumu (`prod-swarm-*`) | **Ansible `05-prod-ansible-bootstrap.md`**`docker` role |
| Security hardening (tüm node'lar) | **Ansible `05-prod-ansible-bootstrap.md`**`hardening` role |
| Swarm init (`prod-swarm-01`) | **Ansible `05-prod-ansible-bootstrap.md`**`swarm` role |
| Manager join (`prod-swarm-02`, `prod-swarm-03`) | **Ansible `05-prod-ansible-bootstrap.md`**`swarm` role |
| `type=service` node label (3 swarm node) | **Ansible `05-prod-ansible-bootstrap.md`**`swarm` role |
| `/opt/iklimco/...` dizinleri | **Ansible `05-prod-ansible-bootstrap.md`**`node_dirs` role |
| 3× `act_runner` systemd (HA runner) | **Ansible `06-prod-runner-ha-ve-swarm.md`**`gitea_runner` role |
| GoDaddy credentials storagebox'a yükleme | **Manuel kalır** — secret yönetimi, Terraform/Ansible dışı |
| DB node'ları Swarm'a join | **Kapsam dışı** — DB cluster ayrı yönetilir |
## Klasör yapısı
```
Environment_Infrastructure/
setup/ ← Terraform + Ansible aşama dokümanları
00-genel-yol-haritasi.md
01-test-terraform-iaac.md
02-test-ansible-bootstrap.md
03-test-runner-ve-deploy-onkosullari.md
04-prod-terraform-iaac.md
05-prod-ansible-bootstrap.md
06-prod-runner-ha-ve-swarm.md
07-private-network-port-matrisi.md
roadmap/
test-env/ ← Test ortamı Roadmap adımları
prod-env/ ← Prod Roadmap adımları
setup-vs-technical-debt-map.md ← Bu dosya
```

View File

@ -0,0 +1,164 @@
# 00 - Genel Yol Haritasi
Bu dosya, `Environment_Infrastructure` reposunda Terraform ve Ansible ile Hetzner Cloud uzerinde test/prod altyapisini kuracak ajanlar icin ana baglamdir. Her asama dosyasi kendi basina yeterli olacak sekilde yazilmistir; yine de bu dokuman genel karar kaydidir.
## Hedef
Iklim.co altyapisi iki ayri Hetzner Cloud Project uzerinde kurulacak:
- `test` Hetzner Cloud Project
- `prod` Hetzner Cloud Project
Bu ayrim zorunlu kabul edilir. API token, network, firewall, placement group, server, maliyet ve yanlislikla silme riskleri ortam bazinda ayrilmis olur.
## Terraform ve Ansible Sorumluluk Siniri
Terraform sadece IaaS kaynaklarini olusturur:
- Hetzner Cloud server
- Private network ve subnet
- Firewall
- SSH key
- Placement group
- Opsiyonel volume, floating IP, load balancer veya DNS kaydi
- Ansible inventory output
Ansible olusan Linux makineleri hazirlar:
- Linux base paketleri
- Security hardening
- Docker Engine kurulumu
- Docker Swarm init/join
- Gitea Actions `act_runner` systemd kurulumu
- Ortak klasorler ve deploy on kosullari
Terraform icinde Docker, Swarm, runner veya uygulama deploy'u yapilmayacak. Ansible icinde Hetzner Cloud kaynaklari yaratilmeyecek.
## Ortam Topolojileri
### Test
Test ortami minimum topoloji:
| Node | Rol | Not |
| --- | --- | --- |
| `test-swarm-01` | Swarm manager + app worker + Gitea runner | CI/CD test deploy bu node uzerinden calisir |
| `test-db-01` | DB node | DB altyapisi manuel kurulacak; Gitea CI/CD ile kurulmayacak |
Test DB kurulumu Terraform/Ansible ile sadece makine ve OS hazirligina kadar getirilir. PostgreSQL/MongoDB cluster kurulumu bu asamanin disindadir.
### Prod
Prod ortami HA topoloji:
| Node grubu | Adet | Rol |
| --- | ---: | --- |
| `prod-swarm-*` | 3 | Her biri Swarm manager + app worker |
| `prod-db-*` | 3 | DB cluster node'lari |
Prod DB altyapisi manuel kurulacak; Gitea CI/CD ile kurulmayacak. Terraform DB makinelerini ve network/firewall kurallarini hazirlar, Ansible OS hardening ve temel bagimliliklari kurar.
## Public Port Politikasi
Public internete acik portlar sadece:
- `22/tcp` SSH, sadece admin IP/CIDR kaynaklarindan
- `80/tcp` HTTP
- `443/tcp` HTTPS
`8200/tcp` Vault public internete acilmayacak. Vault sadece private network veya Docker overlay icinden erisilebilir olmalidir.
Mevcut uygulama stack dosyalarinda bazi servisler host port publish ediyor olabilir. Hetzner Cloud firewall public ingress'i engelleyecegi icin bu portlar public'ten erisilemez olmalidir. Ancak uzun vadede stack manifestleri de bu politikaya uyacak sekilde sadeleştirilmelidir.
## Private Network Politikasi
Private network icinde acilmasi gereken portlarin ayrintili matrisi `07-private-network-port-matrisi.md` dosyasindadir. Ajanlar firewall veya Ansible UFW kurali yazarken bu dosyayi kaynak kabul etmelidir.
## Gitea Actions Runner Karari
`act_runner` Docker container olarak calistirilmayacak ve Docker socket container'a mount edilmeyecek.
Tercih edilen kurulum:
- `act_runner` Linux systemd servisi olarak kurulur.
- Runner icin ayri `gitea-runner` kullanicisi olusturulur.
- CI/CD job'lari gerekli oldugunda container olusturabilir; bunun icin runner host uzerinde Docker CLI/daemon erisimi gerekir.
- Docker group uyeligi root seviyesine yakin yetki verdigi icin sadece guvenilir Gitea repo/job'lari bu runner label'larini kullanmalidir.
Prod HA icin `act_runner` tek makineye degil, 3 Swarm manager node'unun tamamına kurulacaktir. Boylece bir manager/runner kaybedildiginde pipeline calismaya devam edebilir. Runner label'lari hem ortak hem node-spesifik olmalidir:
- Ortak: `prod-runner`
- Node spesifik: `prod-swarm-01`, `prod-swarm-02`, `prod-swarm-03`
Test icin tek runner yeterlidir:
- Ortak: `test-runner`
- Node spesifik: `test-swarm-01`
## Deploy Lock Karari
Prod ortaminda 3 runner HA icin gereklidir; ancak ayni anda birden fazla deploy job'u
calistirabilir. Bu nedenle prod deploy islemleri StorageBox uzerinde otomatik lock ile
tekillestirilmelidir.
Lock dosyalari/klasorleri manuel olusturulmayacak. Workflow basinda atomik `mkdir`
ile olusturulacak, deploy bitince `rmdir` ile silinecek.
Onerilen StorageBox path'leri:
```text
prod/locks/prod-deploy.lock
prod/locks/prod-infra.lock
prod/locks/services/<service-name>.lock
```
Baslangic icin en sade ve guvenli model tek global prod deploy lock'tur:
```text
prod/locks/prod-deploy.lock
```
Bu model tum prod deploy'lari siraya sokar. Daha sonra ihtiyac olursa servis bazli
lock modeline gecilebilir.
Ornek akış:
```bash
ssh storagebox 'mkdir -p prod/locks && mkdir prod/locks/prod-deploy.lock'
# deploy islemleri
ssh storagebox 'rmdir prod/locks/prod-deploy.lock'
```
`mkdir` atomik oldugu icin lock zaten varsa komut fail olur; bu durumda job beklemeli
veya temiz bir hata ile cikmalidir. Workflow fail olsa bile cleanup adimi lock'u silmeye
calismalidir. Eski kalmis lock'lari tespit etmek icin lock klasoru icine timestamp,
runner adi ve workflow bilgisi yazilabilir.
## Hetzner Fiziksel Host Ayrimi
Hetzner Cloud'da kabinet secimi dogrudan yapilmaz. Ayni fiziksel host'a dusmeme ihtiyaci icin `Placement Group` kullanilir. `spread` tipindeki placement group, gruptaki cloud server'lari farkli fiziksel host'lara yerlestirmeyi hedefler.
Kisitlar:
- Spread placement group, tek fiziksel host arizasinin etkisini azaltir.
- Ayni datacenter veya lokasyon icindeki daha genis bir arizaya karsi garanti vermez.
- Lokasyon bazli felaket kurtarma icin ileride farkli lokasyon/region dagilimi tasarlanmalidir.
- Hetzner dokumanina gore spread placement group basina en fazla 10 server limiti vardir.
Prod icin en az iki placement group onerilir:
- `prod-swarm-spread`: 3 Swarm manager/app node
- `prod-db-spread`: 3 DB node
Test icin opsiyonel:
- `test-spread`: `test-swarm-01` ve `test-db-01`
Kaynaklar:
- Hetzner Terraform provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest
- Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/
- Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview
- Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview
- Docker Swarm overlay portlari: https://docs.docker.com/engine/network/drivers/overlay/
- Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner

View File

@ -0,0 +1,119 @@
# 01 - Test Terraform IaC
Bu asamanin amaci test Hetzner Cloud Project icinde minimum IaaS kaynaklarini Terraform ile olusturmaktir. Bu dokuman tek basina uygulanabilir olacak sekilde yazilmistir.
## Kapsam
Terraform test ortaminda sunlari olusturur:
- Private network: `iklim-test-net`
- Subnetler:
- App/Swarm subnet: `10.10.10.0/24`
- DB subnet: `10.10.20.0/24`
- Firewall:
- Public ingress: sadece `22/tcp`, `80/tcp`, `443/tcp`
- Private ingress: `07-private-network-port-matrisi.md` dosyasindaki test kurallari
- SSH key
- Placement group: `test-spread`
- Server:
- `test-swarm-01`
- `test-db-01`
- Ansible inventory output
Terraform DB yazilimini kurmaz. DB node sadece makine, network ve firewall seviyesinde hazirlanir.
## Onerilen Dosya Yapisi
```text
terraform/
hetzner/
test/
versions.tf
providers.tf
variables.tf
locals.tf
network.tf
firewall.tf
placement.tf
servers.tf
outputs.tf
terraform.tfvars.example
```
`terraform.tfvars` commit edilmeyecek. `.gitignore` icinde ignore edilmelidir.
## Degiskenler
Minimum degiskenler:
```hcl
hcloud_token = "secret"
environment = "test"
location = "fsn1"
image = "ubuntu-24.04"
server_type_swarm = "cx32"
server_type_db = "cx42"
admin_ssh_public_key_path = "~/.ssh/id_ed25519.pub"
admin_allowed_cidrs = ["X.X.X.X/32"]
```
`location` icin tek lokasyonla baslanir. Farkli region/lokasyon felaket kurtarma bu asamada konu disidir; ileride dokumana eklenmelidir.
## Server Rolleri
| Server | Private IP | Rol |
| --- | --- | --- |
| `test-swarm-01` | `10.10.10.11` | Swarm manager + app worker + Gitea runner |
| `test-db-01` | `10.10.20.11` | Manuel DB kurulumu icin hazir DB node |
Private IP'ler Terraform icinde sabit tanimlanmalidir. Ansible inventory ve firewall kurallari deterministik kalir.
## Firewall Kurallari
Public ingress:
| Port | Kaynak | Hedef |
| --- | --- | --- |
| `22/tcp` | `admin_allowed_cidrs` | Tum test node'lari |
| `80/tcp` | `0.0.0.0/0`, `::/0` | `test-swarm-01` |
| `443/tcp` | `0.0.0.0/0`, `::/0` | `test-swarm-01` |
Public ingress icin `8200/tcp`, `5432/tcp`, `27017/tcp`, `5672/tcp`, `15672/tcp`, `6379/tcp`, `2379/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` acilmayacak.
Private ingress icin `07-private-network-port-matrisi.md` kaynak alinacak.
## Placement Group
`test-spread` placement group `type = "spread"` olacak. Testte iki server oldugu icin bu grup `test-swarm-01` ve `test-db-01` makinelerinin farkli fiziksel host'lara dagitilmasini hedefler.
Not: Spread placement group farkli kabinet veya lokasyon garantisi degildir; tek fiziksel host arizasinin etkisini azaltir.
## Terraform Cikti Beklentisi
`outputs.tf` minimum su bilgileri uretmelidir:
```hcl
output "ansible_inventory_yaml" {
sensitive = false
}
output "test_private_ips" {
sensitive = false
}
output "test_public_ips" {
sensitive = false
}
```
Inventory output'u daha sonra `ansible/inventory/generated/test.yml` dosyasina yazilabilir. Inventory dosyasinda secret bulunmayacaksa commit edilebilir; secret veya token icerirse commit edilmeyecek.
## Kabul Kriterleri
- `terraform plan` sadece test Hetzner Project token'i ile calisir.
- `terraform apply` sonrasinda 2 server olusur.
- Iki server private network uzerinden birbirine erisebilir.
- Public internetten sadece `22`, `80`, `443` firewall seviyesinde aciktir.
- Vault `8200` public'ten kapali kalir.
- Terraform state repo'ya commit edilmez.

View File

@ -0,0 +1,140 @@
# 02 - Test Ansible Bootstrap
Bu asamanin amaci Terraform ile olusturulan test makinelerini Linux, hardening, Docker ve Swarm acisindan hazir hale getirmektir. DB yazilimi kurulumu bu asamanin disindadir.
## Hedef Makineler
| Host | Rol |
| --- | --- |
| `test-swarm-01` | Swarm manager + app worker |
| `test-db-01` | Manuel DB kurulumu icin OS-hardening uygulanmis DB node |
## Onerilen Dosya Yapisi
```text
ansible/
ansible.cfg
inventory/
generated/
test.yml
group_vars/
all.yml
test.yml
playbooks/
test-bootstrap.yml
roles/
base/
hardening/
docker/
swarm/
node_dirs/
```
## Base Role
Tum test node'larina uygulanir:
- `apt update`
- temel paketler:
- `curl`
- `wget`
- `git`
- `jq`
- `unzip`
- `ca-certificates`
- `gnupg`
- `lsb-release`
- `ufw`
- `fail2ban`
- `chrony`
- `python3`
- `python3-pip`
- timezone: `Europe/Istanbul`
- hostname ayari
- sistem reboot gerekiyorsa kontrollu reboot
## Security Hardening Role
Tum test node'larina uygulanir:
- SSH password login kapatilir.
- Root SSH login kapatilir.
- Sadece SSH key ile login kalir.
- `PermitEmptyPasswords no`
- `MaxAuthTries 3`
- `fail2ban` SSH jail aktif edilir.
- `unattended-upgrades` aktif edilir.
- UFW default:
- incoming: deny
- outgoing: allow
- Public SSH sadece admin CIDR'dan acilir.
Not: Docker iptables kurallari UFW ile etkilesebilir. Hetzner Cloud firewall asil dis perimeter kabul edilir; UFW host icinde ikinci katman olarak kullanilir.
## Docker Role
Sadece `test-swarm-01` uzerinde zorunludur. `test-db-01` uzerinde DB manual kurulum stratejisine gore opsiyonel tutulabilir.
Docker kurulumu resmi Docker apt repository uzerinden yapilir:
- Docker GPG key
- Docker apt source
- paketler:
- `docker-ce`
- `docker-ce-cli`
- `containerd.io`
- `docker-buildx-plugin`
- `docker-compose-plugin`
- Docker servisi enabled + started
Docker convenience script kullanilmayacak. Production benzeri test ortami icin paket repository yolu tercih edilir.
## Swarm Role
`test-swarm-01` uzerinde:
- `docker swarm init`
- advertise addr: `10.10.10.11`
- data path addr: `10.10.10.11`
- overlay network:
- `iklimco-net`
- driver: `overlay`
- attachable: `true`
- Node `type=service` label'i ile isaretlenir:
```bash
docker node update --label-add type=service test-swarm-01
```
- Node `AVAILABILITY=Active` kalir (drain edilmez); tek node hem manager hem worker'dir.
Test tek node Swarm oldugu icin join token kullanimi yoktur.
## Node Directory Role
`test-swarm-01` uzerinde deploy on kosullari:
```text
/opt/iklimco
/opt/iklimco/ssl
/opt/iklimco/init
/opt/iklimco/init/postgresql
/opt/iklimco/init/mongodb
```
DB node uzerinde manuel DB kurulumu icin minimum:
```text
/opt/iklimco
/opt/iklimco/db
/opt/iklimco/backup
```
## Kabul Kriterleri
- `ansible -i inventory/generated/test.yml all -m ping` basarili olur.
- `test-swarm-01` uzerinde `docker info` calisir.
- `test-swarm-01` uzerinde Swarm active olur; node `AVAILABILITY=Active` (drain degil).
- `docker network ls` icinde `iklimco-net` gorulur.
- `docker node inspect test-swarm-01 --format '{{.Spec.Labels}}'` ciktisi `map[type:service]` icerir.
- `test-db-01` uzerinde public DB portu acik degildir.
- Public portlar Hetzner firewall + UFW seviyesinde `22`, `80`, `443` ile sinirlidir.

View File

@ -0,0 +1,117 @@
# 03 - Test Runner ve Deploy On Kosullari
Bu asamanin amaci test ortaminda Gitea Actions runner'i systemd servisi olarak kurmak ve mevcut test CI/CD pipeline'larinin calisabilecegi host on kosullarini hazirlamaktir.
## Runner Yerlesimi
Test ortaminda tek runner yeterlidir:
| Host | Runner |
| --- | --- |
| `test-swarm-01` | `act_runner` systemd servisi |
Runner Docker container olarak calistirilmayacak. `/var/run/docker.sock` bir runner container'ina mount edilmeyecek.
## Neden Systemd Runner
Mevcut CI/CD akisinda Gitea job'lari gerekli hazirliklari kendi icinde yapip deploy komutlarini calistiriyor. Runner'in Docker container olmasi, Docker socket mount edilmesini gerektirir ve bu model ekstra yetki riski uretir. Systemd runner modelinde socket mount yoktur; ancak runner host uzerinde Docker kullanacagi icin runner kullanicisinin Docker erisimi yine yuksek yetki kabul edilir.
Bu nedenle:
- Runner sadece guvenilir Gitea instance/repo icin kullanilir.
- Runner token Ansible Vault veya CI secret olarak saklanir.
- Runner config ve token repo'ya commit edilmez.
## Runner Kullanicisi
Ansible `gitea_runner` role'u:
- `gitea-runner` sistem kullanicisi olusturur.
- Kullanici shell'i ihtiyaca gore `/bin/bash` olabilir.
- Kullanici Docker kullanacaksa `docker` grubuna eklenir.
- Home dizini: `/var/lib/gitea-runner`
- Config dizini: `/etc/gitea-act-runner`
Docker group root seviyesine yakin yetki verdigi icin bu karar bilincli kabul edilir.
## Runner Binary Kurulumu
Kurulum adimlari:
1. `act_runner` Linux amd64 binary indirilir.
2. `/usr/local/bin/act_runner` olarak yerlestirilir.
3. Executable permission verilir.
4. Config uretilir veya template ile yazilir.
5. Runner register edilir.
6. Systemd unit enabled + started edilir.
## Runner Label PolitikasI
Test runner label'lari:
```text
test-runner
test-swarm-01
ubuntu-24.04
docker
swarm-manager
```
Mevcut workflow'larda `runs-on` degeri test icin bu label'lardan biriyle uyumlu hale getirilmelidir. Eski `ubuntu-latest` kullanimi self-hosted Gitea runner eslesmesi icin yeterli olmayabilir; bu durum Gitea Actions label konfigurasyonuna gore netlestirilmelidir.
## Deploy On Kosullari
Test deploy pipeline'lari icin `test-swarm-01` uzerinde bulunmasi gerekenler:
- Docker Engine
- Docker Compose plugin
- Git
- curl
- jq
- gettext/envsubst
- tree
- ssh/scp client
- Harbor registry erisimi
- StorageBox erisimi
- Gitea reposuna erisim
- Swarm manager yetkisi
- `iklimco-net` overlay network
CI/CD DB altyapisini kurmayacak. Test DB node hazir olacak; DB yazilimi ve cluster/manual setup ayridir.
## Deploy Lock Notu
Test ortaminda tek runner oldugu icin runner'lar arasi deploy yarismasi beklenmez.
Yine de ayni branch'e arka arkaya push edilmesi veya manuel yeniden calistirma gibi
durumlarda ayni servis deploy'u ust uste binebilir.
Test icin lock zorunlu degildir; ancak prod ile ayni aliskanligi kazanmak istenirse
StorageBox uzerinde su path kullanilabilir:
```text
test/locks/test-deploy.lock
test/locks/services/<service-name>.lock
```
Lock manuel olusturulmaz. Workflow basinda atomik `mkdir`, bitiste `rmdir` kullanilir.
## Secret Gereksinimleri
Runner kurulumu ve pipeline calismasi icin secret'lar:
- Gitea runner registration token
- Harbor username/password veya token
- StorageBox credential
- SSH deploy key
- Hetzner token gerekmez; Terraform asamasinda kullanilir
Bu secret'lar repo'ya yazilmayacak.
## Kabul Kriterleri
- `systemctl status gitea-act-runner` active gorunur.
- Gitea UI icinde test runner online gorunur.
- Runner label'lari test workflow `runs-on` ile eslesir.
- Basit bir test workflow runner uzerinde calisir.
- Runner job'u Docker komutu calistirabiliyorsa deploy on kosulu saglanmistir.
- `8200/tcp` public internete acik degildir.

View File

@ -0,0 +1,132 @@
# 04 - Prod Terraform IaC
Bu asamanin amaci prod Hetzner Cloud Project icinde HA odakli IaaS kaynaklarini Terraform ile olusturmaktir. Bu dokuman prod Terraform ajanina tek basina verilebilir.
## Kapsam
Terraform prod ortaminda sunlari olusturur:
- Private network: `iklim-prod-net`
- Subnetler:
- App/Swarm subnet: `10.20.10.0/24`
- DB subnet: `10.20.20.0/24`
- Firewall:
- Public ingress: sadece `22/tcp`, `80/tcp`, `443/tcp`
- Private ingress: `07-private-network-port-matrisi.md` dosyasindaki prod kurallari
- SSH key
- Placement groups:
- `prod-swarm-spread`
- `prod-db-spread`
- Servers:
- `prod-swarm-01`
- `prod-swarm-02`
- `prod-swarm-03`
- `prod-db-01`
- `prod-db-02`
- `prod-db-03`
- Ansible inventory output
DB cluster yazilimi Terraform ile kurulmayacak. DB node'lari sadece makine, network ve firewall seviyesinde hazirlanacak.
## Onerilen Dosya Yapisi
```text
terraform/
hetzner/
prod/
versions.tf
providers.tf
variables.tf
locals.tf
network.tf
firewall.tf
placement.tf
servers.tf
outputs.tf
terraform.tfvars.example
```
`terraform.tfvars`, state dosyalari ve token repo'ya commit edilmeyecek.
## Degiskenler
Minimum degiskenler:
```hcl
hcloud_token = "secret"
environment = "prod"
location = "fsn1"
image = "ubuntu-24.04"
server_type_swarm = "cx42"
server_type_db = "cx52"
admin_ssh_public_key_path = "~/.ssh/id_ed25519.pub"
admin_allowed_cidrs = ["X.X.X.X/32"]
```
Server type degerleri kapasiteye gore degisebilir. Bu dokuman topoloji ve guvenlik kararini tanimlar; sizing daha sonra revize edilebilir.
## Server Rolleri ve Private IP PlanI
| Server | Private IP | Rol |
| --- | --- | --- |
| `prod-swarm-01` | `10.20.10.11` | Swarm manager + app worker + runner |
| `prod-swarm-02` | `10.20.10.12` | Swarm manager + app worker + runner |
| `prod-swarm-03` | `10.20.10.13` | Swarm manager + app worker + runner |
| `prod-db-01` | `10.20.20.11` | Manuel DB cluster node |
| `prod-db-02` | `10.20.20.12` | Manuel DB cluster node |
| `prod-db-03` | `10.20.20.13` | Manuel DB cluster node |
Private IP'ler sabit tanimlanmalidir.
## Placement Group Karari
Prod icin iki ayri spread placement group:
```text
prod-swarm-spread: prod-swarm-01/02/03
prod-db-spread: prod-db-01/02/03
```
Bu sayede Swarm quorum node'lari kendi aralarinda farkli fiziksel host'lara, DB node'lari da kendi aralarinda farkli fiziksel host'lara yerlestirilmeye calisilir.
Notlar:
- Hetzner kabinet secimi dogrudan sunmaz.
- Spread placement group farkli fiziksel host hedefler.
- Farkli lokasyon/region felaket kurtarma bu asamada konu disidir.
- Ileride scale buyudugunde multi-location DR ayri tasarlanmalidir.
## Public Firewall
Public ingress:
| Port | Kaynak | Hedef |
| --- | --- | --- |
| `22/tcp` | `admin_allowed_cidrs` | Tum prod node'lari |
| `80/tcp` | `0.0.0.0/0`, `::/0` | Prod gateway entrypoint |
| `443/tcp` | `0.0.0.0/0`, `::/0` | Prod gateway entrypoint |
Prod'da su portlar public acilmayacak:
- `8200/tcp` Vault
- `5432/tcp` PostgreSQL
- `27017/tcp` MongoDB
- `6379/tcp` Redis
- `5672/tcp`, `15672/tcp`, `61613/tcp`, `15674/tcp` RabbitMQ
- `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` Docker Swarm
- `9180/tcp` APISIX Admin API
- `9090/tcp` Prometheus
- `3000/tcp` Grafana
Bu servisler gerekiyorsa private network, VPN, bastion veya admin CIDR ile sinirlandirilmis ek kural uzerinden erisilebilir. Varsayilan public politika kapali kalir.
## Kabul Kriterleri
- `terraform plan` sadece prod Hetzner Project token'i ile calisir.
- 6 server olusur.
- Swarm node'lari `prod-swarm-spread` placement group icindedir.
- DB node'lari `prod-db-spread` placement group icindedir.
- Public firewall sadece `22`, `80`, `443` ingress'e izin verir.
- Private firewall `07-private-network-port-matrisi.md` ile uyumludur.
- Terraform state ve secret tfvars commit edilmez.

View File

@ -0,0 +1,146 @@
# 05 - Prod Ansible Bootstrap
Bu asamanin amaci Terraform ile olusturulan prod makinelerini Linux, security hardening, Docker ve Swarm acisindan hazir hale getirmektir. DB cluster yazilimi manuel kurulacaktir; bu playbook DB node'larinda sadece OS ve temel guvenlik hazirligini yapar.
## Hedef Makineler
| Host | Rol |
| --- | --- |
| `prod-swarm-01` | Swarm manager + app worker |
| `prod-swarm-02` | Swarm manager + app worker |
| `prod-swarm-03` | Swarm manager + app worker |
| `prod-db-01` | Manuel DB cluster node |
| `prod-db-02` | Manuel DB cluster node |
| `prod-db-03` | Manuel DB cluster node |
## Onerilen Dosya Yapisi
```text
ansible/
ansible.cfg
inventory/
generated/
prod.yml
group_vars/
all.yml
prod.yml
playbooks/
prod-bootstrap.yml
roles/
base/
hardening/
docker/
swarm/
node_dirs/
```
## Base Role
Tum prod node'larina uygulanir:
- Paket cache update
- Temel paketler:
- `curl`
- `wget`
- `git`
- `jq`
- `unzip`
- `ca-certificates`
- `gnupg`
- `lsb-release`
- `ufw`
- `fail2ban`
- `chrony`
- `python3`
- `python3-pip`
- timezone: `Europe/Istanbul`
- hostname ayari
- chrony/NTP aktif
## Security Hardening Role
Tum prod node'larina uygulanir:
- SSH password auth kapatilir.
- Root SSH login kapatilir.
- Sadece SSH key auth kalir.
- `PermitEmptyPasswords no`
- `MaxAuthTries 3`
- `fail2ban` aktif edilir.
- `unattended-upgrades` aktif edilir.
- UFW default incoming deny, outgoing allow.
- SSH sadece admin CIDR'dan acilir.
- DB portlari public acilmaz.
Hetzner Cloud Firewall asil perimeter kabul edilir. UFW host uzerinde ikinci savunma katmanidir.
## Docker Role
Sadece `prod-swarm-*` node'larinda zorunludur.
Kurulacak paketler:
- `docker-ce`
- `docker-ce-cli`
- `containerd.io`
- `docker-buildx-plugin`
- `docker-compose-plugin`
Kurulum resmi Docker apt repository uzerinden yapilacak. Convenience script kullanilmayacak.
DB node'larinda Docker zorunlu degildir. DB manuel kurulum stratejisi container tabanli olacaksa daha sonra ayri DB dokumaninda ele alinmalidir.
## Swarm Role
Prod Swarm 3 manager ile kurulacak:
1. `prod-swarm-01` uzerinde `docker swarm init`
2. Advertise/data path addr: `10.20.10.11`
3. Manager join token alinir.
4. `prod-swarm-02` ve `prod-swarm-03` manager olarak join olur.
5. Overlay network olusturulur:
- `iklimco-net`
- driver: `overlay`
- attachable: `true`
6. Tum 3 node `type=service` label'i ile isaretlenir:
```bash
for node in prod-swarm-01 prod-swarm-02 prod-swarm-03; do
docker node update --label-add type=service "$node"
done
```
7. Hicbir node drain edilmez. 3 node da `AVAILABILITY=Active` kalir; hem manager hem app worker olarak calisir.
> DB node'lari (`prod-db-*`) Swarm'a join ettirilmez. DB cluster ayri yonetilir.
## Node Directory Role
Tum `prod-swarm-*` node'larinda:
```text
/opt/iklimco
/opt/iklimco/ssl
/opt/iklimco/init
/opt/iklimco/init/postgresql
/opt/iklimco/init/mongodb
```
DB node'larinda manuel DB kurulumu icin:
```text
/opt/iklimco
/opt/iklimco/db
/opt/iklimco/backup
```
## Kabul Kriterleri
- `ansible -i inventory/generated/prod.yml all -m ping` basarili olur.
- 3 Swarm node `docker node ls` icinde manager olarak gorunur; hepsi `AVAILABILITY=Active`.
- Manager quorum saglanir (3 manager, 1 kayip tolere edilir).
- `iklimco-net` overlay network vardir.
- `docker node inspect prod-swarm-01 --format '{{.Spec.Labels}}'` ciktisi `map[type:service]` icerir.
- DB node'lari `docker node ls` ciktisinda gorunmez.
- Public firewall sadece `22`, `80`, `443` ingress'e izin verir.
- DB node'lari public DB portu acmaz.
- DB yazilimi kurulumu bu playbook tarafindan yapilmaz.

View File

@ -0,0 +1,158 @@
# 06 - Prod Runner HA ve Swarm Deploy Modeli
Bu asamanin amaci prod ortaminda Gitea Actions runner'lari HA calisacak sekilde kurmak ve Swarm uzerinde servislerin 3 node'a dagitilmasina uygun on kosullari tanimlamaktir.
## Runner Sayisi
Tek runner fonksiyonel olarak yeterlidir, ancak HA degildir. Prod hedefi HA oldugu icin `act_runner` 3 Swarm manager node'unun tamamına systemd servisi olarak kurulacak:
| Host | Runner |
| --- | --- |
| `prod-swarm-01` | `act_runner` systemd |
| `prod-swarm-02` | `act_runner` systemd |
| `prod-swarm-03` | `act_runner` systemd |
Bu modelde herhangi bir manager/runner kaybedilirse diger runner'lar pipeline job'larini alabilir.
## Runner Kurulum Modeli
Runner Docker container olarak calismayacak. Docker socket mount yok.
Kurulum:
- `gitea-runner` sistem kullanicisi
- `/usr/local/bin/act_runner`
- `/etc/gitea-act-runner/config.yaml`
- `/var/lib/gitea-runner`
- `gitea-act-runner.service`
Runner job'lari deploy icin Docker CLI kullanacaksa `gitea-runner` kullanicisinin Docker daemon erisimi gerekir. Docker group uyeligi root seviyesine yakin yetki kabul edilir; sadece guvenilir repo/job'lar bu runner label'larini kullanmalidir.
## Runner Label PolitikasI
Tum prod runner'larda ortak label:
```text
prod-runner
docker
swarm-manager
ubuntu-24.04
```
Node-spesifik label'lar:
```text
prod-swarm-01
prod-swarm-02
prod-swarm-03
```
Mevcut prod workflow'lari `runs-on: prod-runner` kullaniyorsa 3 runner'dan herhangi biri job'u alabilir. Belirli bir node'a sabitlemek gerekirse node-spesifik label kullanilir.
## Deploy Yarismasi Riski
Birden fazla runner oldugunda ayni anda birden fazla deploy job'u calisabilir. Bu HA icin iyidir ama ortak kaynaklarda yarisma riski yaratabilir.
Riskli alanlar:
- Ayni stack uzerinde es zamanli `docker stack deploy`
- Ayni servis icin es zamanli `docker service update`
- StorageBox'ta ayni `.env` veya manifest dosyasinin es zamanli guncellenmesi
- Root altyapi pipeline'i ile mikroservis deploy pipeline'inin ayni anda calismasi
Gerekli onlem:
- Prod root altyapi deploy'u manuel/onayli calismali.
- Ayni servis icin prod deploy ayni anda birden fazla kez tetiklenmemeli.
- Prod deploy workflow'lari StorageBox uzerinde otomatik deploy lock kullanmalidir.
## StorageBox Deploy Lock Modeli
Prod'da 3 runner oldugu icin deploy lock zorunlu kabul edilir. Lock lokal dosya
sisteminde tutulmayacak; cunku runner'lar farkli makinelerde calisir ve birbirlerinin
`/tmp` veya `/var/lock` dizinlerini gormez.
Lock konumu StorageBox olacaktir:
```text
prod/locks/prod-deploy.lock
prod/locks/prod-infra.lock
prod/locks/services/<service-name>.lock
```
Baslangic modeli:
```text
prod/locks/prod-deploy.lock
```
Bu tek global lock tum prod deploy'lari siraya sokar ve en az karmasik modeldir.
Ileride deploy sureleri uzarsa servis bazli lock'a gecilebilir.
Lock dosyasi/klasoru manuel olusturulmaz. Workflow basinda atomik `mkdir` ile lock
alinir, workflow sonunda `rmdir` ile lock birakilir.
Ornek:
```bash
LOCK_DIR="prod/locks/prod-deploy.lock"
LOCK_META="owner.txt"
ssh "$STORAGEBOX_SSH" "mkdir -p prod/locks && mkdir '$LOCK_DIR'"
ssh "$STORAGEBOX_SSH" "printf '%s\n' 'runner=${GITEA_RUNNER_NAME:-unknown}' 'run=${GITHUB_RUN_ID:-unknown}' 'created_at=$(date -u +%FT%TZ)' > '$LOCK_DIR/$LOCK_META'"
# deploy islemleri
ssh "$STORAGEBOX_SSH" "rm -f '$LOCK_DIR/$LOCK_META' && rmdir '$LOCK_DIR'"
```
Davranis:
- `mkdir '$LOCK_DIR'` basariliysa lock alinmistir.
- `mkdir '$LOCK_DIR'` fail olursa baska deploy calisiyor kabul edilir.
- Job fail olsa bile cleanup adimi `rm/rmdir` calistirmalidir.
- Stale lock temizligi manuel/onayli olmalidir; otomatik zorla silme ilk asamada uygulanmamalidir.
Lock seviyesi:
| Lock | Ne icin |
| --- | --- |
| `prod/locks/prod-deploy.lock` | Ilk asama: tum prod deploy'lar icin global lock |
| `prod/locks/prod-infra.lock` | Ileride root infra deploy'u mikroservis deploy'larindan ayirmak icin |
| `prod/locks/services/<service-name>.lock` | Ileride servis bazli paralel deploy'a gecmek icin |
## Swarm Servis Dagilimi
Prod'da 3 node da manager + app worker oldugu icin servisler 3 node'a dagitilabilir.
Uygulama servisleri icin ileride `docker-stack-service.yml` deploy ayarlari su prensiplere gore revize edilebilir:
- Stateless servislerde `replicas: 3`
- `placement` ile sadece app-capable node'lar secilir
- `update_config` rolling update olacak sekilde ayarlanir
- `restart_policy` aktif kalir
- State tutan servisler app worker uzerinde cogaltilmaz; stateful katman DB node'larinda ayridir
Mevcut repo durumunda mikroservis stack dosyalari servis bazli deploy ediliyor. Bu dokuman, prod HA hedefi icin runner ve Swarm on kosullarini tanimlar; her mikroservisin replica sayisi ayri uygulama deploy refaktoru olarak ele alinmalidir.
## Gateway ve Public Trafik
Public internet sadece `80/tcp` ve `443/tcp` ile gateway katmanina girmelidir.
Mevcut stack dosyalarinda APISIX `8080/8443` publish ediyor olabilir. Prod hedef mimaride public firewall sadece `80/443` acik oldugu icin iki secenekten biri secilmelidir:
1. APISIX/SWAG host publish portlari `80/443` ile uyumlu hale getirilir.
2. Hetzner Load Balancer veya reverse proxy `80/443` alip Swarm gateway portlarina private network uzerinden aktarir.
Bu karar Terraform/Ansible bootstrap'tan ayridir; uygulama altyapi manifest revizyonu gerektirir.
## Kabul Kriterleri
- 3 prod runner Gitea UI'da online gorunur.
- Her runner `prod-runner` label'ina sahiptir.
- Runner'lardan herhangi biri basit Docker komutu calistirabilir.
- `docker node ls` 3 manager gosterir.
- Bir runner/node kapatildiginda diger runner yeni job alabilir.
- Prod workflow'lari StorageBox uzerindeki `prod/locks/prod-deploy.lock` global lock'unu kullanir.
- Lock manuel degil, workflow tarafindan `mkdir/rmdir` ile otomatik yonetilir.
- Public ingress sadece `22`, `80`, `443` ile sinirlidir.

View File

@ -0,0 +1,149 @@
# 07 - Private Network Port Matrisi
Bu dosya test ve prod ortamlarinda Hetzner private network icinde acilmasi gereken portlari tanimlar. Public internete acik portlar sadece `22/tcp`, `80/tcp`, `443/tcp` olacaktir. Vault `8200/tcp` public acilmayacak.
Bu matris Terraform Hetzner firewall ve Ansible UFW kurallari icin kaynak kabul edilmelidir.
## Network PlanI
### Test
| Subnet | CIDR | Amac |
| --- | --- | --- |
| App/Swarm | `10.10.10.0/24` | `test-swarm-01` |
| DB | `10.10.20.0/24` | `test-db-01` |
### Prod
| Subnet | CIDR | Amac |
| --- | --- | --- |
| App/Swarm | `10.20.10.0/24` | `prod-swarm-01/02/03` |
| DB | `10.20.20.0/24` | `prod-db-01/02/03` |
## Public Ingress Standardi
Tum ortamlar icin public ingress:
| Port | Protocol | Kaynak | Hedef | Zorunluluk |
| --- | --- | --- | --- | --- |
| `22` | TCP | Admin IP/CIDR | Tum node'lar | SSH yonetim |
| `80` | TCP | Internet | Gateway entrypoint | HTTP / ACME redirect |
| `443` | TCP | Internet | Gateway entrypoint | HTTPS |
Public olarak acilmayacak kritik portlar:
| Port | Servis |
| --- | --- |
| `8200/tcp` | Vault |
| `5432/tcp` | PostgreSQL |
| `27017/tcp` | MongoDB |
| `6379/tcp` | Redis |
| `5672/tcp`, `15672/tcp`, `61613/tcp`, `15674/tcp` | RabbitMQ |
| `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` | Docker Swarm |
| `9180/tcp` | APISIX Admin API |
| `9090/tcp` | Prometheus |
| `3000/tcp` | Grafana |
## Docker Swarm Private Portlari
Docker Swarm node'lari arasinda zorunlu portlar:
| Port | Protocol | Kaynak | Hedef | Aciklama |
| --- | --- | --- | --- | --- |
| `2377` | TCP | Swarm node'lari | Swarm manager node'lari | Swarm control plane / join |
| `7946` | TCP | Tum Swarm node'lari | Tum Swarm node'lari | Node discovery / gossip |
| `7946` | UDP | Tum Swarm node'lari | Tum Swarm node'lari | Node discovery / gossip |
| `4789` | UDP | Tum Swarm node'lari | Tum Swarm node'lari | Overlay VXLAN data path |
Testte bu portlar fiilen tek Swarm node icin gerekli olsa da ileride worker eklemeyi kolaylastirmak icin app subnet icinde tanimlanabilir.
Prod'da `10.20.10.0/24` app/swarm subnet icinde bu portlar tum `prod-swarm-*` node'lari arasinda acik olmalidir.
Kaynak: Docker overlay network dokumani, https://docs.docker.com/engine/network/drivers/overlay/
## Uygulama ve Infra Servis Private Portlari
Bu portlar public acilmayacak. Sadece private network veya Docker overlay icinde gerekli kaynaklardan erisime izin verilecek.
| Port | Protocol | Servis | Kaynak | Hedef | Not |
| --- | --- | --- | --- | --- | --- |
| `8200` | TCP | Vault API/UI | Swarm app node'lari / runner | Vault service/node | Public kapali. Runtime servisleri Vault'a private/overlay uzerinden erismeli |
| `6379` | TCP | Redis | Swarm app node'lari | Redis service/node | Public kapali |
| `5672` | TCP | RabbitMQ AMQP | Swarm app node'lari | RabbitMQ service/node | Public kapali |
| `15672` | TCP | RabbitMQ Management | Admin CIDR veya private ops | RabbitMQ service/node | Public kapali; tercihen VPN/bastion |
| `61613` | TCP | RabbitMQ STOMP | Gerekli app node'lari | RabbitMQ service/node | Public kapali |
| `15674` | TCP | RabbitMQ Web STOMP | Gerekli app/gateway node'lari | RabbitMQ service/node | Public kapali |
| `2379` | TCP | etcd client | APISIX service/node | etcd service/node | Public kapali |
| `2380` | TCP | etcd peer | etcd cluster node'lari | etcd cluster node'lari | Tek replica ise gerekmeyebilir; cluster olursa gerekli |
| `9180` | TCP | APISIX Admin API | Admin CIDR veya private ops | APISIX service/node | Public kapali |
| `9090` | TCP | Prometheus UI/API | Admin CIDR veya private ops | Prometheus service/node | Public kapali |
| `3000` | TCP | Grafana UI | Admin CIDR veya private ops | Grafana service/node | Public kapali |
Mevcut `docker-stack-infra.yml` bazi servisleri host mode ile publish ediyor olabilir. Hetzner firewall public ingress'i kapatsa bile private ingress kararini bu tablo belirler.
## DB Node Portlari
DB altyapisi manuel kurulacagi icin kesin cluster teknolojisi bu dokumanin disindadir. Yine de firewall icin varsayilan portlar asagidadir.
### PostgreSQL / PostGIS
| Port | Protocol | Kaynak | Hedef | Not |
| --- | --- | --- | --- | --- |
| `5432` | TCP | App/Swarm subnet | PostgreSQL node/cluster endpoint | Uygulama DB baglantisi |
| `5432` | TCP | DB subnet | PostgreSQL node'lari | Streaming replication ayni portu kullanabilir |
Eger Patroni kullanilirsa ek portlar daha sonra DB runbook'unda netlestirilmelidir:
| Port | Protocol | Amac |
| --- | --- | --- |
| `8008` | TCP | Patroni REST API |
| `2379-2380` | TCP | Patroni icin etcd kullanilirsa etcd client/peer |
| `5000-5001` | TCP | HAProxy veya benzeri DB endpoint kullanilirsa |
Bu ek portlar ancak ilgili teknoloji secildiginde acilmalidir.
### MongoDB
| Port | Protocol | Kaynak | Hedef | Not |
| --- | --- | --- | --- | --- |
| `27017` | TCP | App/Swarm subnet | MongoDB node/replica set endpoint | Uygulama DB baglantisi |
| `27017` | TCP | DB subnet | MongoDB replica set node'lari | Replica set internal trafik |
Ileride sharding yapilirsa `27018/27019` gibi ek MongoDB rolleri gundeme gelebilir; bu asamada acilmayacak.
## Test Private Kurallari
Test ortaminda minimum:
| Kaynak | Hedef | Portlar |
| --- | --- | --- |
| `10.10.10.0/24` | `10.10.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` |
| `10.10.10.0/24` | `10.10.20.0/24` | `5432/tcp`, `27017/tcp` |
| `10.10.10.0/24` | `10.10.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp` |
| Admin CIDR veya VPN | `10.10.10.0/24` | `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` |
Testte DB node tek oldugu icin DB subnet icindeki PostgreSQL/MongoDB replication portlari aktif kullanilmayabilir.
## Prod Private Kurallari
Prod ortaminda minimum:
| Kaynak | Hedef | Portlar |
| --- | --- | --- |
| `10.20.10.0/24` | `10.20.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` |
| `10.20.10.0/24` | `10.20.20.0/24` | `5432/tcp`, `27017/tcp` |
| `10.20.20.0/24` | `10.20.20.0/24` | `5432/tcp`, `27017/tcp` |
| `10.20.10.0/24` | `10.20.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp`, `2379/tcp` |
| Admin CIDR veya VPN | `10.20.10.0/24` | `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` |
Patroni, HAProxy, Mongo sharding veya ayri monitoring agent mimarisi secilirse bu matrise ek portlar kontrollu sekilde eklenmelidir.
## Kabul Kriterleri
- Public firewall `8200/tcp` acmaz.
- DB portlari public acik degildir.
- Swarm portlari sadece private app/swarm subnet icinde aciktir.
- App/Swarm subnet DB subnet'e sadece gerekli DB portlarindan erisir.
- DB subnet app subnet'e genis yetkiyle acilmaz.
- Admin UI portlari public yerine admin CIDR/VPN/private ops ile sinirlandirilir.