docs(db): implement direct cluster access strategy for production

- Updated roadmap (03-infra-stack-changes.md) to deprecate database proxies in prod.
- Detailed direct subnet access via WireGuard for production developers.
- Provided multi-host connection parameters for Patroni and MongoDB Replica Sets in setup guide (08-prod-db-cluster-kurulum.md).
- Added environment comparison table to developer access guide.
This commit is contained in:
Murat ÖZDEMİR 2026-05-18 14:25:26 +03:00
parent 2202e92c2c
commit 8780c7c05e
18 changed files with 1546 additions and 1218 deletions

294
README.md
View File

@ -1,268 +1,64 @@
# 🌍 Sunucu Ortam ve Altyapıları
# 🌍 iklim.co Altyapı ve Sunucu Yönetimi
`iklim.co` test ve prod ortamları için Hetzner Cloud üzerinde Infrastructure-as-code ve operasyonel runbook deposu.
Bu depo, `iklim.co` projesinin **test** ve **production** ortamlarını kurmak, yönetmek ve modernize etmek için gerekli olan Infrastructure-as-Code (IaC) varlıklarını, teknik rehberleri ve operasyonel standartları barındırır.
Bu depo şunları kapsar:
- 🧱 Hetzner altyapısı için Terraform kaynakları (`test` ve `prod`)
- 🤖 Ansible bootstrap playbook'ları, paylaşımlı roller ve envanter hedefleri
- 📚 Uçtan uca kurulum rehberleri ve yol haritası dokümanları
- 📊 Boyutlandırma/maliyet analizi ve referans kaynakları
Altyapı yönetimi; Hetzner Cloud üzerinde Terraform ile kaynak provisioning, Ansible ile işletim sistemi yapılandırması ve Docker Swarm üzerinde mikroservis mimarisinin kurgulanması süreçlerini kapsar.
## 🎯 Kapsam
---
Temel amaç, sorumluluk sınırları net biçimde tanımlanmış, standart ve belgelenmiş bir altyapı provisioning süreci oluşturmaktır:
## 📂 Depo Yapısı ve Temel Bölümler
- 🧱 **Terraform**: bulut altyapısını oluşturur (sunucular, private ağlar, firewall'lar, placement group'lar, floating IP'ler, SSH key kaydı, envanter çıktısı)
- 🤖 **Ansible**: OS hazırlığı, güvenlik sertleştirme, Docker/Swarm, runner kurulumu ve StorageBox mount süreçlerini depodaki playbook ve paylaşımlı roller aracılığıyla yönetir
- 🚀 **Uygulama/stack dağıtımı**: yol haritası dokümanlarında referans verilen ilgili deployment workflow'ları ve stack manifest'leri tarafından yönetilir
Bu depodaki dökümantasyon ve kod varlıkları beş ana kategoriye ayrılmıştır:
## 📌 Mevcut Depo Durumu
### 1. 🛣️ Roadmap (`roadmap/`)
Ortamların (test ve prod) sıfırdan kurulması veya mevcut yapının güncellenmesi için gerekli olan **iş gereksinimlerini, teknik hedefleri ve adım adım uygulama planlarını** içerir.
- Altyapıda yapılacak büyük değişikliklerin (örn: Redis Sentinel geçişi, APISIX konfigürasyonu, RabbitMQ Quorum Queues) stratejik dökümantasyonudur.
- [roadmap/test-env/](./roadmap/test-env/) - Test ortamı gereksinimleri ve planları.
- [roadmap/prod-env/](./roadmap/prod-env/) - Üretim ortamı HA (High Availability) ve güvenilirlik planları.
Bu depo şu an ağırlıklı olarak şunları içermektedir:
### 2. 🛠️ Setup (`setup/`)
Altyapının fiziksel olarak ayağa kaldırılması için kullanılan **uygulama dökümanlarıdır**. Bu bölüm şunları yönetmek için kullanılır:
- **Terraform:** Bulut kaynaklarının (Server, Network, Firewall) üretilmesi.
- **Ansible:** İşletim sistemi hazırlığı, güvenlik sertleştirme (hardening), Docker/Swarm kurulumu.
- **CI/CD:** Deployment workflow'larının (Gitea Actions) ve stack manifest'lerinin oluşturulması/güncellenmesi.
- Örn: [setup/06-prod-terraform-iaac.md](./setup/06-prod-terraform-iaac.md), [setup/07-prod-ansible-bootstrap.md](./setup/07-prod-ansible-bootstrap.md)
- 🧱 Hazır Terraform kodu:
- `terraform/hetzner/test`
- `terraform/hetzner/prod`
- 🤖 Her iki ortam için Ansible otomasyon varlıkları:
- `ansible/test/test-bootstrap.yml`
- `ansible/prod/prod-bootstrap.yml`
- `ansible/roles/*`
- `ansible/test/group_vars/*` ve `ansible/prod/group_vars/*`
- 📦 Envanter çıktıları ve hedef yollar:
- `ansible/test/inventory/generated/test.yml` (takip edilen örnek)
- `ansible/prod/inventory/generated/prod.yml` (beklenen çıktı yolu)
- 📘 Detaylı kurulum aşamaları:
- `setup/00-genel-yol-haritasi.md``setup/09-prod-runner-ha-ve-swarm.md`
- 🛣️ Ortam yol haritası adımları:
- `roadmap/test-env/*`
- `roadmap/prod-env/*`
- 📈 Kapasite planlama ve referans grafikler:
- `hetzner-sizing-report.md`
- `test-app-graphs.png`
- `test-db-graphs.png`
### 3. 🗺️ Setup vs Roadmap Matrisi (`setup-vs-roadmap-map.md`)
İşterler doğrultusunda hazırlanan **Roadmap** dökümanları ile bu isterleri teknik olarak hayata geçiren **Setup** dökümanları arasındaki ilişkiyi açıklar.
- Hangi roadmap adımının hangi setup dökümanı ile uygulandığını gösteren bir eşleşme matrisidir.
- [setup-vs-roadmap-map.md](./setup-vs-roadmap-map.md) dökümanından detaylara ulaşılabilir.
## 🧭 Hedef Ortam Topolojisi
### 4. 📊 Hetzner Sizing Report (`hetzner-sizing-report.md`)
İklim altyapı servisleri (API Gateway, Microservices, Databases, Broker) için seçilen **Hetzner sunucu tiplerini, CPU/RAM kapasitelerini ve maliyet/performans analizlerini** anlatır.
- Ortam kurulumundan önce kapasite planlaması için temel referans noktasıdır.
- [hetzner-sizing-report.md](./hetzner-sizing-report.md) dökümanını inceleyin.
### 🧪 Test
### 5. 💡 Facts (`facts/`)
Ortam kurulumları tamamlandıktan sonra ortaya çıkan, **sistemin o anki gerçek durumunu (source of truth) ve bilinmesi gereken kritik teknik detayları** barındıran dökümanlardır.
- "Sistem şu an nasıl çalışıyor?" sorusunun cevabıdır.
- [facts/firewall.md](./facts/firewall.md): Aktif firewall kuralları ve port matrisi.
- [facts/swarm-node-recovery-swag-failover.md](./facts/swarm-node-recovery-swag-failover.md): Node düşmesi durumunda manuel müdahale ve recovery prosedürleri.
| Node | Rol | Private IP | Önerilen Tip |
| --- | --- | --- | --- |
| `iklim-app-01` | Swarm manager + app worker + test runner | `10.10.10.11` | `cpx42` |
| `iklim-db-01` | DB host (manuel/stack tabanlı DB kurulum yolu) | `10.10.20.11` | `cpx42` |
### 🏭 Production
| Node | Rol | Private IP | Önerilen Tip |
| --- | --- | --- | --- |
| `iklim-app-01` | Swarm manager + app worker + runner (birincil FIP hedefi) | `10.20.10.11` | `cpx42` |
| `iklim-app-02` | Swarm manager + app worker + runner | `10.20.10.12` | `cpx42` |
| `iklim-app-03` | Swarm manager + app worker + runner | `10.20.10.13` | `cpx42` |
| `iklim-db-01` | DB cluster node | `10.20.20.11` | `cpx32` |
| `iklim-db-02` | DB cluster node | `10.20.20.12` | `cpx32` |
| `iklim-db-03` | DB cluster node | `10.20.20.13` | `cpx32` |
## 🔐 Güvenlik ve Ağ Temeli
Terraform ve kurulum dokümanlarına yansıtılan temel kararlar:
- Test ve prod, ayrı Hetzner Cloud proje ve token'larıyla birbirinden yalıtılmıştır.
- Kamuya açık gelen trafik şunlarla sınırlıdır:
- `22/tcp` (yalnızca admin CIDR'ları)
- `80/tcp`
- `443/tcp`
- Kritik servisler yalnızca private ağda erişilebilir (örneğin Vault `8200`, PostgreSQL `5432`, MongoDB `27017`, iç gözlemlenebilirlik ve broker portları).
- Host dağılımı stratejisi için placement group'lar kullanılmaktadır.
- Sunucu kaynaklarında yanlışlıkla silinmeye karşı `prevent_destroy = true` etkinleştirilmiştir.
- Terraform state ve gizli dosyalar commit'lenmemelidir.
Ayrıca bkz.:
- [[facts/firewall.md]] — tüm firewall kurallarının araç bazında özet dökümantasyonu
- [[setup/01-private-network-port-matrisi.md]]
- `terraform/hetzner/test/firewall.tf`
- `terraform/hetzner/prod/firewall.tf`
## 🗂️ Depo Yapısı
```text
Environment_Infrastructure/
├── ansible/
│ ├── prod/
│ │ ├── ansible.cfg
│ │ ├── group_vars/
│ │ └── prod-bootstrap.yml
│ ├── roles/
│ │ ├── base/
│ │ ├── docker/
│ │ ├── hardening/
│ │ ├── node_dirs/
│ │ ├── storagebox/
│ │ ├── storagebox_ssh_key/
│ │ └── swarm/
│ ├── test/
│ │ ├── ansible.cfg
│ │ ├── group_vars/
│ │ ├── inventory/
│ │ │ └── generated/
│ │ │ └── test.yml
│ │ └── test-bootstrap.yml
│ └── requirements.yml
├── roadmap/
│ ├── test-env/
│ └── prod-env/
├── setup/
│ ├── 00-genel-yol-haritasi.md
│ ├── 01-private-network-port-matrisi.md
│ ├── 02-test-terraform-iaac.md
│ ├── 03-test-ansible-bootstrap.md
│ ├── 04-test-db-docker-kurulum.md
│ ├── 05-test-runner-ve-deploy-onkosullari.md
│ ├── 06-prod-terraform-iaac.md
│ ├── 07-prod-ansible-bootstrap.md
│ ├── 08-prod-db-cluster-kurulum.md
│ └── 09-prod-runner-ha-ve-swarm.md
├── terraform/
│ └── hetzner/
│ ├── test/
│ └── prod/
├── facts/
│ └── firewall.md
├── hetzner-sizing-report.md
├── setup-vs-roadmap-map.md
├── test-app-graphs.png
└── test-db-graphs.png
```
## ✅ Ön Koşullar
- Terraform `>= 1.6`
- Hetzner Cloud hesabı ve ortam başına API token
- SSH anahtar çifti (public key yolu Terraform değişkenlerinde kullanılır)
- Linux/macOS kabuk araçları (`bash`, `cp`, `sed` veya tercih edilen metin editörü)
- İlerleyen aşamalarda gerekli: Ansible, Docker, Gitea/Harbor/StorageBox erişimi
## 🛠️ Terraform Kullanımı
### 1) 🧪 Test Altyapısı
```bash
cd terraform/hetzner/test
cp terraform.tfvars.example terraform.tfvars
```
`terraform.tfvars` değerlerini düzenle:
- `hcloud_token`
- `admin_allowed_cidrs`
- isteğe bağlı geçersiz kılmalar (`location`, image, sunucu tipleri, key yolu)
Ardından çalıştır:
```bash
terraform init
terraform plan
terraform apply
mkdir -p ../../../ansible/test/inventory/generated
terraform output -raw ansible_inventory_yaml > ../../../ansible/test/inventory/generated/test.yml
```
### 2) 🏭 Production Altyapısı
```bash
cd terraform/hetzner/prod
cp terraform.tfvars.example terraform.tfvars
```
`terraform.tfvars` değerlerini düzenle:
- `hcloud_token` (prod token)
- `admin_allowed_cidrs`
- isteğe bağlı geçersiz kılmalar
Ardından çalıştır:
```bash
terraform init
terraform plan
terraform apply
mkdir -p ../../../ansible/prod/inventory/generated
terraform output -raw ansible_inventory_yaml > ../../../ansible/prod/inventory/generated/prod.yml
```
---
## 🧱 Kurulum Akışı (Kanonik Sıra)
Kurulum dokümanlarını bu sırayla kullan:
Bir ortamı sıfırdan kurarken veya majör bir güncelleme yaparken şu sırayı takip edin:
1. `setup/00-genel-yol-haritasi.md` — genel kararlar ve sınırlar
2. `setup/01-private-network-port-matrisi.md` — private/public port politikası
3. `setup/02-test-terraform-iaac.md` — test Terraform aşaması
4. `setup/03-test-ansible-bootstrap.md` — test OS/bootstrap/sertleştirme
5. `setup/04-test-db-docker-kurulum.md` — test DB stack kurulumu (Swarm üzerinde)
6. `setup/05-test-runner-ve-deploy-onkosullari.md` — test runner ve deploy ön koşulları
7. `setup/06-prod-terraform-iaac.md` — prod Terraform aşaması
8. `setup/07-prod-ansible-bootstrap.md` — prod OS/bootstrap/sertleştirme
9. `setup/08-prod-db-cluster-kurulum.md` — prod DB cluster stack (MongoDB + Patroni/etcd)
10. `setup/09-prod-runner-ha-ve-swarm.md` — prod runner HA ve deploy kilit modeli
1. **Analiz:** [hetzner-sizing-report.md](./hetzner-sizing-report.md) ile kaynak ihtiyacını belirleyin.
2. **Planlama:** `roadmap/` altındaki ilgili ortam dökümanlarını inceleyerek yapılacak değişiklikleri anlayın.
3. **Hizalama:** [setup-vs-roadmap-map.md](./setup-vs-roadmap-map.md) ile hangi setup dökümanlarını kullanacağınızı netleştirin.
4. **Uygulama:** `setup/` dökümanlarını (00'dan 09'a kadar) sırasıyla takip ederek Terraform ve Ansible süreçlerini işletin.
5. **Doğrulama:** Kurulum sonrası sistemin çalışma prensipleri için `facts/` dökümanlarını referans alın.
## 🛣️ Yol Haritası Dokümanları
---
Yol haritası klasörleri; Swarm stack'leri, SWAG, APISIX, pipeline güncellemeleri ve doğrulama kontrol listeleri için entegrasyon çalışmalarını takip eder:
## ✅ Ön Koşullar ve Araçlar
- `roadmap/test-env/*`
- `roadmap/prod-env/*`
- **Terraform >= 1.6**: Altyapı provisioning.
- **Ansible**: Konfigürasyon yönetimi.
- **Hetzner Cloud API Token**: Ortam bazlı yetkilendirme.
- **SSH Key**: Sunucu erişimi için sisteme tanımlı anahtar çifti.
Bu dokümanlar zaman zaman ilgili repolardan (örneğin uygulama ana deposu workflow ve stack dosyaları) dosyalara referans verir. Bu altyapı temeliyle hizalanmış uygulama rehberleri olarak değerlendirilmelidir.
## 💰 Boyutlandırma ve Maliyet Özeti
Referans: `hetzner-sizing-report.md`
Önerilen temel yapı:
- **Test:** `2 x cpx42` (app + db)
- **Prod:** `3 x cpx42` (app) + `3 x cpx32` (db)
Rapordaki yaklaşık aylık toplam:
- Test: `$59.98`
- Prod: `$139.44`
- Toplam: `$199.42`
## 🔑 Gizli Bilgi ve State Yönetimi
Kesinlikle commit'lenmemeli:
- `terraform.tfvars`, `*.tfvars`, `*.tfstate`, `.terraform/`
- private key'ler, sertifikalar, `.env` gizli bilgileri
- runner token'ları ve vault parola dosyaları
Zorunlu kalıplar için `.gitignore` dosyasına bkz.
Önerilen:
- çalışma zamanı gizli bilgilerini güvenli gizli depolarda / şifreli vault dosyalarında tut
- üretilen çalışma zamanı artifakt'larını, açıkça temizlenmedikçe versiyon kontrolü dışında tut
## ⚠️ Bilinen Eksikler / Notlar
- `ansible/prod/inventory/generated/prod.yml` beklenen bir çıktı yoludur; üretilene kadar mevcut olmayabilir.
- Bazı yol haritası adımları, yalnızca bu depo değil, daha geniş kapsamlı `iklim.co` uygulama deposundaki dosyaları hedef alır.
## ✅ Hızlı Doğrulama Kontrol Listesi
Terraform apply sonrası:
- sunucular beklenen isimler ve private IP'lerle oluşturulmuş
- floating IP mevcut ve bağlı
- firewall'lar yalnızca amaçlanan public portlarııyor
- placement group'lar atanmış
- üretilen envanter YAML `ansible/{test,prod}/inventory/generated/*.yml` yoluna aktarılmış
Bootstrap/deploy aşamaları sonrası:
- Swarm durumu ve etiketleri dökümantasyonla eşleşiyor
- DB erişimi yalnızca private ağdan mümkün
- Vault/API gateway'leri public/private erişim kurallarına uyuyor
- Runner ve deploy kilit davranışı ortam politikasıyla örtüşüyor
## 🔗 Referanslar
- Hetzner Terraform Provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest
- Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/
- Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview
- Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview
- Docker Swarm overlay networking: https://docs.docker.com/engine/network/drivers/overlay/
- Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner
---
*iklim.co Infrastructure Team - 2026*

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 MiB

View File

@ -18,9 +18,9 @@
| `iklim-app-01` | API services, SWAG, Vault | Manager + Worker | `type=service` |
| `iklim-app-02` | API services replicas | Manager + Worker | `type=service` |
| `iklim-app-03` | API services replicas | Manager + Worker | `type=service` |
| `iklim-db-01` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` |
| `iklim-db-02` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` |
| `iklim-db-03` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` |
| `iklim-db-01` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db`, `db-index=01` |
| `iklim-db-02` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db`, `db-index=02` |
| `iklim-db-03` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db`, `db-index=03` |
### Label scheme rationale
@ -95,9 +95,9 @@ docker swarm join --token <WORKER_TOKEN> 10.20.10.11:2377
Then label them on iklim-app-01:
```bash
for node in iklim-db-01 iklim-db-02 iklim-db-03; do
docker node update --label-add role=db "$node"
done
docker node update --label-add role=db --label-add db-index=01 iklim-db-01
docker node update --label-add role=db --label-add db-index=02 iklim-db-02
docker node update --label-add role=db --label-add db-index=03 iklim-db-03
```
> DB nodes are Swarm **workers** only — they never become managers.
@ -117,7 +117,7 @@ docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
```
Expected: `map[type:service]` for app nodes, `map[role:db]` for DB nodes.
Expected: `map[type:service]` for app nodes, `map[db-index:01 role:db]` (vb.) for DB nodes.
## Step 7 — Confirm `init/swarm-init.sh` multi-node awareness

View File

@ -39,9 +39,9 @@ The deploy pipeline (see `08-deploy-pipeline-update.md`) runs on iklim-app-01:
```bash
set -a; . ./.env; set +a
mkdir -p "$SWAG_DNS_CONF_DIR"
envsubst < swag/dns-conf/godaddy.ini.tpl > "$SWAG_DNS_CONF_DIR/godaddy.ini"
chmod 600 "$SWAG_DNS_CONF_DIR/godaddy.ini"
mkdir -p "$SWAG_CONFIG_DIR/dns-conf"
envsubst < swag/dns-conf/godaddy.ini.tpl > "$SWAG_CONFIG_DIR/dns-conf/godaddy.ini"
chmod 600 "$SWAG_CONFIG_DIR/dns-conf/godaddy.ini"
```
## Step 4 — GoDaddy A records for prod subdomains (handled by pipeline)

View File

@ -35,13 +35,13 @@ Test-env Step 9 adds the `swag-vl` named volume to the base file. In prod, SWAG
No `swag-vl` definition is made in `docker-stack-infra.prod.yml`.
### Monitoring Persistence (StorageBox)
### Monitoring Persistence
Prometheus and Grafana run as single instances. To ensure monitoring data and dashboards survive a node failover (moving from `iklim-app-01` to another node), their data is stored on the shared StorageBox:
- **Prometheus:** `/mnt/storagebox/prometheus/data`
- **Grafana:** `/mnt/storagebox/grafana/data`
Prometheus and Grafana run as single instances, but their storage profiles are different:
- **Prometheus:** keep TSDB on a local Docker volume (`prometheus-vl`). Prometheus local storage should not run on StorageBox/DAVFS because of filesystem semantics and WAL/compaction I/O.
- **Grafana:** keep `/var/lib/grafana` on StorageBox (`/mnt/storagebox/grafana/data`) so dashboards, plugins, and the SQLite database are available if the single active instance is manually moved to another node.
These paths are mounted via env vars (`PROMETHEUS_DATA_DIR`, `GRAFANA_DATA_DIR`) with named-volume fallbacks for test. See Step 8 for implementation details.
Grafana uses the `GRAFANA_DATA_DIR` env var with a named-volume fallback for test. Prometheus continues to use the named Docker volume. See Step 9 for implementation details.
**Note:** PostgreSQL and MongoDB are not in `docker-stack-infra.yml`. They run in separate stacks on DB nodes (`iklim-db` and `iklim-patroni`). See `08-prod-db-cluster-kurulum.md`.
@ -225,6 +225,7 @@ nc -zv iklim-db-01 2379
## Step 5 — Redis: Sentinel cluster (prod overlay)
Redis runs as a single instance in test. In prod, Sentinel provides HA.
![[redis-sentinel-vs-cluster.png]]
Bitnami images are used — all configuration is done via env vars, no separate `.conf` file needed.
### Prerequisites
@ -461,16 +462,13 @@ Consistent hashing by `remote_addr` requires APISIX to see the actual client IP,
> **DNS Note:** For `chash` to work with node-specific names, the RabbitMQ service must have network aliases configured for each node (e.g., `rabbitmq-{{.Node.Hostname}}`) as shown in Step 6.
Update `template/apisix-core/config.yaml.template`:
In the `config.yaml` inside the custom APISIX image (`custom-apisix:3.12.0`):
```yaml
nginx_config:
http:
real_ip_header: "X-Forwarded-For"
real_ip_from:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
real_ip_header: "X-Real-IP"
set_real_ip_from: "10.0.0.0/8"
```
## Step 8 — Create `docker-stack-infra.prod.yml`
@ -496,7 +494,7 @@ services:
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file
- /mnt/storagebox/ssl:/vault/certs:ro
- ${SWAG_CERT_DIR}:/vault/certs:ro
deploy:
mode: replicated
replicas: 3
@ -534,7 +532,7 @@ services:
replicas: 1
placement:
constraints:
- node.hostname == iklim-app-01
- node.labels.type == service
restart_policy:
condition: any
delay: 5s
@ -625,41 +623,40 @@ networks:
external: true
```
## Step 8 — Monitoring Data Persistence (StorageBox)
## Step 9 — Monitoring Data Persistence
Prometheus and Grafana run as single instances. Without persistent storage, data is lost on node failover. This step mounts their data directories from the StorageBox shared filesystem.
Prometheus and Grafana run as single instances. Grafana data is placed on the StorageBox shared filesystem for manual failover. Prometheus TSDB stays on a local Docker volume because DAVFS/StorageBox is not suitable for Prometheus WAL and compaction I/O.
**Changes already applied to `docker-stack-infra.yml`:**
```yaml
prometheus:
volumes:
- ${PROMETHEUS_DATA_DIR:-prometheus-vl}:/prometheus
- prometheus-vl:/prometheus
grafana:
volumes:
- ${GRAFANA_DATA_DIR:-grafana-vl}:/var/lib/grafana
```
Test uses the named Docker volume fallbacks (`prometheus-vl`, `grafana-vl`) — no test env change needed.
Test uses the named Docker volume fallback (`grafana-vl`) for Grafana, and Prometheus always uses the named Docker volume (`prometheus-vl`) — no test env change needed.
**Add to `prod/secrets/iklim.co/.env.prod` on storagebox** (already in `env-prod/.env`):
```bash
PROMETHEUS_DATA_DIR=/mnt/storagebox/prometheus/data
GRAFANA_DATA_DIR=/mnt/storagebox/grafana/data
```
**Create directories on StorageBox before first prod deploy:**
```bash
mkdir -p /mnt/storagebox/prometheus/data /mnt/storagebox/grafana/data
mkdir -p /mnt/storagebox/grafana/data
```
> Grafana writes its SQLite database and dashboard JSON to `/var/lib/grafana`.
> Prometheus writes its TSDB to `/prometheus`. Both directories must exist before the stack starts.
> Prometheus writes its TSDB to `/prometheus` on the local `prometheus-vl` Docker volume; it is not shared between nodes.
## Step 9 — Verify
## Step 10 — Verify
```bash
# Base file must be valid on its own (test deploy):
@ -669,6 +666,18 @@ docker stack config -c docker-stack-infra.yml > /dev/null && echo "base OK"
docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml > /dev/null && echo "prod merge OK"
```
## Step 9 — Database Proxies and Developer Access
In the production environment, the `pg-proxy` and `mongo-proxy` services (socat-based) defined in the base `docker-stack-infra.yml` are **deprecated and will not be used**.
### Rationale
- **Leader Tracking:** Simple L4 proxies (socat) cannot track the Patroni Leader or MongoDB Primary. They point to a single service VIP, which might lead to a Read-Only replica during failover.
- **HA Connection Strings:** Modern DB drivers (JDBC, libpq, MongoClient) support multi-host connection strings, which provide native failover and load balancing without an intermediate proxy.
### Developer Access Strategy
- **Direct Subnet Access:** Developers connect via WireGuard directly to the DB subnet (`10.20.20.0/24`).
- **No Translation:** Instead of mapping ports like `15432`, the standard ports (`5432`, `27017`) are used across all cluster nodes.
## Placement and Replica Summary — prod
| Service | File | Replicas | Placement | HA Note |

View File

@ -11,15 +11,13 @@ API_SUBDOMAIN=api.iklim.co
APIGW_SUBDOMAIN=apigw.iklim.co
RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co
GRAFANA_SUBDOMAIN=grafana.iklim.co
RESTRICTED_IP_1=78.187.87.109
RESTRICTED_IP_2=95.70.151.248
RESTRICTED_IPS="78.187.87.109/32,95.70.151.248/32"
# SWAG storage paths — StorageBox is mounted on all app nodes, shared filesystem
# cert-reloader writes here; Vault reads from this path on every node — no SSH distribution needed
SWAG_CERT_DIR=/mnt/storagebox/ssl
# SWAG config dirs on StorageBox — all three survive node failover without pipeline re-run
SWAG_CONFIG_DIR=/mnt/storagebox/swag/config
SWAG_DNS_CONF_DIR=/mnt/storagebox/swag/dns-conf
SWAG_SITE_CONFS_DIR=/mnt/storagebox/swag/site-confs
```
@ -37,14 +35,14 @@ No new files to create — the same templates work for both environments.
```bash
set -a; . ./.env; set +a
export RESTRICTED_IP_1="78.187.87.109"
export RESTRICTED_IP_2="95.70.151.248"
export RESTRICTED_IPS_BLOCK="$(echo "$RESTRICTED_IPS" | tr ',' '\n' | sed 's|.*| allow &;|')"
mkdir -p "$SWAG_DNS_CONF_DIR" "$SWAG_SITE_CONFS_DIR"
mkdir -p "$SWAG_SITE_CONFS_DIR"
SWAG_VARS='${API_SUBDOMAIN}${APIGW_SUBDOMAIN}${GRAFANA_SUBDOMAIN}${RABBITMQ_SUBDOMAIN}${RESTRICTED_IPS_BLOCK}'
for tpl in swag/site-confs/*.conf.tpl; do
out="$SWAG_SITE_CONFS_DIR/$(basename "${tpl%.tpl}")"
envsubst < "$tpl" | sudo tee "$out" > /dev/null
envsubst "$SWAG_VARS" < "$tpl" | sudo tee "$out" > /dev/null
echo "✅ $out"
done

View File

@ -34,11 +34,12 @@ vault:
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file # host path per node
- /mnt/storagebox/ssl:/vault/certs:ro # StorageBox — shared across all nodes, no SSH distribution needed
- ${SWAG_CERT_DIR}:/vault/certs:ro # StorageBox — shared across all nodes, no SSH distribution needed
deploy:
mode: replicated
replicas: 3
placement:
max_replicas_per_node: 1
constraints:
- node.labels.type == service
```

View File

@ -10,7 +10,7 @@
- Storagebox paths via env vars (`SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, vb.) instead of local host paths
- Extra steps: Update DNS Records (GoDaddy API), Wait for etcd
## Step 1 — Remove manual cert scp lines from `Initialize Servers`
## Step 1 — Remove manual cert scp lines from `Initialize Workspace`
```yaml
# DELETE from "Initialize Servers" step:
@ -42,23 +42,23 @@ Insert **after** `Docker Login to Harbor` and **before** `Prepare SWAG Directori
2>/dev/null | jq -r '.[0].data // empty' 2>/dev/null || true)
if [ "$CURRENT" = "$FLOATING_IP" ]; then
echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (mevcut, atlanıyor)"
echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (exists, skipping)"
else
curl -sf -X PUT \
-H "Authorization: sso-key ${GODADDY_KEY}:${GODADDY_SECRET}" \
-H "Content-Type: application/json" \
"https://api.godaddy.com/v1/domains/${DOMAIN}/records/A/${record}" \
-d "[{\"data\":\"${FLOATING_IP}\",\"ttl\":600}]"
echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (eklendi/güncellendi)"
echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (added/updated)"
fi
done
working-directory: /workspace/iklim.co
```
> `GODADDY_KEY` ve `GODADDY_SECRET` `.env.secrets.swag`'dan okunur.
> `PROD_FLOATING_IP` Gitea project variable olarak tanımlanmalı (`terraform output prod_floating_ip`).
> `jq` gereklidir — `Update Apt Repository` adımına eklenmiş olmalı: `apt-get install -y gettext tree jq`.
> Her deploy'da çalışır; mevcut ve doğru kayıtlar atlanır (idempotent).
> `GODADDY_KEY` and `GODADDY_SECRET` are read from `.env.secrets.swag`.
> `PROD_FLOATING_IP` must be defined as a Gitea project variable (`terraform output prod_floating_ip`).
> `jq` is required — it must have been added to the `Update Apt Repository` step: `apt-get install -y gettext tree jq`.
> Runs on every deploy; existing and correct records are skipped (idempotent).
## Step 3 — Add `Prepare SWAG Directories` step
@ -69,10 +69,10 @@ Insert **before** `Bootstrap Vault TLS Placeholder`:
run: |
set -a; . ./.env; . ./.env.secrets.swag; set +a
mkdir -p "$SWAG_CONFIG_DIR" "$SWAG_DNS_CONF_DIR" "$SWAG_SITE_CONFS_DIR"
mkdir -p "$SWAG_CONFIG_DIR/dns-conf" "$SWAG_SITE_CONFS_DIR"
envsubst < swag/dns-conf/godaddy.ini.tpl | docker run --rm -i \
-v "${SWAG_DNS_CONF_DIR}:/output" \
-v "${SWAG_CONFIG_DIR}/dns-conf:/output" \
alpine sh -c "cat > /output/godaddy.ini && chmod 600 /output/godaddy.ini"
echo "✅ godaddy.ini written"
@ -112,20 +112,20 @@ APISIX reads its entire configuration from etcd; init script will fail silently
```yaml
- name: Wait for etcd
run: |
echo "⏳ Waiting for etcd..."
echo "⏳ Waiting for Patroni etcd..."
for i in $(seq 1 30); do
if docker run --rm --network iklimco-net alpine \
sh -c "wget -qO- http://etcd:2379/health 2>/dev/null | grep -q '\"health\":\"true\"'"; then
echo "✅ etcd ready"
sh -c "wget -qO- http://iklim-db-01:2379/health 2>/dev/null | grep -q '\"health\":\"true\"'"; then
echo "✅ Patroni etcd ready"
break
fi
[ "$i" -eq 30 ] && echo "❌ etcd did not become ready in time" && exit 1
[ "$i" -eq 30 ] && echo "❌ Patroni etcd did not become ready in time" && exit 1
echo " attempt $i/30 — waiting 5s..."
sleep 5
done
```
> **Note:** In prod, the standalone `etcd` service from `docker-stack-infra.yml` still runs (Docker Compose overlay files cannot remove services). APISIX currently uses this etcd; the Patroni etcd migration happens via `docker-stack-infra.prod.yml`. The `http://etcd:2379/health` check targets this standalone service and is correct for the current setup.
> **Note:** In prod, APISIX uses the 3-node Patroni etcd cluster on DB nodes (`iklim-db-01/02/03:2379`) via the `/apisix` prefix — configured in `config.yaml` mounted by the prod overlay. The standalone `etcd` service from the base stack remains idle. This step waits for Patroni etcd (`iklim-db-01:2379`) to be healthy before running the APISIX init script.
## Step 5 — Add `Run APISIX Init` step
@ -247,9 +247,9 @@ Insert **after** `Bootstrap SWAG Certificate` and **before** `Review Environment
## Step 8 — Microservice prod deploy overlay
Her mikroservisin kendi `docker-stack-service.prod.yml` overlay dosyası vardır. Bu dosya prod'a özgü `replicas: 3` ve `max_replicas_per_node: 1` ayarlarını içerir.
Each microservice has its own `docker-stack-service.prod.yml` overlay file. This file contains prod-specific `replicas: 3` and `max_replicas_per_node: 1` settings.
Mikroservis deploy pipeline'larında (`deploy-prod.yml`) `docker stack deploy` komutu şu şekilde olmalı:
In microservice deploy pipelines (`deploy-prod.yml`), the `docker stack deploy` command should be:
```bash
docker stack deploy \
@ -258,7 +258,7 @@ docker stack deploy \
iklimco
```
Örneğin `BE-Authentication` için:
For example, for `BE-Authentication`:
```bash
docker stack deploy \
@ -267,7 +267,7 @@ docker stack deploy \
iklimco
```
> Yeni bir mikroservis eklendiğinde `BE-<ServiceName>/docker-stack-service.prod.yml` dosyasının oluşturulması ve pipeline'ın bu overlay'i içermesi zorunludur.
> When a new microservice is added, `BE-<ServiceName>/docker-stack-service.prod.yml` must be created and the pipeline must include this overlay.
## Step 9 — Ensure subdomain env vars are in prod `.env`
@ -282,27 +282,34 @@ GRAFANA_SUBDOMAIN=grafana.iklim.co
## Step 10 — Final step order for prod pipeline
1. Acquire Deploy Lock
2. Checkout Branch
3. Prepare Folders
4. Set up SSH Key and Add to known_hosts
5. Update Apt Repository and Install Required Tools (`gettext tree jq`)
6. Fetch Service Secret Files
7. Initialize Servers ← cert scp lines removed
8. Upload Updated Secrets to Storagebox
9. Provision Vault AppRole IDs and Docker Secrets
10. Upload Updated Env to Storagebox
11. Prepare Init Files ← cert copy lines removed
12. Initialize Docker Swarm
13. Stop Docker Compose Services
14. Docker Login to Harbor
15. **Update DNS Records** ← NEW (GoDaddy API, idempotent)
16. **Prepare SWAG Directories** ← NEW
17. Bootstrap Vault TLS Placeholder
18. Deploy Swarm Stack
19. **Wait for etcd** ← NEW
20. **Run APISIX Init** ← NEW (`SPRING_PROFILES_ACTIVE=prod`)
21. **Bootstrap SWAG Certificate** ← NEW
22. **Run Database Init Scripts** ← NEW (`postgresql`, `mongodb`)
23. Review Environment
24. Release Deploy Lock
To prevent concurrent deploys, a Gitea Actions `concurrency` block is added per pipeline:
```yaml
concurrency:
group: prod-deploy
cancel-in-progress: false
```
With `cancel-in-progress: false`, a new run waits in the queue until the previous one finishes; Gitea UI shows it as "queued" and does not return an error.
1. Checkout Branch
2. Prepare Folders
3. Set up SSH Key and Add to known_hosts
4. Update Apt Repository and Install Required Tools (`gettext tree jq`)
5. Fetch Service Secret Files
6. Initialize Workspace ← cert scp lines removed
7. Upload Updated Secrets to Storagebox
8. Provision Vault AppRole IDs and Docker Secrets
9. Upload Updated Env to Storagebox
10. Prepare Init Files ← cert copy lines removed
11. Initialize Docker Swarm
12. Docker Login to Harbor
13. **Update DNS Records** ← NEW (GoDaddy API, idempotent)
14. **Prepare SWAG Directories** ← NEW (`$SWAG_CONFIG_DIR/dns-conf`; renders nginx conf templates)
15. Bootstrap Vault TLS Placeholder
16. Deploy Swarm Stack
17. **Wait for etcd** ← NEW (Patroni etcd `iklim-db-01:2379`)
18. **Run APISIX Init** ← NEW (`SPRING_PROFILES_ACTIVE=prod`)
19. **Bootstrap SWAG Certificate** ← NEW
20. **Run Database Init Scripts** ← NEW (`postgresql`, `mongodb`)
21. Review Environment

View File

@ -1,157 +1,137 @@
# 00 - Genel Yol Haritasi
# 00 - General Roadmap
Bu dosya, `Environment_Infrastructure` reposunda Terraform ve Ansible ile Hetzner Cloud uzerinde test/prod altyapisini kuracak ajanlar icin ana baglamdir. Her asama dosyasi kendi basina yeterli olacak sekilde yazilmistir; yine de bu dokuman genel karar kaydidir.
This file is the main context for agents that will set up the test/prod infrastructure on Hetzner Cloud with Terraform and Ansible in the `Environment_Infrastructure` repo. Each phase file is written to be self-sufficient; nevertheless, this document is the general decision record.
## Hedef
## Goal
Iklim.co altyapisi iki ayri Hetzner Cloud Project uzerinde kurulacak:
The Iklim.co infrastructure will be set up on two separate Hetzner Cloud Projects:
- `test` Hetzner Cloud Project
- `prod` Hetzner Cloud Project
Bu ayrim zorunlu kabul edilir. API token, network, firewall, placement group, server, maliyet ve yanlislikla silme riskleri ortam bazinda ayrilmis olur.
This separation is considered mandatory. API tokens, networks, firewalls, placement groups, servers, costs, and accidental deletion risks are separated by environment.
## Terraform ve Ansible Sorumluluk Siniri
## Terraform and Ansible Responsibility Boundary
Terraform sadece IaaS kaynaklarini olusturur:
Terraform creates only IaaS resources:
- Hetzner Cloud server
- Private network ve subnet
- Private network and subnet
- Firewall
- SSH key
- Placement group
- Opsiyonel volume, floating IP, load balancer veya DNS kaydi
- Optional volume, floating IP, load balancer, or DNS record
- Ansible inventory output
Ansible olusan Linux makineleri hazirlar:
Ansible prepares the created Linux machines:
- Linux base paketleri
- Linux base packages
- Security hardening
- Docker Engine kurulumu
- Docker Engine installation
- Docker Swarm init/join
- Gitea Actions `act_runner` systemd kurulumu
- Ortak klasorler ve deploy on kosullari
- Gitea Actions `act_runner` systemd installation
- Shared directories and deploy prerequisites
Terraform icinde Docker, Swarm, runner veya uygulama deploy'u yapilmayacak. Ansible icinde Hetzner Cloud kaynaklari yaratilmeyecek.
Docker, Swarm, runner, or application deployment will not be done inside Terraform. Hetzner Cloud resources will not be created inside Ansible.
## Ortam Topolojileri
## Environment Topologies
### Test
Test ortami minimum topoloji:
Minimum topology for the test environment:
| Node | Rol | Not |
| Node | Role | Note |
| --- | --- | --- |
| `iklim-app-01` | Swarm manager + app worker + Gitea runner | CI/CD test deploy bu node uzerinden calisir |
| `iklim-db-01` | DB node | DB altyapisi manuel kurulacak; Gitea CI/CD ile kurulmayacak |
| `iklim-app-01` | Swarm manager + app worker + Gitea runner | CI/CD test deploy runs through this node |
| `iklim-db-01` | DB node | DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD |
Test DB kurulumu Terraform/Ansible ile sadece makine ve OS hazirligina kadar getirilir. PostgreSQL/MongoDB cluster kurulumu bu asamanin disindadir.
The test DB setup is brought only up to machine and OS preparation with Terraform/Ansible. PostgreSQL/MongoDB cluster installation is outside this phase.
### Prod
Prod ortami HA topoloji:
HA topology for the prod environment:
| Node grubu | Adet | Rol |
| Node group | Count | Role |
| --- | ---: | --- |
| `iklim-app-*` | 3 | Her biri Swarm manager + app worker |
| `iklim-db-*` | 3 | DB cluster node'lari |
| `iklim-app-*` | 3 | Each one is a Swarm manager + app worker |
| `iklim-db-*` | 3 | DB cluster nodes |
Prod DB altyapisi manuel kurulacak; Gitea CI/CD ile kurulmayacak. Terraform DB makinelerini ve network/firewall kurallarini hazirlar, Ansible OS hardening ve temel bagimliliklari kurar.
Prod DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD. Terraform prepares the DB machines and network/firewall rules; Ansible installs OS hardening and base dependencies.
## Public Port Politikasi
## Public Port Policy
Public internete acik portlar sadece:
Ports open to the public internet are only:
- `22/tcp` SSH, sadece admin IP/CIDR kaynaklarindan
- `22/tcp` SSH, only from admin IP/CIDR sources
- `80/tcp` HTTP
- `443/tcp` HTTPS
`8200/tcp` Vault public internete acilmayacak. Vault sadece private network veya Docker overlay icinden erisilebilir olmalidir.
`8200/tcp` Vault will not be opened to the public internet. Vault must be reachable only from the private network or Docker overlay.
`docker-stack-infra.yml` bu politikaya uygun hale getirilmiştir: yalnızca SWAG servisi 80/443 portlarını yayınlar; Vault, APISIX, RabbitMQ, Prometheus, Grafana gibi tüm diğer servisler yalnızca `iklimco-net` overlay üzerinden erişilebilir.
`docker-stack-infra.yml` has been aligned with this policy: only the SWAG service publishes ports 80/443; all other services such as Vault, APISIX, RabbitMQ, Prometheus, and Grafana are reachable only through the `iklimco-net` overlay.
## Private Network Politikasi
## Private Network Policy
Private network icinde acilmasi gereken portlarin ayrintili matrisi `01-private-network-port-matrisi.md` dosyasindadir. Ajanlar firewall veya Ansible UFW kurali yazarken bu dosyayi kaynak kabul etmelidir.
The detailed matrix of ports that must be opened inside the private network is in `01-private-network-port-matrisi.md`. Agents must treat that file as the source when writing firewall or Ansible UFW rules.
## Gitea Actions Runner Karari
## Gitea Actions Runner Decision
`act_runner` Docker container olarak calistirilmayacak ve Docker socket container'a mount edilmeyecek.
`act_runner` will not run as a Docker container, and the Docker socket will not be mounted into a container.
Tercih edilen kurulum:
Preferred installation:
- `act_runner` Linux systemd servisi olarak kurulur.
- Runner icin ayri `gitea-runner` kullanicisi olusturulur.
- CI/CD job'lari gerekli oldugunda container olusturabilir; bunun icin runner host uzerinde Docker CLI/daemon erisimi gerekir.
- Docker group uyeligi root seviyesine yakin yetki verdigi icin sadece guvenilir Gitea repo/job'lari bu runner label'larini kullanmalidir.
- `act_runner` is installed as a Linux systemd service.
- A separate `gitea-runner` user is created for the runner.
- CI/CD jobs can create containers when needed; for this, the runner host needs Docker CLI/daemon access.
- Because Docker group membership grants permissions close to root level, only trusted Gitea repos/jobs should use these runner labels.
Prod HA icin `act_runner` tek makineye degil, 3 Swarm manager node'unun tamamına kurulacaktir. Boylece bir manager/runner kaybedildiginde pipeline calismaya devam edebilir. Runner label'lari hem ortak hem node-spesifik olmalidir:
For prod HA, `act_runner` will be installed not on a single machine but on all 3 Swarm manager nodes. This allows pipelines to continue when one manager/runner is lost. Runner labels must be both shared and node-specific:
- Ortak: `prod-runner`
- Node spesifik: `iklim-app-01`, `iklim-app-02`, `iklim-app-03`
- Shared: `prod-runner`
- Node specific: `iklim-app-01`, `iklim-app-02`, `iklim-app-03`
Test icin tek runner yeterlidir:
For test, a single runner is enough:
- Ortak: `test-runner`
- Node spesifik: `iklim-app-01`
- Shared: `test-runner`
- Node specific: `iklim-app-01`
## Deploy Lock Karari
## Deploy Serialization Decision
Prod ortaminda 3 runner HA icin gereklidir; ancak ayni anda birden fazla deploy job'u calistirabilir. Bu nedenle prod deploy islemleri StorageBox uzerinde otomatik lock ile tekillestirilmelidir.
Because of the 3-runner HA model in prod, multiple deploy jobs can run at the same time. Gitea Actions `concurrency` is used to prevent concurrent deploys; a StorageBox-based lock mechanism is not required.
Lock dosyalari/klasorleri manuel olusturulmayacak. Workflow basinda atomik `mkdir` ile olusturulacak, deploy bitince `rmdir` ile silinecek.
Onerilen StorageBox path'leri:
```text
prod/locks/prod-deploy.lock
prod/locks/prod-infra.lock
prod/locks/services/<service-name>.lock
```yaml
concurrency:
group: prod-deploy
cancel-in-progress: false
```
Baslangic icin en sade ve guvenli model tek global prod deploy lock'tur:
With `cancel-in-progress: false`, a new run in the same group is queued by Gitea until the previous one finishes; it appears as "queued" in the UI and is not shown as an error. All prod deploy workflows, including infrastructure and microservices, must use the same `group: prod-deploy` value so infra deploy and microservice deploy cannot overlap.
```text
prod/locks/prod-deploy.lock
```
## Hetzner Physical Host Separation
Bu model tum prod deploy'lari siraya sokar. Daha sonra ihtiyac olursa servis bazli lock modeline gecilebilir.
Hetzner Cloud does not allow direct cabinet selection. `Placement Group` is used for the requirement of avoiding the same physical host. A placement group of type `spread` aims to place the cloud servers in the group on different physical hosts.
Ornek akış:
Constraints:
```bash
ssh storagebox 'mkdir -p prod/locks && mkdir prod/locks/prod-deploy.lock'
# deploy islemleri
ssh storagebox 'rmdir prod/locks/prod-deploy.lock'
```
- A spread placement group reduces the impact of a single physical host failure.
- It does not guarantee protection against a wider failure inside the same datacenter or location.
- For location-level disaster recovery, a different location/region distribution must be designed later.
- According to Hetzner documentation, there is a maximum limit of 10 servers per spread placement group.
`mkdir` atomik oldugu icin lock zaten varsa komut fail olur; bu durumda job beklemeli veya temiz bir hata ile cikmalidir. Workflow fail olsa bile cleanup adimi lock'u silmeye calismalidir. Eski kalmis lock'lari tespit etmek icin lock klasoru icine timestamp, runner adi ve workflow bilgisi yazilabilir.
At least two placement groups are recommended for prod:
## Hetzner Fiziksel Host Ayrimi
- `iklim-prod-app-spread`: 3 Swarm manager/app nodes
- `iklim-prod-db-spread`: 3 DB nodes
Hetzner Cloud'da kabinet secimi dogrudan yapilmaz. Ayni fiziksel host'a dusmeme ihtiyaci icin `Placement Group` kullanilir. `spread` tipindeki placement group, gruptaki cloud server'lari farkli fiziksel host'lara yerlestirmeyi hedefler.
Optional for test:
Kisitlar:
- `iklim-test-spread`: `iklim-app-01` and `iklim-db-01`
- Spread placement group, tek fiziksel host arizasinin etkisini azaltir.
- Ayni datacenter veya lokasyon icindeki daha genis bir arizaya karsi garanti vermez.
- Lokasyon bazli felaket kurtarma icin ileride farkli lokasyon/region dagilimi tasarlanmalidir.
- Hetzner dokumanina gore spread placement group basina en fazla 10 server limiti vardir.
Prod icin en az iki placement group onerilir:
- `iklim-prod-app-spread`: 3 Swarm manager/app node
- `iklim-prod-db-spread`: 3 DB node
Test icin opsiyonel:
- `iklim-test-spread`: `iklim-app-01` ve `iklim-db-01`
Kaynaklar:
Sources:
- Hetzner Terraform provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest
- Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/
- Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview
- Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview
- Docker Swarm overlay portlari: https://docs.docker.com/engine/network/drivers/overlay/
- Docker Swarm overlay ports: https://docs.docker.com/engine/network/drivers/overlay/
- Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner

View File

@ -1,39 +1,39 @@
# 07 - Private Network Port Matrisi
# 07 - Private Network Port Matrix
Bu dosya test ve prod ortamlarinda Hetzner private network icinde acilmasi gereken portlari tanimlar. Public internete acik portlar sadece `22/tcp`, `80/tcp`, `443/tcp` olacaktir. Vault `8200/tcp` public acilmayacak.
This file defines the ports that must be opened inside the Hetzner private network for test and prod environments. Ports open to the public internet will only be `22/tcp`, `80/tcp`, and `443/tcp`. Vault `8200/tcp` will not be opened publicly.
Bu matris Terraform Hetzner firewall ve Ansible UFW kurallari icin kaynak kabul edilmelidir.
This matrix must be treated as the source for Terraform Hetzner firewall and Ansible UFW rules.
## Network PlanI
## Network Plan
### Test
| Subnet | CIDR | Amac |
| Subnet | CIDR | Purpose |
| --- | --- | --- |
| App/Swarm | `10.10.10.0/24` | `iklim-app-01` |
| DB | `10.10.20.0/24` | `test-db-01` |
### Prod
| Subnet | CIDR | Amac |
| Subnet | CIDR | Purpose |
| --- | --- | --- |
| App/Swarm | `10.20.10.0/24` | `iklim-app-01/02/03` |
| DB | `10.20.20.0/24` | `prod-db-01/02/03` |
## Public Ingress Standardi
## Public Ingress Standard
Tum ortamlar icin public ingress:
Public ingress for all environments:
| Port | Protocol | Kaynak | Hedef | Zorunluluk |
| Port | Protocol | Source | Target | Requirement |
| --- | --- | --- | --- | --- |
| `22` | TCP | Admin IP/CIDR | Tum node'lar | SSH yonetim |
| `22` | TCP | Admin IP/CIDR | All nodes | SSH management |
| `80` | TCP | Internet | `iklim-app-01` (gateway) | HTTP / ACME redirect |
| `443` | TCP | Internet | `iklim-app-01` (gateway) | HTTPS |
| `51820` | UDP | `0.0.0.0/0`, `::/0` | `iklim-db-01` (DB node) | WireGuard VPN — DB node yonetim erisimi |
| `51820` | UDP | `0.0.0.0/0`, `::/0` | `iklim-db-01` (DB node) | WireGuard VPN — authentication with cryptographic key |
Public olarak acilmayacak kritik portlar:
Critical ports that will not be opened publicly:
| Port | Servis |
| Port | Service |
| --- | --- |
| `8200/tcp` | Vault |
| `5432/tcp` | PostgreSQL |
@ -45,120 +45,119 @@ Public olarak acilmayacak kritik portlar:
| `9090/tcp` | Prometheus |
| `3000/tcp` | Grafana |
## Docker Swarm Private Portlari
## Docker Swarm Private Ports
Docker Swarm node'lari arasinda zorunlu portlar:
Required ports between Docker Swarm nodes:
| Port | Protocol | Kaynak | Hedef | Aciklama |
| Port | Protocol | Source | Target | Description |
| --- | --- | --- | --- | --- |
| `2377` | TCP | Swarm node'lari | Swarm manager node'lari | Swarm control plane / join |
| `7946` | TCP | Tum Swarm node'lari | Tum Swarm node'lari | Node discovery / gossip |
| `7946` | UDP | Tum Swarm node'lari | Tum Swarm node'lari | Node discovery / gossip |
| `4789` | UDP | Tum Swarm node'lari | Tum Swarm node'lari | Overlay VXLAN data path |
| `2377` | TCP | Swarm nodes | Swarm manager nodes | Swarm control plane / join |
| `7946` | TCP | All Swarm nodes | All Swarm nodes | Node discovery / gossip |
| `7946` | UDP | All Swarm nodes | All Swarm nodes | Node discovery / gossip |
| `4789` | UDP | All Swarm nodes | All Swarm nodes | Overlay VXLAN data path |
Testte bu portlar fiilen tek Swarm node icin gerekli olsa da ileride worker eklemeyi kolaylastirmak icin app subnet icinde tanimlanabilir.
In test, these ports are effectively required for a single Swarm node, but they can be defined inside the app subnet to make adding workers easier later.
Prod'da `10.20.10.0/24` app/swarm subnet icinde bu portlar tum `iklim-app-*` node'lari arasinda acik olmalidir.
In prod, these ports must be open between all `iklim-app-*` nodes inside the `10.20.10.0/24` app/swarm subnet.
Kaynak: Docker overlay network dokumani, https://docs.docker.com/engine/network/drivers/overlay/
Source: Docker overlay network documentation, https://docs.docker.com/engine/network/drivers/overlay/
## Uygulama ve Infra Servis Private Portlari
## Application and Infra Service Private Ports
Bu portlar public acilmayacak. Sadece private network veya Docker overlay icinde gerekli kaynaklardan erisime izin verilecek.
These ports will not be opened publicly. Access will be allowed only from required sources inside the private network or Docker overlay.
| Port | Protocol | Servis | Kaynak | Hedef | Not |
| Port | Protocol | Service | Source | Target | Note |
| --- | --- | --- | --- | --- | --- |
| `8200` | TCP | Vault API/UI | Swarm app node'lari / runner | Vault service/node | Public kapali. Runtime servisleri Vault'a private/overlay uzerinden erismeli |
| `6379` | TCP | Redis | Swarm app node'lari | Redis service/node | Public kapali |
| `5672` | TCP | RabbitMQ AMQP | Swarm app node'lari | RabbitMQ service/node | Public kapali |
| `15672` | TCP | RabbitMQ Management | Admin CIDR veya private ops | RabbitMQ service/node | Public kapali; tercihen VPN/bastion |
| `61613` | TCP | RabbitMQ STOMP | Gerekli app node'lari | RabbitMQ service/node | Public kapali |
| `15674` | TCP | RabbitMQ Web STOMP | Gerekli app/gateway node'lari | RabbitMQ service/node | Public kapali |
| `2379` | TCP | etcd client | APISIX service/node | etcd service/node | Public kapali |
| `2380` | TCP | etcd peer | etcd cluster node'lari | etcd cluster node'lari | Tek replica ise gerekmeyebilir; cluster olursa gerekli |
| `9180` | TCP | APISIX Admin API | Admin CIDR veya private ops | APISIX service/node | Public kapali |
| `9090` | TCP | Prometheus UI/API | Admin CIDR veya private ops | Prometheus service/node | Public kapali |
| `3000` | TCP | Grafana UI | Admin CIDR veya private ops | Grafana service/node | Public kapali |
| `8200` | TCP | Vault API/UI | Swarm app nodes / runner | Vault service/node | Public closed. Runtime services must reach Vault through private/overlay |
| `6379` | TCP | Redis | Swarm app nodes | Redis service/node | Public closed |
| `5672` | TCP | RabbitMQ AMQP | Swarm app nodes | RabbitMQ service/node | Public closed |
| `15672` | TCP | RabbitMQ Management | Admin CIDR or private ops | RabbitMQ service/node | Public closed; preferably VPN/bastion |
| `61613` | TCP | RabbitMQ STOMP | Required app nodes | RabbitMQ service/node | Public closed |
| `15674` | TCP | RabbitMQ Web STOMP | Required app/gateway nodes | RabbitMQ service/node | Public closed |
| `2379` | TCP | etcd client | APISIX service/node | etcd service/node | Public closed |
| `2380` | TCP | etcd peer | etcd cluster nodes | etcd cluster nodes | May not be needed for a single replica; required if clustered |
| `9180` | TCP | APISIX Admin API | Admin CIDR or private ops | APISIX service/node | Public closed |
| `9090` | TCP | Prometheus UI/API | Admin CIDR or private ops | Prometheus service/node | Public closed |
| `3000` | TCP | Grafana UI | Admin CIDR or private ops | Grafana service/node | Public closed |
`docker-stack-infra.yml` güncellenmiş olup yalnızca SWAG servisi 80/443 portlarını host mode ile yayınlar. Diğer tüm servisler published port içermez; erişim yalnızca `iklimco-net` overlay üzerinden sağlanır. Private ingress kararları için bu tablo kaynak olmaya devam eder.
`docker-stack-infra.yml` has been updated so that only the SWAG service publishes ports 80/443 in host mode. All other services contain no published ports; access is provided only through the `iklimco-net` overlay. This table remains the source for private ingress decisions.
## DB Node Portlari
## DB Node Ports
DB altyapisi manuel kurulacagi icin kesin cluster teknolojisi bu dokumanin disindadir. Yine de firewall icin varsayilan portlar asagidadir.
Because DB infrastructure will be installed manually, the exact cluster technology is outside this document. Still, the default ports for firewall purposes are below.
### PostgreSQL / PostGIS (Patroni + etcd)
Prod ortami Patroni + etcd ile yonetilen PostgreSQL kullanir. Test ortaminda tek node oldugu icin replication ve HA portlari gerekmez.
The prod environment uses PostgreSQL managed with Patroni + etcd. In the test environment, replication and HA ports are not required because there is a single node.
| Port | Protocol | Kaynak | Hedef | Not |
| Port | Protocol | Source | Target | Note |
| --- | --- | --- | --- | --- |
| `5432` | TCP | App/Swarm subnet | PostgreSQL node'lari (Patroni yonetimli) | Uygulama JDBC — tum node'lara baglanir, driver primary'i bulur |
| `5432` | TCP | DB subnet | PostgreSQL node'lari | Patroni replication (pg_basebackup ve wal streaming) |
| `8008` | TCP | DB subnet | PostgreSQL node'lari | Patroni REST API — leader election, saglik kontrolu |
| `2379` | TCP | DB subnet | etcd node'lari | etcd client — Patroni → etcd erisimi |
| `2380` | TCP | DB subnet | etcd node'lari | etcd peer — etcd cluster icindeki raft protokolu |
| `5432` | TCP | App/Swarm subnet | PostgreSQL nodes (Patroni-managed) | Application JDBC — connects to all nodes, driver finds the primary |
| `5432` | TCP | DB subnet | PostgreSQL nodes | Patroni replication (pg_basebackup and WAL streaming) |
| `8008` | TCP | DB subnet | PostgreSQL nodes | Patroni REST API — leader election, health check |
| `2379` | TCP | DB subnet | etcd nodes | etcd client — Patroni -> etcd access |
| `2380` | TCP | DB subnet | etcd nodes | etcd peer — raft protocol inside the etcd cluster |
### MongoDB
| Port | Protocol | Kaynak | Hedef | Not |
| Port | Protocol | Source | Target | Note |
| --- | --- | --- | --- | --- |
| `27017` | TCP | App/Swarm subnet | MongoDB node/replica set endpoint | Uygulama DB baglantisi |
| `27017` | TCP | DB subnet | MongoDB replica set node'lari | Replica set internal trafik |
| `27017` | TCP | App/Swarm subnet | MongoDB node/replica set endpoint | Application DB connection |
| `27017` | TCP | DB subnet | MongoDB replica set nodes | Replica set internal traffic |
Ileride sharding yapilirsa `27018/27019` gibi ek MongoDB rolleri gundeme gelebilir; bu asamada acilmayacak.
If sharding is added later, additional MongoDB roles such as `27018/27019` may come up; they will not be opened at this stage.
## Test Private Kurallari
## Test Private Rules
Test ortaminda minimum:
Minimum for the test environment:
| Kaynak | Hedef | Portlar |
| Source | Target | Ports |
| --- | --- | --- |
| `10.10.10.0/24` | `10.10.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` |
| `10.10.10.0/24` | `10.10.20.0/24` | `5432/tcp`, `27017/tcp` |
| `10.10.10.0/24` | `10.10.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp` |
| Admin CIDR veya VPN | `10.10.10.0/24` | `9000/tcp`, `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` |
| Admin CIDR or VPN | `10.10.10.0/24` | `9000/tcp`, `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` |
Testte DB node tek oldugu icin DB subnet icindeki PostgreSQL/MongoDB replication portlari aktif kullanilmayabilir.
Because the DB node is single-node in test, PostgreSQL/MongoDB replication ports inside the DB subnet may not be actively used.
## Prod Private Kurallari
## Prod Private Rules
Prod ortaminda minimum (Patroni + etcd dahil):
Minimum for the prod environment, including Patroni + etcd:
App subnet (swarm firewall) — kendi icindeki trafik:
App subnet (swarm firewall) — traffic inside itself:
| Kaynak | Hedef | Portlar |
| Source | Target | Ports |
| --- | --- | --- |
| `10.20.10.0/24` | `10.20.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm) |
| `10.20.10.0/24` | `10.20.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp`, `2379/tcp` (uygulama servisleri) |
| Admin CIDR veya VPN | `10.20.10.0/24` | `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` |
| `10.20.10.0/24` | `10.20.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp`, `2379/tcp` (application services) |
| Admin CIDR or VPN | `10.20.10.0/24` | `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` |
App → DB trafigi (swarm firewall'da ilgili kural bulunmaz; db firewall'da izin verilir):
App -> DB traffic (there is no related rule in the swarm firewall; it is allowed in the db firewall):
| Kaynak | Hedef | Portlar |
| Source | Target | Ports |
| --- | --- | --- |
| `10.20.10.0/24` | `10.20.20.0/24` | `5432/tcp`, `27017/tcp` (DB erisimi) |
| `10.20.10.0/24` | `10.20.20.0/24` | `5432/tcp`, `27017/tcp` (DB access) |
| `10.20.10.0/24` | `10.20.20.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm — DB worker join) |
DB subnet (db firewall) — DB node'lari arasi trafik:
DB subnet (db firewall) — traffic between DB nodes:
| Kaynak | Hedef | Portlar |
| Source | Target | Ports |
| --- | --- | --- |
| `10.20.20.0/24` | `10.20.20.0/24` | `5432/tcp`, `27017/tcp` (DB replication) |
| `10.20.20.0/24` | `10.20.20.0/24` | `2379/tcp`, `2380/tcp` (etcd client/peer) |
| `10.20.20.0/24` | `10.20.20.0/24` | `8008/tcp` (Patroni REST API) |
DB → App trafigi (swarm firewall'da izin verilir):
DB -> App traffic (allowed in the swarm firewall):
| Kaynak | Hedef | Portlar |
| Source | Target | Ports |
| --- | --- | --- |
| `10.20.20.0/24` | `10.20.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm — manager portlari) |
| `10.20.20.0/24` | `10.20.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm — manager ports) |
## Kabul Kriterleri
- Public firewall `8200/tcp` acmaz.
- DB portlari public acik degildir.
- Swarm portlari sadece private app/swarm subnet icinde aciktir.
- App/Swarm subnet DB subnet'e sadece gerekli DB portlarindan erisir.
- DB subnet app subnet'e genis yetkiyle acilmaz.
- Admin UI portlari public yerine admin CIDR/VPN/private ops ile sinirlandirilir.
## Acceptance Criteria
- The public firewall does not open `8200/tcp`.
- DB ports are not open publicly.
- Swarm ports are open only inside the private app/swarm subnet.
- The App/Swarm subnet reaches the DB subnet only through required DB ports.
- The DB subnet is not opened to the app subnet with broad permissions.
- Admin UI ports are restricted through admin CIDR/VPN/private ops instead of public access.

View File

@ -1,29 +1,29 @@
# 02 - Test Terraform IaC
Bu asamanin amaci test Hetzner Cloud Project icinde minimum IaaS kaynaklarini Terraform ile olusturmaktir. Bu dokuman tek basina uygulanabilir olacak sekilde yazilmistir.
The purpose of this phase is to create the minimum IaaS resources inside the test Hetzner Cloud Project with Terraform. This document is written so it can be applied on its own.
## Kapsam
## Scope
Terraform test ortaminda sunlari olusturur:
Terraform creates the following in the test environment:
- Private network: `iklim-test-net`
- Subnetler:
- Subnets:
- App/Swarm subnet: `10.10.10.0/24`
- DB subnet: `10.10.20.0/24`
- Firewall:
- Public ingress: sadece `22/tcp`, `80/tcp`, `443/tcp`
- Private ingress: `01-private-network-port-matrisi.md` dosyasindaki test kurallari
- Public ingress: only `22/tcp`, `80/tcp`, `443/tcp`
- Private ingress: test rules in `01-private-network-port-matrisi.md`
- SSH key
- Placement group: `iklim-test-spread`
- Floating IP: swarm entry point icin sabit IPv4
- Floating IP: stable IPv4 for the swarm entry point
- Server:
- `iklim-app-01`
- `iklim-db-01`
- Ansible inventory output
Terraform DB yazilimini kurmaz. DB node sadece makine, network ve firewall seviyesinde hazirlanir.
Terraform does not install DB software. The DB node is prepared only at the machine, network, and firewall level.
## Onerilen Dosya Yapisi
## Recommended File Structure
```text
terraform/
@ -42,11 +42,11 @@ terraform/
terraform.tfvars.example
```
`terraform.tfvars` commit edilmeyecek. `.gitignore` icinde ignore edilmelidir.
`terraform.tfvars` will not be committed. It must be ignored in `.gitignore`.
## Degiskenler
## Variables
Minimum degiskenler:
Minimum variables:
```hcl
hcloud_token = "secret"
@ -58,67 +58,64 @@ admin_ssh_public_key_path = "~/.ssh/id_rsa.pub"
admin_allowed_cidrs = ["X.X.X.X/32"]
```
`environment` sabiti `locals.tf` icindedir; `tfvars` ile override edilmez.
The `environment` constant is in `locals.tf`; it is not overridden with `tfvars`.
`location` icin tek lokasyonla baslanir. Farkli region/lokasyon felaket kurtarma bu asamada konu disidir; ileride dokumana eklenmelidir.
Start with a single location for `location`. Disaster recovery across different regions/locations is outside the scope of this stage and must be added to the document later.
Server type karari `../hetzner-sizing-report.md` dokumanindaki mevcut test
ortami metriklerine dayanir. Test app node uzerinde 10 mikroservis ve altyapi
servisleri birlikte calistigi icin `cpx32` RAM acisindan riskli bulunmustur.
Test DB node icin de tek node CPU spike riski nedeniyle `cpx42` onerilir.
The server type decision is based on the current test environment metrics in `../hetzner-sizing-report.md`. Because 10 microservices and infrastructure services run together on the test app node, `cpx32` was considered risky in terms of RAM. `cpx42` is also recommended for the test DB node because of single-node CPU spike risk.
## Server Rolleri
## Server Roles
| Server | Private IP | Rol |
| Server | Private IP | Role |
| --- | --- | --- |
| `iklim-app-01` | `10.10.10.11` | Swarm manager + app worker + Gitea runner |
| `iklim-db-01` | `10.10.20.11` | Manuel DB kurulumu icin hazir DB node |
| `iklim-db-01` | `10.10.20.11` | DB node prepared for manual DB installation |
Private IP'ler Terraform icinde sabit tanimlanmalidir. Ansible inventory ve firewall kurallari deterministik kalir.
Private IPs must be statically defined inside Terraform. Ansible inventory and firewall rules remain deterministic.
## Onerilen Kaynaklar ve Maliyet
## Recommended Resources and Cost
| Server | Rol | Server Type | CPU | RAM | SSD | Aylik |
| Server | Role | Server Type | CPU | RAM | SSD | Monthly |
| --- | --- | --- | ---: | ---: | ---: | ---: |
| `iklim-app-01` | Swarm manager + app worker + Gitea runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
| `iklim-db-01` | PostgreSQL/PostGIS + MongoDB node | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
| **Toplam** | 2 server | | **16 vCPU** | **32 GB** | **640 GB** | **$59.98** |
| **Total** | 2 servers | | **16 vCPU** | **32 GB** | **640 GB** | **$59.98** |
## Firewall Kurallari
## Firewall Rules
Public ingress:
| Port | Kaynak | Hedef |
| Port | Source | Target |
| --- | --- | --- |
| `22/tcp` | `admin_allowed_cidrs` | Tum test node'lari |
| `22/tcp` | `admin_allowed_cidrs` | All test nodes |
| `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-01` |
| `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-01` |
Public ingress icin `8200/tcp`, `5432/tcp`, `27017/tcp`, `5672/tcp`, `15672/tcp`, `6379/tcp`, `2379/tcp`, `9000/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` acilmayacak.
For public ingress, `8200/tcp`, `5432/tcp`, `27017/tcp`, `5672/tcp`, `15672/tcp`, `6379/tcp`, `2379/tcp`, `9000/tcp`, `9180/tcp`, `9090/tcp`, and `3000/tcp` will not be opened.
### App (swarm) Firewall — Private Ingress
App subnet kaynakli (iklim-app-01):
Source from app subnet (`iklim-app-01`):
| Port | Servis | Erisim yontemi |
| Port | Service | Access method |
| --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | App subnet icinden |
| `7946/tcp,udp` | Docker Swarm node discovery | App subnet icinden |
| `4789/udp` | Docker Swarm VXLAN overlay | App subnet icinden |
| `2377/tcp` | Docker Swarm control plane | From app subnet |
| `7946/tcp,udp` | Docker Swarm node discovery | From app subnet |
| `4789/udp` | Docker Swarm VXLAN overlay | From app subnet |
| `8200/tcp` | Vault | Docker overlay / private network |
| `6379/tcp` | Redis | App subnet icinden |
| `5672/tcp` | RabbitMQ AMQP | App subnet icinden |
| `61613/tcp` | RabbitMQ STOMP | App subnet icinden |
| `15674/tcp` | RabbitMQ Web STOMP | App subnet icinden |
| `15672/tcp` | RabbitMQ Management | App subnet icinden; dis erisim SWAG `443` uzerinden — IP kisitli |
| `9000/tcp` | APISIX Dashboard | App subnet icinden; dis erisim SWAG `443` uzerinden — IP kisitli |
| `9180/tcp` | APISIX Admin API | App subnet icinden (Docker overlay dahil) |
| `9090/tcp` | Prometheus | App subnet icinden; dis erisim SWAG `443` uzerinden — IP kisitli |
| `3000/tcp` | Grafana | App subnet icinden; dis erisim SWAG `443` uzerinden — IP kisitli |
| `6379/tcp` | Redis | From app subnet |
| `5672/tcp` | RabbitMQ AMQP | From app subnet |
| `61613/tcp` | RabbitMQ STOMP | From app subnet |
| `15674/tcp` | RabbitMQ Web STOMP | From app subnet |
| `15672/tcp` | RabbitMQ Management | From app subnet; external access through SWAG `443` — IP restricted |
| `9000/tcp` | APISIX Dashboard | From app subnet; external access through SWAG `443` — IP restricted |
| `9180/tcp` | APISIX Admin API | From app subnet, including Docker overlay |
| `9090/tcp` | Prometheus | From app subnet; external access through SWAG `443` — IP restricted |
| `3000/tcp` | Grafana | From app subnet; external access through SWAG `443` — IP restricted |
DB subnet kaynakli (`iklim-db-01` Swarm'a worker olarak katildigi icin):
Source from DB subnet, because `iklim-db-01` joins Swarm as a worker:
| Port | Servis | Kaynak |
| Port | Service | Source |
| --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | `10.10.20.0/24` |
| `7946/tcp,udp` | Docker Swarm node discovery | `10.10.20.0/24` |
@ -126,30 +123,29 @@ DB subnet kaynakli (`iklim-db-01` Swarm'a worker olarak katildigi icin):
### DB Firewall — Private Ingress
| Port | Servis | Kaynak |
| Port | Service | Source |
| --- | --- | --- |
| `22/tcp` | SSH | `admin_allowed_cidrs` |
| `51820/udp` | WireGuard VPN | `0.0.0.0/0`, `::/0`kriptografik anahtar ile kimlik dogrulama |
| `51820/udp` | WireGuard VPN | `0.0.0.0/0`, `::/0`authentication with cryptographic key |
| `5432/tcp` | PostgreSQL | `10.10.10.0/24` (app subnet) |
| `27017/tcp` | MongoDB | `10.10.10.0/24` (app subnet) |
| `2377/tcp` | Docker Swarm control plane | `10.10.10.0/24` (app subnet) |
| `7946/tcp,udp` | Docker Swarm node discovery | `10.10.10.0/24` (app subnet) |
| `4789/udp` | Docker Swarm VXLAN overlay | `10.10.10.0/24` (app subnet) |
IP kisitlamasi Hetzner firewall'da degil, SWAG nginx konfigurasyonunda yapilir.
Bu portlarin hicbiri `admin_allowed_cidrs` kaynagiyla public'ten acilmaz.
IP restriction is done in the SWAG nginx configuration, not in the Hetzner firewall. None of these ports are opened publicly from the `admin_allowed_cidrs` source.
Diger private ingress kurallari icin `01-private-network-port-matrisi.md` kaynak alinacak.
For other private ingress rules, `01-private-network-port-matrisi.md` will be used as the source.
## Placement Group
`iklim-test-spread` placement group `type = "spread"` olacak. Testte iki server oldugu icin bu grup `iklim-app-01` ve `iklim-db-01` makinelerinin farkli fiziksel host'lara dagitilmasini hedefler.
The `iklim-test-spread` placement group will be `type = "spread"`. Because there are two servers in test, this group aims to distribute the `iklim-app-01` and `iklim-db-01` machines across different physical hosts.
Not: Spread placement group farkli kabinet veya lokasyon garantisi degildir; tek fiziksel host arizasinin etkisini azaltir.
Note: A spread placement group is not a guarantee of a different cabinet or location; it reduces the impact of a single physical host failure.
## Terraform Cikti Beklentisi
## Terraform Output Expectations
`outputs.tf` minimum su bilgileri uretmelidir:
`outputs.tf` must produce at least the following information:
```hcl
output "ansible_inventory_yaml" {
@ -169,53 +165,45 @@ output "test_floating_ip" {
}
```
Inventory output'u daha sonra `ansible/inventory/generated/test.yml` dosyasina yazilabilir. Inventory dosyasinda secret bulunmayacaksa commit edilebilir; secret veya token icerirse commit edilmeyecek.
The inventory output can later be written to `ansible/inventory/generated/test.yml`. If the inventory file contains no secrets, it can be committed; if it contains secrets or tokens, it will not be committed.
## Lifecycle ve Resize Politikasi
## Lifecycle and Resize Policy
### server_type Degisikligi (Yeniden Boyutlandirma)
### `server_type` Change (Resize)
`server_type` degistirmek Terraform destroy+create **tetiklemez**. `hcloud` provider
bunu natively destekler: sunucuyu durdurur, Hetzner Resize API'sini cagirir,
yeniden baslatir. `terraform.tfvars` icinde degeri guncelle, `terraform apply` calistir.
Changing `server_type` does **not** trigger Terraform destroy+create. The `hcloud` provider supports this natively: it stops the server, calls the Hetzner Resize API, and starts it again. Update the value in `terraform.tfvars` and run `terraform apply`.
Downtime olur (sunucu durur ve baslar) ancak disk, kurulu yazilim ve Docker volumes
korunur. `ignore_changes` veya manuel adim gerekmez.
There is downtime, because the server stops and starts, but disk, installed software, and Docker volumes are preserved. No `ignore_changes` or manual step is required.
### Hangi Degisiklikler Sunucuyu Zorla Yeniden Olusturur?
### Which Changes Force Server Recreation?
| Degisen alan | Davranis | Not |
| Changed field | Behavior | Note |
| --- | --- | --- |
| `server_type` | In-place resize (provider native) | `terraform apply` yeterli |
| `hcloud_server_network` | Sadece attachment guncellenir | Ayri resource kullanildigi icin |
| `hcloud_firewall_attachment` | Sadece attachment guncellenir | Ayri resource kullanildigi icin |
| `placement_group_id` | Hetzner API degisime izin vermiyor → destroy+create | Degistirme |
| `image` | Disk imaji degisir → destroy+create | Degistirme |
| `location` | Baska datacenter'a tasinamaz → destroy+create | Degistirme |
| `server_type` | In-place resize (provider native) | `terraform apply` is enough |
| `hcloud_server_network` | Only attachment is updated | Because a separate resource is used |
| `hcloud_firewall_attachment` | Only attachment is updated | Because a separate resource is used |
| `placement_group_id` | Hetzner API does not allow changing it -> destroy+create | Do not change |
| `image` | Disk image changes -> destroy+create | Do not change |
| `location` | Cannot be moved to another datacenter -> destroy+create | Do not change |
### Network ve Firewall Attachment Ayrimi
### Network and Firewall Attachment Separation
`network` blogu ve `firewall_ids` `hcloud_server` icine gomulmez. Bunun yerine
ayri resource tanimlanir:
The `network` block and `firewall_ids` are not embedded inside `hcloud_server`. Instead, separate resources are defined:
- `hcloud_server_network` — private IP atamasi
- `hcloud_firewall_attachment` — firewall iliskisi
- `hcloud_server_network` — private IP assignment
- `hcloud_firewall_attachment` — firewall relationship
Gomulu tanimlamada bazi provider versiyonlari bu alanlardaki degisiklikleri
sunucu recreation olarak yorumlar. Ayri resource kullanildiginda sadece
attachment guncellenir, sunucu dokunulmaz.
In embedded definitions, some provider versions interpret changes in these fields as server recreation. When separate resources are used, only the attachment is updated and the server is left untouched.
### prevent_destroy Korumasi
### `prevent_destroy` Protection
Her sunucuya `lifecycle { prevent_destroy = true }` eklenir. Bu blok varken
Terraform hicbir kosulda sunucuyu silemez, plan asamasinda hata verir.
Kasitli silmek icin once lifecycle blogunu gecici olarak kaldir.
Each server gets `lifecycle { prevent_destroy = true }`. While this block exists, Terraform cannot delete the server under any condition and fails during the plan phase. To intentionally delete a server, temporarily remove the lifecycle block first.
## Kabul Kriterleri
## Acceptance Criteria
- `terraform plan` sadece test Hetzner Project token'i ile calisir.
- `terraform apply` sonrasinda 2 server olusur.
- Iki server private network uzerinden birbirine erisebilir.
- Public internetten sadece `22`, `80`, `443` firewall seviyesinde aciktir.
- Vault `8200` public'ten kapali kalir.
- Terraform state repo'ya commit edilmez.
- `terraform plan` works only with the test Hetzner Project token.
- 2 servers are created after `terraform apply`.
- The two servers can reach each other through the private network.
- Only `22`, `80`, and `443` are open at firewall level from the public internet.
- Vault `8200` remains closed from the public internet.
- Terraform state is not committed to the repo.

View File

@ -1,12 +1,12 @@
# 03 - Test Ansible Bootstrap
Bu aşamanın amacı Terraform ile oluşturulan test makinelerini Linux, hardening, Docker ve Swarm açısından hazır hale getirmektir. DB yazılımı kurulumu bu aşamanın dışındadır.
The purpose of this phase is to prepare the test machines created by Terraform for Linux, hardening, Docker, and Swarm. DB software installation is outside this phase.
## Ansible Kurulumu
## Ansible Installation
Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hedef sunuculara herhangi bir ajan kurulmaz, sadece SSH erişimi yeterlidir.
Ansible must be installed on the control machine, meaning your own computer. No agent is installed on target servers; SSH access is enough.
### İşletim Sistemine Göre Kurulum
### Installation by Operating System
- **Ubuntu / Debian:**
```bash
@ -18,7 +18,7 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
pipx install --include-deps ansible
```
> Not: `sudo apt install ansible` komutu bazı Ubuntu/Debian sürümlerinde eski Ansible paketlerini kurabilir. Bu nedenle güncel Ansible kullanımı için `pipx` yöntemi tercih edilmelidir.
> Note: The `sudo apt install ansible` command may install old Ansible packages on some Ubuntu/Debian versions. Therefore, the `pipx` method should be preferred for using an up-to-date Ansible version.
- **Fedora / Rocky Linux / RHEL:**
```bash
@ -35,71 +35,71 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
brew install ansible
```
- **Python Pip ile (Her platformda):**
- **With Python Pip, on any platform:**
```bash
pipx install --include-deps ansible
```
### Ek Python Bağımlılıkları
### Additional Python Dependencies
`password_hash` filtresi için `passlib` kontrol makinesinde gereklidir:
`passlib` is required on the control machine for the `password_hash` filter:
```bash
pipx inject ansible passlib
```
> `pip` ile kurduysanız: `pip install passlib`
> If you installed with `pip`: `pip install passlib`
### Kurulumun Doğrulanması
### Verify the Installation
Hangi yöntemle kurarsanız kurun, kurulumun başarılı olduğunu doğrulamak için aşağıdaki komutları kullanın:
Whichever method you used to install it, use the following commands to verify that the installation succeeded:
```bash
# Ansible versiyonunu ve yapılandırma yollarını kontrol edin
# Check the Ansible version and configuration paths
ansible --version
# Ansible binarysinin hangi konumdan çalıştığını kontrol edin
# Check which location the Ansible binary is running from
which -a ansible
```
## Ansible Komutlarını Çalıştırma
## Running Ansible Commands
Tüm komutlar `ansible/test/` dizininden çalıştırılmalıdır. `ansible.cfg` inventory ve roles_path'i otomatik olarak tanımlar.
All commands must be run from the `ansible/test/` directory. `ansible.cfg` automatically defines the inventory and `roles_path`.
### 0. Gerekli Collection'ları Kur (İlk kurulumda bir kez)
### 0. Install Required Collections Once During Initial Setup
```bash
ansible-galaxy collection install -r ../requirements.yml
```
### 1. Bağlantı Testi (Ping)
### 1. Connection Test (Ping)
```bash
ansible all -m ping
```
### 2. Bootstrap Playbook'unu Çalıştırma
### 2. Run the Bootstrap Playbook
```bash
ansible-playbook test-bootstrap.yml --ask-vault-pass
```
*Not: `--ask-vault-pass` parametresi Ansible Vault parolasını sorar; StorageBox şifresi bu şekilde çözülür.*
*Note: The `--ask-vault-pass` parameter asks for the Ansible Vault password; the StorageBox password is decrypted this way.*
### 3. Sadece Belirli Bir Rolü Çalıştırma (Tags)
### 3. Run Only a Specific Role (Tags)
```bash
ansible-playbook test-bootstrap.yml --tags "hardening" --ask-vault-pass
```
## Hedef Makineler
## Target Machines
| Host | Rol |
| Host | Role |
| --- | --- |
| `iklim-app-01` | Swarm manager + app worker |
| `iklim-db-01` | Manuel DB kurulumu için OS-hardening uygulanmış DB node |
| `iklim-db-01` | OS-hardened DB node for manual DB installation |
## Önerilen Dosya Yapısı
## Recommended File Structure
```text
ansible/
@ -114,13 +114,13 @@ ansible/
vault.yml
host_vars/
iklim-app-01/
vars.yml # floating IP gibi host'a ozel degiskenler
vars.yml # Host-specific variables such as floating IP
vault.yml
iklim-db-01/
vault.yml
test-bootstrap.yml
test-app-post-stack.yml # act_runner kurulumu
test-db-post-stack.yml # db_stack + wireguard kurulumu
test-app-post-stack.yml # act_runner installation
test-db-post-stack.yml # db_stack + wireguard installation
roles/
base/
hardening/
@ -129,18 +129,18 @@ ansible/
node_dirs/
storagebox/
storagebox_ssh_key/
db_stack/ # DB dizin ve konfigürasyon hazırlığı
wireguard/ # WireGuard VPN servisi (DB node)
act_runner/ # Gitea act_runner kurulumu (app node)
db_stack/ # DB directory and configuration preparation
wireguard/ # WireGuard VPN service (DB node)
act_runner/ # Gitea act_runner installation (app node)
```
## Base Role
Tüm test node'larına uygulanır:
Applied to all test nodes:
- `dnf update`
- `epel-release`ayrı task olarak önce kurulur; `fail2ban`, `davfs2`, `htop`, `btop` bu repoya bağımlı
- temel paketler (`epel-release` aktif olduktan sonra):
- `epel-release`installed first as a separate task; `fail2ban`, `davfs2`, `htop`, and `btop` depend on this repo
- base packages, after `epel-release` is active:
- `curl`
- `wget`
- `git`
@ -148,57 +148,57 @@ Tüm test node'larına uygulanır:
- `tar`
- `unzip`
- `bash-completion`
- `gettext`envsubst için; CI/CD deploy pipeline'larında gerekli
- `gettext`required for envsubst in CI/CD deploy pipelines
- `tree`
- `ca-certificates`
- `fail2ban`
- `chrony`
- `python3`
- `python3-pip`
- `python3-passlib``password_hash` filtresi için (EPEL)
- `htop` — interaktif proses izleme (EPEL)
- `btop`kaynak monitörü, grafik arayüz (EPEL)
- `python3-passlib`for the `password_hash` filter (EPEL)
- `htop` — interactive process monitoring (EPEL)
- `btop`resource monitor with graphical interface (EPEL)
- timezone: `Europe/Istanbul`
- hostname ayarı
- klavye düzeni: `trq` (Türkçe Q)
- sistem reboot gerekiyorsa kontrollü reboot
- **Hetzner Floating IP systemd servisi** (`hetzner-floating-ip`): `host_vars` içinde `hetzner_floating_ip` tanımlıysa, IP adresi `eth0`'a eklenir ve reboot'ta otomatik geri yüklenir (`ip addr replace`)
- hostname setup
- keyboard layout: `trq` (Turkish Q)
- controlled reboot if the system requires a reboot
- **Hetzner Floating IP systemd service** (`hetzner-floating-ip`): if `hetzner_floating_ip` is defined in `host_vars`, the IP address is added to `eth0` and automatically restored on reboot (`ip addr replace`)
## Security Hardening Role
Tüm test node'larına uygulanır:
Applied to all test nodes:
- SSH password login kapatılır.
- Root SSH login kapatılır.
- Sadece SSH key ile login kalır.
- SSH password login is disabled.
- Root SSH login is disabled.
- Only SSH key login remains.
- `PermitEmptyPasswords no`
- `MaxAuthTries 3`
- `fail2ban` SSH jail aktif edilir.
- `dnf-automatic` ile otomatik güvenlik güncelleştirmeleri aktif edilir.
- `iklim` sistem kullanıcısı oluşturulur; `wheel` grubuna eklenir (şifre vault'tan alınır).
- The `fail2ban` SSH jail is enabled.
- Automatic security updates are enabled with `dnf-automatic`.
- The `iklim` system user is created and added to the `wheel` group; the password is read from vault.
- `firewalld` default:
- incoming: deny (drop zone)
- outgoing: allow
- SSH kuralı önce `drop` zone'a rich rule olarak yazılır, ardından default zone `drop` yapılır — kilitleme riski ortadan kalkar.
- Public SSH sadece admin CIDR'dan açılır.
- The SSH rule is first written as a rich rule to the `drop` zone, then the default zone is set to `drop`; this removes the lockout risk.
- Public SSH is opened only from the admin CIDR.
### SELinux Kararı
### SELinux Decision
Rocky Linux 10 SELinux enforcing modda gelir. Karar: **disabled**.
Rocky Linux 10 comes in SELinux enforcing mode. Decision: **disabled**.
Gerekçe:
- Hetzner Cloud firewall (dış perimeter) + firewalld (host) iki katman ağ güvenliğini sağlar.
- Docker + davfs2 + firewalld kombinasyonu SELinux enforcing modda ek policy ve volume label yönetimi gerektirir.
- Utils VPS'te de disabled yapılmış; tutarlılık sağlanır.
Rationale:
- Hetzner Cloud firewall (external perimeter) + firewalld (host) provide two layers of network security.
- The Docker + davfs2 + firewalld combination requires additional policy and volume label management in SELinux enforcing mode.
- It was also disabled on the Utils VPS, so consistency is preserved.
```bash
# /etc/selinux/config içinde:
# Inside /etc/selinux/config:
SELINUX=disabled
# Değişiklik reboot sonrası aktif olur
# The change becomes active after reboot
reboot
```
Ansible'da:
In Ansible:
```yaml
- name: Disable SELinux
@ -211,9 +211,9 @@ Ansible'da:
when: selinux_change.changed
```
### fail2ban Konfigürasyonu
### fail2ban Configuration
`/etc/fail2ban/jail.local` içeriği:
Content of `/etc/fail2ban/jail.local`:
```ini
[DEFAULT]
@ -228,50 +228,49 @@ backend = systemd
enabled = true
```
- `bantime`: 6 saat ban
- `findtime`: 5 dakika içinde
- `maxretry`: 5 başarısız giriş → ban
- `ignoreip`: admin CIDR'ları ban'dan muaf tutar
- `bantime`: 6-hour ban
- `findtime`: within 5 minutes
- `maxretry`: 5 failed logins -> ban
- `ignoreip`: keeps admin CIDRs exempt from bans
Ansible'da `admin_allowed_cidrs` listesi space-separated stringe dönüştürülüp template'e basılır.
In Ansible, the `admin_allowed_cidrs` list is converted to a space-separated string and written to the template.
Not: Docker iptables kuralları firewalld ile etkileşebilir. Hetzner Cloud firewall asıl dış perimeter kabul edilir; firewalld host içinde ikinci katman olarak kullanılır.
Note: Docker iptables rules may interact with firewalld. The Hetzner Cloud firewall is considered the actual external perimeter; firewalld is used as a second layer inside the host.
## Docker Role
Her iki node (`iklim-app-01` ve `iklim-db-01`) üzerinde de zorunludur. DB node'u Swarm Worker olarak ağa dahil olacağı için Docker Engine her iki makinede de kurulu olmalıdır.
Required on both nodes (`iklim-app-01` and `iklim-db-01`). Because the DB node will join the network as a Swarm Worker, Docker Engine must be installed on both machines.
Docker kurulumu resmi Docker dnf repository üzerinden yapılır:
Docker is installed through the official Docker dnf repository:
- Docker GPG key + dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`)
- paketler:
- packages:
- `docker-ce`
- `docker-ce-cli`
- `containerd.io`
- `docker-buildx-plugin`
- `docker-compose-plugin`
- Docker servisi enabled + started
- Docker service enabled + started
Docker convenience script kullanılmayacak. Production benzeri test ortamı için paket repository yolu tercih edilir.
The Docker convenience script will not be used. The package repository path is preferred for a production-like test environment.
## Swarm Role
- `iklim-app-01` üzerinde Swarm Manager olarak init edilir.
- `iklim-db-01` üzerinde Swarm Worker olarak join edilir (Overlay network erişimi için).
- advertise addr: `10.10.10.11` (manager için)
- Initialized as Swarm Manager on `iklim-app-01`.
- Joined as Swarm Worker on `iklim-db-01`, for overlay network access.
- advertise addr: `10.10.10.11`, for the manager
- overlay network:
- `iklimco-net`
- driver: `overlay`
- attachable: `true`
- Node etiketleri:
- `iklim-app-01`: `type=service` — tüm infra ve uygulama servisleri bu node'a deploy olur
- `iklim-db-01`: `role=db` — PostgreSQL ve MongoDB servisleri bu node'a deploy olur
- `iklim-app-01` üzerinde hem manager hem worker (Active) görevini sürdürür.
- Node labels:
- `iklim-app-01`: `type=service` — all infra and application services are deployed to this node
- `iklim-db-01`: `role=db` — PostgreSQL and MongoDB services are deployed to this node
- On `iklim-app-01`, it remains both manager and worker (Active).
## Node Directory Role
`iklim-app-01` üzerinde deploy ön koşulları:
Deploy prerequisites on `iklim-app-01`:
```text
/opt/iklimco
@ -282,7 +281,7 @@ Docker convenience script kullanılmayacak. Production benzeri test ortamı içi
/opt/iklimco/stacks
```
DB node üzerinde manuel DB kurulumu için minimum:
Minimum for manual DB installation on the DB node:
```text
/opt/iklimco
@ -292,24 +291,24 @@ DB node üzerinde manuel DB kurulumu için minimum:
## StorageBox DAVFS Mount Role
Her iki node'a uygulanır (`iklim-app-01` ve `iklim-db-01`).
Applied to both nodes (`iklim-app-01` and `iklim-db-01`).
### Amaç
### Purpose
Hetzner StorageBox'u WebDAV (DAVFS) protokolü üzerinden `/mnt/storagebox` olarak mount eder. Docker volume'ları bu dizine bağlanarak veri kalıcılığını ve yedeklemeyi sağlar.
Mounts Hetzner StorageBox as `/mnt/storagebox` through the WebDAV (DAVFS) protocol. Docker volumes are connected to this directory to provide data persistence and backups.
### Test Ortamı Sub-Account
### Test Environment Sub-Account
| Parametre | Değişken | Değer |
| Parameter | Variable | Value |
| --- | --- | --- |
| Ana hesap | `storagebox_account` | `u469968` |
| Main account | `storagebox_account` | `u469968` |
| Sub-account | `storagebox_user` | `u469968-sub4` |
| WebDAV URL | `storagebox_url` | `https://u469968-sub4.your-storagebox.de/` |
| Mount point | `storagebox_mount_point` | `/mnt/storagebox` |
### Role Değişkenleri
### Role Variables
Tüm değişkenler `group_vars/all/vars.yml` içinde tanımlanır:
All variables are defined in `group_vars/all/vars.yml`:
```yaml
storagebox_account: "u469968"
@ -322,24 +321,24 @@ storagebox_managed_directories:
mode: "0755"
```
Prod ortamında suffix `sub4``sub5` olarak değişir.
In prod, the suffix changes from `sub4` to `sub5`.
Şifreler Ansible Vault ile şifreli `group_vars/all/vault.yml` içinde tutulur:
Passwords are stored encrypted with Ansible Vault inside `group_vars/all/vault.yml`:
```bash
ansible-vault edit group_vars/all/vault.yml
```
`vault.yml` içeriği:
Content of `vault.yml`:
```yaml
vault_storagebox_password: "SUB_ACCOUNT_PAROLASI"
vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
vault_storagebox_password: "SUB_ACCOUNT_PASSWORD"
vault_iklim_password: "IKLIM_USER_PASSWORD"
```
### Adımlar
### Steps
1. **davfs2 kurulumu**
1. **Install davfs2**
```yaml
- name: Install davfs2
@ -348,7 +347,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
state: present
```
2. **Kimlik bilgileri dosyası** (`/etc/davfs2/secrets`)
2. **Credentials file** (`/etc/davfs2/secrets`)
```yaml
- name: Configure davfs2 secrets
@ -361,7 +360,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
group: root
```
3. **Mount point oluştur**
3. **Create mount point**
```yaml
- name: Create mount point
@ -371,7 +370,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
mode: "0755"
```
4. **fstab kaydı**
4. **fstab entry**
```yaml
- name: Add fstab entry
@ -383,7 +382,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
state: present
```
5. **Mount et**
5. **Mount**
```yaml
- name: Mount StorageBox
@ -392,7 +391,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
creates: "{{ storagebox_mount_point }}/.mounted_marker"
```
Mount başarısı için dizine bir marker dosyası yazılabilir:
A marker file can be written to the directory to confirm mount success:
```yaml
- name: Write mount marker
@ -401,12 +400,9 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
dest: "{{ storagebox_mount_point }}/.mounted_marker"
```
6. **Servis bind mount dizinlerini oluştur**
6. **Create service bind mount directories**
Test ortamında precipitation servisinin `image-data` volume'u host üzerinde
`/mnt/storagebox/precipitation/images` dizinine bind mount edilir. Dizin
StorageBox mount edildikten sonra Ansible tarafından oluşturulur ve `0755`
izinle bırakılır.
In the test environment, the precipitation service's `image-data` volume is bind mounted on the host to `/mnt/storagebox/precipitation/images`. The directory is created by Ansible after StorageBox is mounted and left with `0755` permissions.
```yaml
- name: Create managed StorageBox directories
@ -419,27 +415,25 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
loop: "{{ storagebox_managed_directories | default([]) }}"
```
### Notlar
### Notes
- `davfs2` paketi EPEL repository'sinde bulunur; base role `epel-release`'i zaten kurar.
- StorageBox şifreleri asla plaintext olarak repository'e eklenmez; Ansible Vault zorunludur.
- Mount noktası reboot'ta `_netdev` flag'ı sayesinde network hazır olduktan sonra otomatik mount edilir.
- Docker Swarm servisleri StorageBox altındaki servis dizinlerini bind mount olarak kullanır.
- Precipitation servisinin test ortamı image dizini `/mnt/storagebox/precipitation/images` olmalıdır; bu path `BE-Precipitation/docker-stack-service.yml` içindeki `device` değeriyle birebir eşleşmelidir.
- The `davfs2` package is in the EPEL repository; the base role already installs `epel-release`.
- StorageBox passwords are never added to the repository as plaintext; Ansible Vault is mandatory.
- The mount point is automatically mounted after the network is ready on reboot, thanks to the `_netdev` flag.
- Docker Swarm services use service directories under StorageBox as bind mounts.
- The precipitation service's test environment image directory must be `/mnt/storagebox/precipitation/images`; this path must exactly match the `device` value in `BE-Precipitation/docker-stack-service.yml`.
## StorageBox SSH Key Role
Her iki node'a uygulanır (`iklim-app-01` ve `iklim-db-01`).
Applied to both nodes (`iklim-app-01` and `iklim-db-01`).
### Amaç
### Purpose
Sunucu üzerinde ed25519 SSH anahtar çifti üretilir ve StorageBox ana hesabına yüklenir.
Bu sayede CI/CD pipeline'ları `STORAGEBOX_SSH_PRIV` Gitea secret'ini kullanarak
şifresiz erişim sağlayabilir.
An ed25519 SSH key pair is generated on the server and uploaded to the StorageBox main account. This allows CI/CD pipelines to use the `STORAGEBOX_SSH_PRIV` Gitea secret for passwordless access.
### Adımlar
### Steps
1. **SSH Key üretimi**
1. **SSH key generation**
```yaml
- name: Generate SSH key for StorageBox
@ -451,39 +445,39 @@ Bu sayede CI/CD pipeline'ları `STORAGEBOX_SSH_PRIV` Gitea secret'ini kullanarak
ssh_key_comment: "{{ inventory_hostname }}-storagebox"
```
2. **Public key'i StorageBox'a yükle**
2. **Upload the public key to StorageBox**
Bu adım manuel yapılır (ilk kez şifre gerektirir):
This step is done manually and requires the password the first time:
```bash
cat /root/.ssh/id_ed25519_storagebox.pub | ssh -p23 u469968-sub4@u469968-sub4.your-storagebox.de install-ssh-key
```
Sonraki erişimler şifresiz çalışır:
Later access works passwordlessly:
```bash
sftp -P23 u469968-sub4@u469968-sub4.your-storagebox.de
```
3. **Private ve public key'leri Gitea'ya ekle**
3. **Add private and public keys to Gitea**
Gitea → Organization Settings → Actions → Secrets:
Gitea -> Organization Settings -> Actions -> Secrets:
| Secret Adı | Değer |
| Secret Name | Value |
| --- | --- |
| `STORAGEBOX_SSH_PRIV` | `/root/.ssh/id_ed25519_storagebox` içeriği |
| `STORAGEBOX_SSH_PUB` | `/root/.ssh/id_ed25519_storagebox.pub` içeriği |
| `STORAGEBOX_SSH_PRIV` | Contents of `/root/.ssh/id_ed25519_storagebox` |
| `STORAGEBOX_SSH_PUB` | Contents of `/root/.ssh/id_ed25519_storagebox.pub` |
Key içeriğini almak için:
To get the key contents:
```bash
cat /root/.ssh/id_ed25519_storagebox
cat /root/.ssh/id_ed25519_storagebox.pub
```
### Notlar
### Notes
- Her sunucu için ayrı key üretilir; tüm public key'ler StorageBox ana hesabına yüklenir.
- Private key asla repo'ya commit edilmez; yalnızca Gitea secret olarak saklanır.
- A separate key is generated for each server; all public keys are uploaded to the StorageBox main account.
- The private key is never committed to the repo; it is stored only as a Gitea secret.
## Kabul Kriterleri
## Acceptance Criteria

View File

@ -1,90 +1,90 @@
# 04 - Test DB Docker Kurulumu (Swarm Worker)
# 04 - Test DB Docker Installation (Swarm Worker)
Bu aşamanın amacı `iklim-db-01` node'unu Swarm'a worker olarak eklemek ve PostgreSQL ile MongoDB'yi Swarm servisi olarak çalıştırmaktır.
The purpose of this phase is to add the `iklim-db-01` node to Swarm as a worker and run PostgreSQL and MongoDB as Swarm services.
## Mimari Karar
## Architecture Decision
Yol haritasında DB'lerin "manuel" kurulacağı belirtilmiştir. Test ortamında bu "manuel" süreç, DB'lerin işletim sistemine doğrudan kurulması yerine, **Swarm Worker** üzerinde Docker konteynerleri olarak ayağa kaldırılması şeklinde uygulanacaktır.
The roadmap states that DBs will be installed "manually". In the test environment, this "manual" process will be implemented by starting the DBs as Docker containers on the **Swarm Worker**, instead of installing them directly on the operating system.
Kurulum **iki aşamalıdır:**
1. **Hazırlık (Ansible):** `test-db-post-stack.yml` playbook'u DB dizinlerini, `mongod.conf` konfigürasyonunu ve WireGuard VPN servisini kurar.
2. **Deploy (Gitea CI/CD):** `deploy-test.yml` workflow'u `docker-stack-infra.yml` üzerinden PostgreSQL ve MongoDB servislerini Swarm'a deploy eder.
The installation has **two phases:**
1. **Preparation (Ansible):** The `test-db-post-stack.yml` playbook sets up DB directories, the `mongod.conf` configuration, and the WireGuard VPN service.
2. **Deploy (Gitea CI/CD):** The `deploy-test.yml` workflow deploys PostgreSQL and MongoDB services to Swarm through `docker-stack-infra.yml`.
**Neden?**
1. **Yönetim Kolaylığı:** Docker ile versiyon geçişleri ve konfigürasyon yönetimi çok daha hızlıdır.
2. **Overlay Network:** Uygulama servisleri (`iklim-app-01`), DB'lere `iklimco-net` overlay network üzerinden şifreli ve izole bir şekilde erişebilir.
3. **Veri Kalıcılığı:** Veriler `iklim-db-01` üzerindeki Docker named volume'larında saklanır. StorageBox yalnızca backup için kullanılır.
**Why?**
1. **Ease of management:** Version transitions and configuration management are much faster with Docker.
2. **Overlay Network:** Application services (`iklim-app-01`) can access DBs through the `iklimco-net` overlay network in an encrypted and isolated way.
3. **Data persistence:** Data is stored in Docker named volumes on `iklim-db-01`. StorageBox is used only for backups.
## Ön Koşullar
## Prerequisites
- `03-test-ansible-bootstrap.md` her iki node'da tamamlanmış olmalı.
- Docker `iklim-db-01` üzerinde kurulu olmalı (Bootstrap role bunu yapar).
- Ansible vault'unda `vault_postgres_root_user`, `vault_postgres_password`, `vault_mongo_root_user`, `vault_mongo_root_password` tanımlı olmalı.
- `03-test-ansible-bootstrap.md` must be completed on both nodes.
- Docker must be installed on `iklim-db-01`; the Bootstrap role does this.
- `vault_postgres_root_user`, `vault_postgres_password`, `vault_mongo_root_user`, and `vault_mongo_root_password` must be defined in the Ansible vault.
## 1. Firewall Güncellemesi
## 1. Firewall Update
`iklim-db-01`'in Swarm'a katılabilmesi ve uygulama trafiğini kabul etmesi için `terraform/hetzner/test/firewall.tf` dosyasına kurallar eklenmelidir.
Rules must be added to `terraform/hetzner/test/firewall.tf` so `iklim-db-01` can join Swarm and accept application traffic.
### Swarm İletişimi (App Subnet <-> DB Subnet)
Swarm yönetimi için `2377/tcp`, `7946/tcp/udp` ve `4789/udp` portları her iki subnet arasında karşılıklıık olmalıdır.
### Swarm Communication (App Subnet <-> DB Subnet)
For Swarm management, ports `2377/tcp`, `7946/tcp/udp`, and `4789/udp` must be open mutually between both subnets.
### DB Erişimi (App Subnet -> DB Subnet)
### DB Access (App Subnet -> DB Subnet)
- **PostgreSQL:** `5432/tcp`
- **MongoDB:** `27017/tcp`
Güncellemeyi yaptıktan sonra:
After making the update:
```bash
cd terraform/hetzner/test
terraform apply
```
## 2. Vault Güncellemesi
## 2. Vault Update
```bash
cd ansible/test
ansible-vault edit group_vars/all/vault.yml
```
Şu değişkenleri ekle:
Add these variables:
```yaml
vault_postgres_root_user: "postgres"
vault_postgres_password: "GÜÇLÜ_ŞİFRE"
vault_postgres_password: "STRONG_PASSWORD"
vault_mongo_root_user: "mongoadmin"
vault_mongo_root_password: "GÜÇLÜ_ŞİFRE"
vault_mongo_root_password: "STRONG_PASSWORD"
```
## 3. Ansible ile Kurulum
## 3. Installation with Ansible
```bash
cd ansible/test
ansible-playbook -i inventory/generated/test.yml test-db-post-stack.yml --ask-vault-pass
```
**Playbook ne yapar?**
**What does the playbook do?**
`iklim-db-01` üzerinde (`db_stack` ve `wireguard` rolleri):
- `/opt/iklimco/db/mongodb/config/` dizinini oluşturur
- `mongod.conf` dosyasını yerleştirir
- WireGuard VPN sunucusunu kurar ve yapılandırır (`51820/udp`)
On `iklim-db-01`, through the `db_stack` and `wireguard` roles:
- Creates the `/opt/iklimco/db/mongodb/config/` directory
- Places the `mongod.conf` file
- Installs and configures the WireGuard VPN server (`51820/udp`)
> DB servislerinin (PostgreSQL, MongoDB) Swarm'a deploy edilmesi Ansible'ın değil, Gitea CI/CD workflow'unun (`deploy-test.yml`) sorumluluğundadır. Bu workflow `docker-stack-infra.yml` aracılığıyla tüm servisleri tek seferde deploy eder.
> Deploying DB services (PostgreSQL, MongoDB) to Swarm is the responsibility of the Gitea CI/CD workflow (`deploy-test.yml`), not Ansible. This workflow deploys all services at once through `docker-stack-infra.yml`.
## 4. Volume ve Veri Yapısı
## 4. Volume and Data Structure
DB verileri `iklim-db-01` üzerindeki Docker named volume'larında tutulur:
DB data is stored in Docker named volumes on `iklim-db-01`:
| Volume | İçerik |
| Volume | Content |
|---|---|
| `iklim-db_postgresql_data` | PostgreSQL veri dosyaları |
| `iklim-db_mongodb_data` | MongoDB veri dosyaları |
| `iklim-db_postgresql_data` | PostgreSQL data files |
| `iklim-db_mongodb_data` | MongoDB data files |
MongoDB log'ları stdout'a yazılır (`docker logs` ile izlenir). Konfigürasyon: `/opt/iklimco/db/mongodb/config/mongod.conf`
MongoDB logs are written to stdout and can be watched with `docker logs`. Configuration: `/opt/iklimco/db/mongodb/config/mongod.conf`
> StorageBox DB verisi için **kullanılmaz**. Yalnızca backup stratejisinde görev alır.
> StorageBox is **not used** for DB data. It only has a role in the backup strategy.
## 5. Kabul Kriterleri
## 5. Acceptance Criteria
- `docker node ls` komutunda `iklim-db-01` Ready ve Active görünür.
- `docker stack services iklim-db` her iki servisi 1/1 replica ile gösterir.
- Uygulama node'undan `iklim-db_postgresql` ve `iklim-db_mongodb` DNS isimleriyle erişim sağlanır.
- Reboot sonrası veriler named volume'lardan korunur (`docker volume ls` ile kontrol).
- `iklim-db-01` appears as Ready and Active in the `docker node ls` command.
- `docker stack services iklim-db` shows both services with 1/1 replicas.
- Access from the application node is available through the `iklim-db_postgresql` and `iklim-db_mongodb` DNS names.
- Data is preserved from named volumes after reboot; verify with `docker volume ls`.

View File

@ -1,31 +1,31 @@
# 05 - Test Runner ve Deploy Ön Koşulları
# 05 - Test Runner and Deploy Prerequisites
Bu aşamanın amacı test ortamında Gitea Actions runner'ı (`act_runner`) systemd servisi olarak kurmak ve CI/CD pipeline'larının çalışabileceği ortamı hazırlamaktır.
The purpose of this phase is to install the Gitea Actions runner (`act_runner`) as a systemd service in the test environment and prepare the environment where CI/CD pipelines can run.
## Runner Yerleşimi
## Runner Placement
Test ortamında maliyet ve basitlik için tek runner kullanılır:
A single runner is used in the test environment for cost and simplicity:
| Host | Servis Adı | Sistem Kullanıcısı | Etiketler |
| Host | Service Name | System User | Labels |
| --- | --- | --- | --- |
| `iklim-app-01` | `gitea-act-runner` | `gitea-runner` | `ubuntu-latest`, `ubuntu-22.04`, `ubuntu-20.04`, `test-runner` |
## 1. Runner Kullanıcısı ve Yetkiler
## 1. Runner User and Permissions
Runner, host üzerinde Docker komutlarını çalıştırabilmelidir.
The runner must be able to run Docker commands on the host.
```bash
# Kullanıcıyı oluştur
# Create the user
sudo useradd -m -s /bin/bash gitea-runner
# Docker grubuna ekle
# Add to the Docker group
sudo usermod -aG docker gitea-runner
```
## 2. act_runner Kurulumu
## 2. act_runner Installation
### Kurulum
### Installation
Kurulum ve kayıt Ansible ile otomatik yapılır (`test-app-post-stack.yml`). Manuel kurulum gerekirse:
Installation and registration are done automatically with Ansible (`test-app-post-stack.yml`). If manual installation is required:
```bash
wget -O act_runner https://dl.gitea.com/act_runner/0.2.12/act_runner-0.2.12-linux-amd64
@ -33,9 +33,9 @@ sudo mv act_runner /usr/local/bin/
sudo chmod +x /usr/local/bin/act_runner
```
### Kayıt (Registration)
### Registration
Gitea arayüzünden (Organization → Settings → Actions → Runners) **Registration Token** alın, vault'a ekleyin:
Get the **Registration Token** from the Gitea UI (Organization -> Settings -> Actions -> Runners) and add it to the vault:
```yaml
# group_vars/all/vault.yml
@ -47,11 +47,11 @@ cd Environment_Infrastructure/ansible/test
ansible-playbook test-app-post-stack.yml --vault-password-file=.vault_pass
```
## 3. Systemd Servisi ve Konfigürasyon
## 3. Systemd Service and Configuration
Ansible tarafından yönetilir. Servis dosyası `/etc/systemd/system/gitea-act-runner.service`, konfigürasyon `/etc/gitea-act-runner/config.yaml` konumundadır.
Managed by Ansible. The service file is located at `/etc/systemd/system/gitea-act-runner.service`, and the configuration is located at `/etc/gitea-act-runner/config.yaml`.
Konfigürasyonun kritik bölümleri:
Critical parts of the configuration:
```yaml
runner:
@ -62,58 +62,58 @@ runner:
- "test-runner:docker://ubuntu:22.04"
container:
network: "iklimco-net" # DB servislerine overlay üzerinden erişim
options: "-v /var/run/docker.sock:/var/run/docker.sock" # Docker komutları için
network: "iklimco-net" # Access to DB services through overlay
options: "-v /var/run/docker.sock:/var/run/docker.sock" # For Docker commands
```
Durum kontrolü:
Status check:
```bash
sudo systemctl status gitea-act-runner
sudo journalctl -u gitea-act-runner -f
```
## 4. Deploy Ön Koşulları
## 4. Deploy Prerequisites
Pipeline'ın `iklim-app-01` üzerinde başarılı deploy yapabilmesi için şu araçların kurulu olması şarttır:
- `docker-ce` ve `docker-compose-plugin`
- `gettext` (`envsubst` komutu için)
The following tools must be installed for the pipeline to deploy successfully on `iklim-app-01`:
- `docker-ce` and `docker-compose-plugin`
- `gettext` for the `envsubst` command
- `jq`
- `git`
## 5. Gitea Organization Secrets
Pipeline'ların çalışması için Gitea Organization seviyesinde şu secret'lar tanımlanmalıdır:
The following secrets must be defined at Gitea Organization level for pipelines to run:
| Secret | ıklama |
| Secret | Description |
| --- | --- |
| `STORAGEBOX_SSH_PRIV` | StorageBox SSH private key |
| `STORAGEBOX_SSH_PUB` | StorageBox SSH public key |
| `HARBOR_CI_TOKEN` | `robot-ci-push-iklimco` robot hesabı token'ı (build + push) |
| `HARBOR_PULL_TOKEN` | `robot-swarm-pull-iklimco` robot hesabı token'ı (Swarm deploy pull) |
| `REPO_ACCESS_TOKEN` | Gitea private repo erişimi (BE-Commons vb. checkout) |
| `HARBOR_CI_TOKEN` | `robot-ci-push-iklimco` robot account token (build + push) |
| `HARBOR_PULL_TOKEN` | `robot-swarm-pull-iklimco` robot account token (Swarm deploy pull) |
| `REPO_ACCESS_TOKEN` | Gitea private repo access (BE-Commons, etc. checkout) |
## 6. Custom Image Build ve Harbor Push
## 6. Custom Image Build and Harbor Push
`docker-stack-infra.yml` ve mikroservis stack'leri `registry.tarla.io/iklimco/` altındaki özel image'leri kullanır. Bu image'ler `ops/push-harbor-custom-images.sh` scripti ile build edilip registry'ye push edilir.
`docker-stack-infra.yml` and microservice stacks use private images under `registry.tarla.io/iklimco/`. These images are built and pushed to the registry with the `ops/push-harbor-custom-images.sh` script.
APISIX config dosyaları (`build/apisix-core/config.yaml`, `build/apisix-dashboard/conf.yaml`) `template/` altındaki şablonlardan `envsubst` ile üretilir. Bu üretimi `push-harbor-custom-images.sh` kendi içinde yapar; build bitince geçici dosyalar otomatik temizlenir.
APISIX config files (`build/apisix-core/config.yaml`, `build/apisix-dashboard/conf.yaml`) are generated from templates under `template/` with `envsubst`. `push-harbor-custom-images.sh` performs this generation internally; temporary files are cleaned automatically when the build finishes.
**Tasarım notu:** APISIX admin key image'a bake edilmez. Template'de `${{APISIX_ADMIN_KEY}}` (çift süslü parantez) kullanılır; APISIX bunu container başlarken Docker service ortam değişkeninden okur. Böylece tek image hem test hem prod için kullanılabilir.
**Design note:** The APISIX admin key is not baked into the image. The template uses `${{APISIX_ADMIN_KEY}}` (double curly braces); APISIX reads it from the Docker service environment variable when the container starts. This allows one image to be used for both test and prod.
### Adımlar
### Steps
```bash
# 1. Harbor'a login ol
# 1. Log in to Harbor
docker login registry.tarla.io -u robot-ci-push-iklimco
# 2. Image'leri build edip push et (env'leri ve config dosyalarını script kendi üretir)
# 2. Build and push the images; the script generates envs and config files itself
bash ops/push-harbor-custom-images.sh
```
## Kabul Kriterleri
## Acceptance Criteria
1. Gitea Runners sayfasında `test-runner` etiketli runner **Idle** (yeşil) görünür.
2. `runs-on: test-runner` kullanan bir workflow başarıyla tetiklenir.
3. Job container'ı Docker daemon'a ve `iklimco-net` overlay network'üne erişebilir.
4. `8200/tcp` (Vault) portu public internete kapalıdır.
5. `registry.tarla.io/iklimco/custom-apisix`, `custom-apisix-dashboard`, `custom-prometheus` image'leri Harbor'da mevcut ve çekilebilir durumda.
1. The runner labeled `test-runner` appears as **Idle** (green) on the Gitea Runners page.
2. A workflow using `runs-on: test-runner` is triggered successfully.
3. The job container can access the Docker daemon and the `iklimco-net` overlay network.
4. The `8200/tcp` (Vault) port is closed to the public internet.
5. `registry.tarla.io/iklimco/custom-apisix`, `custom-apisix-dashboard`, and `custom-prometheus` images exist in Harbor and are pullable.

View File

@ -1,23 +1,23 @@
# 06 - Prod Terraform IaC
Bu asamanin amaci prod Hetzner Cloud Project icinde HA odakli IaaS kaynaklarini Terraform ile olusturmaktir. Bu dokuman prod Terraform ajanina tek basina verilebilir.
The purpose of this phase is to create HA-focused IaaS resources inside the prod Hetzner Cloud Project with Terraform. This document can be given to the prod Terraform agent on its own.
## Kapsam
## Scope
Terraform prod ortaminda sunlari olusturur:
Terraform creates the following in the prod environment:
- Private network: `iklim-prod-net`
- Subnetler:
- Subnets:
- App/Swarm subnet: `10.20.10.0/24`
- DB subnet: `10.20.20.0/24`
- Firewall:
- Public ingress: sadece `22/tcp`, `80/tcp`, `443/tcp`
- Private ingress: `01-private-network-port-matrisi.md` dosyasindaki prod kurallari
- Public ingress: only `22/tcp`, `80/tcp`, `443/tcp`
- Private ingress: prod rules in `01-private-network-port-matrisi.md`
- SSH key
- Placement groups:
- `iklim-prod-app-spread`
- `iklim-prod-db-spread`
- Floating IP: app entry point icin sabit IPv4 (`iklim-app-01`'e atanir)
- Floating IP: stable IPv4 for the app entry point, assigned to `iklim-app-01`
- Servers:
- `iklim-app-01`
- `iklim-app-02`
@ -27,16 +27,16 @@ Terraform prod ortaminda sunlari olusturur:
- `iklim-db-03`
- Ansible inventory output
DB cluster yazilimi Terraform ile kurulmayacak. DB node'lari sadece makine, network ve firewall seviyesinde hazirlanacak.
DB cluster software will not be installed with Terraform. DB nodes will be prepared only at the machine, network, and firewall level.
## Versiyon Gereksinimleri
## Version Requirements
```text
Terraform >= 1.6
hcloud provider ~> 1.49
```
## Onerilen Dosya Yapisi
## Recommended File Structure
```text
terraform/
@ -55,13 +55,13 @@ terraform/
terraform.tfvars.example
```
`terraform.tfvars`, state dosyalari ve token repo'ya commit edilmeyecek.
`terraform.tfvars`, state files, and tokens will not be committed to the repo.
## Degiskenler
## Variables
`environment` sabiti `locals.tf` icindedir; `tfvars` ile override edilmez.
The `environment` constant is in `locals.tf`; it is not overridden with `tfvars`.
Minimum degiskenler:
Minimum variables:
```hcl
hcloud_token = "secret"
@ -73,29 +73,24 @@ admin_ssh_public_key_path = "~/.ssh/id_ed25519.pub"
admin_allowed_cidrs = ["X.X.X.X/32"]
```
Server type karari `../hetzner-sizing-report.md` dokumanindaki mevcut test
ortami metrikleri ve prod cluster topolojisi dikkate alinarak belirlenmistir.
Prod app node'lar icin Java mikroservis bellek baskisi nedeniyle `cpx42`,
prod DB node'lar icin ise 3 node cluster baslangici nedeniyle ekonomik
`cpx32` onerilir. Kapasite ihtiyaci metriklerle dogrulandiginda node ekleme
veya in-place rescale yapilabilir.
The server type decision was made by considering the current test environment metrics in `../hetzner-sizing-report.md` and the prod cluster topology. `cpx42` is recommended for prod app nodes because of Java microservice memory pressure, and the more economical `cpx32` is recommended for prod DB nodes because the cluster starts with 3 nodes. When capacity needs are validated with metrics, nodes can be added or in-place rescale can be performed.
## Server Rolleri ve Private IP Plani
## Server Roles and Private IP Plan
| Server | Private IP | Rol |
| Server | Private IP | Role |
| --- | --- | --- |
| `iklim-app-01` | `10.20.10.11` | Swarm manager + app worker + runner (primary, FIP alir) |
| `iklim-app-01` | `10.20.10.11` | Swarm manager + app worker + runner; primary, receives FIP |
| `iklim-app-02` | `10.20.10.12` | Swarm manager + app worker + runner |
| `iklim-app-03` | `10.20.10.13` | Swarm manager + app worker + runner |
| `iklim-db-01` | `10.20.20.11` | Manuel DB cluster node |
| `iklim-db-02` | `10.20.20.12` | Manuel DB cluster node |
| `iklim-db-03` | `10.20.20.13` | Manuel DB cluster node |
| `iklim-db-01` | `10.20.20.11` | Manual DB cluster node |
| `iklim-db-02` | `10.20.20.12` | Manual DB cluster node |
| `iklim-db-03` | `10.20.20.13` | Manual DB cluster node |
Private IP'ler `locals.tf` icinde `swarm_private_ips` ve `db_private_ips` map'leri olarak sabit tanimlanir. Sunucu listesi `for_each` ile bu map'lerden turetilir.
Private IPs are statically defined inside `locals.tf` as the `swarm_private_ips` and `db_private_ips` maps. The server list is derived from these maps with `for_each`.
## Onerilen Kaynaklar ve Maliyet
## Recommended Resources and Cost
| Server | Rol | Server Type | CPU | RAM | SSD | Aylik |
| Server | Role | Server Type | CPU | RAM | SSD | Monthly |
| --- | --- | --- | ---: | ---: | ---: | ---: |
| `iklim-app-01` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
| `iklim-app-02` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
@ -103,41 +98,41 @@ Private IP'ler `locals.tf` icinde `swarm_private_ips` ve `db_private_ips` map'le
| `iklim-db-01` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 |
| `iklim-db-02` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 |
| `iklim-db-03` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 |
| **Toplam** | 6 server | | **36 vCPU** | **72 GB** | **1,440 GB** | **$139.44** |
| **Total** | 6 servers | | **36 vCPU** | **72 GB** | **1,440 GB** | **$139.44** |
## Placement Group Karari
## Placement Group Decision
Prod icin iki ayri spread placement group:
Two separate spread placement groups for prod:
```text
iklim-prod-app-spread: iklim-app-01/02/03
iklim-prod-db-spread: iklim-db-01/02/03
```
Bu sayede Swarm quorum node'lari kendi aralarinda farkli fiziksel host'lara, DB node'lari da kendi aralarinda farkli fiziksel host'lara yerlestirilmeye calisilir.
This aims to place Swarm quorum nodes on different physical hosts from each other, and DB nodes on different physical hosts from each other.
Notlar:
Notes:
- Hetzner kabinet secimi dogrudan sunmaz.
- Spread placement group farkli fiziksel host hedefler.
- Farkli lokasyon/region felaket kurtarma bu asamada konu disidir.
- Ileride scale buyudugunde multi-location DR ayri tasarlanmalidir.
- Hetzner does not provide direct cabinet selection.
- A spread placement group targets different physical hosts.
- Disaster recovery across different locations/regions is outside the scope of this phase.
- Multi-location DR must be designed separately later when scale grows.
## Floating IP
`iklim-prod-app-fip` adli IPv4 floating IP olusturulur ve `iklim-app-01`'e atanir. DNS A kaydi bu IP'ye yonlendirilir. Failover gerekirse floating IP baska bir app node'una tasinabilir.
An IPv4 floating IP named `iklim-prod-app-fip` is created and assigned to `iklim-app-01`. The DNS A record is pointed to this IP. If failover is needed, the floating IP can be moved to another app node.
## Public Firewall
Public ingress:
| Port | Kaynak | Hedef |
| Port | Source | Target |
| --- | --- | --- |
| `22/tcp` | `admin_allowed_cidrs` | Tum prod node'lari |
| `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` (Floating IP uzerinden) |
| `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` (Floating IP uzerinden) |
| `22/tcp` | `admin_allowed_cidrs` | All prod nodes |
| `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` through Floating IP |
| `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` through Floating IP |
Prod'da su portlar public acilmayacak:
The following ports will not be opened publicly in prod:
- `8200/tcp` Vault
- `5432/tcp` PostgreSQL
@ -153,27 +148,27 @@ Prod'da su portlar public acilmayacak:
### App (swarm) Firewall — Private Ingress
App subnet kaynakli (`10.20.10.0/24`):
Source from app subnet (`10.20.10.0/24`):
| Port | Servis | Erisim yontemi |
| Port | Service | Access method |
| --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | App subnet icinden |
| `7946/tcp,udp` | Docker Swarm node discovery | App subnet icinden |
| `4789/udp` | Docker Swarm VXLAN overlay | App subnet icinden |
| `2377/tcp` | Docker Swarm control plane | From app subnet |
| `7946/tcp,udp` | Docker Swarm node discovery | From app subnet |
| `4789/udp` | Docker Swarm VXLAN overlay | From app subnet |
| `8200/tcp` | Vault | Docker overlay / private network |
| `6379/tcp` | Redis | App subnet icinden |
| `5672/tcp` | RabbitMQ AMQP | App subnet icinden |
| `61613/tcp` | RabbitMQ STOMP | App subnet icinden |
| `15674/tcp` | RabbitMQ Web STOMP | App subnet icinden |
| `15672/tcp` | RabbitMQ Management | SWAG arkasinda `443` — IP kisitli |
| `9000/tcp` | APISIX Dashboard | SWAG arkasinda `443` — IP kisitli |
| `9180/tcp` | APISIX Admin API | Docker overlay icinden sadece Dashboard erisir |
| `9090/tcp` | Prometheus | SWAG arkasinda `443` — IP kisitli |
| `3000/tcp` | Grafana | SWAG arkasinda `443` — IP kisitli |
| `6379/tcp` | Redis | From app subnet |
| `5672/tcp` | RabbitMQ AMQP | From app subnet |
| `61613/tcp` | RabbitMQ STOMP | From app subnet |
| `15674/tcp` | RabbitMQ Web STOMP | From app subnet |
| `15672/tcp` | RabbitMQ Management | Behind SWAG `443` — IP restricted |
| `9000/tcp` | APISIX Dashboard | Behind SWAG `443` — IP restricted |
| `9180/tcp` | APISIX Admin API | Only Dashboard accesses it from Docker overlay |
| `9090/tcp` | Prometheus | Behind SWAG `443` — IP restricted |
| `3000/tcp` | Grafana | Behind SWAG `443` — IP restricted |
DB subnet kaynakli (`iklim-db-*` node'lari Swarm'a worker olarak katildigi icin):
Source from DB subnet, because `iklim-db-*` nodes join Swarm as workers:
| Port | Servis | Kaynak |
| Port | Service | Source |
| --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | `10.20.20.0/24` |
| `7946/tcp,udp` | Docker Swarm node discovery | `10.20.20.0/24` |
@ -181,150 +176,146 @@ DB subnet kaynakli (`iklim-db-*` node'lari Swarm'a worker olarak katildigi icin)
### DB Firewall — Private Ingress
Admin erisimi:
Admin access:
| Port | Servis | Kaynak |
| Port | Service | Source |
| --- | --- | --- |
| `22/tcp` | SSH | `admin_allowed_cidrs` |
App subnet kaynakli (`10.20.10.0/24`):
Source from app subnet (`10.20.10.0/24`):
| Port | Servis | Not |
| Port | Service | Note |
| --- | --- | --- |
| `5432/tcp` | PostgreSQL (Patroni primary) | App subnet erisimi |
| `27017/tcp` | MongoDB replica set endpoint | App subnet erisimi |
| `2377/tcp` | Docker Swarm control plane | App subnet icinden |
| `7946/tcp,udp` | Docker Swarm node discovery | App subnet icinden |
| `4789/udp` | Docker Swarm VXLAN overlay | App subnet icinden |
| `5432/tcp` | PostgreSQL (Patroni primary) | App subnet access |
| `27017/tcp` | MongoDB replica set endpoint | App subnet access |
| `2379/tcp` | etcd client (Patroni + APISIX) | App subnet access |
| `2377/tcp` | Docker Swarm control plane | From app subnet |
| `7946/tcp,udp` | Docker Swarm node discovery | From app subnet |
| `4789/udp` | Docker Swarm VXLAN overlay | From app subnet |
DB subnet icindeki karsilikli erisim (`10.20.20.0/24`):
Mutual access inside the DB subnet (`10.20.20.0/24`):
| Port | Servis | Not |
| Port | Service | Note |
| --- | --- | --- |
| `5432/tcp` | PostgreSQL Patroni replication | DB node'lari arasi |
| `27017/tcp` | MongoDB replica set internal | DB node'lari arasi |
| `2379/tcp` | etcd client | Patroni → etcd erisimi |
| `5432/tcp` | PostgreSQL Patroni replication | Between DB nodes |
| `27017/tcp` | MongoDB replica set internal | Between DB nodes |
| `2379/tcp` | etcd client | Patroni -> etcd access |
| `2380/tcp` | etcd peer | etcd cluster internal |
| `8008/tcp` | Patroni REST API | Patroni leader election ve saglik kontrolu |
| `8008/tcp` | Patroni REST API | Patroni leader election and health check |
IP kisitlamasi Hetzner firewall'da degil, SWAG nginx konfigurasyonunda yapilir.
IP restriction is done in the SWAG nginx configuration, not in the Hetzner firewall.
## Outputs
`terraform apply` veya `terraform output` sonrasi asagidaki degerler alinabilir:
The following values can be obtained after `terraform apply` or `terraform output`:
| Output | Aciklama |
| Output | Description |
| --- | --- |
| `ansible_inventory_yaml` | Ansible inventory YAML — `ansible/inventory/generated/prod.yml` dosyasina yazilir |
| `prod_private_ips` | Tum node'larin private IP haritasi (`swarm` ve `db` alt anahtarlari) |
| `prod_public_ips` | Tum node'larin public IPv4 haritasi |
| `prod_floating_ip` | Swarm giris noktasi icin Floating IP adresi (DNS A kaydi bu IP'ye yonlendirilir) |
| `ansible_inventory_yaml` | Ansible inventory YAML — written to `ansible/inventory/generated/prod.yml` |
| `prod_private_ips` | Private IP map of all nodes, with `swarm` and `db` subkeys |
| `prod_public_ips` | Public IPv4 map of all nodes |
| `prod_floating_ip` | Floating IP address for the Swarm entry point; DNS A record points to this IP |
Ansible inventory cikarmak icin:
To extract the Ansible inventory:
```bash
terraform output -raw ansible_inventory_yaml > \
../../ansible/inventory/generated/prod.yml
```
## Lifecycle ve Resize Politikasi
## Lifecycle and Resize Policy
### server_type Degisikligi (Yeniden Boyutlandirma)
### `server_type` Change (Resize)
`server_type` degistirmek Terraform destroy+create **tetiklemez**. `hcloud` provider
bunu natively destekler: sunucuyu durdurur, Hetzner Resize API'sini cagirir,
yeniden baslatir. `terraform.tfvars` icinde degeri guncelle, `terraform apply` calistir.
Changing `server_type` does **not** trigger Terraform destroy+create. The `hcloud` provider supports this natively: it stops the server, calls the Hetzner Resize API, and starts it again. Update the value in `terraform.tfvars` and run `terraform apply`.
Downtime olur (sunucu durur ve baslar) ancak disk, kurulu yazilim ve Docker volumes
korunur. `ignore_changes` veya manuel adim gerekmez.
There is downtime, because the server stops and starts, but disk, installed software, and Docker volumes are preserved. No `ignore_changes` or manual step is required.
### Hangi Degisiklikler Sunucuyu Zorla Yeniden Olusturur?
### Which Changes Force Server Recreation?
| Degisen alan | Davranis | Not |
| Changed field | Behavior | Note |
| --- | --- | --- |
| `server_type` | In-place resize (provider native) | `terraform apply` yeterli |
| `hcloud_server_network` | Sadece attachment guncellenir | Ayri resource kullanildigi icin |
| `hcloud_firewall_attachment` | Sadece attachment guncellenir | Ayri resource kullanildigi icin |
| `placement_group_id` | Hetzner API degisime izin vermiyor → destroy+create | Degistirme |
| `image` | Disk imaji degisir → destroy+create | Degistirme |
| `location` | Baska datacenter'a tasinamaz → destroy+create | Degistirme |
| `server_type` | In-place resize (provider native) | `terraform apply` is enough |
| `hcloud_server_network` | Only attachment is updated | Because a separate resource is used |
| `hcloud_firewall_attachment` | Only attachment is updated | Because a separate resource is used |
| `placement_group_id` | Hetzner API does not allow changing it -> destroy+create | Do not change |
| `image` | Disk image changes -> destroy+create | Do not change |
| `location` | Cannot be moved to another datacenter -> destroy+create | Do not change |
### Network ve Firewall Attachment Ayrimi
### Network and Firewall Attachment Separation
`network` blogu ve `firewall_ids` `hcloud_server` icine gomulmez. Bunun yerine
ayri resource tanimlanir:
The `network` block and `firewall_ids` are not embedded inside `hcloud_server`. Instead, separate resources are defined:
- `hcloud_server_network` — private IP atamasi (`for_each` ile her node icin)
- `hcloud_firewall_attachment` — firewall iliskisi (`for_each` ile turetilen server listesi)
- `hcloud_server_network` — private IP assignment, for each node with `for_each`
- `hcloud_firewall_attachment` — firewall relationship, using the server list derived with `for_each`
### prevent_destroy Korumasi
### `prevent_destroy` Protection
Her sunucuya `lifecycle { prevent_destroy = true }` eklenir. Kasitli silmek icin
once lifecycle blogunu gecici olarak kaldir.
Each server gets `lifecycle { prevent_destroy = true }`. To intentionally delete a server, temporarily remove the lifecycle block first.
## Nasil Calistirilir
## How to Run
### Hazirlik
### Preparation
**1. tfvars olustur (bir kere):**
**1. Create tfvars once:**
```bash
cd Environment_Infrastructure/terraform/hetzner/prod
cp terraform.tfvars.example terraform.tfvars
# terraform.tfvars icerigini gercek degerlerle doldur
# (hcloud_token, admin_allowed_cidrs, vb.)
# Fill terraform.tfvars with real values
# (hcloud_token, admin_allowed_cidrs, etc.)
```
`terraform.tfvars` commit edilmez — `.gitignore` ile korunur.
`terraform.tfvars` is not committed; it is protected with `.gitignore`.
**2. Provider yukle (bir kere):**
**2. Install the provider once:**
```bash
terraform init
```
### Ilk Uygulama
### First Apply
```bash
# Nelerin olusacagini goster — bozma yapma
# Show what will be created; do not make changes
terraform plan
# Onayla ve olustur
# Approve and create
terraform apply
```
`apply` sonrasi 6 sunucu, 2 firewall, 1 floating IP ve network kaynaklari Hetzner'da gorunur.
After `apply`, 6 servers, 2 firewalls, 1 floating IP, and network resources are visible in Hetzner.
### Ansible Inventory Alma
### Get Ansible Inventory
```bash
terraform output -raw ansible_inventory_yaml > \
../../ansible/inventory/generated/prod.yml
```
### Gitea Değişkeni: PROD_FLOATING_IP
### Gitea Variable: `PROD_FLOATING_IP`
Deploy pipeline DNS kayıtlarını otomatik yönetmek için bu değişkene ihtiyaç duyar. `terraform apply` sonrasında bir kez ayarlanır:
The deploy pipeline needs this variable to manage DNS records automatically. It is set once after `terraform apply`:
```bash
terraform output prod_floating_ip
```
Çıkan IP adresini Gitea → proje ayarları**Variables** altında `PROD_FLOATING_IP` adıyla ekle. Pipeline `vars.PROD_FLOATING_IP` ile okur ve GoDaddy A kayıtlarını idempotent olarak günceller.
Add the resulting IP address in Gitea -> project settings -> **Variables** with the name `PROD_FLOATING_IP`. The pipeline reads it with `vars.PROD_FLOATING_IP` and updates GoDaddy A records idempotently.
### Resize (Server Type Degistirme)
### Resize (Change Server Type)
`terraform.tfvars` icinde `server_type_swarm` veya `server_type_db` degerini degistir:
Change the `server_type_swarm` or `server_type_db` value inside `terraform.tfvars`:
```bash
terraform apply
```
Sunucu durdurulur, Hetzner Resize API cagirilir, yeniden baslatilir. Disk ve Docker volumes korunur. Downtime olur.
The server is stopped, the Hetzner Resize API is called, and the server is started again. Disk and Docker volumes are preserved. There is downtime.
### Sunucu Silme (Zorla)
### Server Deletion (Forced)
`prevent_destroy = true` oldugu icin normal `terraform destroy` hata verir. Once `servers.tf` icindeki `lifecycle` blogunu gecici kaldir:
Because `prevent_destroy = true` exists, normal `terraform destroy` fails. First, temporarily remove the `lifecycle` block inside `servers.tf`:
```hcl
# lifecycle {
@ -332,26 +323,26 @@ Sunucu durdurulur, Hetzner Resize API cagirilir, yeniden baslatilir. Disk ve Doc
# }
```
Sonra:
Then:
```bash
terraform destroy -target=hcloud_server.swarm["iklim-app-01"]
```
Islemi tamamladiktan sonra lifecycle blogunu geri ekle.
After completing the operation, add the lifecycle block back.
### State Yonetimi
### State Management
Simdilik local state kullanilmaktadir (`terraform.tfstate`). State dosyasi repo'ya commit edilmez. Ekipte birden fazla kisi calisiyorsa Hetzner Object Storage veya HCP Terraform remote state kullanilmalidir.
Local state is used for now (`terraform.tfstate`). The state file is not committed to the repo. If more than one person works on the team, Hetzner Object Storage or HCP Terraform remote state must be used.
## Kabul Kriterleri
## Acceptance Criteria
- `terraform plan` sadece prod Hetzner Project token'i ile calisir.
- 6 server olusur (`iklim-app-01/02/03`, `iklim-db-01/02/03`).
- Swarm node'lari `iklim-prod-app-spread` placement group icindedir.
- DB node'lari `iklim-prod-db-spread` placement group icindedir.
- Public firewall sadece `22`, `80`, `443` ingress'e izin verir.
- Private firewall `01-private-network-port-matrisi.md` ile uyumludur.
- DB replication portlari yalnizca DB subnet'ten erisilebilir.
- Floating IP olusur ve `iklim-app-01`'e atanir.
- Terraform state ve secret tfvars commit edilmez.
- `terraform plan` works only with the prod Hetzner Project token.
- 6 servers are created: `iklim-app-01/02/03`, `iklim-db-01/02/03`.
- Swarm nodes are inside the `iklim-prod-app-spread` placement group.
- DB nodes are inside the `iklim-prod-db-spread` placement group.
- Public firewall allows only `22`, `80`, and `443` ingress.
- Private firewall is compatible with `01-private-network-port-matrisi.md`.
- DB replication ports are accessible only from the DB subnet.
- Floating IP is created and assigned to `iklim-app-01`.
- Terraform state and secret tfvars are not committed.

View File

@ -1,12 +1,12 @@
# 07 - Prod Ansible Bootstrap
Bu aşamanın amacı Terraform ile oluşturulan prod makinelerini Linux, security hardening, Docker ve Swarm açısından hazır hale getirmektir. DB cluster yazılımı bu playbook tarafından kurulmaz; ancak DB node'ları Swarm'a worker olarak katılır.
The purpose of this phase is to prepare the prod machines created by Terraform for Linux, security hardening, Docker, and Swarm. DB cluster software is not installed by this playbook; however, DB nodes join Swarm as workers.
## Ansible Kurulumu
## Ansible Installation
Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hedef sunuculara herhangi bir ajan kurulmaz, sadece SSH erişimi yeterlidir.
Ansible must be installed on the control machine, meaning your own computer. No agent is installed on target servers; SSH access is enough.
### İşletim Sistemine Göre Kurulum
### Installation by Operating System
- **Ubuntu / Debian:**
```bash
@ -17,6 +17,7 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
export PATH="$HOME/.local/bin:$PATH"
pipx install --include-deps ansible
pipx install ansible-lint
```
- **Fedora / Rocky Linux / RHEL:**
@ -27,6 +28,7 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
export PATH="$HOME/.local/bin:$PATH"
pipx install --include-deps ansible
pipx install ansible-lint
```
- **macOS (Homebrew):**
@ -34,75 +36,76 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
brew install ansible
```
- **Python Pip ile (Her platformda):**
- **With Python Pip, on any platform:**
```bash
pipx install --include-deps ansible
pipx install ansible-lint
```
### Ek Python Bağımlılıkları
### Additional Python Dependencies
`password_hash` filtresi için `passlib` kontrol makinesinde gereklidir:
`passlib` is required on the control machine for the `password_hash` filter:
```bash
pipx inject ansible passlib
```
> `pip` ile kurduysanız: `pip install passlib`
> If you installed with `pip`: `pip install passlib`
### Kurulumun Doğrulanması
### Verify the Installation
Hangi yöntemle kurarsanız kurun, kurulumun başarılı olduğunu doğrulamak için aşağıdaki komutları kullanın:
Whichever method you used to install it, use the following commands to verify that the installation succeeded:
```bash
# Ansible versiyonunu ve yapılandırma yollarını kontrol edin
# Check the Ansible version and configuration paths
ansible --version
# Ansible binarysinin hangi konumdan çalıştığını kontrol edin
# Check which location the Ansible binary is running from
which -a ansible
```
## Ansible Komutlarını Çalıştırma
## Running Ansible Commands
Tüm komutlar `ansible/prod/` dizininden çalıştırılmalıdır. `ansible.cfg` inventory ve roles_path'i otomatik olarak tanımlar.
All commands must be run from the `ansible/prod/` directory. `ansible.cfg` automatically defines the inventory and `roles_path`.
### 0. Gerekli Collection'ları Kur (İlk kurulumda bir kez)
### 0. Install Required Collections Once During Initial Setup
```bash
ansible-galaxy collection install -r ../requirements.yml
```
### 1. Bağlantı Testi (Ping)
### 1. Connection Test (Ping)
```bash
ansible all -m ping
```
### 2. Bootstrap Playbook'unu Çalıştırma
### 2. Run the Bootstrap Playbook
```bash
ansible-playbook prod-bootstrap.yml --ask-vault-pass
```
*Not: `--ask-vault-pass` parametresi Ansible Vault parolasını sorar; StorageBox şifresi bu şekilde çözülür.*
*Note: The `--ask-vault-pass` parameter asks for the Ansible Vault password; the StorageBox password is decrypted this way.*
### 3. Sadece Belirli Bir Rolü Çalıştırma (Tags)
### 3. Run Only a Specific Role (Tags)
```bash
ansible-playbook prod-bootstrap.yml --tags "hardening" --ask-vault-pass
```
## Hedef Makineler
## Target Machines
| Host | Rol |
| Host | Role |
| --- | --- |
| `iklim-app-01` | Swarm manager + app worker |
| `iklim-app-02` | Swarm manager + app worker |
| `iklim-app-03` | Swarm manager + app worker |
| `iklim-db-01` | Manuel DB cluster node |
| `iklim-db-02` | Manuel DB cluster node |
| `iklim-db-03` | Manuel DB cluster node |
| `iklim-db-01` | Manual DB cluster node |
| `iklim-db-02` | Manual DB cluster node |
| `iklim-db-03` | Manual DB cluster node |
## Önerilen Dosya Yapısı
## Recommended File Structure
```text
ansible/
@ -130,11 +133,11 @@ ansible/
## Base Role
Tüm prod node'larına uygulanır:
Applied to all prod nodes:
- Paket cache update
- `epel-release`ayrı task olarak önce kurulur; `fail2ban`, `davfs2`, `htop`, `btop` bu repoya bağımlı
- temel paketler (`epel-release` aktif olduktan sonra):
- Package cache update
- `epel-release`installed first as a separate task; `fail2ban`, `davfs2`, `htop`, and `btop` depend on this repo
- base packages, after `epel-release` is active:
- `curl`
- `wget`
- `git`
@ -142,45 +145,45 @@ Tüm prod node'larına uygulanır:
- `tar`
- `unzip`
- `bash-completion`
- `gettext`envsubst için; CI/CD deploy pipeline'larında gerekli
- `gettext`required for envsubst in CI/CD deploy pipelines
- `tree`
- `ca-certificates`
- `fail2ban`
- `chrony`
- `python3`
- `python3-pip`
- `python3-passlib``password_hash` filtresi için (EPEL)
- `htop` — interaktif proses izleme (EPEL)
- `btop`kaynak monitörü, grafik arayüz (EPEL)
- `python3-passlib`for the `password_hash` filter (EPEL)
- `htop` — interactive process monitoring (EPEL)
- `btop`resource monitor with graphical interface (EPEL)
- timezone: `Europe/Istanbul`
- hostname ayarı
- klavye düzeni: `trq` (Türkçe Q)
- chrony/NTP aktif
- hostname setup
- keyboard layout: `trq` (Turkish Q)
- chrony/NTP active
## Security Hardening Role
Tüm prod node'larına uygulanır:
Applied to all prod nodes:
- SSH password auth kapatılır.
- Root SSH login kapatılır.
- Sadece SSH key auth kalır.
- SSH password auth is disabled.
- Root SSH login is disabled.
- Only SSH key auth remains.
- `PermitEmptyPasswords no`
- `MaxAuthTries 3`
- `fail2ban` aktif edilir.
- `dnf-automatic` ile otomatik güvenlik güncelleştirmeleri aktif edilir.
- `iklim` sistem kullanıcısı oluşturulur; `wheel` grubuna eklenir (şifre vault'tan alınır).
- `fail2ban` is enabled.
- Automatic security updates are enabled with `dnf-automatic`.
- The `iklim` system user is created and added to the `wheel` group; the password is read from vault.
- `firewalld` default: incoming deny (drop zone), outgoing allow.
- SSH kuralı önce `drop` zone'a rich rule olarak yazılır, ardından default zone `drop` yapılır.
- SSH sadece admin CIDR'dan açılır.
- DB portları public açılmaz.
- The SSH rule is first written as a rich rule to the `drop` zone, then the default zone is set to `drop`.
- SSH is opened only from the admin CIDR.
- DB ports are not opened publicly.
Hetzner Cloud Firewall asıl perimeter kabul edilir. firewalld host üzerinde ikinci savunma katmanıdır.
The Hetzner Cloud Firewall is considered the actual perimeter. firewalld is the second defense layer on the host.
## Docker Role
Tüm prod node'larında (hem app hem db) zorunludur. DB node'ları Swarm Worker olarak ağa dahil olacağı için Docker Engine her makinede kurulu olmalıdır.
Required on all prod nodes, both app and db. Because DB nodes join the network as Swarm Workers, Docker Engine must be installed on every machine.
Kurulacak paketler:
Packages to install:
- `docker-ce`
- `docker-ce-cli`
@ -188,33 +191,36 @@ Kurulacak paketler:
- `docker-buildx-plugin`
- `docker-compose-plugin`
Kurulum resmi Docker dnf repository üzerinden yapılacak (`https://download.docker.com/linux/rhel/docker-ce.repo`).
Installation will be done through the official Docker dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`).
## Swarm Role
Prod Swarm 3 manager ile kurulacak:
Prod Swarm will be set up with 3 managers:
1. `iklim-app-01` üzerinde `docker swarm init` (Advertise/data path addr: `10.20.10.11`)
2. `iklim-app-02` ve `iklim-app-03` manager olarak join olur.
3. `iklim-db-01/02/03` worker olarak join olur.
4. Overlay network oluşturulur: `iklimco-net`
5. Node etiketleri:
1. `docker swarm init` on `iklim-app-01` (Advertise/data path addr: `10.20.10.11`)
2. `iklim-app-02` and `iklim-app-03` join as managers.
3. `iklim-db-01/02/03` join as workers.
4. Overlay network is created: `iklimco-net`
5. Node labels:
- `iklim-app-*` -> `type=service`
- `iklim-db-*` -> `role=db`, `db-index=01/02/03` (Patroni node koordinasyonu için)
6. Tüm node'lar `AVAILABILITY=Active` kalır.
- `iklim-db-*` -> `role=db`, `db-index=01/02/03`, for Patroni node coordination
6. All nodes remain `AVAILABILITY=Active`.
`db-index` etiketleri prod-bootstrap.yml içinde ayrı bir play ile `iklim-app-01` üzerinden eklenir (swarm role tarafından değil).
The `db-index` labels are added through `iklim-app-01` in a separate play inside `prod-bootstrap.yml`, not by the swarm role.
## Node Directory Role
Tüm `iklim-app-*` node'larında:
On all `iklim-app-*` nodes:
```text
/opt/iklimco/ssl
/opt/iklimco/init
/opt/iklimco/stacks
/opt/iklimco/vault/data
```
DB node'larında:
`/opt/iklimco/vault/data` is the host path volume of the Vault Raft node; it must be created separately on every app node. Swarm does not manage this directory as an overlay volume; if it is missing, the Vault container will not start.
On DB nodes:
```text
/opt/iklimco/db
/opt/iklimco/backup
@ -222,23 +228,23 @@ DB node'larında:
## StorageBox DAVFS Mount Role
Her node'a uygulanır (tüm `iklim-app-*` ve `iklim-db-*`).
Applied to every node, all `iklim-app-*` and `iklim-db-*`.
### Prod Sub-Account
| Parametre | Değişken | Değer |
| Parameter | Variable | Value |
| --- | --- | --- |
| Ana hesap | `storagebox_account` | `u469968` |
| Main account | `storagebox_account` | `u469968` |
| Sub-account | `storagebox_user` | `u469968-sub5` |
| WebDAV URL | `storagebox_url` | `https://u469968-sub5.your-storagebox.de/` |
| Mount point | `storagebox_mount_point` | `/mnt/storagebox` |
## StorageBox SSH Key Role
Her node'a uygulanır. Sunucu üzerinde `/root/.ssh/id_ed25519_storagebox` ed25519 anahtar çifti üretilir. Üretilen public key'in StorageBox ana hesabına yüklenmesi (SSH authorized_keys) ayrı bir manuel adımdır:
Applied to every node. The `/root/.ssh/id_ed25519_storagebox` ed25519 key pair is generated on the server. Uploading the generated public key to the StorageBox main account (SSH authorized_keys) is a separate manual step:
```bash
# Her node için:
# For each node:
cat /root/.ssh/id_ed25519_storagebox.pub | \
ssh -p 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de \
"cat >> .ssh/authorized_keys"
@ -246,15 +252,15 @@ cat /root/.ssh/id_ed25519_storagebox.pub | \
## Act Runner Role
`iklim-app-*` node'larına uygulanır. Her app node'a Gitea Act Runner kurulur ve systemd servisi olarak başlatılır. Prod ortamında 3 app node üzerinde runner çalışır; deploy pipeline bu runner'lardan herhangi birinde tetiklenebilir.
Applied to `iklim-app-*` nodes. Gitea Act Runner is installed on each app node and started as a systemd service. In prod, the runner runs on 3 app nodes; the deploy pipeline can be triggered on any of these runners.
## DB Stack Role
`iklim-db-*` node'larına uygulanır. MongoDB için `/opt/iklimco/db/mongodb/config/` dizinini ve `mongod.conf` dosyasını oluşturur. `group_vars/prod.yml` içinde tanımlı `mongodb_replset_name: "rs0"` değişkeniyle `mongod.conf` replicaSet ve keyFile bloklarını otomatik içerir.
Applied to `iklim-db-*` nodes. On each DB node, it creates `/opt/iklimco/db` and `/opt/iklimco/backup` directories, as well as a local reference directory for MongoDB. The actual production configuration, including node-specific `mongod.conf`, replica set auth key, Patroni, and etcd configurations, is set up on StorageBox at `/mnt/storagebox/prod/db/mongodb-0X/config/`, `/mnt/storagebox/prod/db/postgresql-0X/config/`, and `/mnt/storagebox/prod/db/etcd-0X/data/` in the `08-prod-db-cluster-kurulum.md` step.
## /opt/iklimco/stacks/.env
DB cluster stack'lerinin gerektirdiği şifre değişkenleri `/opt/iklimco/stacks/.env` dosyasında saklanır. Bu dosya StorageBox'ta `prod/secrets/iklim.co/.env.stacks` olarak tutulur. İlk deploy öncesinde `iklim-app-01` üzerinde aşağıdaki komutla çekilir:
Password variables required by the DB cluster stacks are stored in the `/opt/iklimco/stacks/.env` file. This file is stored on StorageBox as `prod/secrets/iklim.co/.env.stacks`. Before the first deploy, it is fetched on `iklim-app-01` with the following command:
```bash
scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \
@ -262,36 +268,58 @@ scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.
chmod 600 /opt/iklimco/stacks/.env
```
## Swarm Kurulum Doğrulaması
## StorageBox Directory Structure
Bootstrap sonrası aşağıdaki komutlarla Swarm durumu kontrol edilir:
After Ansible bootstrap is completed and before the infra stack is deployed, create the following directories on `iklim-app-01`; StorageBox must be mounted:
```bash
# 6 node: 3 manager (Leader/Reachable), 3 worker (Ready)
# SWAG certificate and configuration directories
mkdir -p /mnt/storagebox/ssl
mkdir -p /mnt/storagebox/swag/config
mkdir -p /mnt/storagebox/swag/site-confs
# Monitoring data directories; Grafana on StorageBox, Prometheus on local volume
mkdir -p /mnt/storagebox/grafana/data
mkdir -p /mnt/storagebox/prometheus/data
# Image directory for the precipitation service
mkdir -p /mnt/storagebox/precipitation/images
```
These directories match the `SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, `SWAG_SITE_CONFS_DIR`, `GRAFANA_DATA_DIR`, and `PROMETHEUS_DATA_DIR` variables in `env-prod/.env`. Because StorageBox is mounted at the same `/mnt/storagebox` path on all app nodes, these directories are created only once and all nodes access them commonly.
## Swarm Setup Verification
After bootstrap, check the Swarm status with the following commands:
```bash
# 6 nodes: 3 managers (Leader/Reachable), 3 workers (Ready)
docker node ls
# App node etiketi
# App node label
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
# Beklenen: map[type:service]
# Expected: map[type:service]
# DB node etiketi
# DB node label
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
# Beklenen: map[db-index:01 role:db]
# Expected: map[db-index:01 role:db]
# swarm-init.sh idempotency — zaten aktif Swarm'da tekrar init denemez
# swarm-init.sh idempotency — do not attempt init again in an already active Swarm
grep -n "swarm init\|swarm join" init/swarm-init.sh
```
## Kabul Kriterleri
## Acceptance Criteria
- `ansible all -m ping` başarılı olur.
- 3 Swarm manager node `docker node ls` içinde Leader/Reachable görünür.
- 3 DB node `docker node ls` içinde Worker olarak görünür.
- Manager quorum sağlanır (3 manager, 1 kayıp tolere edilir).
- `iklimco-net` overlay network vardır.
- Node etiketleri (`type=service`, `role=db`, `db-index=01/02/03`) inspect ile doğrulanır.
- `swarm-init.sh` aktif Swarm'da tekrar init denemez (idempotent).
- Her node'da `/mnt/storagebox` mount edilmiştir.
- Her app node'da Gitea Act Runner servisi çalışmaktadır.
- DB node'larında `/opt/iklimco/db/mongodb/config/mongod.conf` oluşturulmuştur ve `replSetName: rs0` içermektedir.
- Public firewall sadece `22`, `80`, `443` ingress'e izin verir.
- `ansible all -m ping` succeeds.
- 3 Swarm manager nodes appear as Leader/Reachable in `docker node ls`.
- 3 DB nodes appear as Workers in `docker node ls`.
- Manager quorum is provided: 3 managers, 1 loss tolerated.
- The `iklimco-net` overlay network exists.
- Node labels (`type=service`, `role=db`, `db-index=01/02/03`) are verified with inspect.
- `swarm-init.sh` does not attempt init again in an active Swarm; it is idempotent.
- `/mnt/storagebox` is mounted on every node.
- The `/opt/iklimco/vault/data` directory exists on every app node.
- The `ssl`, `swag/config`, `swag/site-confs`, `grafana/data`, `prometheus/data`, and `precipitation/images` directories exist on StorageBox.
- The Gitea Act Runner service is running on every app node.
- `/opt/iklimco/db` and `/opt/iklimco/backup` directories exist on DB nodes. Node-specific `mongod.conf` and other DB configurations are created on StorageBox (`/mnt/storagebox/prod/db/...`) in the `08-prod-db-cluster-kurulum.md` step.
- Public firewall allows only `22`, `80`, and `443` ingress.

View File

@ -1,10 +1,10 @@
# 08 - Prod DB Cluster Kurulumu (Swarm)
# 08 - Prod DB Cluster Setup (Swarm)
Bu aşamanın amacı üç DB node'unu Docker Swarm'a worker olarak eklemek, MongoDB replica set ve Patroni + etcd ile yönetilen PostgreSQL yüksek erişilebilirlik konfigürasyonunu yapmaktır.
The purpose of this phase is to add the three DB nodes to Docker Swarm as workers and configure the MongoDB replica set and the PostgreSQL high-availability setup managed with Patroni + etcd.
`07-prod-ansible-bootstrap.md` tüm DB node'larında tamamlanmış olmalıdır.
`07-prod-ansible-bootstrap.md` must be completed on all DB nodes.
## Mimari
## Architecture
```
iklim-app-01/02/03 (Swarm manager'lar, 10.20.10.11/12/13)
@ -14,7 +14,7 @@ iklim-app-01/02/03 (Swarm manager'lar, 10.20.10.11/12/13)
iklim-db-01 (Swarm worker, 10.20.20.11)
mongodb-01 [rs0 member 0 — preferred primary]
etcd-01 [etcd cluster member]
patroni-01 [Patroni + PostgreSQL — ilk primary adayı]
patroni-01 [Patroni + PostgreSQL — first primary candidate]
iklim-db-02 (Swarm worker, 10.20.20.12)
mongodb-02 [rs0 member 1]
@ -27,13 +27,13 @@ iklim-db-03 (Swarm worker, 10.20.20.13)
patroni-03 [Patroni + PostgreSQL — standby]
```
DB container'ları birbirlerini overlay DNS adıyla değil, **Hetzner private IP üzerinden** tanıyor. Bu nedenle her servis portunu `host` modda yayımlar; replikasyon ve etcd trafiği doğrudan private network üzerinden gecer. Hetzner Cloud firewall ve prod `db` firewall zaten bu portlara izin vermektedir.
DB containers discover each other through **Hetzner private IPs**, not overlay DNS names. Therefore, each service publishes its port in `host` mode; replication and etcd traffic goes directly through the private network. The Hetzner Cloud firewall and the prod `db` firewall already allow these ports.
## 1. Firewall Güncellemesi
## 1. Firewall Update
`terraform/hetzner/prod/firewall.tf` dosyasında aşağıdaki kuralların mevcut olduğunu doğrula; eksik varsa ekle ve `terraform apply` çalıştır.
Verify that the following rules exist in `terraform/hetzner/prod/firewall.tf`; if any are missing, add them and run `terraform apply`.
`hcloud_firewall.swarm` içinde (DB subnet'ten Swarm portlarına):
Inside `hcloud_firewall.swarm`, from the DB subnet to Swarm ports:
```hcl
rule {
@ -69,7 +69,7 @@ rule {
}
```
`hcloud_firewall.db` içinde (app subnet'ten Swarm portlarına + overlay; DB subnet içi etcd/Patroni trafiği):
Inside `hcloud_firewall.db`, from the app subnet to Swarm ports + overlay, and etcd/Patroni traffic inside the DB subnet:
```hcl
rule {
@ -112,6 +112,14 @@ rule {
description = "etcd client port within DB subnet"
}
rule {
direction = "in"
protocol = "tcp"
port = "2379"
source_ips = [local.app_subnet_cidr]
description = "etcd client port from app subnet (APISIX connects to Patroni etcd)"
}
rule {
direction = "in"
protocol = "tcp"
@ -135,7 +143,7 @@ terraform plan
terraform apply
```
## 2. DB Node'larını Swarm'a Ekleme
## 2. Add DB Nodes to Swarm
**Swarm manager'lardan birinde** (iklim-app-01) join token al:
@ -149,7 +157,7 @@ docker swarm join-token worker
docker swarm join --token <TOKEN> 10.20.10.11:2377
```
**iklim-app-01 üzerinde** node'ları etiketle:
Label the nodes **on iklim-app-01**:
```bash
docker node update --label-add role=db --label-add db-index=01 iklim-db-01
@ -159,22 +167,22 @@ docker node update --label-add role=db --label-add db-index=03 iklim-db-03
docker node ls
```
## 3. StorageBox Dizin Yapısı
## 3. StorageBox Directory Structure
Her DB node'unda (`/mnt/storagebox` zaten mount edilmiş olmalı):
On each DB node, where `/mnt/storagebox` must already be mounted:
```bash
# iklim-db-01 üzerinde:
# On iklim-db-01:
mkdir -p /mnt/storagebox/prod/db/mongodb-01/{data,log,config}
mkdir -p /mnt/storagebox/prod/db/postgresql-01/{data,config}
mkdir -p /mnt/storagebox/prod/db/etcd-01/data
# iklim-db-02 üzerinde:
# On iklim-db-02:
mkdir -p /mnt/storagebox/prod/db/mongodb-02/{data,log,config}
mkdir -p /mnt/storagebox/prod/db/postgresql-02/{data,config}
mkdir -p /mnt/storagebox/prod/db/etcd-02/data
# iklim-db-03 üzerinde:
# On iklim-db-03:
mkdir -p /mnt/storagebox/prod/db/mongodb-03/{data,log,config}
mkdir -p /mnt/storagebox/prod/db/postgresql-03/{data,config}
mkdir -p /mnt/storagebox/prod/db/etcd-03/data
@ -209,14 +217,14 @@ security:
### Replica Set Auth Key
Tüm DB node'larında **aynı** key dosyası olmalıdır:
The **same** key file must exist on all DB nodes:
```bash
# iklim-db-01 üzerinde oluştur:
# Create on iklim-db-01:
openssl rand -base64 756 > /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key
chmod 400 /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key
# Aynı içeriği diğer node'lara kopyala:
# Copy the same content to the other nodes:
cat /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key \
> /mnt/storagebox/prod/db/mongodb-02/config/rs-auth.key
cat /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key \
@ -225,7 +233,7 @@ cat /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key \
chmod 400 /mnt/storagebox/prod/db/mongodb-0{2,3}/config/rs-auth.key
```
### Stack Dosyası — MongoDB
### Stack File — MongoDB
`/opt/iklimco/stacks/prod-db-mongo.yml`:
@ -313,16 +321,16 @@ services:
condition: on-failure
```
### Replica Set Başlangıç
### Replica Set Initialization
Stack deploy edildikten sonra **bir kez** çalıştırılır:
Run **once** after the stack is deployed:
```bash
# iklim-db-01 üzerinde:
# On iklim-db-01:
docker exec -it $(docker ps -q -f name=iklim-db_mongodb-01) mongosh \
-u mongo-root -p "${MONGO_ROOT_PASSWORD}" --authenticationDatabase admin
# mongosh içinde:
# Inside mongosh:
rs.initiate({
_id: "rs0",
members: [
@ -332,19 +340,19 @@ rs.initiate({
]
})
# Durum kontrol:
# Status check:
rs.status()
```
`"stateStr": "PRIMARY"` ve iki `"SECONDARY"` görülünce replica set hazırdır.
The replica set is ready when `"stateStr": "PRIMARY"` and two `"SECONDARY"` entries are visible.
## 5. PostgreSQL — Patroni + etcd
Patroni, PostgreSQL primary/standby rollerini etcd üzerinden koordine eder. Primary düşerse diğer node'lardan biri otomatik olarak seçim kazanır ve primary olur. Swarm servisi container'ı yeniden başlatır; Patroni kaldığı yerden devam eder.
Patroni coordinates PostgreSQL primary/standby roles through etcd. If the primary goes down, one of the other nodes automatically wins the election and becomes primary. The Swarm service restarts the container; Patroni continues from where it left off.
### 5.1 Özel Image (Patroni + PostGIS)
### 5.1 Custom Image (Patroni + PostGIS)
`postgis/postgis:17-3.5` imajı üzerine Patroni kurulur. Bu imaj Harbor'a push edilip stack'te kullanılır.
Patroni is installed on top of the `postgis/postgis:17-3.5` image. This image is pushed to Harbor and used in the stack.
`Environment_Infrastructure/docker/patroni-postgis/Dockerfile`:
@ -368,7 +376,7 @@ USER postgres
ENTRYPOINT ["patroni", "/etc/patroni/patroni.yml"]
```
Build ve push (`ops/push-harbor-custom-images.sh` ile yapılır veya aşağıdaki komutları çalıştır):
Build and push; this is done with `ops/push-harbor-custom-images.sh`, or run the commands below:
```bash
cd Environment_Infrastructure/docker/patroni-postgis
@ -377,9 +385,9 @@ echo "$HARBOR_CI_TOKEN" | docker login registry.tarla.io -u robot-ci-push-iklimc
docker push registry.tarla.io/iklimco/patroni-postgis:17-3.5
```
### 5.2 etcd Kümesi
### 5.2 etcd Cluster
#### Stack Dosyası — etcd
#### Stack File — etcd
`/opt/iklimco/stacks/prod-db-etcd.yml`:
@ -491,11 +499,13 @@ services:
condition: on-failure
```
**Önemli:** `ETCD_INITIAL_CLUSTER_STATE` değeri ilk deploy'da `new`, sonraki tüm deploy'larda `existing` olmalıdır. Yanlış değer bırakılırsa data dizini sıfırlanır. Aşağıdaki Section 6'daki deploy adımları bu durumu otomatik tespit eder — manuel güncelleme gerekmez.
**APISIX etcd usage:** In prod, APISIX shares this etcd cluster with the `/apisix` prefix. Patroni uses the `/service/` prefix and APISIX uses the `/apisix/` prefix, so there is no collision. APISIX configuration is managed by the `config.yaml` file in the `docker-stack-infra.prod.yml` overlay; the connection is made to `http://iklim-db-01:2379,http://iklim-db-02:2379,http://iklim-db-03:2379`. Therefore, the app subnet -> DB nodes port 2379 firewall rule is mandatory; it was added in Section 1.
### 5.3 Patroni Konfigürasyonu
**Important:** `ETCD_INITIAL_CLUSTER_STATE` must be `new` on the first deploy and `existing` on all later deploys. If the wrong value is left in place, the data directory is reset. The deploy steps in Section 6 below detect this automatically; no manual update is required.
Her node için ayrı bir `patroni.yml` dosyası oluşturulur. Farklılıklar yalnızca `name` ve `connect_address` alanlarındadır.
### 5.3 Patroni Configuration
A separate `patroni.yml` file is created for each node. The only differences are the `name` and `connect_address` fields.
**Node 01** — `/mnt/storagebox/prod/db/postgresql-01/config/patroni.yml`:
@ -570,7 +580,7 @@ tags:
**Node 02** — `/mnt/storagebox/prod/db/postgresql-02/config/patroni.yml`:
Node 01 ile aynı içerik, yalnızca şu alanlar farklı:
Same content as Node 01; only the following fields differ:
```yaml
name: postgresql-02
@ -596,7 +606,7 @@ postgresql:
data_dir: /var/lib/postgresql/data/pgdata
```
### 5.4 Stack Dosyası — Patroni
### 5.4 Stack File — Patroni
`/opt/iklimco/stacks/prod-db-patroni.yml`:
@ -696,66 +706,66 @@ services:
condition: on-failure
```
### 5.5 Durum Kontrolü
### 5.5 Status Check
```bash
# Herhangi bir DB node'unda:
# On any DB node:
docker exec -it $(docker ps -q -f name=iklim-patroni_patroni-01) \
patronictl -c /etc/patroni/patroni.yml list
```
Beklenen çıktı: bir `Leader` ve iki `Replica` satırı, hepsinin `State` sütunu `running`.
Expected output: one `Leader` row and two `Replica` rows, all with the `State` column set to `running`.
```bash
# etcd cluster sağlığı:
# etcd cluster health:
docker exec -it $(docker ps -q -f name=iklim-etcd_etcd-01) \
etcdctl endpoint health \
--endpoints=http://10.20.20.11:2379,http://10.20.20.12:2379,http://10.20.20.13:2379
```
```bash
# Mevcut primary'i öğren:
# Find the current primary:
docker exec -it $(docker ps -q -f name=iklim-patroni_patroni-01) \
patronictl -c /etc/patroni/patroni.yml topology
```
## 6. Deploy
Sıra önemlidir: önce etcd, ardından MongoDB ve Patroni stack'leri.
Order matters: etcd first, then the MongoDB and Patroni stacks.
### .env Dosyası
### .env File
`/opt/iklimco/stacks/.env` dosyası StorageBox'ta `prod/secrets/iklim.co/.env.stacks` olarak saklanır. İlk kez oluşturulurken güçlü şifrelerle doldurulup StorageBox'a yüklenir; sonraki deploy'larda buradan çekilir:
The `/opt/iklimco/stacks/.env` file is stored on StorageBox as `prod/secrets/iklim.co/.env.stacks`. When it is created the first time, it is filled with strong passwords and uploaded to StorageBox; later deploys fetch it from there:
```bash
# iklim-app-01 üzerinde (bir kez):
# On iklim-app-01, once:
scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \
/opt/iklimco/stacks/.env
chmod 600 /opt/iklimco/stacks/.env
```
Dosya içeriği (`/opt/iklimco/stacks/.env`, repo'ya commit edilmez):
File content (`/opt/iklimco/stacks/.env`, not committed to the repo):
```env
DATABASE_POSTGRES_ROOT_USER=postgres
POSTGRES_PASSWORD=<güçlü-şifre>
REPLICATOR_PASSWORD=<güçlü-şifre>
MONGO_ROOT_PASSWORD=<güçlü-şifre>
POSTGRES_PASSWORD=<strong-password>
REPLICATOR_PASSWORD=<strong-password>
MONGO_ROOT_PASSWORD=<strong-password>
```
### Deploy Adımları
### Deploy Steps
```bash
# iklim-app-01 üzerinde (Swarm manager):
# On iklim-app-01 (Swarm manager):
export $(cat /opt/iklimco/stacks/.env | xargs)
# ETCD_INITIAL_CLUSTER_STATE otomatik tespiti — ilk deploy'da 'new', sonrakinde 'existing'
# Automatic ETCD_INITIAL_CLUSTER_STATE detection — 'new' on first deploy, 'existing' afterwards
ETCD_STATE="new"
if docker service ls --filter name=iklim-etcd -q 2>/dev/null | grep -q .; then
echo " etcd servisleri mevcut, 'existing' state kullanılıyor..."
echo " etcd services exist, using 'existing' state..."
ETCD_STATE="existing"
else
echo " İlk deploy, 'new' state kullanılıyor..."
echo " First deploy, using 'new' state..."
fi
sed -i \
"s/ETCD_INITIAL_CLUSTER_STATE: new/ETCD_INITIAL_CLUSTER_STATE: ${ETCD_STATE}/g; \
@ -769,14 +779,14 @@ docker stack deploy \
--with-registry-auth \
iklim-etcd
# etcd cluster'ın kurulmasını bekle:
# Wait for the etcd cluster to be ready:
echo "⏳ etcd bekleniyor..."
for i in $(seq 1 18); do
if docker exec $(docker ps -q -f name=iklim-etcd_etcd-01 | head -1) \
etcdctl endpoint health \
--endpoints=http://10.20.20.11:2379,http://10.20.20.12:2379,http://10.20.20.13:2379 \
2>/dev/null | grep -q "is healthy"; then
echo "✅ etcd hazır"
echo "✅ etcd ready"
break
fi
[ "$i" -eq 18 ] && echo "❌ etcd timeout" && exit 1
@ -801,15 +811,15 @@ docker stack services iklim-db
docker stack services iklim-patroni
```
### MongoDB Replica Set Başlatma
### MongoDB Replica Set Initialization
MongoDB stack deploy edildikten sonra bir kez çalıştırılır:
Run once after the MongoDB stack is deployed:
```bash
docker exec -it $(docker ps -q -f name=iklim-db_mongodb-01) mongosh \
-u mongo-root -p "${MONGO_ROOT_PASSWORD}" --authenticationDatabase admin
# mongosh içinde:
# Inside mongosh:
rs.initiate({
_id: "rs0",
members: [
@ -820,46 +830,88 @@ rs.initiate({
})
```
## 7. App Servislerinden Erişim
## 7. Access from App Services
App containers connect to DB services through the `iklimco-net` overlay network **by Swarm DNS name**. Because the MongoDB stack (`iklim-db`) and Patroni stack (`iklim-patroni`) share the `iklimco-net` external network, service names are resolved through overlay DNS.
### MongoDB Replica Set Connection String
Variables in `env-prod/.env`:
```bash
DATABASE_MONGODB_HOST=mongodb-01:27017,mongodb-02:27017,mongodb-03:27017
DATABASE_MONGODB_PARAMS=replicaSet=rs0&readPreference=secondaryPreferred&authSource=admin
```
mongodb://mongo-root:<SIFRE>@10.20.20.11:27017,10.20.20.12:27017,10.20.20.13:27017/<db>?replicaSet=rs0&authSource=admin
Microservice URI through overlay DNS:
```
mongodb://<user>:<password>@mongodb-01:27017,mongodb-02:27017,mongodb-03:27017/<db>?replicaSet=rs0&readPreference=secondaryPreferred&authSource=admin
```
> For direct testing, from outside the overlay with private IP:
> `mongodb://mongo-root:<PASSWORD>@10.20.20.11:27017,10.20.20.12:27017,10.20.20.13:27017/admin?replicaSet=rs0&authSource=admin`
### PostgreSQL — Patroni
Patroni her an primary olan node'u yönetir. Uygulama katmanı tüm üç IP'yi vererek primary'e yazabilir, secondary'den okuyabilir:
Variables in `env-prod/.env`:
```
# Yazma — sadece primary kabul eder:
jdbc:postgresql://10.20.20.11:5432,10.20.20.12:5432,10.20.20.13:5432/iklimdb?targetServerType=primary
# Okuma (yük dengeleme):
jdbc:postgresql://10.20.20.11:5432,10.20.20.12:5432,10.20.20.13:5432/iklimdb?targetServerType=preferSecondary
```bash
DATABASE_POSTGRES_HOST=patroni-01:5432,patroni-02:5432,patroni-03:5432
DATABASE_POSTGRES_PARAMS=targetServerType=preferSecondary&loadBalanceHosts=true
```
PostgreSQL JDBC sürücüsü `targetServerType=primary` ile bağlanmaya çalışacağı tüm node'lara bağlanır ve primary olanı otomatik bulur.
Patroni manages whichever node is primary at any moment. The JDBC/libpq driver automatically selects primary/secondary through the `targetServerType` parameter in the multi-host list:
```
# Write — goes to primary (libpq URI):
postgresql://<user>@patroni-01:5432,patroni-02:5432,patroni-03:5432/<db>?targetServerType=primary
# Read (load balancing):
postgresql://<user>@patroni-01:5432,patroni-02:5432,patroni-03:5432/<db>?targetServerType=preferSecondary&loadBalanceHosts=true
```
> For direct testing, from outside the overlay with private IP:
> `postgresql://postgres@10.20.20.11:5432,10.20.20.12:5432,10.20.20.13:5432/postgres?targetServerType=primary`
The PostgreSQL JDBC/libpq driver connects to all listed nodes with `targetServerType=primary` and automatically finds the primary.
### Patroni REST API
Patroni, 8008 portundan HTTP endpoint sunar. Bu endpoint HAProxy veya benzeri bir load balancer ile kullanılarak primary'i otomatik yönlendirme sağlanabilir:
Patroni exposes an HTTP endpoint on port 8008. This endpoint can be used with HAProxy or a similar load balancer to route to the primary automatically:
```bash
# Primary kontrolü (HTTP 200 = primary, HTTP 503 = replica):
# Primary check (HTTP 200 = primary, HTTP 503 = replica):
curl -s http://10.20.20.11:8008/primary
```
## Kabul Kriterleri
## 8. Geliştirici ve Ofis Erişimi (Production)
- `docker stack services iklim-etcd` — üç servis `1/1`
- `docker stack services iklim-db` — üç MongoDB servisi `1/1`
- `docker stack services iklim-patroni` — üç Patroni servisi `1/1`
- `patronictl list` — 1 `Leader`, 2 `Replica`, hepsi `running`
- `etcdctl endpoint health` — üç endpoint `healthy`
Prod cluster yapısında `pg-proxy` veya `mongo-proxy` **kullanılmaz**. Ofis bilgisayarından erişim için doğrudan DB subnet'i hedef alınır.
### WireGuard Ayarı
Ofis bilgisayarındaki `.conf` dosyasında `AllowedIPs` güncellenmelidir:
`AllowedIPs = 10.8.0.1/32, 10.20.20.0/24`
### Bağlantı Parametreleri (Multi-Host)
Modern veritabanı araçları (DBeaver, Compass vb.) küme farkındalıklı bağlantı kurmalıdır:
| Veritabanı | Host Listesi | Port | Kritik Parametre |
| :--- | :--- | :--- | :--- |
| **PostgreSQL** | `10.20.20.11, 10.20.20.12, 10.20.20.13` | `5432` | `targetServerType=primary` |
| **MongoDB** | `10.20.20.11, 10.20.20.12, 10.20.20.13` | `27017` | `replicaSet=rs0` |
## Acceptance Criteria
- `docker stack services iklim-etcd` — three services `1/1`
- `docker stack services iklim-db` — three MongoDB services `1/1`
- `docker stack services iklim-patroni` — three Patroni services `1/1`
- In the output of `docker service ps iklim-patroni_patroni-01`, `patroni-02`, and `patroni-03`, every task runs on an `iklim-db-*` node through the `role=db` placement constraint.
- In the output of `docker service ps iklim-db_mongodb-01`, `mongodb-02`, and `mongodb-03`, every task runs on an `iklim-db-*` node.
- In the output of `docker service ps iklim-etcd_etcd-01`, `etcd-02`, and `etcd-03`, every task runs on an `iklim-db-*` node.
- `patronictl list` — 1 `Leader`, 2 `Replica`, all `running`
- `etcdctl endpoint health` — three endpoints `healthy`
- `rs.status()` — 1 PRIMARY, 2 SECONDARY
- App node'larından MongoDB ve PostgreSQL'e erişim sağlanır
- `5432`, `27017`, `2379`, `2380`, `8008` portları public internet'ten kapalıdır
- Bir DB node yeniden başlatıldığında Patroni otomatik seçim yapar, yeni primary belirlenir
- Patroni primary geçişi sırasında eski primary standby olarak re-join olur (split-brain yoktur)
- MongoDB and PostgreSQL are reachable from app nodes.
- Ports `5432`, `27017`, `2379`, `2380`, and `8008` are closed from the public internet.
- When a DB node is restarted, Patroni performs automatic election and a new primary is selected.
- During Patroni primary transition, the old primary rejoins as standby; there is no split-brain.

View File

@ -1,10 +1,10 @@
# 09 - Prod Runner HA ve Swarm Deploy Modeli
# 09 - Prod Runner HA and Swarm Deploy Model
Bu asamanin amaci prod ortaminda Gitea Actions runner'lari HA calisacak sekilde kurmak ve Swarm uzerinde servislerin 3 node'a dagitilmasina uygun on kosullari tanimlamaktir.
The purpose of this phase is to set up Gitea Actions runners in prod so they run in HA mode and define the prerequisites for distributing services across 3 nodes on Swarm.
## Runner Sayisi
## Runner Count
Tek runner fonksiyonel olarak yeterlidir, ancak HA degildir. Prod hedefi HA oldugu icin `act_runner` 3 Swarm manager node'unun tamamına systemd servisi olarak kurulacak:
A single runner is functionally enough, but it is not HA. Because the prod target is HA, `act_runner` will be installed as a systemd service on all 3 Swarm manager nodes:
| Host | Runner |
| --- | --- |
@ -12,13 +12,13 @@ Tek runner fonksiyonel olarak yeterlidir, ancak HA degildir. Prod hedefi HA oldu
| `iklim-app-02` | `act_runner` systemd |
| `iklim-app-03` | `act_runner` systemd |
Bu modelde herhangi bir manager/runner kaybedilirse diger runner'lar pipeline job'larini alabilir.
In this model, if any manager/runner is lost, the other runners can pick up pipeline jobs.
## Runner Kurulum Modeli
## Runner Installation Model
Runner Docker container olarak calismayacak. Docker socket mount yok.
The runner will not run as a Docker container. There is no Docker socket mount.
Kurulum:
Installation:
- `gitea-runner` sistem kullanicisi
- `/usr/local/bin/act_runner`
@ -26,11 +26,11 @@ Kurulum:
- `/var/lib/gitea-runner`
- `gitea-act-runner.service`
Runner job'lari deploy icin Docker CLI kullanacaksa `gitea-runner` kullanicisinin Docker daemon erisimi gerekir. Docker group uyeligi root seviyesine yakin yetki kabul edilir; sadece guvenilir repo/job'lar bu runner label'larini kullanmalidir.
If runner jobs use Docker CLI for deploy, the `gitea-runner` user needs access to the Docker daemon. Docker group membership is considered close to root-level permission; only trusted repos/jobs should use these runner labels.
## Runner Label PolitikasI
## Runner Label Policy
Tum prod runner'larda ortak label:
Shared labels on all prod runners:
```text
prod-runner
@ -39,7 +39,7 @@ swarm-manager
ubuntu-24.04
```
Node-spesifik label'lar:
Node-specific labels:
```text
iklim-app-01
@ -47,30 +47,30 @@ iklim-app-02
iklim-app-03
```
Mevcut prod workflow'lari `runs-on: prod-runner` kullaniyorsa 3 runner'dan herhangi biri job'u alabilir. Belirli bir node'a sabitlemek gerekirse node-spesifik label kullanilir.
If existing prod workflows use `runs-on: prod-runner`, any of the 3 runners can pick up the job. If pinning to a specific node is required, use a node-specific label.
## Deploy Yarismasi Riski
## Deploy Race Risk
Birden fazla runner oldugunda ayni anda birden fazla deploy job'u calisabilir. Bu HA icin iyidir ama ortak kaynaklarda yarisma riski yaratabilir.
When there is more than one runner, multiple deploy jobs can run at the same time. This is good for HA, but it can create race risk on shared resources.
Riskli alanlar:
Risk areas:
- Ayni stack uzerinde es zamanli `docker stack deploy`
- Ayni servis icin es zamanli `docker service update`
- StorageBox'ta ayni `.env` veya manifest dosyasinin es zamanli guncellenmesi
- Root altyapi pipeline'i ile mikroservis deploy pipeline'inin ayni anda calismasi
- Concurrent `docker stack deploy` on the same stack
- Concurrent `docker service update` for the same service
- Concurrent updates to the same `.env` or manifest file on StorageBox
- Root infrastructure pipeline and microservice deploy pipeline running at the same time
Gerekli onlem:
Required measure:
- Prod root altyapi deploy'u manuel/onayli calismali.
- Ayni servis icin prod deploy ayni anda birden fazla kez tetiklenmemeli.
- Prod deploy workflow'lari StorageBox uzerinde otomatik deploy lock kullanmalidir.
- Prod root infrastructure deploy should run manually or with approval.
- Prod deploy for the same service must not be triggered more than once at the same time.
- All prod deploy workflows are queued with the Gitea Actions `concurrency: group: prod-deploy` block; concurrent execution is prevented by Gitea.
## Ön Koşullar — StorageBox Sırları
## Prerequisites — StorageBox Secrets
Deploy pipeline çalışmadan önce aşağıdaki dosyaların StorageBox'ta mevcut olması gerekir. Bu dosyalar otomatik oluşturulmaz; ilk kurulumda elle oluşturulur.
Before the deploy pipeline runs, the following files must exist on StorageBox. These files are not created automatically; they are created manually during the initial setup.
### SWAG / GoDaddy Kimlik Bilgileri
### SWAG / GoDaddy Credentials
```
prod/secrets/iklim.co/.env.secrets.swag
@ -81,111 +81,596 @@ GODADDY_KEY=<api-key>
GODADDY_SECRET=<api-secret>
```
GoDaddy API anahtarı için: https://developer.godaddy.com/keys — **Production** key oluştur. Mevcut bir anahtarın herhangi bir chat, Slack veya e-postada paylaşıldığı biliniyorsa kullanmadan önce iptal et ve yenisini oluştur.
For the GoDaddy API key: https://developer.godaddy.com/keys — create a **Production** key. If an existing key is known to have been shared in any chat, Slack, or email, revoke it before use and create a new one.
> `.env.secrets.swag` yalnızca SWAG/GoDaddy kimlik bilgilerini içerir.
> `.env.secrets.shared` AppRole ID'leri, DB şifreleri ve diğer çalışma zamanı sırlarını içerir — bu iki dosyayı karıştırma.
> `.env.secrets.swag` contains only SWAG/GoDaddy credentials.
> `.env.secrets.shared` contains AppRole IDs, DB passwords, and other runtime secrets — do not mix these two files.
### Gitea PROD_FLOATING_IP Değişkeni
### Gitea `PROD_FLOATING_IP` Variable
DNS otomasyonu için `PROD_FLOATING_IP` Gitea project variable olarak tanımlanmış olmalıdır. `06-prod-terraform-iaac.md` → "Gitea Değişkeni: PROD_FLOATING_IP" adımına bak.
For DNS automation, `PROD_FLOATING_IP` must be defined as a Gitea project variable. See the "Gitea Variable: PROD_FLOATING_IP" step in `06-prod-terraform-iaac.md`.
## StorageBox Deploy Lock Modeli
### Docker Secrets
Prod'da 3 runner oldugu icin deploy lock zorunlu kabul edilir. Lock lokal dosya
sisteminde tutulmayacak; cunku runner'lar farkli makinelerde calisir ve birbirlerinin
`/tmp` veya `/var/lock` dizinlerini gormez.
Lock konumu StorageBox olacaktir:
```text
prod/locks/prod-deploy.lock
prod/locks/prod-infra.lock
prod/locks/services/<service-name>.lock
```
Baslangic modeli:
```text
prod/locks/prod-deploy.lock
```
Bu tek global lock tum prod deploy'lari siraya sokar ve en az karmasik modeldir.
Ileride deploy sureleri uzarsa servis bazli lock'a gecilebilir.
Lock dosyasi/klasoru manuel olusturulmaz. Workflow basinda atomik `mkdir` ile lock
alinir, workflow sonunda `rmdir` ile lock birakilir.
Ornek:
Before the infra stack is deployed, the following Docker secrets must be created on `iklim-app-01`. These secrets are referenced by `docker-stack-infra.prod.yml`; if they do not exist, stack deploy fails.
```bash
LOCK_DIR="prod/locks/prod-deploy.lock"
LOCK_META="owner.txt"
# Redis password, used by Redis master, replica, and sentinel:
openssl rand -hex 32 | docker secret create redis_password -
ssh "$STORAGEBOX_SSH" "mkdir -p prod/locks && mkdir '$LOCK_DIR'"
ssh "$STORAGEBOX_SSH" "printf '%s\n' 'runner=${GITEA_RUNNER_NAME:-unknown}' 'run=${GITHUB_RUN_ID:-unknown}' 'created_at=$(date -u +%FT%TZ)' > '$LOCK_DIR/$LOCK_META'"
# deploy islemleri
ssh "$STORAGEBOX_SSH" "rm -f '$LOCK_DIR/$LOCK_META' && rmdir '$LOCK_DIR'"
# RabbitMQ Erlang cluster cookie; must be the same on all RabbitMQ nodes:
openssl rand -hex 32 | docker secret create rabbitmq_erlang_cookie -
```
Davranis:
> The `vault_unseal_key` secret is created after Vault is started for the first time; see `roadmap/prod-env/07-vault-raft-plan.md` Step 3. It is not required for the first infra stack deploy; it is waited for until the health check is triggered.
>
> This secret is also used during Vault restarts triggered by cert-reloader: when `cert-reloader` detects a certificate change, it runs `docker service update --force iklimco_vault`; while Vault containers restart, they read from the `vault_unseal_key` Docker secret and automatically unseal. If the secret is missing, Vault remains sealed after every certificate renewal.
- `mkdir '$LOCK_DIR'` basariliysa lock alinmistir.
- `mkdir '$LOCK_DIR'` fail olursa baska deploy calisiyor kabul edilir.
- Job fail olsa bile cleanup adimi `rm/rmdir` calistirmalidir.
- Stale lock temizligi manuel/onayli olmalidir; otomatik zorla silme ilk asamada uygulanmamalidir.
Verify secrets:
Lock seviyesi:
```bash
docker secret ls
# redis_password and rabbitmq_erlang_cookie rows must appear
```
| Lock | Ne icin |
| --- | --- |
| `prod/locks/prod-deploy.lock` | Ilk asama: tum prod deploy'lar icin global lock |
| `prod/locks/prod-infra.lock` | Ileride root infra deploy'u mikroservis deploy'larindan ayirmak icin |
| `prod/locks/services/<service-name>.lock` | Ileride servis bazli paralel deploy'a gecmek icin |
### SWAG Nginx Configuration Templates
## Swarm Servis Dagilimi
Before the deploy pipeline runs, the following template files must exist in the repo:
Prod'da 3 app node da manager + app worker oldugu icin servisler 3 node'a dagitilabilir.
- `swag/site-confs/default.conf`
- `swag/site-confs/api.conf.tpl`
- `swag/site-confs/apigw.conf.tpl`
- `swag/site-confs/rabbitmq.conf.tpl`
- `swag/site-confs/grafana.conf.tpl`
### Mikroservisler
These files are created in the test environment (`test-env/04-swag-nginx-configs.md`); they are not created separately for prod. Template files are shared by both environments; prod-specific values are injected with environment variables during deploy.
Her mikroservisin iki stack dosyasi vardir:
Verify that the `prod/secrets/iklim.co/.env.prod` file on StorageBox contains the following variables:
| Dosya | Icerik | Ortam |
```bash
API_SUBDOMAIN=api.iklim.co
APIGW_SUBDOMAIN=apigw.iklim.co
RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co
GRAFANA_SUBDOMAIN=grafana.iklim.co
RESTRICTED_IPS="78.187.87.109/32,95.70.151.248/32"
SWAG_CERT_DIR=/mnt/storagebox/ssl
SWAG_CONFIG_DIR=/mnt/storagebox/swag/config
SWAG_SITE_CONFS_DIR=/mnt/storagebox/swag/site-confs
```
The pipeline sources these variables and renders the template files into the `$SWAG_SITE_CONFS_DIR` (`/mnt/storagebox/swag/site-confs`) directory. Because StorageBox is mounted commonly on all app nodes, even if the configuration is created on a single runner, SWAG containers on other nodes access the same files. Detail: `roadmap/prod-env/04-swag-nginx-configs.md`.
### APISIX Configuration
The following prerequisites must be satisfied before deploy.
#### init.sh SSL Block
The `ssls/1` PUT block and the `dev` SSL block inside `init/apisix-core/init.sh` must be removed. This change is made in the test environment (`test-env/05-apisix-remove-ssl.md`); the same `init.sh` file is also used in prod, so no separate change is required for prod.
#### Custom APISIX Image
The prod stack uses the `registry.tarla.io/iklimco/custom-apisix:3.12.0` image. This image's `config.yaml` must contain real IP header configuration for the overlay CIDR:
```yaml
nginx_config:
http:
real_ip_header: "X-Real-IP"
set_real_ip_from: "10.0.0.0/8"
```
`set_real_ip_from: 10.0.0.0/8` covers all container addresses in the Swarm overlay network; this skips SWAG's internal overlay IP and writes the real client IP to APISIX access logs.
If the image requires a rebuild because `config.yaml` changed:
```bash
docker build -t registry.tarla.io/iklimco/custom-apisix:3.12.0 .
docker push registry.tarla.io/iklimco/custom-apisix:3.12.0
```
During deploy, `init/apisix-core/init.sh` is run once by the pipeline. It writes the APISIX configuration to Patroni etcd with the `/apisix` prefix; the 3 replicas in prod read this etcd state commonly, so no separate init per replica is required. Detail: `roadmap/prod-env/05-apisix-remove-ssl.md`.
## Deploy Serialization with Gitea Concurrency
Because 3 runners run in prod, more than one deploy job can be triggered at the same time. Instead of a StorageBox-based `mkdir/rmdir` lock mechanism, the Gitea Actions `concurrency` feature is used.
Add the following block to the pipeline file (`deploy-prod.yml`):
```yaml
concurrency:
group: prod-deploy
cancel-in-progress: false
```
With `cancel-in-progress: false`, a new run in the same group is queued by Gitea until the previous one finishes. It appears as "queued" in the UI and is not shown as an error. There is no stale lock risk: even if the runner crashes or the job is canceled, Gitea handles state management.
All prod deploy workflows, including infra and microservices, must use the same `group: prod-deploy` value so infra deploy and microservice deploy cannot overlap.
## Deploy Pipeline
`.gitea/workflows/deploy-prod.yml` is the full step order of the prod deploy pipeline. Steps marked with `*` are prod-specific and do not exist in the test pipeline.
| # | Step | Note |
| --- | --- | --- |
| `BE-<Servis>/docker-stack-service.yml` | Base tanimlar, `replicas: 1` | Test + Prod |
| `BE-<Servis>/docker-stack-service.prod.yml` | `replicas: 3`, `max_replicas_per_node: 1` | Yalnizca Prod |
| 1 | Checkout Branch | |
| 2 | Prepare Folders | |
| 3 | Set up SSH Key and Add to known_hosts | |
| 4 | Update Apt Repository and Install Required Tools | `gettext tree jq``jq` is required for the GoDaddy DNS API |
| 5 | Fetch Service Secret Files | Fetch `.env.secrets.*` from StorageBox |
| 6 | Initialize Workspace | Fetch `.env` and `.env.secrets.shared` from StorageBox; run `init-base.sh` |
| 7 | Upload Updated Secrets to Storagebox | |
| 8 | Provision Vault AppRole IDs and Docker Secrets | |
| 9 | Upload Updated Env to Storagebox | |
| 10 | Prepare Init Files | Cert copy lines removed |
| 11 | Initialize Docker Swarm | |
| 12 | Docker Login to Harbor | |
| 13 | **Update DNS Records** * | GoDaddy API; `api/apigw/rabbitmq/grafana` A records; idempotent |
| 14 | **Prepare SWAG Directories** * | `$SWAG_CONFIG_DIR/dns-conf`; renders nginx conf templates; reloads running SWAG |
| 15 | Bootstrap Vault TLS Placeholder | |
| 16 | Deploy Swarm Stack | base + prod overlay together |
| 17 | **Wait for etcd** * | Waits until Patroni etcd (`iklim-db-01:2379`) is healthy |
| 18 | **Run APISIX Init** * | `SPRING_PROFILES_ACTIVE=prod`; idempotent; writes to etcd |
| 19 | **Bootstrap SWAG Certificate** * | Waits for SWAG to obtain the cert; copies it to `SWAG_CERT_DIR` |
| 20 | **Run Database Init Scripts** * | `postgresql`/`mongodb` Swarm VIP; SQL+JS init; idempotent |
| 21 | Review Environment | |
Prod deploy komutu:
### Removal of Cert Scp Lines
Lines removed from the `Initialize Workspace` step:
```yaml
# REMOVED — manual cert copy with scp is no longer required:
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co.full.crt ./STAR.iklim.co.full.crt
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co_key.pem ./STAR.iklim.co_key.pem
```
Line also removed from the `Prepare Init Files` step:
```yaml
# REMOVED:
sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.pem /opt/iklimco/ssl/
```
The certificate is now obtained by SWAG from Let's Encrypt and written to the `SWAG_CERT_DIR` (`/mnt/storagebox/ssl/`) directory in the `Bootstrap SWAG Certificate` step. Later renewals are handled automatically by cert-reloader.
### Bootstrap SWAG Certificate (Step 19)
On the first deploy, SWAG obtains the Let's Encrypt certificate with the GoDaddy DNS-01 challenge. This step waits for SWAG to obtain the certificate, for up to 10 minutes, and then copies it to the `SWAG_CERT_DIR` directory:
```yaml
- name: Bootstrap SWAG Certificate
run: |
set -a; . ./.env; set +a
echo "Waiting for SWAG container to start..."
SWAG_CTR=""
for i in $(seq 1 24); do
SWAG_CTR=$(docker ps -q -f name=iklimco_swag 2>/dev/null | head -1)
[ -n "$SWAG_CTR" ] && break
sleep 10
done
if [ -z "$SWAG_CTR" ]; then
echo "❌ SWAG container did not start"
exit 1
fi
CERT_PATH="/config/etc/letsencrypt/live/iklim.co/fullchain.pem"
echo "Waiting for cert (up to 10 min)..."
for i in $(seq 1 20); do
if docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "✅ Cert obtained"
break
fi
echo " attempt $i/20 — waiting 30s..."
sleep 30
done
if ! docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "❌ SWAG did not obtain cert. Logs:"
docker service logs iklimco_swag --tail 50
exit 1
fi
docker exec "$SWAG_CTR" cat "$CERT_PATH" | \
docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
sh -c "cat > /output/STAR.iklim.co.full.crt && chmod 644 /output/STAR.iklim.co.full.crt"
docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \
docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
sh -c "cat > /output/STAR.iklim.co_key.pem && chmod 644 /output/STAR.iklim.co_key.pem"
echo "✅ Cert bootstrapped to ${SWAG_CERT_DIR}/"
working-directory: /workspace/iklim.co
```
After this step, certificate files exist inside `SWAG_CERT_DIR` (`/mnt/storagebox/ssl/`); Vault TLS reads these files. Later renewals are handled automatically by cert-reloader. When the pipeline runs again, this step only waits for the SWAG container to be ready; certificate issuance is managed by SWAG/cert-reloader within Let's Encrypt's 90-day cycle.
### Run Database Init Scripts (Step 20)
PostgreSQL and MongoDB init scripts run through Swarm overlay DNS service names (`postgresql`, `mongodb`):
```yaml
- name: Run Database Init Scripts
run: |
set -a; . ./.env; . ./.env.secrets.shared; set +a
echo "⏳ Waiting for PostgreSQL..."
until docker run --rm --network iklimco-net \
-e PGPASSWORD="${DATABASE_POSTGRES_ROOT_PASSWD}" \
postgis/postgis:17-3.5 \
pg_isready -h postgresql -U "${DATABASE_POSTGRES_ROOT_USER}" -q 2>/dev/null; do
sleep 5
done
for sql_file in $(ls ./init/postgresql/*.sql 2>/dev/null | sort); do
echo "▶ $(basename "$sql_file")"
docker run --rm -i --network iklimco-net \
-e PGPASSWORD="${DATABASE_POSTGRES_ROOT_PASSWD}" \
postgis/postgis:17-3.5 \
psql -h postgresql -U "${DATABASE_POSTGRES_ROOT_USER}" < "$sql_file"
done
echo "⏳ Waiting for MongoDB..."
until docker run --rm --network iklimco-net mongo:8 \
mongosh "mongodb://${DATABASE_MONGODB_ROOT_USER}:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb/admin" \
--eval "db.runCommand({ping:1})" --quiet 2>/dev/null; do
sleep 5
done
for js_file in $(ls ./init/mongodb/*.js 2>/dev/null | sort); do
echo "▶ $(basename "$js_file")"
docker run --rm -i --network iklimco-net mongo:8 \
mongosh "mongodb://${DATABASE_MONGODB_ROOT_USER}:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb/admin" \
--quiet < "$js_file"
done
echo "✅ Database init scripts completed"
working-directory: /workspace/iklim.co
```
- `postgresql` and `mongodb`: Swarm VIP service names, resolved on the `iklimco-net` overlay; Patroni primary automatic routing happens at VIP level
- SQL files `./init/postgresql/*.sql` and JS files `./init/mongodb/*.js` are created in the `Prepare Init Files` step by the `init_postgresql`/`init_mongodb` functions in `common-functions.sh`
- Idempotent: `CREATE IF NOT EXISTS` / `createCollection` semantics; runs safely again on later deploys
## Swarm Service Distribution
In prod, all 3 app nodes are manager + app worker, so services can be distributed across 3 nodes.
### Microservices
Each microservice has two stack files:
| File | Content | Environment |
| --- | --- | --- |
| `BE-<Service>/docker-stack-service.yml` | Base definitions, `replicas: 1` | Test + Prod |
| `BE-<Service>/docker-stack-service.prod.yml` | `replicas: 3`, `max_replicas_per_node: 1` | Prod only |
Prod deploy command:
```bash
docker stack deploy \
-c BE-<Servis>/docker-stack-service.yml \
-c BE-<Servis>/docker-stack-service.prod.yml \
-c BE-<Service>/docker-stack-service.yml \
-c BE-<Service>/docker-stack-service.prod.yml \
iklimco
```
`max_replicas_per_node: 1` zorunludur; bu olmadan Swarm node sayisi < replica sayisina dustugunde ayni node'a birden fazla replica yerlestirir.
`max_replicas_per_node: 1` is mandatory; without it, when the Swarm node count is lower than the replica count, Swarm places more than one replica on the same node.
### Infra Servisleri
### Infra Services
`docker-stack-infra.yml` (base) ile `docker-stack-infra.prod.yml` (overlay) birlikte deploy edilir. Overlay; Vault, APISIX, RabbitMQ, Redis Sentinel gibi servisleri `replicas: 3` ve `max_replicas_per_node: 1` ile override eder. Detay: `Environment_Infrastructure/roadmap/prod-env/03-infra-stack-changes.md`.
`docker-stack-infra.yml` (base) and `docker-stack-infra.prod.yml` (overlay) are deployed together. The overlay overrides services such as Vault, APISIX, RabbitMQ, and Redis Sentinel with `replicas: 3` and `max_replicas_per_node: 1`. Detail: `Environment_Infrastructure/roadmap/prod-env/03-infra-stack-changes.md`.
## Gateway ve Public Trafik
#### cert-reloader and Vault Auto-Unseal
Public internet sadece `80/tcp` ve `443/tcp` ile SWAG uzerinden girer. SWAG `iklim-app-01`'e sabitlenmistir (Floating IP bu node'da). APISIX admin portlari (`9180`) ve diger servis portlari public acilmaz; SWAG reverse proxy olarak tum public trafigi APISIX'e iletir. Detay: `Environment_Infrastructure/roadmap/prod-env/04-swag-nginx-configs.md`.
The `cert-reloader` sidecar service runs as `replicas: 1` inside the infra stack. It detects the Let's Encrypt certificate renewed by SWAG and distributes it to Vault. Because prod uses the shared StorageBox mount, SSH-based distribution is not required.
## Kabul Kriterleri
Certificate renewal flow:
- 3 prod runner Gitea UI'da online gorunur.
- Her runner `prod-runner` label'ina sahiptir.
- Runner'lardan herhangi biri basit Docker komutu calistirabilir.
- `docker node ls` 3 manager gosterir.
- Bir runner/node kapatildiginda diger runner yeni job alabilir.
- Prod workflow'lari StorageBox uzerindeki `prod/locks/prod-deploy.lock` global lock'unu kullanir.
- Lock manuel degil, workflow tarafindan `mkdir/rmdir` ile otomatik yonetilir.
- Public ingress sadece `22`, `80`, `443` ile sinirlidir.
- StorageBox'ta `prod/secrets/iklim.co/.env.secrets.swag` mevcuttur ve geçerli GoDaddy kimlik bilgilerini içerir.
- Gitea'da `PROD_FLOATING_IP` project variable tanımlıdır.
```
SWAG renews the certificate -> writes it to SWAG_CONFIG_DIR (/mnt/storagebox/swag/config)
cert-reloader detects the MD5 change
-> copies it to /mnt/storagebox/ssl/ directory (common mount on all app nodes)
-> runs docker service update --force iklimco_vault
Vault (3 replicas) restarts
-> each instance reads the new certificate from the /mnt/storagebox/ssl/ mount
-> healthcheck checks sealed status every 30 seconds
-> if sealed: reads from the vault_unseal_key Docker secret and automatically unseals
```
The auto-unseal mechanism is provided by the Vault healthcheck inside `docker-stack-infra.yml`:
```yaml
healthcheck:
test:
- "CMD"
- "sh"
- "-c"
- >-
vault status -format=json 2>/dev/null | grep -q '"sealed":false' ||
vault operator unseal $$(cat /run/secrets/vault_unseal_key 2>/dev/null)
interval: 30s
timeout: 10s
start_period: 15s
retries: 5
```
The 3 replicas run their own healthchecks independently; all of them unseal separately. The certificate renewal -> restart -> auto-unseal chain requires no manual intervention. Detail: `roadmap/prod-env/06-cert-reloader.md`.
#### Vault Raft Configuration
Vault is defined as 3 replicas with Raft storage in the `docker-stack-infra.prod.yml` overlay:
```yaml
vault:
environment:
VAULT_LOCAL_CONFIG: >-
{"api_addr":"https://vault.iklim.co:8200",
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
"listener":[{"tcp":{"address":"0.0.0.0:8200",
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
"tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}],
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file # separate host path on each node — created with Ansible
- ${SWAG_CERT_DIR}:/vault/certs:ro # StorageBox shared — all nodes see the same path
deploy:
mode: replicated
replicas: 3
placement:
max_replicas_per_node: 1
constraints:
- node.labels.type == service
```
`{{ .Node.Hostname }}` is a Docker Swarm Go template; it gives each Vault instance a unique `node_id` and `cluster_addr`. Because `/opt/iklimco/vault/data` is a host path volume, it is not an overlay volume; it must be created separately on each app node during Ansible bootstrap. See `07-prod-ansible-bootstrap.md` — Node Directory Role. Detail: `roadmap/prod-env/07-vault-raft-plan.md`.
## Vault Raft Cluster Initial Setup
After the infra stack is deployed for the first time, the Vault Raft cluster is initialized manually once. These steps are not repeated on every deploy; they are applied only during initial setup.
### Step 1 — Stack Deploy
```bash
docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
```
3 Vault containers start. The first initialized node becomes the leader.
### Step 2 — Vault Initialize (iklim-app-01)
```bash
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
docker exec -it "$VAULT_CTR" vault operator init
```
Store the unseal keys and root token from the output securely. Save the unseal key as a Docker secret:
```bash
echo -n "<unseal-key>" | docker secret create vault_unseal_key -
```
> After this step, the `vault_unseal_key` secret exists. During later certificate renewals, cert-reloader restarts Vault; the healthcheck reads this secret and automatically unseals, so no manual intervention is required.
### Step 3 — Unseal the Leader
```bash
docker exec -it "$VAULT_CTR" vault operator unseal
```
### Step 4 — Join the Other Nodes to the Raft Cluster
The Vault containers on `iklim-app-02` and `iklim-app-03` join the cluster:
```bash
docker exec -it <vault-on-iklim-app-02> vault operator raft join \
https://vault.iklim.co:8200
docker exec -it <vault-on-iklim-app-03> vault operator raft join \
https://vault.iklim.co:8200
```
Each node is also unsealed after it joins:
```bash
docker exec -it <vault-on-iklim-app-02> vault operator unseal
docker exec -it <vault-on-iklim-app-03> vault operator unseal
```
### Step 5 — Verify the Cluster
```bash
docker exec "$VAULT_CTR" vault operator raft list-peers
```
Expected: 3 peers — one `leader`, two `follower`.
## Gateway and Public Traffic
Public internet enters only through SWAG on `80/tcp` and `443/tcp`. SWAG is pinned to `iklim-app-01`, where the Floating IP is located. APISIX admin ports (`9180`) and other service ports are not opened publicly; SWAG forwards all public traffic to APISIX as a reverse proxy.
### Subdomain Routing
| Subdomain | Target Service | Restriction |
| --- | --- | --- |
| `api.iklim.co` | APISIX `:9080` | Public |
| `apigw.iklim.co` | APISIX Dashboard `:9000` | IP restricted with `RESTRICTED_IPS` |
| `rabbitmq.iklim.co` | RabbitMQ Management `:15672` | IP restricted with `RESTRICTED_IPS` |
| `grafana.iklim.co` | Grafana `:3000` | IP restricted with `RESTRICTED_IPS` |
IP restriction is done with the `RESTRICTED_IPS_BLOCK` nginx allow block derived from the `RESTRICTED_IPS` variable; it is applied in SWAG nginx configuration, not in the Hetzner firewall.
### SWAG -> APISIX Load Distribution
SWAG connects to APISIX through the Docker Swarm service name with `proxy_pass http://apisix:9080;`. Swarm resolves the `apisix` service name to a VIP (Virtual IP); the IPVS load balancer distributes incoming connections round-robin across the 3 replicas in prod. No additional upstream or load balancer configuration is required on the SWAG side; load distribution happens transparently at the overlay network layer.
`Prometheus` is intentionally not exposed externally through SWAG. Access uses Grafana, whose internal connection is `http://prometheus:9090`, or an SSH tunnel.
Detay: `Environment_Infrastructure/roadmap/prod-env/04-swag-nginx-configs.md`.
## Post-Deploy Verification
After a successful prod pipeline deploy, run the following checks.
### Swarm Health
```bash
docker node ls
```
Expected: 3 managers (`Leader` + 2 `Reachable`) — `iklim-app-01/02/03`; 3 workers (`Ready`) — `iklim-db-01/02/03`.
```bash
docker service ls --filter label=project=co.iklim
```
All services must show `REPLICAS X/X`; target met.
### Precipitation Image Directory
```bash
ls -ld /mnt/storagebox/precipitation/images
```
The directory must exist; it is required before `iklimco_precipitation-service` is deployed.
```bash
docker volume inspect iklimco_image-data
```
Expected: `Options.device` -> `/mnt/storagebox/precipitation/images`.
### SWAG Certificate
```bash
docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates
```
Expected: `*.iklim.co`, `VALID: XX days` (Let's Encrypt — not the old manual cert).
TLS check from outside:
```bash
echo | openssl s_client -connect api.iklim.co:443 -servername api.iklim.co 2>/dev/null \
| openssl x509 -noout -subject -dates
```
Expected: `CN=*.iklim.co`, `notAfter > 2026-07-15`.
> Warning: The old manual `*.iklim.co` certificate expires on **2026-07-15**. After SWAG's Let's Encrypt certificate is verified for the first time, the old cert on StorageBox can be archived and is no longer used.
### Public API Access
```bash
curl -si https://api.iklim.co/health
```
It must return HTTP 2xx; there must be no TLS error.
### IP Restriction
From a disallowed IP:
```bash
curl -si https://grafana.iklim.co
curl -si https://apigw.iklim.co
curl -si https://rabbitmq.iklim.co
```
All must return HTTP 403.
From an allowed IP (78.187.87.109 or 95.70.151.248):
```bash
curl -si https://grafana.iklim.co # HTTP 200 Grafana
curl -si https://apigw.iklim.co # HTTP 200 APISIX Dashboard
curl -si https://rabbitmq.iklim.co # HTTP 200 RabbitMQ Management
```
### Vault Access Control
Must not be reachable from outside:
```bash
# Expected: connection refused or timeout
curl -sk --connect-timeout 5 https://<iklim-app-01-public-ip>:8200/v1/sys/health
```
Must be reachable from inside the overlay:
```bash
# Expected: {"sealed":false,...}
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
curl -sk https://vault.iklim.co:8200/v1/sys/health
```
### No Unexpected Ports
```bash
docker service ls --format "{{.Name}}\t{{.Ports}}" \
--filter label=project=co.iklim
```
Only `iklimco_swag` -> `*:80->80/tcp, *:443->443/tcp` should publish ports; other services must not publish ports.
### APISIX Replica Distribution
```bash
docker service ps iklimco_apisix
```
Expected: 3 tasks, all `Running`, on different nodes.
### fail2ban (SWAG Container)
```bash
docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status
```
Expected: a list with more than one jail.
### Microservice Health (After Microservices Are Deployed)
After microservices are deployed with a separate pipeline:
```bash
curl -si "https://api.iklim.co/v1/weather/current?lat=39&lon=35"
```
Expected: valid JSON weather response.
## Acceptance Criteria
- 3 prod runners appear online in the Gitea UI.
- Every runner has the `prod-runner` label.
- Any runner can run a simple Docker command.
- `docker node ls` shows 3 managers.
- When one runner/node is shut down, another runner can pick up a new job.
- All prod deploy workflows (`concurrency: group: prod-deploy`) are queued by Gitea; concurrent execution is prevented.
- Public ingress is limited to only `22`, `80`, and `443`.
- `prod/secrets/iklim.co/.env.secrets.swag` exists on StorageBox and contains valid GoDaddy credentials.
- `PROD_FLOATING_IP` project variable is defined in Gitea.
- `redis_password` and `rabbitmq_erlang_cookie` appear in `docker secret ls`.
- The `ssl`, `swag/config`, `swag/site-confs`, `grafana/data`, `prometheus/data`, and `precipitation/images` directories exist on StorageBox; see `07-prod-ansible-bootstrap.md` — StorageBox Directory Structure.
- The `swag/site-confs/default.conf`, `api.conf.tpl`, `apigw.conf.tpl`, `rabbitmq.conf.tpl`, and `grafana.conf.tpl` template files exist in the repo.
- StorageBox `prod/secrets/iklim.co/.env.prod` has correct values for `API_SUBDOMAIN`, `APIGW_SUBDOMAIN`, `RABBITMQ_SUBDOMAIN`, `GRAFANA_SUBDOMAIN`, `RESTRICTED_IPS`, `SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, and `SWAG_SITE_CONFS_DIR`.
- After the first deploy, `docker exec $(docker ps -q -f name=iklimco_swag) nginx -t` succeeds and returns `syntax is ok`.
- The output of `cat /mnt/storagebox/swag/site-confs/api.conf | grep server_name` contains `server_name api.iklim.co;`.
- The `ssls/1` PUT block does not exist inside `init/apisix-core/init.sh`.
- The `registry.tarla.io/iklimco/custom-apisix:3.12.0` image exists in Harbor and its `config.yaml` contains `set_real_ip_from: 10.0.0.0/8` configuration.
- After the first deploy, real client IP appears in APISIX access logs, not the SWAG overlay IP: `docker exec $(docker ps -q -f name=iklimco_apisix | head -1) tail -5 /usr/local/apisix/logs/access.log`
- `docker service ps iklimco_cert-reloader` shows that the service is running.
- The output of `docker service logs iklimco_cert-reloader --tail 20` contains `[cert-reloader] started` and has no error lines.
- The `notAfter` date of the Vault TLS endpoint certificate matches `/mnt/storagebox/ssl/STAR.iklim.co.full.crt`: `docker exec $(docker ps -q -f name=iklimco_vault | head -1) sh -c 'echo | openssl s_client -connect vault.iklim.co:8200 2>/dev/null | openssl x509 -noout -dates'`
- `vault operator raft list-peers` returns 3 peers: 1 leader, 2 followers.
- The `vault_unseal_key` Docker secret exists and appears in `docker secret ls`.
- 3 Vault containers are not sealed: `docker exec $(docker ps -q -f name=iklimco_vault | head -1) vault status | grep Sealed` -> `Sealed false`.
- The first deploy pipeline successfully completes all 21 steps; the `Review Environment` step succeeds.
- After the `Bootstrap SWAG Certificate` step, `ls /mnt/storagebox/ssl/` -> `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.pem` exist.
- The `Run Database Init Scripts` step completes without error; PostgreSQL and MongoDB are healthy and init scripts are applied.
- In the output of `docker service ls --filter label=project=co.iklim`, all infra services show `X/X`.
- `docker volume inspect iklimco_image-data``Options.device=/mnt/storagebox/precipitation/images`.
- `docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates` -> `*.iklim.co` Let's Encrypt certificate is valid; it is not the old manual cert.
- `echo | openssl s_client -connect api.iklim.co:443 2>/dev/null | openssl x509 -noout -subject -dates``CN=*.iklim.co`, `notAfter > 2026-07-15`.
- `curl -si https://api.iklim.co/health` -> HTTP 2xx; no TLS error.
- `https://grafana.iklim.co`, `https://apigw.iklim.co`, `https://rabbitmq.iklim.co` — returns HTTP 403 from a disallowed IP and HTTP 200 from an allowed IP.
- `curl --connect-timeout 5 https://<public-ip>:8200` -> connection refused or timeout; Vault is not reachable from outside.
- `docker exec $(docker ps -q -f name=iklimco_apisix | head -1) curl -sk https://vault.iklim.co:8200/v1/sys/health` -> `{"sealed":false,...}`; reachable from inside the overlay.
- `docker service ls --format "{{.Name}}\t{{.Ports}}" --filter label=project=co.iklim` -> only `iklimco_swag` publishes ports.
- `docker service ps iklimco_apisix` -> 3 tasks, `Running`, on different nodes.
- `docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status` -> more than one jail appears.