docs(db): implement direct cluster access strategy for production

- Updated roadmap (03-infra-stack-changes.md) to deprecate database proxies in prod.
- Detailed direct subnet access via WireGuard for production developers.
- Provided multi-host connection parameters for Patroni and MongoDB Replica Sets in setup guide (08-prod-db-cluster-kurulum.md).
- Added environment comparison table to developer access guide.
This commit is contained in:
Murat ÖZDEMİR 2026-05-18 14:25:26 +03:00
parent 2202e92c2c
commit 8780c7c05e
18 changed files with 1546 additions and 1218 deletions

294
README.md
View File

@ -1,268 +1,64 @@
# 🌍 Sunucu Ortam ve Altyapıları # 🌍 iklim.co Altyapı ve Sunucu Yönetimi
`iklim.co` test ve prod ortamları için Hetzner Cloud üzerinde Infrastructure-as-code ve operasyonel runbook deposu. Bu depo, `iklim.co` projesinin **test** ve **production** ortamlarını kurmak, yönetmek ve modernize etmek için gerekli olan Infrastructure-as-Code (IaC) varlıklarını, teknik rehberleri ve operasyonel standartları barındırır.
Bu depo şunları kapsar: Altyapı yönetimi; Hetzner Cloud üzerinde Terraform ile kaynak provisioning, Ansible ile işletim sistemi yapılandırması ve Docker Swarm üzerinde mikroservis mimarisinin kurgulanması süreçlerini kapsar.
- 🧱 Hetzner altyapısı için Terraform kaynakları (`test` ve `prod`)
- 🤖 Ansible bootstrap playbook'ları, paylaşımlı roller ve envanter hedefleri
- 📚 Uçtan uca kurulum rehberleri ve yol haritası dokümanları
- 📊 Boyutlandırma/maliyet analizi ve referans kaynakları
## 🎯 Kapsam ---
Temel amaç, sorumluluk sınırları net biçimde tanımlanmış, standart ve belgelenmiş bir altyapı provisioning süreci oluşturmaktır: ## 📂 Depo Yapısı ve Temel Bölümler
- 🧱 **Terraform**: bulut altyapısını oluşturur (sunucular, private ağlar, firewall'lar, placement group'lar, floating IP'ler, SSH key kaydı, envanter çıktısı) Bu depodaki dökümantasyon ve kod varlıkları beş ana kategoriye ayrılmıştır:
- 🤖 **Ansible**: OS hazırlığı, güvenlik sertleştirme, Docker/Swarm, runner kurulumu ve StorageBox mount süreçlerini depodaki playbook ve paylaşımlı roller aracılığıyla yönetir
- 🚀 **Uygulama/stack dağıtımı**: yol haritası dokümanlarında referans verilen ilgili deployment workflow'ları ve stack manifest'leri tarafından yönetilir
## 📌 Mevcut Depo Durumu ### 1. 🛣️ Roadmap (`roadmap/`)
Ortamların (test ve prod) sıfırdan kurulması veya mevcut yapının güncellenmesi için gerekli olan **iş gereksinimlerini, teknik hedefleri ve adım adım uygulama planlarını** içerir.
- Altyapıda yapılacak büyük değişikliklerin (örn: Redis Sentinel geçişi, APISIX konfigürasyonu, RabbitMQ Quorum Queues) stratejik dökümantasyonudur.
- [roadmap/test-env/](./roadmap/test-env/) - Test ortamı gereksinimleri ve planları.
- [roadmap/prod-env/](./roadmap/prod-env/) - Üretim ortamı HA (High Availability) ve güvenilirlik planları.
Bu depo şu an ağırlıklı olarak şunları içermektedir: ### 2. 🛠️ Setup (`setup/`)
Altyapının fiziksel olarak ayağa kaldırılması için kullanılan **uygulama dökümanlarıdır**. Bu bölüm şunları yönetmek için kullanılır:
- **Terraform:** Bulut kaynaklarının (Server, Network, Firewall) üretilmesi.
- **Ansible:** İşletim sistemi hazırlığı, güvenlik sertleştirme (hardening), Docker/Swarm kurulumu.
- **CI/CD:** Deployment workflow'larının (Gitea Actions) ve stack manifest'lerinin oluşturulması/güncellenmesi.
- Örn: [setup/06-prod-terraform-iaac.md](./setup/06-prod-terraform-iaac.md), [setup/07-prod-ansible-bootstrap.md](./setup/07-prod-ansible-bootstrap.md)
- 🧱 Hazır Terraform kodu: ### 3. 🗺️ Setup vs Roadmap Matrisi (`setup-vs-roadmap-map.md`)
- `terraform/hetzner/test` İşterler doğrultusunda hazırlanan **Roadmap** dökümanları ile bu isterleri teknik olarak hayata geçiren **Setup** dökümanları arasındaki ilişkiyi açıklar.
- `terraform/hetzner/prod` - Hangi roadmap adımının hangi setup dökümanı ile uygulandığını gösteren bir eşleşme matrisidir.
- 🤖 Her iki ortam için Ansible otomasyon varlıkları: - [setup-vs-roadmap-map.md](./setup-vs-roadmap-map.md) dökümanından detaylara ulaşılabilir.
- `ansible/test/test-bootstrap.yml`
- `ansible/prod/prod-bootstrap.yml`
- `ansible/roles/*`
- `ansible/test/group_vars/*` ve `ansible/prod/group_vars/*`
- 📦 Envanter çıktıları ve hedef yollar:
- `ansible/test/inventory/generated/test.yml` (takip edilen örnek)
- `ansible/prod/inventory/generated/prod.yml` (beklenen çıktı yolu)
- 📘 Detaylı kurulum aşamaları:
- `setup/00-genel-yol-haritasi.md``setup/09-prod-runner-ha-ve-swarm.md`
- 🛣️ Ortam yol haritası adımları:
- `roadmap/test-env/*`
- `roadmap/prod-env/*`
- 📈 Kapasite planlama ve referans grafikler:
- `hetzner-sizing-report.md`
- `test-app-graphs.png`
- `test-db-graphs.png`
## 🧭 Hedef Ortam Topolojisi ### 4. 📊 Hetzner Sizing Report (`hetzner-sizing-report.md`)
İklim altyapı servisleri (API Gateway, Microservices, Databases, Broker) için seçilen **Hetzner sunucu tiplerini, CPU/RAM kapasitelerini ve maliyet/performans analizlerini** anlatır.
- Ortam kurulumundan önce kapasite planlaması için temel referans noktasıdır.
- [hetzner-sizing-report.md](./hetzner-sizing-report.md) dökümanını inceleyin.
### 🧪 Test ### 5. 💡 Facts (`facts/`)
Ortam kurulumları tamamlandıktan sonra ortaya çıkan, **sistemin o anki gerçek durumunu (source of truth) ve bilinmesi gereken kritik teknik detayları** barındıran dökümanlardır.
- "Sistem şu an nasıl çalışıyor?" sorusunun cevabıdır.
- [facts/firewall.md](./facts/firewall.md): Aktif firewall kuralları ve port matrisi.
- [facts/swarm-node-recovery-swag-failover.md](./facts/swarm-node-recovery-swag-failover.md): Node düşmesi durumunda manuel müdahale ve recovery prosedürleri.
| Node | Rol | Private IP | Önerilen Tip | ---
| --- | --- | --- | --- |
| `iklim-app-01` | Swarm manager + app worker + test runner | `10.10.10.11` | `cpx42` |
| `iklim-db-01` | DB host (manuel/stack tabanlı DB kurulum yolu) | `10.10.20.11` | `cpx42` |
### 🏭 Production
| Node | Rol | Private IP | Önerilen Tip |
| --- | --- | --- | --- |
| `iklim-app-01` | Swarm manager + app worker + runner (birincil FIP hedefi) | `10.20.10.11` | `cpx42` |
| `iklim-app-02` | Swarm manager + app worker + runner | `10.20.10.12` | `cpx42` |
| `iklim-app-03` | Swarm manager + app worker + runner | `10.20.10.13` | `cpx42` |
| `iklim-db-01` | DB cluster node | `10.20.20.11` | `cpx32` |
| `iklim-db-02` | DB cluster node | `10.20.20.12` | `cpx32` |
| `iklim-db-03` | DB cluster node | `10.20.20.13` | `cpx32` |
## 🔐 Güvenlik ve Ağ Temeli
Terraform ve kurulum dokümanlarına yansıtılan temel kararlar:
- Test ve prod, ayrı Hetzner Cloud proje ve token'larıyla birbirinden yalıtılmıştır.
- Kamuya açık gelen trafik şunlarla sınırlıdır:
- `22/tcp` (yalnızca admin CIDR'ları)
- `80/tcp`
- `443/tcp`
- Kritik servisler yalnızca private ağda erişilebilir (örneğin Vault `8200`, PostgreSQL `5432`, MongoDB `27017`, iç gözlemlenebilirlik ve broker portları).
- Host dağılımı stratejisi için placement group'lar kullanılmaktadır.
- Sunucu kaynaklarında yanlışlıkla silinmeye karşı `prevent_destroy = true` etkinleştirilmiştir.
- Terraform state ve gizli dosyalar commit'lenmemelidir.
Ayrıca bkz.:
- [[facts/firewall.md]] — tüm firewall kurallarının araç bazında özet dökümantasyonu
- [[setup/01-private-network-port-matrisi.md]]
- `terraform/hetzner/test/firewall.tf`
- `terraform/hetzner/prod/firewall.tf`
## 🗂️ Depo Yapısı
```text
Environment_Infrastructure/
├── ansible/
│ ├── prod/
│ │ ├── ansible.cfg
│ │ ├── group_vars/
│ │ └── prod-bootstrap.yml
│ ├── roles/
│ │ ├── base/
│ │ ├── docker/
│ │ ├── hardening/
│ │ ├── node_dirs/
│ │ ├── storagebox/
│ │ ├── storagebox_ssh_key/
│ │ └── swarm/
│ ├── test/
│ │ ├── ansible.cfg
│ │ ├── group_vars/
│ │ ├── inventory/
│ │ │ └── generated/
│ │ │ └── test.yml
│ │ └── test-bootstrap.yml
│ └── requirements.yml
├── roadmap/
│ ├── test-env/
│ └── prod-env/
├── setup/
│ ├── 00-genel-yol-haritasi.md
│ ├── 01-private-network-port-matrisi.md
│ ├── 02-test-terraform-iaac.md
│ ├── 03-test-ansible-bootstrap.md
│ ├── 04-test-db-docker-kurulum.md
│ ├── 05-test-runner-ve-deploy-onkosullari.md
│ ├── 06-prod-terraform-iaac.md
│ ├── 07-prod-ansible-bootstrap.md
│ ├── 08-prod-db-cluster-kurulum.md
│ └── 09-prod-runner-ha-ve-swarm.md
├── terraform/
│ └── hetzner/
│ ├── test/
│ └── prod/
├── facts/
│ └── firewall.md
├── hetzner-sizing-report.md
├── setup-vs-roadmap-map.md
├── test-app-graphs.png
└── test-db-graphs.png
```
## ✅ Ön Koşullar
- Terraform `>= 1.6`
- Hetzner Cloud hesabı ve ortam başına API token
- SSH anahtar çifti (public key yolu Terraform değişkenlerinde kullanılır)
- Linux/macOS kabuk araçları (`bash`, `cp`, `sed` veya tercih edilen metin editörü)
- İlerleyen aşamalarda gerekli: Ansible, Docker, Gitea/Harbor/StorageBox erişimi
## 🛠️ Terraform Kullanımı
### 1) 🧪 Test Altyapısı
```bash
cd terraform/hetzner/test
cp terraform.tfvars.example terraform.tfvars
```
`terraform.tfvars` değerlerini düzenle:
- `hcloud_token`
- `admin_allowed_cidrs`
- isteğe bağlı geçersiz kılmalar (`location`, image, sunucu tipleri, key yolu)
Ardından çalıştır:
```bash
terraform init
terraform plan
terraform apply
mkdir -p ../../../ansible/test/inventory/generated
terraform output -raw ansible_inventory_yaml > ../../../ansible/test/inventory/generated/test.yml
```
### 2) 🏭 Production Altyapısı
```bash
cd terraform/hetzner/prod
cp terraform.tfvars.example terraform.tfvars
```
`terraform.tfvars` değerlerini düzenle:
- `hcloud_token` (prod token)
- `admin_allowed_cidrs`
- isteğe bağlı geçersiz kılmalar
Ardından çalıştır:
```bash
terraform init
terraform plan
terraform apply
mkdir -p ../../../ansible/prod/inventory/generated
terraform output -raw ansible_inventory_yaml > ../../../ansible/prod/inventory/generated/prod.yml
```
## 🧱 Kurulum Akışı (Kanonik Sıra) ## 🧱 Kurulum Akışı (Kanonik Sıra)
Kurulum dokümanlarını bu sırayla kullan: Bir ortamı sıfırdan kurarken veya majör bir güncelleme yaparken şu sırayı takip edin:
1. `setup/00-genel-yol-haritasi.md` — genel kararlar ve sınırlar 1. **Analiz:** [hetzner-sizing-report.md](./hetzner-sizing-report.md) ile kaynak ihtiyacını belirleyin.
2. `setup/01-private-network-port-matrisi.md` — private/public port politikası 2. **Planlama:** `roadmap/` altındaki ilgili ortam dökümanlarını inceleyerek yapılacak değişiklikleri anlayın.
3. `setup/02-test-terraform-iaac.md` — test Terraform aşaması 3. **Hizalama:** [setup-vs-roadmap-map.md](./setup-vs-roadmap-map.md) ile hangi setup dökümanlarını kullanacağınızı netleştirin.
4. `setup/03-test-ansible-bootstrap.md` — test OS/bootstrap/sertleştirme 4. **Uygulama:** `setup/` dökümanlarını (00'dan 09'a kadar) sırasıyla takip ederek Terraform ve Ansible süreçlerini işletin.
5. `setup/04-test-db-docker-kurulum.md` — test DB stack kurulumu (Swarm üzerinde) 5. **Doğrulama:** Kurulum sonrası sistemin çalışma prensipleri için `facts/` dökümanlarını referans alın.
6. `setup/05-test-runner-ve-deploy-onkosullari.md` — test runner ve deploy ön koşulları
7. `setup/06-prod-terraform-iaac.md` — prod Terraform aşaması
8. `setup/07-prod-ansible-bootstrap.md` — prod OS/bootstrap/sertleştirme
9. `setup/08-prod-db-cluster-kurulum.md` — prod DB cluster stack (MongoDB + Patroni/etcd)
10. `setup/09-prod-runner-ha-ve-swarm.md` — prod runner HA ve deploy kilit modeli
## 🛣️ Yol Haritası Dokümanları ---
Yol haritası klasörleri; Swarm stack'leri, SWAG, APISIX, pipeline güncellemeleri ve doğrulama kontrol listeleri için entegrasyon çalışmalarını takip eder: ## ✅ Ön Koşullar ve Araçlar
- `roadmap/test-env/*` - **Terraform >= 1.6**: Altyapı provisioning.
- `roadmap/prod-env/*` - **Ansible**: Konfigürasyon yönetimi.
- **Hetzner Cloud API Token**: Ortam bazlı yetkilendirme.
- **SSH Key**: Sunucu erişimi için sisteme tanımlı anahtar çifti.
Bu dokümanlar zaman zaman ilgili repolardan (örneğin uygulama ana deposu workflow ve stack dosyaları) dosyalara referans verir. Bu altyapı temeliyle hizalanmış uygulama rehberleri olarak değerlendirilmelidir. ---
*iklim.co Infrastructure Team - 2026*
## 💰 Boyutlandırma ve Maliyet Özeti
Referans: `hetzner-sizing-report.md`
Önerilen temel yapı:
- **Test:** `2 x cpx42` (app + db)
- **Prod:** `3 x cpx42` (app) + `3 x cpx32` (db)
Rapordaki yaklaşık aylık toplam:
- Test: `$59.98`
- Prod: `$139.44`
- Toplam: `$199.42`
## 🔑 Gizli Bilgi ve State Yönetimi
Kesinlikle commit'lenmemeli:
- `terraform.tfvars`, `*.tfvars`, `*.tfstate`, `.terraform/`
- private key'ler, sertifikalar, `.env` gizli bilgileri
- runner token'ları ve vault parola dosyaları
Zorunlu kalıplar için `.gitignore` dosyasına bkz.
Önerilen:
- çalışma zamanı gizli bilgilerini güvenli gizli depolarda / şifreli vault dosyalarında tut
- üretilen çalışma zamanı artifakt'larını, açıkça temizlenmedikçe versiyon kontrolü dışında tut
## ⚠️ Bilinen Eksikler / Notlar
- `ansible/prod/inventory/generated/prod.yml` beklenen bir çıktı yoludur; üretilene kadar mevcut olmayabilir.
- Bazı yol haritası adımları, yalnızca bu depo değil, daha geniş kapsamlı `iklim.co` uygulama deposundaki dosyaları hedef alır.
## ✅ Hızlı Doğrulama Kontrol Listesi
Terraform apply sonrası:
- sunucular beklenen isimler ve private IP'lerle oluşturulmuş
- floating IP mevcut ve bağlı
- firewall'lar yalnızca amaçlanan public portlarııyor
- placement group'lar atanmış
- üretilen envanter YAML `ansible/{test,prod}/inventory/generated/*.yml` yoluna aktarılmış
Bootstrap/deploy aşamaları sonrası:
- Swarm durumu ve etiketleri dökümantasyonla eşleşiyor
- DB erişimi yalnızca private ağdan mümkün
- Vault/API gateway'leri public/private erişim kurallarına uyuyor
- Runner ve deploy kilit davranışı ortam politikasıyla örtüşüyor
## 🔗 Referanslar
- Hetzner Terraform Provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest
- Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/
- Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview
- Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview
- Docker Swarm overlay networking: https://docs.docker.com/engine/network/drivers/overlay/
- Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 MiB

View File

@ -18,9 +18,9 @@
| `iklim-app-01` | API services, SWAG, Vault | Manager + Worker | `type=service` | | `iklim-app-01` | API services, SWAG, Vault | Manager + Worker | `type=service` |
| `iklim-app-02` | API services replicas | Manager + Worker | `type=service` | | `iklim-app-02` | API services replicas | Manager + Worker | `type=service` |
| `iklim-app-03` | API services replicas | Manager + Worker | `type=service` | | `iklim-app-03` | API services replicas | Manager + Worker | `type=service` |
| `iklim-db-01` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` | | `iklim-db-01` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db`, `db-index=01` |
| `iklim-db-02` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` | | `iklim-db-02` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db`, `db-index=02` |
| `iklim-db-03` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` | | `iklim-db-03` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db`, `db-index=03` |
### Label scheme rationale ### Label scheme rationale
@ -95,9 +95,9 @@ docker swarm join --token <WORKER_TOKEN> 10.20.10.11:2377
Then label them on iklim-app-01: Then label them on iklim-app-01:
```bash ```bash
for node in iklim-db-01 iklim-db-02 iklim-db-03; do docker node update --label-add role=db --label-add db-index=01 iklim-db-01
docker node update --label-add role=db "$node" docker node update --label-add role=db --label-add db-index=02 iklim-db-02
done docker node update --label-add role=db --label-add db-index=03 iklim-db-03
``` ```
> DB nodes are Swarm **workers** only — they never become managers. > DB nodes are Swarm **workers** only — they never become managers.
@ -117,7 +117,7 @@ docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}' docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
``` ```
Expected: `map[type:service]` for app nodes, `map[role:db]` for DB nodes. Expected: `map[type:service]` for app nodes, `map[db-index:01 role:db]` (vb.) for DB nodes.
## Step 7 — Confirm `init/swarm-init.sh` multi-node awareness ## Step 7 — Confirm `init/swarm-init.sh` multi-node awareness

View File

@ -39,9 +39,9 @@ The deploy pipeline (see `08-deploy-pipeline-update.md`) runs on iklim-app-01:
```bash ```bash
set -a; . ./.env; set +a set -a; . ./.env; set +a
mkdir -p "$SWAG_DNS_CONF_DIR" mkdir -p "$SWAG_CONFIG_DIR/dns-conf"
envsubst < swag/dns-conf/godaddy.ini.tpl > "$SWAG_DNS_CONF_DIR/godaddy.ini" envsubst < swag/dns-conf/godaddy.ini.tpl > "$SWAG_CONFIG_DIR/dns-conf/godaddy.ini"
chmod 600 "$SWAG_DNS_CONF_DIR/godaddy.ini" chmod 600 "$SWAG_CONFIG_DIR/dns-conf/godaddy.ini"
``` ```
## Step 4 — GoDaddy A records for prod subdomains (handled by pipeline) ## Step 4 — GoDaddy A records for prod subdomains (handled by pipeline)

View File

@ -35,13 +35,13 @@ Test-env Step 9 adds the `swag-vl` named volume to the base file. In prod, SWAG
No `swag-vl` definition is made in `docker-stack-infra.prod.yml`. No `swag-vl` definition is made in `docker-stack-infra.prod.yml`.
### Monitoring Persistence (StorageBox) ### Monitoring Persistence
Prometheus and Grafana run as single instances. To ensure monitoring data and dashboards survive a node failover (moving from `iklim-app-01` to another node), their data is stored on the shared StorageBox: Prometheus and Grafana run as single instances, but their storage profiles are different:
- **Prometheus:** `/mnt/storagebox/prometheus/data` - **Prometheus:** keep TSDB on a local Docker volume (`prometheus-vl`). Prometheus local storage should not run on StorageBox/DAVFS because of filesystem semantics and WAL/compaction I/O.
- **Grafana:** `/mnt/storagebox/grafana/data` - **Grafana:** keep `/var/lib/grafana` on StorageBox (`/mnt/storagebox/grafana/data`) so dashboards, plugins, and the SQLite database are available if the single active instance is manually moved to another node.
These paths are mounted via env vars (`PROMETHEUS_DATA_DIR`, `GRAFANA_DATA_DIR`) with named-volume fallbacks for test. See Step 8 for implementation details. Grafana uses the `GRAFANA_DATA_DIR` env var with a named-volume fallback for test. Prometheus continues to use the named Docker volume. See Step 9 for implementation details.
**Note:** PostgreSQL and MongoDB are not in `docker-stack-infra.yml`. They run in separate stacks on DB nodes (`iklim-db` and `iklim-patroni`). See `08-prod-db-cluster-kurulum.md`. **Note:** PostgreSQL and MongoDB are not in `docker-stack-infra.yml`. They run in separate stacks on DB nodes (`iklim-db` and `iklim-patroni`). See `08-prod-db-cluster-kurulum.md`.
@ -225,6 +225,7 @@ nc -zv iklim-db-01 2379
## Step 5 — Redis: Sentinel cluster (prod overlay) ## Step 5 — Redis: Sentinel cluster (prod overlay)
Redis runs as a single instance in test. In prod, Sentinel provides HA. Redis runs as a single instance in test. In prod, Sentinel provides HA.
![[redis-sentinel-vs-cluster.png]]
Bitnami images are used — all configuration is done via env vars, no separate `.conf` file needed. Bitnami images are used — all configuration is done via env vars, no separate `.conf` file needed.
### Prerequisites ### Prerequisites
@ -461,16 +462,13 @@ Consistent hashing by `remote_addr` requires APISIX to see the actual client IP,
> **DNS Note:** For `chash` to work with node-specific names, the RabbitMQ service must have network aliases configured for each node (e.g., `rabbitmq-{{.Node.Hostname}}`) as shown in Step 6. > **DNS Note:** For `chash` to work with node-specific names, the RabbitMQ service must have network aliases configured for each node (e.g., `rabbitmq-{{.Node.Hostname}}`) as shown in Step 6.
Update `template/apisix-core/config.yaml.template`: In the `config.yaml` inside the custom APISIX image (`custom-apisix:3.12.0`):
```yaml ```yaml
nginx_config: nginx_config:
http: http:
real_ip_header: "X-Forwarded-For" real_ip_header: "X-Real-IP"
real_ip_from: set_real_ip_from: "10.0.0.0/8"
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
``` ```
## Step 8 — Create `docker-stack-infra.prod.yml` ## Step 8 — Create `docker-stack-infra.prod.yml`
@ -496,7 +494,7 @@ services:
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true} "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes: volumes:
- /opt/iklimco/vault/data:/vault/file - /opt/iklimco/vault/data:/vault/file
- /mnt/storagebox/ssl:/vault/certs:ro - ${SWAG_CERT_DIR}:/vault/certs:ro
deploy: deploy:
mode: replicated mode: replicated
replicas: 3 replicas: 3
@ -534,7 +532,7 @@ services:
replicas: 1 replicas: 1
placement: placement:
constraints: constraints:
- node.hostname == iklim-app-01 - node.labels.type == service
restart_policy: restart_policy:
condition: any condition: any
delay: 5s delay: 5s
@ -625,41 +623,40 @@ networks:
external: true external: true
``` ```
## Step 8 — Monitoring Data Persistence (StorageBox) ## Step 9 — Monitoring Data Persistence
Prometheus and Grafana run as single instances. Without persistent storage, data is lost on node failover. This step mounts their data directories from the StorageBox shared filesystem. Prometheus and Grafana run as single instances. Grafana data is placed on the StorageBox shared filesystem for manual failover. Prometheus TSDB stays on a local Docker volume because DAVFS/StorageBox is not suitable for Prometheus WAL and compaction I/O.
**Changes already applied to `docker-stack-infra.yml`:** **Changes already applied to `docker-stack-infra.yml`:**
```yaml ```yaml
prometheus: prometheus:
volumes: volumes:
- ${PROMETHEUS_DATA_DIR:-prometheus-vl}:/prometheus - prometheus-vl:/prometheus
grafana: grafana:
volumes: volumes:
- ${GRAFANA_DATA_DIR:-grafana-vl}:/var/lib/grafana - ${GRAFANA_DATA_DIR:-grafana-vl}:/var/lib/grafana
``` ```
Test uses the named Docker volume fallbacks (`prometheus-vl`, `grafana-vl`) — no test env change needed. Test uses the named Docker volume fallback (`grafana-vl`) for Grafana, and Prometheus always uses the named Docker volume (`prometheus-vl`) — no test env change needed.
**Add to `prod/secrets/iklim.co/.env.prod` on storagebox** (already in `env-prod/.env`): **Add to `prod/secrets/iklim.co/.env.prod` on storagebox** (already in `env-prod/.env`):
```bash ```bash
PROMETHEUS_DATA_DIR=/mnt/storagebox/prometheus/data
GRAFANA_DATA_DIR=/mnt/storagebox/grafana/data GRAFANA_DATA_DIR=/mnt/storagebox/grafana/data
``` ```
**Create directories on StorageBox before first prod deploy:** **Create directories on StorageBox before first prod deploy:**
```bash ```bash
mkdir -p /mnt/storagebox/prometheus/data /mnt/storagebox/grafana/data mkdir -p /mnt/storagebox/grafana/data
``` ```
> Grafana writes its SQLite database and dashboard JSON to `/var/lib/grafana`. > Grafana writes its SQLite database and dashboard JSON to `/var/lib/grafana`.
> Prometheus writes its TSDB to `/prometheus`. Both directories must exist before the stack starts. > Prometheus writes its TSDB to `/prometheus` on the local `prometheus-vl` Docker volume; it is not shared between nodes.
## Step 9 — Verify ## Step 10 — Verify
```bash ```bash
# Base file must be valid on its own (test deploy): # Base file must be valid on its own (test deploy):
@ -669,6 +666,18 @@ docker stack config -c docker-stack-infra.yml > /dev/null && echo "base OK"
docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml > /dev/null && echo "prod merge OK" docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml > /dev/null && echo "prod merge OK"
``` ```
## Step 9 — Database Proxies and Developer Access
In the production environment, the `pg-proxy` and `mongo-proxy` services (socat-based) defined in the base `docker-stack-infra.yml` are **deprecated and will not be used**.
### Rationale
- **Leader Tracking:** Simple L4 proxies (socat) cannot track the Patroni Leader or MongoDB Primary. They point to a single service VIP, which might lead to a Read-Only replica during failover.
- **HA Connection Strings:** Modern DB drivers (JDBC, libpq, MongoClient) support multi-host connection strings, which provide native failover and load balancing without an intermediate proxy.
### Developer Access Strategy
- **Direct Subnet Access:** Developers connect via WireGuard directly to the DB subnet (`10.20.20.0/24`).
- **No Translation:** Instead of mapping ports like `15432`, the standard ports (`5432`, `27017`) are used across all cluster nodes.
## Placement and Replica Summary — prod ## Placement and Replica Summary — prod
| Service | File | Replicas | Placement | HA Note | | Service | File | Replicas | Placement | HA Note |

View File

@ -11,15 +11,13 @@ API_SUBDOMAIN=api.iklim.co
APIGW_SUBDOMAIN=apigw.iklim.co APIGW_SUBDOMAIN=apigw.iklim.co
RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co
GRAFANA_SUBDOMAIN=grafana.iklim.co GRAFANA_SUBDOMAIN=grafana.iklim.co
RESTRICTED_IP_1=78.187.87.109 RESTRICTED_IPS="78.187.87.109/32,95.70.151.248/32"
RESTRICTED_IP_2=95.70.151.248
# SWAG storage paths — StorageBox is mounted on all app nodes, shared filesystem # SWAG storage paths — StorageBox is mounted on all app nodes, shared filesystem
# cert-reloader writes here; Vault reads from this path on every node — no SSH distribution needed # cert-reloader writes here; Vault reads from this path on every node — no SSH distribution needed
SWAG_CERT_DIR=/mnt/storagebox/ssl SWAG_CERT_DIR=/mnt/storagebox/ssl
# SWAG config dirs on StorageBox — all three survive node failover without pipeline re-run # SWAG config dirs on StorageBox — all three survive node failover without pipeline re-run
SWAG_CONFIG_DIR=/mnt/storagebox/swag/config SWAG_CONFIG_DIR=/mnt/storagebox/swag/config
SWAG_DNS_CONF_DIR=/mnt/storagebox/swag/dns-conf
SWAG_SITE_CONFS_DIR=/mnt/storagebox/swag/site-confs SWAG_SITE_CONFS_DIR=/mnt/storagebox/swag/site-confs
``` ```
@ -37,14 +35,14 @@ No new files to create — the same templates work for both environments.
```bash ```bash
set -a; . ./.env; set +a set -a; . ./.env; set +a
export RESTRICTED_IP_1="78.187.87.109" export RESTRICTED_IPS_BLOCK="$(echo "$RESTRICTED_IPS" | tr ',' '\n' | sed 's|.*| allow &;|')"
export RESTRICTED_IP_2="95.70.151.248"
mkdir -p "$SWAG_DNS_CONF_DIR" "$SWAG_SITE_CONFS_DIR" mkdir -p "$SWAG_SITE_CONFS_DIR"
SWAG_VARS='${API_SUBDOMAIN}${APIGW_SUBDOMAIN}${GRAFANA_SUBDOMAIN}${RABBITMQ_SUBDOMAIN}${RESTRICTED_IPS_BLOCK}'
for tpl in swag/site-confs/*.conf.tpl; do for tpl in swag/site-confs/*.conf.tpl; do
out="$SWAG_SITE_CONFS_DIR/$(basename "${tpl%.tpl}")" out="$SWAG_SITE_CONFS_DIR/$(basename "${tpl%.tpl}")"
envsubst < "$tpl" | sudo tee "$out" > /dev/null envsubst "$SWAG_VARS" < "$tpl" | sudo tee "$out" > /dev/null
echo "✅ $out" echo "✅ $out"
done done

View File

@ -34,11 +34,12 @@ vault:
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true} "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes: volumes:
- /opt/iklimco/vault/data:/vault/file # host path per node - /opt/iklimco/vault/data:/vault/file # host path per node
- /mnt/storagebox/ssl:/vault/certs:ro # StorageBox — shared across all nodes, no SSH distribution needed - ${SWAG_CERT_DIR}:/vault/certs:ro # StorageBox — shared across all nodes, no SSH distribution needed
deploy: deploy:
mode: replicated mode: replicated
replicas: 3 replicas: 3
placement: placement:
max_replicas_per_node: 1
constraints: constraints:
- node.labels.type == service - node.labels.type == service
``` ```

View File

@ -10,7 +10,7 @@
- Storagebox paths via env vars (`SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, vb.) instead of local host paths - Storagebox paths via env vars (`SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, vb.) instead of local host paths
- Extra steps: Update DNS Records (GoDaddy API), Wait for etcd - Extra steps: Update DNS Records (GoDaddy API), Wait for etcd
## Step 1 — Remove manual cert scp lines from `Initialize Servers` ## Step 1 — Remove manual cert scp lines from `Initialize Workspace`
```yaml ```yaml
# DELETE from "Initialize Servers" step: # DELETE from "Initialize Servers" step:
@ -42,23 +42,23 @@ Insert **after** `Docker Login to Harbor` and **before** `Prepare SWAG Directori
2>/dev/null | jq -r '.[0].data // empty' 2>/dev/null || true) 2>/dev/null | jq -r '.[0].data // empty' 2>/dev/null || true)
if [ "$CURRENT" = "$FLOATING_IP" ]; then if [ "$CURRENT" = "$FLOATING_IP" ]; then
echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (mevcut, atlanıyor)" echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (exists, skipping)"
else else
curl -sf -X PUT \ curl -sf -X PUT \
-H "Authorization: sso-key ${GODADDY_KEY}:${GODADDY_SECRET}" \ -H "Authorization: sso-key ${GODADDY_KEY}:${GODADDY_SECRET}" \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
"https://api.godaddy.com/v1/domains/${DOMAIN}/records/A/${record}" \ "https://api.godaddy.com/v1/domains/${DOMAIN}/records/A/${record}" \
-d "[{\"data\":\"${FLOATING_IP}\",\"ttl\":600}]" -d "[{\"data\":\"${FLOATING_IP}\",\"ttl\":600}]"
echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (eklendi/güncellendi)" echo "✅ ${record}.${DOMAIN} → ${FLOATING_IP} (added/updated)"
fi fi
done done
working-directory: /workspace/iklim.co working-directory: /workspace/iklim.co
``` ```
> `GODADDY_KEY` ve `GODADDY_SECRET` `.env.secrets.swag`'dan okunur. > `GODADDY_KEY` and `GODADDY_SECRET` are read from `.env.secrets.swag`.
> `PROD_FLOATING_IP` Gitea project variable olarak tanımlanmalı (`terraform output prod_floating_ip`). > `PROD_FLOATING_IP` must be defined as a Gitea project variable (`terraform output prod_floating_ip`).
> `jq` gereklidir — `Update Apt Repository` adımına eklenmiş olmalı: `apt-get install -y gettext tree jq`. > `jq` is required — it must have been added to the `Update Apt Repository` step: `apt-get install -y gettext tree jq`.
> Her deploy'da çalışır; mevcut ve doğru kayıtlar atlanır (idempotent). > Runs on every deploy; existing and correct records are skipped (idempotent).
## Step 3 — Add `Prepare SWAG Directories` step ## Step 3 — Add `Prepare SWAG Directories` step
@ -69,10 +69,10 @@ Insert **before** `Bootstrap Vault TLS Placeholder`:
run: | run: |
set -a; . ./.env; . ./.env.secrets.swag; set +a set -a; . ./.env; . ./.env.secrets.swag; set +a
mkdir -p "$SWAG_CONFIG_DIR" "$SWAG_DNS_CONF_DIR" "$SWAG_SITE_CONFS_DIR" mkdir -p "$SWAG_CONFIG_DIR/dns-conf" "$SWAG_SITE_CONFS_DIR"
envsubst < swag/dns-conf/godaddy.ini.tpl | docker run --rm -i \ envsubst < swag/dns-conf/godaddy.ini.tpl | docker run --rm -i \
-v "${SWAG_DNS_CONF_DIR}:/output" \ -v "${SWAG_CONFIG_DIR}/dns-conf:/output" \
alpine sh -c "cat > /output/godaddy.ini && chmod 600 /output/godaddy.ini" alpine sh -c "cat > /output/godaddy.ini && chmod 600 /output/godaddy.ini"
echo "✅ godaddy.ini written" echo "✅ godaddy.ini written"
@ -112,20 +112,20 @@ APISIX reads its entire configuration from etcd; init script will fail silently
```yaml ```yaml
- name: Wait for etcd - name: Wait for etcd
run: | run: |
echo "⏳ Waiting for etcd..." echo "⏳ Waiting for Patroni etcd..."
for i in $(seq 1 30); do for i in $(seq 1 30); do
if docker run --rm --network iklimco-net alpine \ if docker run --rm --network iklimco-net alpine \
sh -c "wget -qO- http://etcd:2379/health 2>/dev/null | grep -q '\"health\":\"true\"'"; then sh -c "wget -qO- http://iklim-db-01:2379/health 2>/dev/null | grep -q '\"health\":\"true\"'"; then
echo "✅ etcd ready" echo "✅ Patroni etcd ready"
break break
fi fi
[ "$i" -eq 30 ] && echo "❌ etcd did not become ready in time" && exit 1 [ "$i" -eq 30 ] && echo "❌ Patroni etcd did not become ready in time" && exit 1
echo " attempt $i/30 — waiting 5s..." echo " attempt $i/30 — waiting 5s..."
sleep 5 sleep 5
done done
``` ```
> **Note:** In prod, the standalone `etcd` service from `docker-stack-infra.yml` still runs (Docker Compose overlay files cannot remove services). APISIX currently uses this etcd; the Patroni etcd migration happens via `docker-stack-infra.prod.yml`. The `http://etcd:2379/health` check targets this standalone service and is correct for the current setup. > **Note:** In prod, APISIX uses the 3-node Patroni etcd cluster on DB nodes (`iklim-db-01/02/03:2379`) via the `/apisix` prefix — configured in `config.yaml` mounted by the prod overlay. The standalone `etcd` service from the base stack remains idle. This step waits for Patroni etcd (`iklim-db-01:2379`) to be healthy before running the APISIX init script.
## Step 5 — Add `Run APISIX Init` step ## Step 5 — Add `Run APISIX Init` step
@ -247,9 +247,9 @@ Insert **after** `Bootstrap SWAG Certificate` and **before** `Review Environment
## Step 8 — Microservice prod deploy overlay ## Step 8 — Microservice prod deploy overlay
Her mikroservisin kendi `docker-stack-service.prod.yml` overlay dosyası vardır. Bu dosya prod'a özgü `replicas: 3` ve `max_replicas_per_node: 1` ayarlarını içerir. Each microservice has its own `docker-stack-service.prod.yml` overlay file. This file contains prod-specific `replicas: 3` and `max_replicas_per_node: 1` settings.
Mikroservis deploy pipeline'larında (`deploy-prod.yml`) `docker stack deploy` komutu şu şekilde olmalı: In microservice deploy pipelines (`deploy-prod.yml`), the `docker stack deploy` command should be:
```bash ```bash
docker stack deploy \ docker stack deploy \
@ -258,7 +258,7 @@ docker stack deploy \
iklimco iklimco
``` ```
Örneğin `BE-Authentication` için: For example, for `BE-Authentication`:
```bash ```bash
docker stack deploy \ docker stack deploy \
@ -267,7 +267,7 @@ docker stack deploy \
iklimco iklimco
``` ```
> Yeni bir mikroservis eklendiğinde `BE-<ServiceName>/docker-stack-service.prod.yml` dosyasının oluşturulması ve pipeline'ın bu overlay'i içermesi zorunludur. > When a new microservice is added, `BE-<ServiceName>/docker-stack-service.prod.yml` must be created and the pipeline must include this overlay.
## Step 9 — Ensure subdomain env vars are in prod `.env` ## Step 9 — Ensure subdomain env vars are in prod `.env`
@ -282,27 +282,34 @@ GRAFANA_SUBDOMAIN=grafana.iklim.co
## Step 10 — Final step order for prod pipeline ## Step 10 — Final step order for prod pipeline
1. Acquire Deploy Lock To prevent concurrent deploys, a Gitea Actions `concurrency` block is added per pipeline:
2. Checkout Branch
3. Prepare Folders ```yaml
4. Set up SSH Key and Add to known_hosts concurrency:
5. Update Apt Repository and Install Required Tools (`gettext tree jq`) group: prod-deploy
6. Fetch Service Secret Files cancel-in-progress: false
7. Initialize Servers ← cert scp lines removed ```
8. Upload Updated Secrets to Storagebox
9. Provision Vault AppRole IDs and Docker Secrets With `cancel-in-progress: false`, a new run waits in the queue until the previous one finishes; Gitea UI shows it as "queued" and does not return an error.
10. Upload Updated Env to Storagebox
11. Prepare Init Files ← cert copy lines removed 1. Checkout Branch
12. Initialize Docker Swarm 2. Prepare Folders
13. Stop Docker Compose Services 3. Set up SSH Key and Add to known_hosts
14. Docker Login to Harbor 4. Update Apt Repository and Install Required Tools (`gettext tree jq`)
15. **Update DNS Records** ← NEW (GoDaddy API, idempotent) 5. Fetch Service Secret Files
16. **Prepare SWAG Directories** ← NEW 6. Initialize Workspace ← cert scp lines removed
17. Bootstrap Vault TLS Placeholder 7. Upload Updated Secrets to Storagebox
18. Deploy Swarm Stack 8. Provision Vault AppRole IDs and Docker Secrets
19. **Wait for etcd** ← NEW 9. Upload Updated Env to Storagebox
20. **Run APISIX Init** ← NEW (`SPRING_PROFILES_ACTIVE=prod`) 10. Prepare Init Files ← cert copy lines removed
21. **Bootstrap SWAG Certificate** ← NEW 11. Initialize Docker Swarm
22. **Run Database Init Scripts** ← NEW (`postgresql`, `mongodb`) 12. Docker Login to Harbor
23. Review Environment 13. **Update DNS Records** ← NEW (GoDaddy API, idempotent)
24. Release Deploy Lock 14. **Prepare SWAG Directories** ← NEW (`$SWAG_CONFIG_DIR/dns-conf`; renders nginx conf templates)
15. Bootstrap Vault TLS Placeholder
16. Deploy Swarm Stack
17. **Wait for etcd** ← NEW (Patroni etcd `iklim-db-01:2379`)
18. **Run APISIX Init** ← NEW (`SPRING_PROFILES_ACTIVE=prod`)
19. **Bootstrap SWAG Certificate** ← NEW
20. **Run Database Init Scripts** ← NEW (`postgresql`, `mongodb`)
21. Review Environment

View File

@ -1,157 +1,137 @@
# 00 - Genel Yol Haritasi # 00 - General Roadmap
Bu dosya, `Environment_Infrastructure` reposunda Terraform ve Ansible ile Hetzner Cloud uzerinde test/prod altyapisini kuracak ajanlar icin ana baglamdir. Her asama dosyasi kendi basina yeterli olacak sekilde yazilmistir; yine de bu dokuman genel karar kaydidir. This file is the main context for agents that will set up the test/prod infrastructure on Hetzner Cloud with Terraform and Ansible in the `Environment_Infrastructure` repo. Each phase file is written to be self-sufficient; nevertheless, this document is the general decision record.
## Hedef ## Goal
Iklim.co altyapisi iki ayri Hetzner Cloud Project uzerinde kurulacak: The Iklim.co infrastructure will be set up on two separate Hetzner Cloud Projects:
- `test` Hetzner Cloud Project - `test` Hetzner Cloud Project
- `prod` Hetzner Cloud Project - `prod` Hetzner Cloud Project
Bu ayrim zorunlu kabul edilir. API token, network, firewall, placement group, server, maliyet ve yanlislikla silme riskleri ortam bazinda ayrilmis olur. This separation is considered mandatory. API tokens, networks, firewalls, placement groups, servers, costs, and accidental deletion risks are separated by environment.
## Terraform ve Ansible Sorumluluk Siniri ## Terraform and Ansible Responsibility Boundary
Terraform sadece IaaS kaynaklarini olusturur: Terraform creates only IaaS resources:
- Hetzner Cloud server - Hetzner Cloud server
- Private network ve subnet - Private network and subnet
- Firewall - Firewall
- SSH key - SSH key
- Placement group - Placement group
- Opsiyonel volume, floating IP, load balancer veya DNS kaydi - Optional volume, floating IP, load balancer, or DNS record
- Ansible inventory output - Ansible inventory output
Ansible olusan Linux makineleri hazirlar: Ansible prepares the created Linux machines:
- Linux base paketleri - Linux base packages
- Security hardening - Security hardening
- Docker Engine kurulumu - Docker Engine installation
- Docker Swarm init/join - Docker Swarm init/join
- Gitea Actions `act_runner` systemd kurulumu - Gitea Actions `act_runner` systemd installation
- Ortak klasorler ve deploy on kosullari - Shared directories and deploy prerequisites
Terraform icinde Docker, Swarm, runner veya uygulama deploy'u yapilmayacak. Ansible icinde Hetzner Cloud kaynaklari yaratilmeyecek. Docker, Swarm, runner, or application deployment will not be done inside Terraform. Hetzner Cloud resources will not be created inside Ansible.
## Ortam Topolojileri ## Environment Topologies
### Test ### Test
Test ortami minimum topoloji: Minimum topology for the test environment:
| Node | Rol | Not | | Node | Role | Note |
| --- | --- | --- | | --- | --- | --- |
| `iklim-app-01` | Swarm manager + app worker + Gitea runner | CI/CD test deploy bu node uzerinden calisir | | `iklim-app-01` | Swarm manager + app worker + Gitea runner | CI/CD test deploy runs through this node |
| `iklim-db-01` | DB node | DB altyapisi manuel kurulacak; Gitea CI/CD ile kurulmayacak | | `iklim-db-01` | DB node | DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD |
Test DB kurulumu Terraform/Ansible ile sadece makine ve OS hazirligina kadar getirilir. PostgreSQL/MongoDB cluster kurulumu bu asamanin disindadir. The test DB setup is brought only up to machine and OS preparation with Terraform/Ansible. PostgreSQL/MongoDB cluster installation is outside this phase.
### Prod ### Prod
Prod ortami HA topoloji: HA topology for the prod environment:
| Node grubu | Adet | Rol | | Node group | Count | Role |
| --- | ---: | --- | | --- | ---: | --- |
| `iklim-app-*` | 3 | Her biri Swarm manager + app worker | | `iklim-app-*` | 3 | Each one is a Swarm manager + app worker |
| `iklim-db-*` | 3 | DB cluster node'lari | | `iklim-db-*` | 3 | DB cluster nodes |
Prod DB altyapisi manuel kurulacak; Gitea CI/CD ile kurulmayacak. Terraform DB makinelerini ve network/firewall kurallarini hazirlar, Ansible OS hardening ve temel bagimliliklari kurar. Prod DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD. Terraform prepares the DB machines and network/firewall rules; Ansible installs OS hardening and base dependencies.
## Public Port Politikasi ## Public Port Policy
Public internete acik portlar sadece: Ports open to the public internet are only:
- `22/tcp` SSH, sadece admin IP/CIDR kaynaklarindan - `22/tcp` SSH, only from admin IP/CIDR sources
- `80/tcp` HTTP - `80/tcp` HTTP
- `443/tcp` HTTPS - `443/tcp` HTTPS
`8200/tcp` Vault public internete acilmayacak. Vault sadece private network veya Docker overlay icinden erisilebilir olmalidir. `8200/tcp` Vault will not be opened to the public internet. Vault must be reachable only from the private network or Docker overlay.
`docker-stack-infra.yml` bu politikaya uygun hale getirilmiştir: yalnızca SWAG servisi 80/443 portlarını yayınlar; Vault, APISIX, RabbitMQ, Prometheus, Grafana gibi tüm diğer servisler yalnızca `iklimco-net` overlay üzerinden erişilebilir. `docker-stack-infra.yml` has been aligned with this policy: only the SWAG service publishes ports 80/443; all other services such as Vault, APISIX, RabbitMQ, Prometheus, and Grafana are reachable only through the `iklimco-net` overlay.
## Private Network Politikasi ## Private Network Policy
Private network icinde acilmasi gereken portlarin ayrintili matrisi `01-private-network-port-matrisi.md` dosyasindadir. Ajanlar firewall veya Ansible UFW kurali yazarken bu dosyayi kaynak kabul etmelidir. The detailed matrix of ports that must be opened inside the private network is in `01-private-network-port-matrisi.md`. Agents must treat that file as the source when writing firewall or Ansible UFW rules.
## Gitea Actions Runner Karari ## Gitea Actions Runner Decision
`act_runner` Docker container olarak calistirilmayacak ve Docker socket container'a mount edilmeyecek. `act_runner` will not run as a Docker container, and the Docker socket will not be mounted into a container.
Tercih edilen kurulum: Preferred installation:
- `act_runner` Linux systemd servisi olarak kurulur. - `act_runner` is installed as a Linux systemd service.
- Runner icin ayri `gitea-runner` kullanicisi olusturulur. - A separate `gitea-runner` user is created for the runner.
- CI/CD job'lari gerekli oldugunda container olusturabilir; bunun icin runner host uzerinde Docker CLI/daemon erisimi gerekir. - CI/CD jobs can create containers when needed; for this, the runner host needs Docker CLI/daemon access.
- Docker group uyeligi root seviyesine yakin yetki verdigi icin sadece guvenilir Gitea repo/job'lari bu runner label'larini kullanmalidir. - Because Docker group membership grants permissions close to root level, only trusted Gitea repos/jobs should use these runner labels.
Prod HA icin `act_runner` tek makineye degil, 3 Swarm manager node'unun tamamına kurulacaktir. Boylece bir manager/runner kaybedildiginde pipeline calismaya devam edebilir. Runner label'lari hem ortak hem node-spesifik olmalidir: For prod HA, `act_runner` will be installed not on a single machine but on all 3 Swarm manager nodes. This allows pipelines to continue when one manager/runner is lost. Runner labels must be both shared and node-specific:
- Ortak: `prod-runner` - Shared: `prod-runner`
- Node spesifik: `iklim-app-01`, `iklim-app-02`, `iklim-app-03` - Node specific: `iklim-app-01`, `iklim-app-02`, `iklim-app-03`
Test icin tek runner yeterlidir: For test, a single runner is enough:
- Ortak: `test-runner` - Shared: `test-runner`
- Node spesifik: `iklim-app-01` - Node specific: `iklim-app-01`
## Deploy Lock Karari ## Deploy Serialization Decision
Prod ortaminda 3 runner HA icin gereklidir; ancak ayni anda birden fazla deploy job'u calistirabilir. Bu nedenle prod deploy islemleri StorageBox uzerinde otomatik lock ile tekillestirilmelidir. Because of the 3-runner HA model in prod, multiple deploy jobs can run at the same time. Gitea Actions `concurrency` is used to prevent concurrent deploys; a StorageBox-based lock mechanism is not required.
Lock dosyalari/klasorleri manuel olusturulmayacak. Workflow basinda atomik `mkdir` ile olusturulacak, deploy bitince `rmdir` ile silinecek. ```yaml
concurrency:
Onerilen StorageBox path'leri: group: prod-deploy
cancel-in-progress: false
```text
prod/locks/prod-deploy.lock
prod/locks/prod-infra.lock
prod/locks/services/<service-name>.lock
``` ```
Baslangic icin en sade ve guvenli model tek global prod deploy lock'tur: With `cancel-in-progress: false`, a new run in the same group is queued by Gitea until the previous one finishes; it appears as "queued" in the UI and is not shown as an error. All prod deploy workflows, including infrastructure and microservices, must use the same `group: prod-deploy` value so infra deploy and microservice deploy cannot overlap.
```text ## Hetzner Physical Host Separation
prod/locks/prod-deploy.lock
```
Bu model tum prod deploy'lari siraya sokar. Daha sonra ihtiyac olursa servis bazli lock modeline gecilebilir. Hetzner Cloud does not allow direct cabinet selection. `Placement Group` is used for the requirement of avoiding the same physical host. A placement group of type `spread` aims to place the cloud servers in the group on different physical hosts.
Ornek akış: Constraints:
```bash - A spread placement group reduces the impact of a single physical host failure.
ssh storagebox 'mkdir -p prod/locks && mkdir prod/locks/prod-deploy.lock' - It does not guarantee protection against a wider failure inside the same datacenter or location.
# deploy islemleri - For location-level disaster recovery, a different location/region distribution must be designed later.
ssh storagebox 'rmdir prod/locks/prod-deploy.lock' - According to Hetzner documentation, there is a maximum limit of 10 servers per spread placement group.
```
`mkdir` atomik oldugu icin lock zaten varsa komut fail olur; bu durumda job beklemeli veya temiz bir hata ile cikmalidir. Workflow fail olsa bile cleanup adimi lock'u silmeye calismalidir. Eski kalmis lock'lari tespit etmek icin lock klasoru icine timestamp, runner adi ve workflow bilgisi yazilabilir. At least two placement groups are recommended for prod:
## Hetzner Fiziksel Host Ayrimi - `iklim-prod-app-spread`: 3 Swarm manager/app nodes
- `iklim-prod-db-spread`: 3 DB nodes
Hetzner Cloud'da kabinet secimi dogrudan yapilmaz. Ayni fiziksel host'a dusmeme ihtiyaci icin `Placement Group` kullanilir. `spread` tipindeki placement group, gruptaki cloud server'lari farkli fiziksel host'lara yerlestirmeyi hedefler. Optional for test:
Kisitlar: - `iklim-test-spread`: `iklim-app-01` and `iklim-db-01`
- Spread placement group, tek fiziksel host arizasinin etkisini azaltir. Sources:
- Ayni datacenter veya lokasyon icindeki daha genis bir arizaya karsi garanti vermez.
- Lokasyon bazli felaket kurtarma icin ileride farkli lokasyon/region dagilimi tasarlanmalidir.
- Hetzner dokumanina gore spread placement group basina en fazla 10 server limiti vardir.
Prod icin en az iki placement group onerilir:
- `iklim-prod-app-spread`: 3 Swarm manager/app node
- `iklim-prod-db-spread`: 3 DB node
Test icin opsiyonel:
- `iklim-test-spread`: `iklim-app-01` ve `iklim-db-01`
Kaynaklar:
- Hetzner Terraform provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest - Hetzner Terraform provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest
- Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/ - Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/
- Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview - Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview
- Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview - Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview
- Docker Swarm overlay portlari: https://docs.docker.com/engine/network/drivers/overlay/ - Docker Swarm overlay ports: https://docs.docker.com/engine/network/drivers/overlay/
- Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner - Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner

View File

@ -1,39 +1,39 @@
# 07 - Private Network Port Matrisi # 07 - Private Network Port Matrix
Bu dosya test ve prod ortamlarinda Hetzner private network icinde acilmasi gereken portlari tanimlar. Public internete acik portlar sadece `22/tcp`, `80/tcp`, `443/tcp` olacaktir. Vault `8200/tcp` public acilmayacak. This file defines the ports that must be opened inside the Hetzner private network for test and prod environments. Ports open to the public internet will only be `22/tcp`, `80/tcp`, and `443/tcp`. Vault `8200/tcp` will not be opened publicly.
Bu matris Terraform Hetzner firewall ve Ansible UFW kurallari icin kaynak kabul edilmelidir. This matrix must be treated as the source for Terraform Hetzner firewall and Ansible UFW rules.
## Network PlanI ## Network Plan
### Test ### Test
| Subnet | CIDR | Amac | | Subnet | CIDR | Purpose |
| --- | --- | --- | | --- | --- | --- |
| App/Swarm | `10.10.10.0/24` | `iklim-app-01` | | App/Swarm | `10.10.10.0/24` | `iklim-app-01` |
| DB | `10.10.20.0/24` | `test-db-01` | | DB | `10.10.20.0/24` | `test-db-01` |
### Prod ### Prod
| Subnet | CIDR | Amac | | Subnet | CIDR | Purpose |
| --- | --- | --- | | --- | --- | --- |
| App/Swarm | `10.20.10.0/24` | `iklim-app-01/02/03` | | App/Swarm | `10.20.10.0/24` | `iklim-app-01/02/03` |
| DB | `10.20.20.0/24` | `prod-db-01/02/03` | | DB | `10.20.20.0/24` | `prod-db-01/02/03` |
## Public Ingress Standardi ## Public Ingress Standard
Tum ortamlar icin public ingress: Public ingress for all environments:
| Port | Protocol | Kaynak | Hedef | Zorunluluk | | Port | Protocol | Source | Target | Requirement |
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| `22` | TCP | Admin IP/CIDR | Tum node'lar | SSH yonetim | | `22` | TCP | Admin IP/CIDR | All nodes | SSH management |
| `80` | TCP | Internet | `iklim-app-01` (gateway) | HTTP / ACME redirect | | `80` | TCP | Internet | `iklim-app-01` (gateway) | HTTP / ACME redirect |
| `443` | TCP | Internet | `iklim-app-01` (gateway) | HTTPS | | `443` | TCP | Internet | `iklim-app-01` (gateway) | HTTPS |
| `51820` | UDP | `0.0.0.0/0`, `::/0` | `iklim-db-01` (DB node) | WireGuard VPN — DB node yonetim erisimi | | `51820` | UDP | `0.0.0.0/0`, `::/0` | `iklim-db-01` (DB node) | WireGuard VPN — authentication with cryptographic key |
Public olarak acilmayacak kritik portlar: Critical ports that will not be opened publicly:
| Port | Servis | | Port | Service |
| --- | --- | | --- | --- |
| `8200/tcp` | Vault | | `8200/tcp` | Vault |
| `5432/tcp` | PostgreSQL | | `5432/tcp` | PostgreSQL |
@ -45,120 +45,119 @@ Public olarak acilmayacak kritik portlar:
| `9090/tcp` | Prometheus | | `9090/tcp` | Prometheus |
| `3000/tcp` | Grafana | | `3000/tcp` | Grafana |
## Docker Swarm Private Portlari ## Docker Swarm Private Ports
Docker Swarm node'lari arasinda zorunlu portlar: Required ports between Docker Swarm nodes:
| Port | Protocol | Kaynak | Hedef | Aciklama | | Port | Protocol | Source | Target | Description |
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| `2377` | TCP | Swarm node'lari | Swarm manager node'lari | Swarm control plane / join | | `2377` | TCP | Swarm nodes | Swarm manager nodes | Swarm control plane / join |
| `7946` | TCP | Tum Swarm node'lari | Tum Swarm node'lari | Node discovery / gossip | | `7946` | TCP | All Swarm nodes | All Swarm nodes | Node discovery / gossip |
| `7946` | UDP | Tum Swarm node'lari | Tum Swarm node'lari | Node discovery / gossip | | `7946` | UDP | All Swarm nodes | All Swarm nodes | Node discovery / gossip |
| `4789` | UDP | Tum Swarm node'lari | Tum Swarm node'lari | Overlay VXLAN data path | | `4789` | UDP | All Swarm nodes | All Swarm nodes | Overlay VXLAN data path |
Testte bu portlar fiilen tek Swarm node icin gerekli olsa da ileride worker eklemeyi kolaylastirmak icin app subnet icinde tanimlanabilir. In test, these ports are effectively required for a single Swarm node, but they can be defined inside the app subnet to make adding workers easier later.
Prod'da `10.20.10.0/24` app/swarm subnet icinde bu portlar tum `iklim-app-*` node'lari arasinda acik olmalidir. In prod, these ports must be open between all `iklim-app-*` nodes inside the `10.20.10.0/24` app/swarm subnet.
Kaynak: Docker overlay network dokumani, https://docs.docker.com/engine/network/drivers/overlay/ Source: Docker overlay network documentation, https://docs.docker.com/engine/network/drivers/overlay/
## Uygulama ve Infra Servis Private Portlari ## Application and Infra Service Private Ports
Bu portlar public acilmayacak. Sadece private network veya Docker overlay icinde gerekli kaynaklardan erisime izin verilecek. These ports will not be opened publicly. Access will be allowed only from required sources inside the private network or Docker overlay.
| Port | Protocol | Servis | Kaynak | Hedef | Not | | Port | Protocol | Service | Source | Target | Note |
| --- | --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- | --- |
| `8200` | TCP | Vault API/UI | Swarm app node'lari / runner | Vault service/node | Public kapali. Runtime servisleri Vault'a private/overlay uzerinden erismeli | | `8200` | TCP | Vault API/UI | Swarm app nodes / runner | Vault service/node | Public closed. Runtime services must reach Vault through private/overlay |
| `6379` | TCP | Redis | Swarm app node'lari | Redis service/node | Public kapali | | `6379` | TCP | Redis | Swarm app nodes | Redis service/node | Public closed |
| `5672` | TCP | RabbitMQ AMQP | Swarm app node'lari | RabbitMQ service/node | Public kapali | | `5672` | TCP | RabbitMQ AMQP | Swarm app nodes | RabbitMQ service/node | Public closed |
| `15672` | TCP | RabbitMQ Management | Admin CIDR veya private ops | RabbitMQ service/node | Public kapali; tercihen VPN/bastion | | `15672` | TCP | RabbitMQ Management | Admin CIDR or private ops | RabbitMQ service/node | Public closed; preferably VPN/bastion |
| `61613` | TCP | RabbitMQ STOMP | Gerekli app node'lari | RabbitMQ service/node | Public kapali | | `61613` | TCP | RabbitMQ STOMP | Required app nodes | RabbitMQ service/node | Public closed |
| `15674` | TCP | RabbitMQ Web STOMP | Gerekli app/gateway node'lari | RabbitMQ service/node | Public kapali | | `15674` | TCP | RabbitMQ Web STOMP | Required app/gateway nodes | RabbitMQ service/node | Public closed |
| `2379` | TCP | etcd client | APISIX service/node | etcd service/node | Public kapali | | `2379` | TCP | etcd client | APISIX service/node | etcd service/node | Public closed |
| `2380` | TCP | etcd peer | etcd cluster node'lari | etcd cluster node'lari | Tek replica ise gerekmeyebilir; cluster olursa gerekli | | `2380` | TCP | etcd peer | etcd cluster nodes | etcd cluster nodes | May not be needed for a single replica; required if clustered |
| `9180` | TCP | APISIX Admin API | Admin CIDR veya private ops | APISIX service/node | Public kapali | | `9180` | TCP | APISIX Admin API | Admin CIDR or private ops | APISIX service/node | Public closed |
| `9090` | TCP | Prometheus UI/API | Admin CIDR veya private ops | Prometheus service/node | Public kapali | | `9090` | TCP | Prometheus UI/API | Admin CIDR or private ops | Prometheus service/node | Public closed |
| `3000` | TCP | Grafana UI | Admin CIDR veya private ops | Grafana service/node | Public kapali | | `3000` | TCP | Grafana UI | Admin CIDR or private ops | Grafana service/node | Public closed |
`docker-stack-infra.yml` güncellenmiş olup yalnızca SWAG servisi 80/443 portlarını host mode ile yayınlar. Diğer tüm servisler published port içermez; erişim yalnızca `iklimco-net` overlay üzerinden sağlanır. Private ingress kararları için bu tablo kaynak olmaya devam eder. `docker-stack-infra.yml` has been updated so that only the SWAG service publishes ports 80/443 in host mode. All other services contain no published ports; access is provided only through the `iklimco-net` overlay. This table remains the source for private ingress decisions.
## DB Node Portlari ## DB Node Ports
DB altyapisi manuel kurulacagi icin kesin cluster teknolojisi bu dokumanin disindadir. Yine de firewall icin varsayilan portlar asagidadir. Because DB infrastructure will be installed manually, the exact cluster technology is outside this document. Still, the default ports for firewall purposes are below.
### PostgreSQL / PostGIS (Patroni + etcd) ### PostgreSQL / PostGIS (Patroni + etcd)
Prod ortami Patroni + etcd ile yonetilen PostgreSQL kullanir. Test ortaminda tek node oldugu icin replication ve HA portlari gerekmez. The prod environment uses PostgreSQL managed with Patroni + etcd. In the test environment, replication and HA ports are not required because there is a single node.
| Port | Protocol | Kaynak | Hedef | Not | | Port | Protocol | Source | Target | Note |
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| `5432` | TCP | App/Swarm subnet | PostgreSQL node'lari (Patroni yonetimli) | Uygulama JDBC — tum node'lara baglanir, driver primary'i bulur | | `5432` | TCP | App/Swarm subnet | PostgreSQL nodes (Patroni-managed) | Application JDBC — connects to all nodes, driver finds the primary |
| `5432` | TCP | DB subnet | PostgreSQL node'lari | Patroni replication (pg_basebackup ve wal streaming) | | `5432` | TCP | DB subnet | PostgreSQL nodes | Patroni replication (pg_basebackup and WAL streaming) |
| `8008` | TCP | DB subnet | PostgreSQL node'lari | Patroni REST API — leader election, saglik kontrolu | | `8008` | TCP | DB subnet | PostgreSQL nodes | Patroni REST API — leader election, health check |
| `2379` | TCP | DB subnet | etcd node'lari | etcd client — Patroni → etcd erisimi | | `2379` | TCP | DB subnet | etcd nodes | etcd client — Patroni -> etcd access |
| `2380` | TCP | DB subnet | etcd node'lari | etcd peer — etcd cluster icindeki raft protokolu | | `2380` | TCP | DB subnet | etcd nodes | etcd peer — raft protocol inside the etcd cluster |
### MongoDB ### MongoDB
| Port | Protocol | Kaynak | Hedef | Not | | Port | Protocol | Source | Target | Note |
| --- | --- | --- | --- | --- | | --- | --- | --- | --- | --- |
| `27017` | TCP | App/Swarm subnet | MongoDB node/replica set endpoint | Uygulama DB baglantisi | | `27017` | TCP | App/Swarm subnet | MongoDB node/replica set endpoint | Application DB connection |
| `27017` | TCP | DB subnet | MongoDB replica set node'lari | Replica set internal trafik | | `27017` | TCP | DB subnet | MongoDB replica set nodes | Replica set internal traffic |
Ileride sharding yapilirsa `27018/27019` gibi ek MongoDB rolleri gundeme gelebilir; bu asamada acilmayacak. If sharding is added later, additional MongoDB roles such as `27018/27019` may come up; they will not be opened at this stage.
## Test Private Kurallari ## Test Private Rules
Test ortaminda minimum: Minimum for the test environment:
| Kaynak | Hedef | Portlar | | Source | Target | Ports |
| --- | --- | --- | | --- | --- | --- |
| `10.10.10.0/24` | `10.10.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` | | `10.10.10.0/24` | `10.10.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` |
| `10.10.10.0/24` | `10.10.20.0/24` | `5432/tcp`, `27017/tcp` | | `10.10.10.0/24` | `10.10.20.0/24` | `5432/tcp`, `27017/tcp` |
| `10.10.10.0/24` | `10.10.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp` | | `10.10.10.0/24` | `10.10.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp` |
| Admin CIDR veya VPN | `10.10.10.0/24` | `9000/tcp`, `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` | | Admin CIDR or VPN | `10.10.10.0/24` | `9000/tcp`, `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` |
Testte DB node tek oldugu icin DB subnet icindeki PostgreSQL/MongoDB replication portlari aktif kullanilmayabilir. Because the DB node is single-node in test, PostgreSQL/MongoDB replication ports inside the DB subnet may not be actively used.
## Prod Private Kurallari ## Prod Private Rules
Prod ortaminda minimum (Patroni + etcd dahil): Minimum for the prod environment, including Patroni + etcd:
App subnet (swarm firewall) — kendi icindeki trafik: App subnet (swarm firewall) — traffic inside itself:
| Kaynak | Hedef | Portlar | | Source | Target | Ports |
| --- | --- | --- | | --- | --- | --- |
| `10.20.10.0/24` | `10.20.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm) | | `10.20.10.0/24` | `10.20.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm) |
| `10.20.10.0/24` | `10.20.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp`, `2379/tcp` (uygulama servisleri) | | `10.20.10.0/24` | `10.20.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp`, `2379/tcp` (application services) |
| Admin CIDR veya VPN | `10.20.10.0/24` | `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` | | Admin CIDR or VPN | `10.20.10.0/24` | `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` |
App → DB trafigi (swarm firewall'da ilgili kural bulunmaz; db firewall'da izin verilir): App -> DB traffic (there is no related rule in the swarm firewall; it is allowed in the db firewall):
| Kaynak | Hedef | Portlar | | Source | Target | Ports |
| --- | --- | --- | | --- | --- | --- |
| `10.20.10.0/24` | `10.20.20.0/24` | `5432/tcp`, `27017/tcp` (DB erisimi) | | `10.20.10.0/24` | `10.20.20.0/24` | `5432/tcp`, `27017/tcp` (DB access) |
| `10.20.10.0/24` | `10.20.20.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm — DB worker join) | | `10.20.10.0/24` | `10.20.20.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm — DB worker join) |
DB subnet (db firewall) — DB node'lari arasi trafik: DB subnet (db firewall) — traffic between DB nodes:
| Kaynak | Hedef | Portlar | | Source | Target | Ports |
| --- | --- | --- | | --- | --- | --- |
| `10.20.20.0/24` | `10.20.20.0/24` | `5432/tcp`, `27017/tcp` (DB replication) | | `10.20.20.0/24` | `10.20.20.0/24` | `5432/tcp`, `27017/tcp` (DB replication) |
| `10.20.20.0/24` | `10.20.20.0/24` | `2379/tcp`, `2380/tcp` (etcd client/peer) | | `10.20.20.0/24` | `10.20.20.0/24` | `2379/tcp`, `2380/tcp` (etcd client/peer) |
| `10.20.20.0/24` | `10.20.20.0/24` | `8008/tcp` (Patroni REST API) | | `10.20.20.0/24` | `10.20.20.0/24` | `8008/tcp` (Patroni REST API) |
DB → App trafigi (swarm firewall'da izin verilir): DB -> App traffic (allowed in the swarm firewall):
| Kaynak | Hedef | Portlar | | Source | Target | Ports |
| --- | --- | --- | | --- | --- | --- |
| `10.20.20.0/24` | `10.20.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm — manager portlari) | | `10.20.20.0/24` | `10.20.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm — manager ports) |
## Kabul Kriterleri ## Acceptance Criteria
- Public firewall `8200/tcp` acmaz.
- DB portlari public acik degildir.
- Swarm portlari sadece private app/swarm subnet icinde aciktir.
- App/Swarm subnet DB subnet'e sadece gerekli DB portlarindan erisir.
- DB subnet app subnet'e genis yetkiyle acilmaz.
- Admin UI portlari public yerine admin CIDR/VPN/private ops ile sinirlandirilir.
- The public firewall does not open `8200/tcp`.
- DB ports are not open publicly.
- Swarm ports are open only inside the private app/swarm subnet.
- The App/Swarm subnet reaches the DB subnet only through required DB ports.
- The DB subnet is not opened to the app subnet with broad permissions.
- Admin UI ports are restricted through admin CIDR/VPN/private ops instead of public access.

View File

@ -1,29 +1,29 @@
# 02 - Test Terraform IaC # 02 - Test Terraform IaC
Bu asamanin amaci test Hetzner Cloud Project icinde minimum IaaS kaynaklarini Terraform ile olusturmaktir. Bu dokuman tek basina uygulanabilir olacak sekilde yazilmistir. The purpose of this phase is to create the minimum IaaS resources inside the test Hetzner Cloud Project with Terraform. This document is written so it can be applied on its own.
## Kapsam ## Scope
Terraform test ortaminda sunlari olusturur: Terraform creates the following in the test environment:
- Private network: `iklim-test-net` - Private network: `iklim-test-net`
- Subnetler: - Subnets:
- App/Swarm subnet: `10.10.10.0/24` - App/Swarm subnet: `10.10.10.0/24`
- DB subnet: `10.10.20.0/24` - DB subnet: `10.10.20.0/24`
- Firewall: - Firewall:
- Public ingress: sadece `22/tcp`, `80/tcp`, `443/tcp` - Public ingress: only `22/tcp`, `80/tcp`, `443/tcp`
- Private ingress: `01-private-network-port-matrisi.md` dosyasindaki test kurallari - Private ingress: test rules in `01-private-network-port-matrisi.md`
- SSH key - SSH key
- Placement group: `iklim-test-spread` - Placement group: `iklim-test-spread`
- Floating IP: swarm entry point icin sabit IPv4 - Floating IP: stable IPv4 for the swarm entry point
- Server: - Server:
- `iklim-app-01` - `iklim-app-01`
- `iklim-db-01` - `iklim-db-01`
- Ansible inventory output - Ansible inventory output
Terraform DB yazilimini kurmaz. DB node sadece makine, network ve firewall seviyesinde hazirlanir. Terraform does not install DB software. The DB node is prepared only at the machine, network, and firewall level.
## Onerilen Dosya Yapisi ## Recommended File Structure
```text ```text
terraform/ terraform/
@ -42,11 +42,11 @@ terraform/
terraform.tfvars.example terraform.tfvars.example
``` ```
`terraform.tfvars` commit edilmeyecek. `.gitignore` icinde ignore edilmelidir. `terraform.tfvars` will not be committed. It must be ignored in `.gitignore`.
## Degiskenler ## Variables
Minimum degiskenler: Minimum variables:
```hcl ```hcl
hcloud_token = "secret" hcloud_token = "secret"
@ -58,67 +58,64 @@ admin_ssh_public_key_path = "~/.ssh/id_rsa.pub"
admin_allowed_cidrs = ["X.X.X.X/32"] admin_allowed_cidrs = ["X.X.X.X/32"]
``` ```
`environment` sabiti `locals.tf` icindedir; `tfvars` ile override edilmez. The `environment` constant is in `locals.tf`; it is not overridden with `tfvars`.
`location` icin tek lokasyonla baslanir. Farkli region/lokasyon felaket kurtarma bu asamada konu disidir; ileride dokumana eklenmelidir. Start with a single location for `location`. Disaster recovery across different regions/locations is outside the scope of this stage and must be added to the document later.
Server type karari `../hetzner-sizing-report.md` dokumanindaki mevcut test The server type decision is based on the current test environment metrics in `../hetzner-sizing-report.md`. Because 10 microservices and infrastructure services run together on the test app node, `cpx32` was considered risky in terms of RAM. `cpx42` is also recommended for the test DB node because of single-node CPU spike risk.
ortami metriklerine dayanir. Test app node uzerinde 10 mikroservis ve altyapi
servisleri birlikte calistigi icin `cpx32` RAM acisindan riskli bulunmustur.
Test DB node icin de tek node CPU spike riski nedeniyle `cpx42` onerilir.
## Server Rolleri ## Server Roles
| Server | Private IP | Rol | | Server | Private IP | Role |
| --- | --- | --- | | --- | --- | --- |
| `iklim-app-01` | `10.10.10.11` | Swarm manager + app worker + Gitea runner | | `iklim-app-01` | `10.10.10.11` | Swarm manager + app worker + Gitea runner |
| `iklim-db-01` | `10.10.20.11` | Manuel DB kurulumu icin hazir DB node | | `iklim-db-01` | `10.10.20.11` | DB node prepared for manual DB installation |
Private IP'ler Terraform icinde sabit tanimlanmalidir. Ansible inventory ve firewall kurallari deterministik kalir. Private IPs must be statically defined inside Terraform. Ansible inventory and firewall rules remain deterministic.
## Onerilen Kaynaklar ve Maliyet ## Recommended Resources and Cost
| Server | Rol | Server Type | CPU | RAM | SSD | Aylik | | Server | Role | Server Type | CPU | RAM | SSD | Monthly |
| --- | --- | --- | ---: | ---: | ---: | ---: | | --- | --- | --- | ---: | ---: | ---: | ---: |
| `iklim-app-01` | Swarm manager + app worker + Gitea runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 | | `iklim-app-01` | Swarm manager + app worker + Gitea runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
| `iklim-db-01` | PostgreSQL/PostGIS + MongoDB node | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 | | `iklim-db-01` | PostgreSQL/PostGIS + MongoDB node | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
| **Toplam** | 2 server | | **16 vCPU** | **32 GB** | **640 GB** | **$59.98** | | **Total** | 2 servers | | **16 vCPU** | **32 GB** | **640 GB** | **$59.98** |
## Firewall Kurallari ## Firewall Rules
Public ingress: Public ingress:
| Port | Kaynak | Hedef | | Port | Source | Target |
| --- | --- | --- | | --- | --- | --- |
| `22/tcp` | `admin_allowed_cidrs` | Tum test node'lari | | `22/tcp` | `admin_allowed_cidrs` | All test nodes |
| `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-01` | | `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-01` |
| `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-01` | | `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-01` |
Public ingress icin `8200/tcp`, `5432/tcp`, `27017/tcp`, `5672/tcp`, `15672/tcp`, `6379/tcp`, `2379/tcp`, `9000/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` acilmayacak. For public ingress, `8200/tcp`, `5432/tcp`, `27017/tcp`, `5672/tcp`, `15672/tcp`, `6379/tcp`, `2379/tcp`, `9000/tcp`, `9180/tcp`, `9090/tcp`, and `3000/tcp` will not be opened.
### App (swarm) Firewall — Private Ingress ### App (swarm) Firewall — Private Ingress
App subnet kaynakli (iklim-app-01): Source from app subnet (`iklim-app-01`):
| Port | Servis | Erisim yontemi | | Port | Service | Access method |
| --- | --- | --- | | --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | App subnet icinden | | `2377/tcp` | Docker Swarm control plane | From app subnet |
| `7946/tcp,udp` | Docker Swarm node discovery | App subnet icinden | | `7946/tcp,udp` | Docker Swarm node discovery | From app subnet |
| `4789/udp` | Docker Swarm VXLAN overlay | App subnet icinden | | `4789/udp` | Docker Swarm VXLAN overlay | From app subnet |
| `8200/tcp` | Vault | Docker overlay / private network | | `8200/tcp` | Vault | Docker overlay / private network |
| `6379/tcp` | Redis | App subnet icinden | | `6379/tcp` | Redis | From app subnet |
| `5672/tcp` | RabbitMQ AMQP | App subnet icinden | | `5672/tcp` | RabbitMQ AMQP | From app subnet |
| `61613/tcp` | RabbitMQ STOMP | App subnet icinden | | `61613/tcp` | RabbitMQ STOMP | From app subnet |
| `15674/tcp` | RabbitMQ Web STOMP | App subnet icinden | | `15674/tcp` | RabbitMQ Web STOMP | From app subnet |
| `15672/tcp` | RabbitMQ Management | App subnet icinden; dis erisim SWAG `443` uzerinden — IP kisitli | | `15672/tcp` | RabbitMQ Management | From app subnet; external access through SWAG `443` — IP restricted |
| `9000/tcp` | APISIX Dashboard | App subnet icinden; dis erisim SWAG `443` uzerinden — IP kisitli | | `9000/tcp` | APISIX Dashboard | From app subnet; external access through SWAG `443` — IP restricted |
| `9180/tcp` | APISIX Admin API | App subnet icinden (Docker overlay dahil) | | `9180/tcp` | APISIX Admin API | From app subnet, including Docker overlay |
| `9090/tcp` | Prometheus | App subnet icinden; dis erisim SWAG `443` uzerinden — IP kisitli | | `9090/tcp` | Prometheus | From app subnet; external access through SWAG `443` — IP restricted |
| `3000/tcp` | Grafana | App subnet icinden; dis erisim SWAG `443` uzerinden — IP kisitli | | `3000/tcp` | Grafana | From app subnet; external access through SWAG `443` — IP restricted |
DB subnet kaynakli (`iklim-db-01` Swarm'a worker olarak katildigi icin): Source from DB subnet, because `iklim-db-01` joins Swarm as a worker:
| Port | Servis | Kaynak | | Port | Service | Source |
| --- | --- | --- | | --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | `10.10.20.0/24` | | `2377/tcp` | Docker Swarm control plane | `10.10.20.0/24` |
| `7946/tcp,udp` | Docker Swarm node discovery | `10.10.20.0/24` | | `7946/tcp,udp` | Docker Swarm node discovery | `10.10.20.0/24` |
@ -126,30 +123,29 @@ DB subnet kaynakli (`iklim-db-01` Swarm'a worker olarak katildigi icin):
### DB Firewall — Private Ingress ### DB Firewall — Private Ingress
| Port | Servis | Kaynak | | Port | Service | Source |
| --- | --- | --- | | --- | --- | --- |
| `22/tcp` | SSH | `admin_allowed_cidrs` | | `22/tcp` | SSH | `admin_allowed_cidrs` |
| `51820/udp` | WireGuard VPN | `0.0.0.0/0`, `::/0`kriptografik anahtar ile kimlik dogrulama | | `51820/udp` | WireGuard VPN | `0.0.0.0/0`, `::/0`authentication with cryptographic key |
| `5432/tcp` | PostgreSQL | `10.10.10.0/24` (app subnet) | | `5432/tcp` | PostgreSQL | `10.10.10.0/24` (app subnet) |
| `27017/tcp` | MongoDB | `10.10.10.0/24` (app subnet) | | `27017/tcp` | MongoDB | `10.10.10.0/24` (app subnet) |
| `2377/tcp` | Docker Swarm control plane | `10.10.10.0/24` (app subnet) | | `2377/tcp` | Docker Swarm control plane | `10.10.10.0/24` (app subnet) |
| `7946/tcp,udp` | Docker Swarm node discovery | `10.10.10.0/24` (app subnet) | | `7946/tcp,udp` | Docker Swarm node discovery | `10.10.10.0/24` (app subnet) |
| `4789/udp` | Docker Swarm VXLAN overlay | `10.10.10.0/24` (app subnet) | | `4789/udp` | Docker Swarm VXLAN overlay | `10.10.10.0/24` (app subnet) |
IP kisitlamasi Hetzner firewall'da degil, SWAG nginx konfigurasyonunda yapilir. IP restriction is done in the SWAG nginx configuration, not in the Hetzner firewall. None of these ports are opened publicly from the `admin_allowed_cidrs` source.
Bu portlarin hicbiri `admin_allowed_cidrs` kaynagiyla public'ten acilmaz.
Diger private ingress kurallari icin `01-private-network-port-matrisi.md` kaynak alinacak. For other private ingress rules, `01-private-network-port-matrisi.md` will be used as the source.
## Placement Group ## Placement Group
`iklim-test-spread` placement group `type = "spread"` olacak. Testte iki server oldugu icin bu grup `iklim-app-01` ve `iklim-db-01` makinelerinin farkli fiziksel host'lara dagitilmasini hedefler. The `iklim-test-spread` placement group will be `type = "spread"`. Because there are two servers in test, this group aims to distribute the `iklim-app-01` and `iklim-db-01` machines across different physical hosts.
Not: Spread placement group farkli kabinet veya lokasyon garantisi degildir; tek fiziksel host arizasinin etkisini azaltir. Note: A spread placement group is not a guarantee of a different cabinet or location; it reduces the impact of a single physical host failure.
## Terraform Cikti Beklentisi ## Terraform Output Expectations
`outputs.tf` minimum su bilgileri uretmelidir: `outputs.tf` must produce at least the following information:
```hcl ```hcl
output "ansible_inventory_yaml" { output "ansible_inventory_yaml" {
@ -169,53 +165,45 @@ output "test_floating_ip" {
} }
``` ```
Inventory output'u daha sonra `ansible/inventory/generated/test.yml` dosyasina yazilabilir. Inventory dosyasinda secret bulunmayacaksa commit edilebilir; secret veya token icerirse commit edilmeyecek. The inventory output can later be written to `ansible/inventory/generated/test.yml`. If the inventory file contains no secrets, it can be committed; if it contains secrets or tokens, it will not be committed.
## Lifecycle ve Resize Politikasi ## Lifecycle and Resize Policy
### server_type Degisikligi (Yeniden Boyutlandirma) ### `server_type` Change (Resize)
`server_type` degistirmek Terraform destroy+create **tetiklemez**. `hcloud` provider Changing `server_type` does **not** trigger Terraform destroy+create. The `hcloud` provider supports this natively: it stops the server, calls the Hetzner Resize API, and starts it again. Update the value in `terraform.tfvars` and run `terraform apply`.
bunu natively destekler: sunucuyu durdurur, Hetzner Resize API'sini cagirir,
yeniden baslatir. `terraform.tfvars` icinde degeri guncelle, `terraform apply` calistir.
Downtime olur (sunucu durur ve baslar) ancak disk, kurulu yazilim ve Docker volumes There is downtime, because the server stops and starts, but disk, installed software, and Docker volumes are preserved. No `ignore_changes` or manual step is required.
korunur. `ignore_changes` veya manuel adim gerekmez.
### Hangi Degisiklikler Sunucuyu Zorla Yeniden Olusturur? ### Which Changes Force Server Recreation?
| Degisen alan | Davranis | Not | | Changed field | Behavior | Note |
| --- | --- | --- | | --- | --- | --- |
| `server_type` | In-place resize (provider native) | `terraform apply` yeterli | | `server_type` | In-place resize (provider native) | `terraform apply` is enough |
| `hcloud_server_network` | Sadece attachment guncellenir | Ayri resource kullanildigi icin | | `hcloud_server_network` | Only attachment is updated | Because a separate resource is used |
| `hcloud_firewall_attachment` | Sadece attachment guncellenir | Ayri resource kullanildigi icin | | `hcloud_firewall_attachment` | Only attachment is updated | Because a separate resource is used |
| `placement_group_id` | Hetzner API degisime izin vermiyor → destroy+create | Degistirme | | `placement_group_id` | Hetzner API does not allow changing it -> destroy+create | Do not change |
| `image` | Disk imaji degisir → destroy+create | Degistirme | | `image` | Disk image changes -> destroy+create | Do not change |
| `location` | Baska datacenter'a tasinamaz → destroy+create | Degistirme | | `location` | Cannot be moved to another datacenter -> destroy+create | Do not change |
### Network ve Firewall Attachment Ayrimi ### Network and Firewall Attachment Separation
`network` blogu ve `firewall_ids` `hcloud_server` icine gomulmez. Bunun yerine The `network` block and `firewall_ids` are not embedded inside `hcloud_server`. Instead, separate resources are defined:
ayri resource tanimlanir:
- `hcloud_server_network` — private IP atamasi - `hcloud_server_network` — private IP assignment
- `hcloud_firewall_attachment` — firewall iliskisi - `hcloud_firewall_attachment` — firewall relationship
Gomulu tanimlamada bazi provider versiyonlari bu alanlardaki degisiklikleri In embedded definitions, some provider versions interpret changes in these fields as server recreation. When separate resources are used, only the attachment is updated and the server is left untouched.
sunucu recreation olarak yorumlar. Ayri resource kullanildiginda sadece
attachment guncellenir, sunucu dokunulmaz.
### prevent_destroy Korumasi ### `prevent_destroy` Protection
Her sunucuya `lifecycle { prevent_destroy = true }` eklenir. Bu blok varken Each server gets `lifecycle { prevent_destroy = true }`. While this block exists, Terraform cannot delete the server under any condition and fails during the plan phase. To intentionally delete a server, temporarily remove the lifecycle block first.
Terraform hicbir kosulda sunucuyu silemez, plan asamasinda hata verir.
Kasitli silmek icin once lifecycle blogunu gecici olarak kaldir.
## Kabul Kriterleri ## Acceptance Criteria
- `terraform plan` sadece test Hetzner Project token'i ile calisir. - `terraform plan` works only with the test Hetzner Project token.
- `terraform apply` sonrasinda 2 server olusur. - 2 servers are created after `terraform apply`.
- Iki server private network uzerinden birbirine erisebilir. - The two servers can reach each other through the private network.
- Public internetten sadece `22`, `80`, `443` firewall seviyesinde aciktir. - Only `22`, `80`, and `443` are open at firewall level from the public internet.
- Vault `8200` public'ten kapali kalir. - Vault `8200` remains closed from the public internet.
- Terraform state repo'ya commit edilmez. - Terraform state is not committed to the repo.

View File

@ -1,12 +1,12 @@
# 03 - Test Ansible Bootstrap # 03 - Test Ansible Bootstrap
Bu aşamanın amacı Terraform ile oluşturulan test makinelerini Linux, hardening, Docker ve Swarm açısından hazır hale getirmektir. DB yazılımı kurulumu bu aşamanın dışındadır. The purpose of this phase is to prepare the test machines created by Terraform for Linux, hardening, Docker, and Swarm. DB software installation is outside this phase.
## Ansible Kurulumu ## Ansible Installation
Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hedef sunuculara herhangi bir ajan kurulmaz, sadece SSH erişimi yeterlidir. Ansible must be installed on the control machine, meaning your own computer. No agent is installed on target servers; SSH access is enough.
### İşletim Sistemine Göre Kurulum ### Installation by Operating System
- **Ubuntu / Debian:** - **Ubuntu / Debian:**
```bash ```bash
@ -18,7 +18,7 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
pipx install --include-deps ansible pipx install --include-deps ansible
``` ```
> Not: `sudo apt install ansible` komutu bazı Ubuntu/Debian sürümlerinde eski Ansible paketlerini kurabilir. Bu nedenle güncel Ansible kullanımı için `pipx` yöntemi tercih edilmelidir. > Note: The `sudo apt install ansible` command may install old Ansible packages on some Ubuntu/Debian versions. Therefore, the `pipx` method should be preferred for using an up-to-date Ansible version.
- **Fedora / Rocky Linux / RHEL:** - **Fedora / Rocky Linux / RHEL:**
```bash ```bash
@ -35,71 +35,71 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
brew install ansible brew install ansible
``` ```
- **Python Pip ile (Her platformda):** - **With Python Pip, on any platform:**
```bash ```bash
pipx install --include-deps ansible pipx install --include-deps ansible
``` ```
### Ek Python Bağımlılıkları ### Additional Python Dependencies
`password_hash` filtresi için `passlib` kontrol makinesinde gereklidir: `passlib` is required on the control machine for the `password_hash` filter:
```bash ```bash
pipx inject ansible passlib pipx inject ansible passlib
``` ```
> `pip` ile kurduysanız: `pip install passlib` > If you installed with `pip`: `pip install passlib`
### Kurulumun Doğrulanması ### Verify the Installation
Hangi yöntemle kurarsanız kurun, kurulumun başarılı olduğunu doğrulamak için aşağıdaki komutları kullanın: Whichever method you used to install it, use the following commands to verify that the installation succeeded:
```bash ```bash
# Ansible versiyonunu ve yapılandırma yollarını kontrol edin # Check the Ansible version and configuration paths
ansible --version ansible --version
# Ansible binarysinin hangi konumdan çalıştığını kontrol edin # Check which location the Ansible binary is running from
which -a ansible which -a ansible
``` ```
## Ansible Komutlarını Çalıştırma ## Running Ansible Commands
Tüm komutlar `ansible/test/` dizininden çalıştırılmalıdır. `ansible.cfg` inventory ve roles_path'i otomatik olarak tanımlar. All commands must be run from the `ansible/test/` directory. `ansible.cfg` automatically defines the inventory and `roles_path`.
### 0. Gerekli Collection'ları Kur (İlk kurulumda bir kez) ### 0. Install Required Collections Once During Initial Setup
```bash ```bash
ansible-galaxy collection install -r ../requirements.yml ansible-galaxy collection install -r ../requirements.yml
``` ```
### 1. Bağlantı Testi (Ping) ### 1. Connection Test (Ping)
```bash ```bash
ansible all -m ping ansible all -m ping
``` ```
### 2. Bootstrap Playbook'unu Çalıştırma ### 2. Run the Bootstrap Playbook
```bash ```bash
ansible-playbook test-bootstrap.yml --ask-vault-pass ansible-playbook test-bootstrap.yml --ask-vault-pass
``` ```
*Not: `--ask-vault-pass` parametresi Ansible Vault parolasını sorar; StorageBox şifresi bu şekilde çözülür.* *Note: The `--ask-vault-pass` parameter asks for the Ansible Vault password; the StorageBox password is decrypted this way.*
### 3. Sadece Belirli Bir Rolü Çalıştırma (Tags) ### 3. Run Only a Specific Role (Tags)
```bash ```bash
ansible-playbook test-bootstrap.yml --tags "hardening" --ask-vault-pass ansible-playbook test-bootstrap.yml --tags "hardening" --ask-vault-pass
``` ```
## Hedef Makineler ## Target Machines
| Host | Rol | | Host | Role |
| --- | --- | | --- | --- |
| `iklim-app-01` | Swarm manager + app worker | | `iklim-app-01` | Swarm manager + app worker |
| `iklim-db-01` | Manuel DB kurulumu için OS-hardening uygulanmış DB node | | `iklim-db-01` | OS-hardened DB node for manual DB installation |
## Önerilen Dosya Yapısı ## Recommended File Structure
```text ```text
ansible/ ansible/
@ -114,13 +114,13 @@ ansible/
vault.yml vault.yml
host_vars/ host_vars/
iklim-app-01/ iklim-app-01/
vars.yml # floating IP gibi host'a ozel degiskenler vars.yml # Host-specific variables such as floating IP
vault.yml vault.yml
iklim-db-01/ iklim-db-01/
vault.yml vault.yml
test-bootstrap.yml test-bootstrap.yml
test-app-post-stack.yml # act_runner kurulumu test-app-post-stack.yml # act_runner installation
test-db-post-stack.yml # db_stack + wireguard kurulumu test-db-post-stack.yml # db_stack + wireguard installation
roles/ roles/
base/ base/
hardening/ hardening/
@ -129,18 +129,18 @@ ansible/
node_dirs/ node_dirs/
storagebox/ storagebox/
storagebox_ssh_key/ storagebox_ssh_key/
db_stack/ # DB dizin ve konfigürasyon hazırlığı db_stack/ # DB directory and configuration preparation
wireguard/ # WireGuard VPN servisi (DB node) wireguard/ # WireGuard VPN service (DB node)
act_runner/ # Gitea act_runner kurulumu (app node) act_runner/ # Gitea act_runner installation (app node)
``` ```
## Base Role ## Base Role
Tüm test node'larına uygulanır: Applied to all test nodes:
- `dnf update` - `dnf update`
- `epel-release`ayrı task olarak önce kurulur; `fail2ban`, `davfs2`, `htop`, `btop` bu repoya bağımlı - `epel-release`installed first as a separate task; `fail2ban`, `davfs2`, `htop`, and `btop` depend on this repo
- temel paketler (`epel-release` aktif olduktan sonra): - base packages, after `epel-release` is active:
- `curl` - `curl`
- `wget` - `wget`
- `git` - `git`
@ -148,57 +148,57 @@ Tüm test node'larına uygulanır:
- `tar` - `tar`
- `unzip` - `unzip`
- `bash-completion` - `bash-completion`
- `gettext`envsubst için; CI/CD deploy pipeline'larında gerekli - `gettext`required for envsubst in CI/CD deploy pipelines
- `tree` - `tree`
- `ca-certificates` - `ca-certificates`
- `fail2ban` - `fail2ban`
- `chrony` - `chrony`
- `python3` - `python3`
- `python3-pip` - `python3-pip`
- `python3-passlib``password_hash` filtresi için (EPEL) - `python3-passlib`for the `password_hash` filter (EPEL)
- `htop` — interaktif proses izleme (EPEL) - `htop` — interactive process monitoring (EPEL)
- `btop`kaynak monitörü, grafik arayüz (EPEL) - `btop`resource monitor with graphical interface (EPEL)
- timezone: `Europe/Istanbul` - timezone: `Europe/Istanbul`
- hostname ayarı - hostname setup
- klavye düzeni: `trq` (Türkçe Q) - keyboard layout: `trq` (Turkish Q)
- sistem reboot gerekiyorsa kontrollü reboot - controlled reboot if the system requires a reboot
- **Hetzner Floating IP systemd servisi** (`hetzner-floating-ip`): `host_vars` içinde `hetzner_floating_ip` tanımlıysa, IP adresi `eth0`'a eklenir ve reboot'ta otomatik geri yüklenir (`ip addr replace`) - **Hetzner Floating IP systemd service** (`hetzner-floating-ip`): if `hetzner_floating_ip` is defined in `host_vars`, the IP address is added to `eth0` and automatically restored on reboot (`ip addr replace`)
## Security Hardening Role ## Security Hardening Role
Tüm test node'larına uygulanır: Applied to all test nodes:
- SSH password login kapatılır. - SSH password login is disabled.
- Root SSH login kapatılır. - Root SSH login is disabled.
- Sadece SSH key ile login kalır. - Only SSH key login remains.
- `PermitEmptyPasswords no` - `PermitEmptyPasswords no`
- `MaxAuthTries 3` - `MaxAuthTries 3`
- `fail2ban` SSH jail aktif edilir. - The `fail2ban` SSH jail is enabled.
- `dnf-automatic` ile otomatik güvenlik güncelleştirmeleri aktif edilir. - Automatic security updates are enabled with `dnf-automatic`.
- `iklim` sistem kullanıcısı oluşturulur; `wheel` grubuna eklenir (şifre vault'tan alınır). - The `iklim` system user is created and added to the `wheel` group; the password is read from vault.
- `firewalld` default: - `firewalld` default:
- incoming: deny (drop zone) - incoming: deny (drop zone)
- outgoing: allow - outgoing: allow
- SSH kuralı önce `drop` zone'a rich rule olarak yazılır, ardından default zone `drop` yapılır — kilitleme riski ortadan kalkar. - The SSH rule is first written as a rich rule to the `drop` zone, then the default zone is set to `drop`; this removes the lockout risk.
- Public SSH sadece admin CIDR'dan açılır. - Public SSH is opened only from the admin CIDR.
### SELinux Kararı ### SELinux Decision
Rocky Linux 10 SELinux enforcing modda gelir. Karar: **disabled**. Rocky Linux 10 comes in SELinux enforcing mode. Decision: **disabled**.
Gerekçe: Rationale:
- Hetzner Cloud firewall (dış perimeter) + firewalld (host) iki katman ağ güvenliğini sağlar. - Hetzner Cloud firewall (external perimeter) + firewalld (host) provide two layers of network security.
- Docker + davfs2 + firewalld kombinasyonu SELinux enforcing modda ek policy ve volume label yönetimi gerektirir. - The Docker + davfs2 + firewalld combination requires additional policy and volume label management in SELinux enforcing mode.
- Utils VPS'te de disabled yapılmış; tutarlılık sağlanır. - It was also disabled on the Utils VPS, so consistency is preserved.
```bash ```bash
# /etc/selinux/config içinde: # Inside /etc/selinux/config:
SELINUX=disabled SELINUX=disabled
# Değişiklik reboot sonrası aktif olur # The change becomes active after reboot
reboot reboot
``` ```
Ansible'da: In Ansible:
```yaml ```yaml
- name: Disable SELinux - name: Disable SELinux
@ -211,9 +211,9 @@ Ansible'da:
when: selinux_change.changed when: selinux_change.changed
``` ```
### fail2ban Konfigürasyonu ### fail2ban Configuration
`/etc/fail2ban/jail.local` içeriği: Content of `/etc/fail2ban/jail.local`:
```ini ```ini
[DEFAULT] [DEFAULT]
@ -228,50 +228,49 @@ backend = systemd
enabled = true enabled = true
``` ```
- `bantime`: 6 saat ban - `bantime`: 6-hour ban
- `findtime`: 5 dakika içinde - `findtime`: within 5 minutes
- `maxretry`: 5 başarısız giriş → ban - `maxretry`: 5 failed logins -> ban
- `ignoreip`: admin CIDR'ları ban'dan muaf tutar - `ignoreip`: keeps admin CIDRs exempt from bans
Ansible'da `admin_allowed_cidrs` listesi space-separated stringe dönüştürülüp template'e basılır. In Ansible, the `admin_allowed_cidrs` list is converted to a space-separated string and written to the template.
Not: Docker iptables kuralları firewalld ile etkileşebilir. Hetzner Cloud firewall asıl dış perimeter kabul edilir; firewalld host içinde ikinci katman olarak kullanılır. Note: Docker iptables rules may interact with firewalld. The Hetzner Cloud firewall is considered the actual external perimeter; firewalld is used as a second layer inside the host.
## Docker Role ## Docker Role
Her iki node (`iklim-app-01` ve `iklim-db-01`) üzerinde de zorunludur. DB node'u Swarm Worker olarak ağa dahil olacağı için Docker Engine her iki makinede de kurulu olmalıdır. Required on both nodes (`iklim-app-01` and `iklim-db-01`). Because the DB node will join the network as a Swarm Worker, Docker Engine must be installed on both machines.
Docker kurulumu resmi Docker dnf repository üzerinden yapılır: Docker is installed through the official Docker dnf repository:
- Docker GPG key + dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`) - Docker GPG key + dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`)
- paketler: - packages:
- `docker-ce` - `docker-ce`
- `docker-ce-cli` - `docker-ce-cli`
- `containerd.io` - `containerd.io`
- `docker-buildx-plugin` - `docker-buildx-plugin`
- `docker-compose-plugin` - `docker-compose-plugin`
- Docker servisi enabled + started - Docker service enabled + started
Docker convenience script kullanılmayacak. Production benzeri test ortamı için paket repository yolu tercih edilir. The Docker convenience script will not be used. The package repository path is preferred for a production-like test environment.
## Swarm Role ## Swarm Role
- `iklim-app-01` üzerinde Swarm Manager olarak init edilir. - Initialized as Swarm Manager on `iklim-app-01`.
- `iklim-db-01` üzerinde Swarm Worker olarak join edilir (Overlay network erişimi için). - Joined as Swarm Worker on `iklim-db-01`, for overlay network access.
- advertise addr: `10.10.10.11` (manager için) - advertise addr: `10.10.10.11`, for the manager
- overlay network: - overlay network:
- `iklimco-net` - `iklimco-net`
- driver: `overlay` - driver: `overlay`
- attachable: `true` - attachable: `true`
- Node etiketleri: - Node labels:
- `iklim-app-01`: `type=service` — tüm infra ve uygulama servisleri bu node'a deploy olur - `iklim-app-01`: `type=service` — all infra and application services are deployed to this node
- `iklim-db-01`: `role=db` — PostgreSQL ve MongoDB servisleri bu node'a deploy olur - `iklim-db-01`: `role=db` — PostgreSQL and MongoDB services are deployed to this node
- `iklim-app-01` üzerinde hem manager hem worker (Active) görevini sürdürür. - On `iklim-app-01`, it remains both manager and worker (Active).
## Node Directory Role ## Node Directory Role
`iklim-app-01` üzerinde deploy ön koşulları: Deploy prerequisites on `iklim-app-01`:
```text ```text
/opt/iklimco /opt/iklimco
@ -282,7 +281,7 @@ Docker convenience script kullanılmayacak. Production benzeri test ortamı içi
/opt/iklimco/stacks /opt/iklimco/stacks
``` ```
DB node üzerinde manuel DB kurulumu için minimum: Minimum for manual DB installation on the DB node:
```text ```text
/opt/iklimco /opt/iklimco
@ -292,24 +291,24 @@ DB node üzerinde manuel DB kurulumu için minimum:
## StorageBox DAVFS Mount Role ## StorageBox DAVFS Mount Role
Her iki node'a uygulanır (`iklim-app-01` ve `iklim-db-01`). Applied to both nodes (`iklim-app-01` and `iklim-db-01`).
### Amaç ### Purpose
Hetzner StorageBox'u WebDAV (DAVFS) protokolü üzerinden `/mnt/storagebox` olarak mount eder. Docker volume'ları bu dizine bağlanarak veri kalıcılığını ve yedeklemeyi sağlar. Mounts Hetzner StorageBox as `/mnt/storagebox` through the WebDAV (DAVFS) protocol. Docker volumes are connected to this directory to provide data persistence and backups.
### Test Ortamı Sub-Account ### Test Environment Sub-Account
| Parametre | Değişken | Değer | | Parameter | Variable | Value |
| --- | --- | --- | | --- | --- | --- |
| Ana hesap | `storagebox_account` | `u469968` | | Main account | `storagebox_account` | `u469968` |
| Sub-account | `storagebox_user` | `u469968-sub4` | | Sub-account | `storagebox_user` | `u469968-sub4` |
| WebDAV URL | `storagebox_url` | `https://u469968-sub4.your-storagebox.de/` | | WebDAV URL | `storagebox_url` | `https://u469968-sub4.your-storagebox.de/` |
| Mount point | `storagebox_mount_point` | `/mnt/storagebox` | | Mount point | `storagebox_mount_point` | `/mnt/storagebox` |
### Role Değişkenleri ### Role Variables
Tüm değişkenler `group_vars/all/vars.yml` içinde tanımlanır: All variables are defined in `group_vars/all/vars.yml`:
```yaml ```yaml
storagebox_account: "u469968" storagebox_account: "u469968"
@ -322,24 +321,24 @@ storagebox_managed_directories:
mode: "0755" mode: "0755"
``` ```
Prod ortamında suffix `sub4``sub5` olarak değişir. In prod, the suffix changes from `sub4` to `sub5`.
Şifreler Ansible Vault ile şifreli `group_vars/all/vault.yml` içinde tutulur: Passwords are stored encrypted with Ansible Vault inside `group_vars/all/vault.yml`:
```bash ```bash
ansible-vault edit group_vars/all/vault.yml ansible-vault edit group_vars/all/vault.yml
``` ```
`vault.yml` içeriği: Content of `vault.yml`:
```yaml ```yaml
vault_storagebox_password: "SUB_ACCOUNT_PAROLASI" vault_storagebox_password: "SUB_ACCOUNT_PASSWORD"
vault_iklim_password: "IKLIM_KULLANICI_PAROLASI" vault_iklim_password: "IKLIM_USER_PASSWORD"
``` ```
### Adımlar ### Steps
1. **davfs2 kurulumu** 1. **Install davfs2**
```yaml ```yaml
- name: Install davfs2 - name: Install davfs2
@ -348,7 +347,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
state: present state: present
``` ```
2. **Kimlik bilgileri dosyası** (`/etc/davfs2/secrets`) 2. **Credentials file** (`/etc/davfs2/secrets`)
```yaml ```yaml
- name: Configure davfs2 secrets - name: Configure davfs2 secrets
@ -361,7 +360,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
group: root group: root
``` ```
3. **Mount point oluştur** 3. **Create mount point**
```yaml ```yaml
- name: Create mount point - name: Create mount point
@ -371,7 +370,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
mode: "0755" mode: "0755"
``` ```
4. **fstab kaydı** 4. **fstab entry**
```yaml ```yaml
- name: Add fstab entry - name: Add fstab entry
@ -383,7 +382,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
state: present state: present
``` ```
5. **Mount et** 5. **Mount**
```yaml ```yaml
- name: Mount StorageBox - name: Mount StorageBox
@ -392,7 +391,7 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
creates: "{{ storagebox_mount_point }}/.mounted_marker" creates: "{{ storagebox_mount_point }}/.mounted_marker"
``` ```
Mount başarısı için dizine bir marker dosyası yazılabilir: A marker file can be written to the directory to confirm mount success:
```yaml ```yaml
- name: Write mount marker - name: Write mount marker
@ -401,12 +400,9 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
dest: "{{ storagebox_mount_point }}/.mounted_marker" dest: "{{ storagebox_mount_point }}/.mounted_marker"
``` ```
6. **Servis bind mount dizinlerini oluştur** 6. **Create service bind mount directories**
Test ortamında precipitation servisinin `image-data` volume'u host üzerinde In the test environment, the precipitation service's `image-data` volume is bind mounted on the host to `/mnt/storagebox/precipitation/images`. The directory is created by Ansible after StorageBox is mounted and left with `0755` permissions.
`/mnt/storagebox/precipitation/images` dizinine bind mount edilir. Dizin
StorageBox mount edildikten sonra Ansible tarafından oluşturulur ve `0755`
izinle bırakılır.
```yaml ```yaml
- name: Create managed StorageBox directories - name: Create managed StorageBox directories
@ -419,27 +415,25 @@ vault_iklim_password: "IKLIM_KULLANICI_PAROLASI"
loop: "{{ storagebox_managed_directories | default([]) }}" loop: "{{ storagebox_managed_directories | default([]) }}"
``` ```
### Notlar ### Notes
- `davfs2` paketi EPEL repository'sinde bulunur; base role `epel-release`'i zaten kurar. - The `davfs2` package is in the EPEL repository; the base role already installs `epel-release`.
- StorageBox şifreleri asla plaintext olarak repository'e eklenmez; Ansible Vault zorunludur. - StorageBox passwords are never added to the repository as plaintext; Ansible Vault is mandatory.
- Mount noktası reboot'ta `_netdev` flag'ı sayesinde network hazır olduktan sonra otomatik mount edilir. - The mount point is automatically mounted after the network is ready on reboot, thanks to the `_netdev` flag.
- Docker Swarm servisleri StorageBox altındaki servis dizinlerini bind mount olarak kullanır. - Docker Swarm services use service directories under StorageBox as bind mounts.
- Precipitation servisinin test ortamı image dizini `/mnt/storagebox/precipitation/images` olmalıdır; bu path `BE-Precipitation/docker-stack-service.yml` içindeki `device` değeriyle birebir eşleşmelidir. - The precipitation service's test environment image directory must be `/mnt/storagebox/precipitation/images`; this path must exactly match the `device` value in `BE-Precipitation/docker-stack-service.yml`.
## StorageBox SSH Key Role ## StorageBox SSH Key Role
Her iki node'a uygulanır (`iklim-app-01` ve `iklim-db-01`). Applied to both nodes (`iklim-app-01` and `iklim-db-01`).
### Amaç ### Purpose
Sunucu üzerinde ed25519 SSH anahtar çifti üretilir ve StorageBox ana hesabına yüklenir. An ed25519 SSH key pair is generated on the server and uploaded to the StorageBox main account. This allows CI/CD pipelines to use the `STORAGEBOX_SSH_PRIV` Gitea secret for passwordless access.
Bu sayede CI/CD pipeline'ları `STORAGEBOX_SSH_PRIV` Gitea secret'ini kullanarak
şifresiz erişim sağlayabilir.
### Adımlar ### Steps
1. **SSH Key üretimi** 1. **SSH key generation**
```yaml ```yaml
- name: Generate SSH key for StorageBox - name: Generate SSH key for StorageBox
@ -451,39 +445,39 @@ Bu sayede CI/CD pipeline'ları `STORAGEBOX_SSH_PRIV` Gitea secret'ini kullanarak
ssh_key_comment: "{{ inventory_hostname }}-storagebox" ssh_key_comment: "{{ inventory_hostname }}-storagebox"
``` ```
2. **Public key'i StorageBox'a yükle** 2. **Upload the public key to StorageBox**
Bu adım manuel yapılır (ilk kez şifre gerektirir): This step is done manually and requires the password the first time:
```bash ```bash
cat /root/.ssh/id_ed25519_storagebox.pub | ssh -p23 u469968-sub4@u469968-sub4.your-storagebox.de install-ssh-key cat /root/.ssh/id_ed25519_storagebox.pub | ssh -p23 u469968-sub4@u469968-sub4.your-storagebox.de install-ssh-key
``` ```
Sonraki erişimler şifresiz çalışır: Later access works passwordlessly:
```bash ```bash
sftp -P23 u469968-sub4@u469968-sub4.your-storagebox.de sftp -P23 u469968-sub4@u469968-sub4.your-storagebox.de
``` ```
3. **Private ve public key'leri Gitea'ya ekle** 3. **Add private and public keys to Gitea**
Gitea → Organization Settings → Actions → Secrets: Gitea -> Organization Settings -> Actions -> Secrets:
| Secret Adı | Değer | | Secret Name | Value |
| --- | --- | | --- | --- |
| `STORAGEBOX_SSH_PRIV` | `/root/.ssh/id_ed25519_storagebox` içeriği | | `STORAGEBOX_SSH_PRIV` | Contents of `/root/.ssh/id_ed25519_storagebox` |
| `STORAGEBOX_SSH_PUB` | `/root/.ssh/id_ed25519_storagebox.pub` içeriği | | `STORAGEBOX_SSH_PUB` | Contents of `/root/.ssh/id_ed25519_storagebox.pub` |
Key içeriğini almak için: To get the key contents:
```bash ```bash
cat /root/.ssh/id_ed25519_storagebox cat /root/.ssh/id_ed25519_storagebox
cat /root/.ssh/id_ed25519_storagebox.pub cat /root/.ssh/id_ed25519_storagebox.pub
``` ```
### Notlar ### Notes
- Her sunucu için ayrı key üretilir; tüm public key'ler StorageBox ana hesabına yüklenir. - A separate key is generated for each server; all public keys are uploaded to the StorageBox main account.
- Private key asla repo'ya commit edilmez; yalnızca Gitea secret olarak saklanır. - The private key is never committed to the repo; it is stored only as a Gitea secret.
## Kabul Kriterleri ## Acceptance Criteria

View File

@ -1,90 +1,90 @@
# 04 - Test DB Docker Kurulumu (Swarm Worker) # 04 - Test DB Docker Installation (Swarm Worker)
Bu aşamanın amacı `iklim-db-01` node'unu Swarm'a worker olarak eklemek ve PostgreSQL ile MongoDB'yi Swarm servisi olarak çalıştırmaktır. The purpose of this phase is to add the `iklim-db-01` node to Swarm as a worker and run PostgreSQL and MongoDB as Swarm services.
## Mimari Karar ## Architecture Decision
Yol haritasında DB'lerin "manuel" kurulacağı belirtilmiştir. Test ortamında bu "manuel" süreç, DB'lerin işletim sistemine doğrudan kurulması yerine, **Swarm Worker** üzerinde Docker konteynerleri olarak ayağa kaldırılması şeklinde uygulanacaktır. The roadmap states that DBs will be installed "manually". In the test environment, this "manual" process will be implemented by starting the DBs as Docker containers on the **Swarm Worker**, instead of installing them directly on the operating system.
Kurulum **iki aşamalıdır:** The installation has **two phases:**
1. **Hazırlık (Ansible):** `test-db-post-stack.yml` playbook'u DB dizinlerini, `mongod.conf` konfigürasyonunu ve WireGuard VPN servisini kurar. 1. **Preparation (Ansible):** The `test-db-post-stack.yml` playbook sets up DB directories, the `mongod.conf` configuration, and the WireGuard VPN service.
2. **Deploy (Gitea CI/CD):** `deploy-test.yml` workflow'u `docker-stack-infra.yml` üzerinden PostgreSQL ve MongoDB servislerini Swarm'a deploy eder. 2. **Deploy (Gitea CI/CD):** The `deploy-test.yml` workflow deploys PostgreSQL and MongoDB services to Swarm through `docker-stack-infra.yml`.
**Neden?** **Why?**
1. **Yönetim Kolaylığı:** Docker ile versiyon geçişleri ve konfigürasyon yönetimi çok daha hızlıdır. 1. **Ease of management:** Version transitions and configuration management are much faster with Docker.
2. **Overlay Network:** Uygulama servisleri (`iklim-app-01`), DB'lere `iklimco-net` overlay network üzerinden şifreli ve izole bir şekilde erişebilir. 2. **Overlay Network:** Application services (`iklim-app-01`) can access DBs through the `iklimco-net` overlay network in an encrypted and isolated way.
3. **Veri Kalıcılığı:** Veriler `iklim-db-01` üzerindeki Docker named volume'larında saklanır. StorageBox yalnızca backup için kullanılır. 3. **Data persistence:** Data is stored in Docker named volumes on `iklim-db-01`. StorageBox is used only for backups.
## Ön Koşullar ## Prerequisites
- `03-test-ansible-bootstrap.md` her iki node'da tamamlanmış olmalı. - `03-test-ansible-bootstrap.md` must be completed on both nodes.
- Docker `iklim-db-01` üzerinde kurulu olmalı (Bootstrap role bunu yapar). - Docker must be installed on `iklim-db-01`; the Bootstrap role does this.
- Ansible vault'unda `vault_postgres_root_user`, `vault_postgres_password`, `vault_mongo_root_user`, `vault_mongo_root_password` tanımlı olmalı. - `vault_postgres_root_user`, `vault_postgres_password`, `vault_mongo_root_user`, and `vault_mongo_root_password` must be defined in the Ansible vault.
## 1. Firewall Güncellemesi ## 1. Firewall Update
`iklim-db-01`'in Swarm'a katılabilmesi ve uygulama trafiğini kabul etmesi için `terraform/hetzner/test/firewall.tf` dosyasına kurallar eklenmelidir. Rules must be added to `terraform/hetzner/test/firewall.tf` so `iklim-db-01` can join Swarm and accept application traffic.
### Swarm İletişimi (App Subnet <-> DB Subnet) ### Swarm Communication (App Subnet <-> DB Subnet)
Swarm yönetimi için `2377/tcp`, `7946/tcp/udp` ve `4789/udp` portları her iki subnet arasında karşılıklıık olmalıdır. For Swarm management, ports `2377/tcp`, `7946/tcp/udp`, and `4789/udp` must be open mutually between both subnets.
### DB Erişimi (App Subnet -> DB Subnet) ### DB Access (App Subnet -> DB Subnet)
- **PostgreSQL:** `5432/tcp` - **PostgreSQL:** `5432/tcp`
- **MongoDB:** `27017/tcp` - **MongoDB:** `27017/tcp`
Güncellemeyi yaptıktan sonra: After making the update:
```bash ```bash
cd terraform/hetzner/test cd terraform/hetzner/test
terraform apply terraform apply
``` ```
## 2. Vault Güncellemesi ## 2. Vault Update
```bash ```bash
cd ansible/test cd ansible/test
ansible-vault edit group_vars/all/vault.yml ansible-vault edit group_vars/all/vault.yml
``` ```
Şu değişkenleri ekle: Add these variables:
```yaml ```yaml
vault_postgres_root_user: "postgres" vault_postgres_root_user: "postgres"
vault_postgres_password: "GÜÇLÜ_ŞİFRE" vault_postgres_password: "STRONG_PASSWORD"
vault_mongo_root_user: "mongoadmin" vault_mongo_root_user: "mongoadmin"
vault_mongo_root_password: "GÜÇLÜ_ŞİFRE" vault_mongo_root_password: "STRONG_PASSWORD"
``` ```
## 3. Ansible ile Kurulum ## 3. Installation with Ansible
```bash ```bash
cd ansible/test cd ansible/test
ansible-playbook -i inventory/generated/test.yml test-db-post-stack.yml --ask-vault-pass ansible-playbook -i inventory/generated/test.yml test-db-post-stack.yml --ask-vault-pass
``` ```
**Playbook ne yapar?** **What does the playbook do?**
`iklim-db-01` üzerinde (`db_stack` ve `wireguard` rolleri): On `iklim-db-01`, through the `db_stack` and `wireguard` roles:
- `/opt/iklimco/db/mongodb/config/` dizinini oluşturur - Creates the `/opt/iklimco/db/mongodb/config/` directory
- `mongod.conf` dosyasını yerleştirir - Places the `mongod.conf` file
- WireGuard VPN sunucusunu kurar ve yapılandırır (`51820/udp`) - Installs and configures the WireGuard VPN server (`51820/udp`)
> DB servislerinin (PostgreSQL, MongoDB) Swarm'a deploy edilmesi Ansible'ın değil, Gitea CI/CD workflow'unun (`deploy-test.yml`) sorumluluğundadır. Bu workflow `docker-stack-infra.yml` aracılığıyla tüm servisleri tek seferde deploy eder. > Deploying DB services (PostgreSQL, MongoDB) to Swarm is the responsibility of the Gitea CI/CD workflow (`deploy-test.yml`), not Ansible. This workflow deploys all services at once through `docker-stack-infra.yml`.
## 4. Volume ve Veri Yapısı ## 4. Volume and Data Structure
DB verileri `iklim-db-01` üzerindeki Docker named volume'larında tutulur: DB data is stored in Docker named volumes on `iklim-db-01`:
| Volume | İçerik | | Volume | Content |
|---|---| |---|---|
| `iklim-db_postgresql_data` | PostgreSQL veri dosyaları | | `iklim-db_postgresql_data` | PostgreSQL data files |
| `iklim-db_mongodb_data` | MongoDB veri dosyaları | | `iklim-db_mongodb_data` | MongoDB data files |
MongoDB log'ları stdout'a yazılır (`docker logs` ile izlenir). Konfigürasyon: `/opt/iklimco/db/mongodb/config/mongod.conf` MongoDB logs are written to stdout and can be watched with `docker logs`. Configuration: `/opt/iklimco/db/mongodb/config/mongod.conf`
> StorageBox DB verisi için **kullanılmaz**. Yalnızca backup stratejisinde görev alır. > StorageBox is **not used** for DB data. It only has a role in the backup strategy.
## 5. Kabul Kriterleri ## 5. Acceptance Criteria
- `docker node ls` komutunda `iklim-db-01` Ready ve Active görünür. - `iklim-db-01` appears as Ready and Active in the `docker node ls` command.
- `docker stack services iklim-db` her iki servisi 1/1 replica ile gösterir. - `docker stack services iklim-db` shows both services with 1/1 replicas.
- Uygulama node'undan `iklim-db_postgresql` ve `iklim-db_mongodb` DNS isimleriyle erişim sağlanır. - Access from the application node is available through the `iklim-db_postgresql` and `iklim-db_mongodb` DNS names.
- Reboot sonrası veriler named volume'lardan korunur (`docker volume ls` ile kontrol). - Data is preserved from named volumes after reboot; verify with `docker volume ls`.

View File

@ -1,31 +1,31 @@
# 05 - Test Runner ve Deploy Ön Koşulları # 05 - Test Runner and Deploy Prerequisites
Bu aşamanın amacı test ortamında Gitea Actions runner'ı (`act_runner`) systemd servisi olarak kurmak ve CI/CD pipeline'larının çalışabileceği ortamı hazırlamaktır. The purpose of this phase is to install the Gitea Actions runner (`act_runner`) as a systemd service in the test environment and prepare the environment where CI/CD pipelines can run.
## Runner Yerleşimi ## Runner Placement
Test ortamında maliyet ve basitlik için tek runner kullanılır: A single runner is used in the test environment for cost and simplicity:
| Host | Servis Adı | Sistem Kullanıcısı | Etiketler | | Host | Service Name | System User | Labels |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `iklim-app-01` | `gitea-act-runner` | `gitea-runner` | `ubuntu-latest`, `ubuntu-22.04`, `ubuntu-20.04`, `test-runner` | | `iklim-app-01` | `gitea-act-runner` | `gitea-runner` | `ubuntu-latest`, `ubuntu-22.04`, `ubuntu-20.04`, `test-runner` |
## 1. Runner Kullanıcısı ve Yetkiler ## 1. Runner User and Permissions
Runner, host üzerinde Docker komutlarını çalıştırabilmelidir. The runner must be able to run Docker commands on the host.
```bash ```bash
# Kullanıcıyı oluştur # Create the user
sudo useradd -m -s /bin/bash gitea-runner sudo useradd -m -s /bin/bash gitea-runner
# Docker grubuna ekle # Add to the Docker group
sudo usermod -aG docker gitea-runner sudo usermod -aG docker gitea-runner
``` ```
## 2. act_runner Kurulumu ## 2. act_runner Installation
### Kurulum ### Installation
Kurulum ve kayıt Ansible ile otomatik yapılır (`test-app-post-stack.yml`). Manuel kurulum gerekirse: Installation and registration are done automatically with Ansible (`test-app-post-stack.yml`). If manual installation is required:
```bash ```bash
wget -O act_runner https://dl.gitea.com/act_runner/0.2.12/act_runner-0.2.12-linux-amd64 wget -O act_runner https://dl.gitea.com/act_runner/0.2.12/act_runner-0.2.12-linux-amd64
@ -33,9 +33,9 @@ sudo mv act_runner /usr/local/bin/
sudo chmod +x /usr/local/bin/act_runner sudo chmod +x /usr/local/bin/act_runner
``` ```
### Kayıt (Registration) ### Registration
Gitea arayüzünden (Organization → Settings → Actions → Runners) **Registration Token** alın, vault'a ekleyin: Get the **Registration Token** from the Gitea UI (Organization -> Settings -> Actions -> Runners) and add it to the vault:
```yaml ```yaml
# group_vars/all/vault.yml # group_vars/all/vault.yml
@ -47,11 +47,11 @@ cd Environment_Infrastructure/ansible/test
ansible-playbook test-app-post-stack.yml --vault-password-file=.vault_pass ansible-playbook test-app-post-stack.yml --vault-password-file=.vault_pass
``` ```
## 3. Systemd Servisi ve Konfigürasyon ## 3. Systemd Service and Configuration
Ansible tarafından yönetilir. Servis dosyası `/etc/systemd/system/gitea-act-runner.service`, konfigürasyon `/etc/gitea-act-runner/config.yaml` konumundadır. Managed by Ansible. The service file is located at `/etc/systemd/system/gitea-act-runner.service`, and the configuration is located at `/etc/gitea-act-runner/config.yaml`.
Konfigürasyonun kritik bölümleri: Critical parts of the configuration:
```yaml ```yaml
runner: runner:
@ -62,58 +62,58 @@ runner:
- "test-runner:docker://ubuntu:22.04" - "test-runner:docker://ubuntu:22.04"
container: container:
network: "iklimco-net" # DB servislerine overlay üzerinden erişim network: "iklimco-net" # Access to DB services through overlay
options: "-v /var/run/docker.sock:/var/run/docker.sock" # Docker komutları için options: "-v /var/run/docker.sock:/var/run/docker.sock" # For Docker commands
``` ```
Durum kontrolü: Status check:
```bash ```bash
sudo systemctl status gitea-act-runner sudo systemctl status gitea-act-runner
sudo journalctl -u gitea-act-runner -f sudo journalctl -u gitea-act-runner -f
``` ```
## 4. Deploy Ön Koşulları ## 4. Deploy Prerequisites
Pipeline'ın `iklim-app-01` üzerinde başarılı deploy yapabilmesi için şu araçların kurulu olması şarttır: The following tools must be installed for the pipeline to deploy successfully on `iklim-app-01`:
- `docker-ce` ve `docker-compose-plugin` - `docker-ce` and `docker-compose-plugin`
- `gettext` (`envsubst` komutu için) - `gettext` for the `envsubst` command
- `jq` - `jq`
- `git` - `git`
## 5. Gitea Organization Secrets ## 5. Gitea Organization Secrets
Pipeline'ların çalışması için Gitea Organization seviyesinde şu secret'lar tanımlanmalıdır: The following secrets must be defined at Gitea Organization level for pipelines to run:
| Secret | ıklama | | Secret | Description |
| --- | --- | | --- | --- |
| `STORAGEBOX_SSH_PRIV` | StorageBox SSH private key | | `STORAGEBOX_SSH_PRIV` | StorageBox SSH private key |
| `STORAGEBOX_SSH_PUB` | StorageBox SSH public key | | `STORAGEBOX_SSH_PUB` | StorageBox SSH public key |
| `HARBOR_CI_TOKEN` | `robot-ci-push-iklimco` robot hesabı token'ı (build + push) | | `HARBOR_CI_TOKEN` | `robot-ci-push-iklimco` robot account token (build + push) |
| `HARBOR_PULL_TOKEN` | `robot-swarm-pull-iklimco` robot hesabı token'ı (Swarm deploy pull) | | `HARBOR_PULL_TOKEN` | `robot-swarm-pull-iklimco` robot account token (Swarm deploy pull) |
| `REPO_ACCESS_TOKEN` | Gitea private repo erişimi (BE-Commons vb. checkout) | | `REPO_ACCESS_TOKEN` | Gitea private repo access (BE-Commons, etc. checkout) |
## 6. Custom Image Build ve Harbor Push ## 6. Custom Image Build and Harbor Push
`docker-stack-infra.yml` ve mikroservis stack'leri `registry.tarla.io/iklimco/` altındaki özel image'leri kullanır. Bu image'ler `ops/push-harbor-custom-images.sh` scripti ile build edilip registry'ye push edilir. `docker-stack-infra.yml` and microservice stacks use private images under `registry.tarla.io/iklimco/`. These images are built and pushed to the registry with the `ops/push-harbor-custom-images.sh` script.
APISIX config dosyaları (`build/apisix-core/config.yaml`, `build/apisix-dashboard/conf.yaml`) `template/` altındaki şablonlardan `envsubst` ile üretilir. Bu üretimi `push-harbor-custom-images.sh` kendi içinde yapar; build bitince geçici dosyalar otomatik temizlenir. APISIX config files (`build/apisix-core/config.yaml`, `build/apisix-dashboard/conf.yaml`) are generated from templates under `template/` with `envsubst`. `push-harbor-custom-images.sh` performs this generation internally; temporary files are cleaned automatically when the build finishes.
**Tasarım notu:** APISIX admin key image'a bake edilmez. Template'de `${{APISIX_ADMIN_KEY}}` (çift süslü parantez) kullanılır; APISIX bunu container başlarken Docker service ortam değişkeninden okur. Böylece tek image hem test hem prod için kullanılabilir. **Design note:** The APISIX admin key is not baked into the image. The template uses `${{APISIX_ADMIN_KEY}}` (double curly braces); APISIX reads it from the Docker service environment variable when the container starts. This allows one image to be used for both test and prod.
### Adımlar ### Steps
```bash ```bash
# 1. Harbor'a login ol # 1. Log in to Harbor
docker login registry.tarla.io -u robot-ci-push-iklimco docker login registry.tarla.io -u robot-ci-push-iklimco
# 2. Image'leri build edip push et (env'leri ve config dosyalarını script kendi üretir) # 2. Build and push the images; the script generates envs and config files itself
bash ops/push-harbor-custom-images.sh bash ops/push-harbor-custom-images.sh
``` ```
## Kabul Kriterleri ## Acceptance Criteria
1. Gitea Runners sayfasında `test-runner` etiketli runner **Idle** (yeşil) görünür. 1. The runner labeled `test-runner` appears as **Idle** (green) on the Gitea Runners page.
2. `runs-on: test-runner` kullanan bir workflow başarıyla tetiklenir. 2. A workflow using `runs-on: test-runner` is triggered successfully.
3. Job container'ı Docker daemon'a ve `iklimco-net` overlay network'üne erişebilir. 3. The job container can access the Docker daemon and the `iklimco-net` overlay network.
4. `8200/tcp` (Vault) portu public internete kapalıdır. 4. The `8200/tcp` (Vault) port is closed to the public internet.
5. `registry.tarla.io/iklimco/custom-apisix`, `custom-apisix-dashboard`, `custom-prometheus` image'leri Harbor'da mevcut ve çekilebilir durumda. 5. `registry.tarla.io/iklimco/custom-apisix`, `custom-apisix-dashboard`, and `custom-prometheus` images exist in Harbor and are pullable.

View File

@ -1,23 +1,23 @@
# 06 - Prod Terraform IaC # 06 - Prod Terraform IaC
Bu asamanin amaci prod Hetzner Cloud Project icinde HA odakli IaaS kaynaklarini Terraform ile olusturmaktir. Bu dokuman prod Terraform ajanina tek basina verilebilir. The purpose of this phase is to create HA-focused IaaS resources inside the prod Hetzner Cloud Project with Terraform. This document can be given to the prod Terraform agent on its own.
## Kapsam ## Scope
Terraform prod ortaminda sunlari olusturur: Terraform creates the following in the prod environment:
- Private network: `iklim-prod-net` - Private network: `iklim-prod-net`
- Subnetler: - Subnets:
- App/Swarm subnet: `10.20.10.0/24` - App/Swarm subnet: `10.20.10.0/24`
- DB subnet: `10.20.20.0/24` - DB subnet: `10.20.20.0/24`
- Firewall: - Firewall:
- Public ingress: sadece `22/tcp`, `80/tcp`, `443/tcp` - Public ingress: only `22/tcp`, `80/tcp`, `443/tcp`
- Private ingress: `01-private-network-port-matrisi.md` dosyasindaki prod kurallari - Private ingress: prod rules in `01-private-network-port-matrisi.md`
- SSH key - SSH key
- Placement groups: - Placement groups:
- `iklim-prod-app-spread` - `iklim-prod-app-spread`
- `iklim-prod-db-spread` - `iklim-prod-db-spread`
- Floating IP: app entry point icin sabit IPv4 (`iklim-app-01`'e atanir) - Floating IP: stable IPv4 for the app entry point, assigned to `iklim-app-01`
- Servers: - Servers:
- `iklim-app-01` - `iklim-app-01`
- `iklim-app-02` - `iklim-app-02`
@ -27,16 +27,16 @@ Terraform prod ortaminda sunlari olusturur:
- `iklim-db-03` - `iklim-db-03`
- Ansible inventory output - Ansible inventory output
DB cluster yazilimi Terraform ile kurulmayacak. DB node'lari sadece makine, network ve firewall seviyesinde hazirlanacak. DB cluster software will not be installed with Terraform. DB nodes will be prepared only at the machine, network, and firewall level.
## Versiyon Gereksinimleri ## Version Requirements
```text ```text
Terraform >= 1.6 Terraform >= 1.6
hcloud provider ~> 1.49 hcloud provider ~> 1.49
``` ```
## Onerilen Dosya Yapisi ## Recommended File Structure
```text ```text
terraform/ terraform/
@ -55,13 +55,13 @@ terraform/
terraform.tfvars.example terraform.tfvars.example
``` ```
`terraform.tfvars`, state dosyalari ve token repo'ya commit edilmeyecek. `terraform.tfvars`, state files, and tokens will not be committed to the repo.
## Degiskenler ## Variables
`environment` sabiti `locals.tf` icindedir; `tfvars` ile override edilmez. The `environment` constant is in `locals.tf`; it is not overridden with `tfvars`.
Minimum degiskenler: Minimum variables:
```hcl ```hcl
hcloud_token = "secret" hcloud_token = "secret"
@ -73,29 +73,24 @@ admin_ssh_public_key_path = "~/.ssh/id_ed25519.pub"
admin_allowed_cidrs = ["X.X.X.X/32"] admin_allowed_cidrs = ["X.X.X.X/32"]
``` ```
Server type karari `../hetzner-sizing-report.md` dokumanindaki mevcut test The server type decision was made by considering the current test environment metrics in `../hetzner-sizing-report.md` and the prod cluster topology. `cpx42` is recommended for prod app nodes because of Java microservice memory pressure, and the more economical `cpx32` is recommended for prod DB nodes because the cluster starts with 3 nodes. When capacity needs are validated with metrics, nodes can be added or in-place rescale can be performed.
ortami metrikleri ve prod cluster topolojisi dikkate alinarak belirlenmistir.
Prod app node'lar icin Java mikroservis bellek baskisi nedeniyle `cpx42`,
prod DB node'lar icin ise 3 node cluster baslangici nedeniyle ekonomik
`cpx32` onerilir. Kapasite ihtiyaci metriklerle dogrulandiginda node ekleme
veya in-place rescale yapilabilir.
## Server Rolleri ve Private IP Plani ## Server Roles and Private IP Plan
| Server | Private IP | Rol | | Server | Private IP | Role |
| --- | --- | --- | | --- | --- | --- |
| `iklim-app-01` | `10.20.10.11` | Swarm manager + app worker + runner (primary, FIP alir) | | `iklim-app-01` | `10.20.10.11` | Swarm manager + app worker + runner; primary, receives FIP |
| `iklim-app-02` | `10.20.10.12` | Swarm manager + app worker + runner | | `iklim-app-02` | `10.20.10.12` | Swarm manager + app worker + runner |
| `iklim-app-03` | `10.20.10.13` | Swarm manager + app worker + runner | | `iklim-app-03` | `10.20.10.13` | Swarm manager + app worker + runner |
| `iklim-db-01` | `10.20.20.11` | Manuel DB cluster node | | `iklim-db-01` | `10.20.20.11` | Manual DB cluster node |
| `iklim-db-02` | `10.20.20.12` | Manuel DB cluster node | | `iklim-db-02` | `10.20.20.12` | Manual DB cluster node |
| `iklim-db-03` | `10.20.20.13` | Manuel DB cluster node | | `iklim-db-03` | `10.20.20.13` | Manual DB cluster node |
Private IP'ler `locals.tf` icinde `swarm_private_ips` ve `db_private_ips` map'leri olarak sabit tanimlanir. Sunucu listesi `for_each` ile bu map'lerden turetilir. Private IPs are statically defined inside `locals.tf` as the `swarm_private_ips` and `db_private_ips` maps. The server list is derived from these maps with `for_each`.
## Onerilen Kaynaklar ve Maliyet ## Recommended Resources and Cost
| Server | Rol | Server Type | CPU | RAM | SSD | Aylik | | Server | Role | Server Type | CPU | RAM | SSD | Monthly |
| --- | --- | --- | ---: | ---: | ---: | ---: | | --- | --- | --- | ---: | ---: | ---: | ---: |
| `iklim-app-01` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 | | `iklim-app-01` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
| `iklim-app-02` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 | | `iklim-app-02` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
@ -103,41 +98,41 @@ Private IP'ler `locals.tf` icinde `swarm_private_ips` ve `db_private_ips` map'le
| `iklim-db-01` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 | | `iklim-db-01` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 |
| `iklim-db-02` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 | | `iklim-db-02` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 |
| `iklim-db-03` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 | | `iklim-db-03` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 |
| **Toplam** | 6 server | | **36 vCPU** | **72 GB** | **1,440 GB** | **$139.44** | | **Total** | 6 servers | | **36 vCPU** | **72 GB** | **1,440 GB** | **$139.44** |
## Placement Group Karari ## Placement Group Decision
Prod icin iki ayri spread placement group: Two separate spread placement groups for prod:
```text ```text
iklim-prod-app-spread: iklim-app-01/02/03 iklim-prod-app-spread: iklim-app-01/02/03
iklim-prod-db-spread: iklim-db-01/02/03 iklim-prod-db-spread: iklim-db-01/02/03
``` ```
Bu sayede Swarm quorum node'lari kendi aralarinda farkli fiziksel host'lara, DB node'lari da kendi aralarinda farkli fiziksel host'lara yerlestirilmeye calisilir. This aims to place Swarm quorum nodes on different physical hosts from each other, and DB nodes on different physical hosts from each other.
Notlar: Notes:
- Hetzner kabinet secimi dogrudan sunmaz. - Hetzner does not provide direct cabinet selection.
- Spread placement group farkli fiziksel host hedefler. - A spread placement group targets different physical hosts.
- Farkli lokasyon/region felaket kurtarma bu asamada konu disidir. - Disaster recovery across different locations/regions is outside the scope of this phase.
- Ileride scale buyudugunde multi-location DR ayri tasarlanmalidir. - Multi-location DR must be designed separately later when scale grows.
## Floating IP ## Floating IP
`iklim-prod-app-fip` adli IPv4 floating IP olusturulur ve `iklim-app-01`'e atanir. DNS A kaydi bu IP'ye yonlendirilir. Failover gerekirse floating IP baska bir app node'una tasinabilir. An IPv4 floating IP named `iklim-prod-app-fip` is created and assigned to `iklim-app-01`. The DNS A record is pointed to this IP. If failover is needed, the floating IP can be moved to another app node.
## Public Firewall ## Public Firewall
Public ingress: Public ingress:
| Port | Kaynak | Hedef | | Port | Source | Target |
| --- | --- | --- | | --- | --- | --- |
| `22/tcp` | `admin_allowed_cidrs` | Tum prod node'lari | | `22/tcp` | `admin_allowed_cidrs` | All prod nodes |
| `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` (Floating IP uzerinden) | | `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` through Floating IP |
| `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` (Floating IP uzerinden) | | `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` through Floating IP |
Prod'da su portlar public acilmayacak: The following ports will not be opened publicly in prod:
- `8200/tcp` Vault - `8200/tcp` Vault
- `5432/tcp` PostgreSQL - `5432/tcp` PostgreSQL
@ -153,27 +148,27 @@ Prod'da su portlar public acilmayacak:
### App (swarm) Firewall — Private Ingress ### App (swarm) Firewall — Private Ingress
App subnet kaynakli (`10.20.10.0/24`): Source from app subnet (`10.20.10.0/24`):
| Port | Servis | Erisim yontemi | | Port | Service | Access method |
| --- | --- | --- | | --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | App subnet icinden | | `2377/tcp` | Docker Swarm control plane | From app subnet |
| `7946/tcp,udp` | Docker Swarm node discovery | App subnet icinden | | `7946/tcp,udp` | Docker Swarm node discovery | From app subnet |
| `4789/udp` | Docker Swarm VXLAN overlay | App subnet icinden | | `4789/udp` | Docker Swarm VXLAN overlay | From app subnet |
| `8200/tcp` | Vault | Docker overlay / private network | | `8200/tcp` | Vault | Docker overlay / private network |
| `6379/tcp` | Redis | App subnet icinden | | `6379/tcp` | Redis | From app subnet |
| `5672/tcp` | RabbitMQ AMQP | App subnet icinden | | `5672/tcp` | RabbitMQ AMQP | From app subnet |
| `61613/tcp` | RabbitMQ STOMP | App subnet icinden | | `61613/tcp` | RabbitMQ STOMP | From app subnet |
| `15674/tcp` | RabbitMQ Web STOMP | App subnet icinden | | `15674/tcp` | RabbitMQ Web STOMP | From app subnet |
| `15672/tcp` | RabbitMQ Management | SWAG arkasinda `443` — IP kisitli | | `15672/tcp` | RabbitMQ Management | Behind SWAG `443` — IP restricted |
| `9000/tcp` | APISIX Dashboard | SWAG arkasinda `443` — IP kisitli | | `9000/tcp` | APISIX Dashboard | Behind SWAG `443` — IP restricted |
| `9180/tcp` | APISIX Admin API | Docker overlay icinden sadece Dashboard erisir | | `9180/tcp` | APISIX Admin API | Only Dashboard accesses it from Docker overlay |
| `9090/tcp` | Prometheus | SWAG arkasinda `443` — IP kisitli | | `9090/tcp` | Prometheus | Behind SWAG `443` — IP restricted |
| `3000/tcp` | Grafana | SWAG arkasinda `443` — IP kisitli | | `3000/tcp` | Grafana | Behind SWAG `443` — IP restricted |
DB subnet kaynakli (`iklim-db-*` node'lari Swarm'a worker olarak katildigi icin): Source from DB subnet, because `iklim-db-*` nodes join Swarm as workers:
| Port | Servis | Kaynak | | Port | Service | Source |
| --- | --- | --- | | --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | `10.20.20.0/24` | | `2377/tcp` | Docker Swarm control plane | `10.20.20.0/24` |
| `7946/tcp,udp` | Docker Swarm node discovery | `10.20.20.0/24` | | `7946/tcp,udp` | Docker Swarm node discovery | `10.20.20.0/24` |
@ -181,150 +176,146 @@ DB subnet kaynakli (`iklim-db-*` node'lari Swarm'a worker olarak katildigi icin)
### DB Firewall — Private Ingress ### DB Firewall — Private Ingress
Admin erisimi: Admin access:
| Port | Servis | Kaynak | | Port | Service | Source |
| --- | --- | --- | | --- | --- | --- |
| `22/tcp` | SSH | `admin_allowed_cidrs` | | `22/tcp` | SSH | `admin_allowed_cidrs` |
App subnet kaynakli (`10.20.10.0/24`): Source from app subnet (`10.20.10.0/24`):
| Port | Servis | Not | | Port | Service | Note |
| --- | --- | --- | | --- | --- | --- |
| `5432/tcp` | PostgreSQL (Patroni primary) | App subnet erisimi | | `5432/tcp` | PostgreSQL (Patroni primary) | App subnet access |
| `27017/tcp` | MongoDB replica set endpoint | App subnet erisimi | | `27017/tcp` | MongoDB replica set endpoint | App subnet access |
| `2377/tcp` | Docker Swarm control plane | App subnet icinden | | `2379/tcp` | etcd client (Patroni + APISIX) | App subnet access |
| `7946/tcp,udp` | Docker Swarm node discovery | App subnet icinden | | `2377/tcp` | Docker Swarm control plane | From app subnet |
| `4789/udp` | Docker Swarm VXLAN overlay | App subnet icinden | | `7946/tcp,udp` | Docker Swarm node discovery | From app subnet |
| `4789/udp` | Docker Swarm VXLAN overlay | From app subnet |
DB subnet icindeki karsilikli erisim (`10.20.20.0/24`): Mutual access inside the DB subnet (`10.20.20.0/24`):
| Port | Servis | Not | | Port | Service | Note |
| --- | --- | --- | | --- | --- | --- |
| `5432/tcp` | PostgreSQL Patroni replication | DB node'lari arasi | | `5432/tcp` | PostgreSQL Patroni replication | Between DB nodes |
| `27017/tcp` | MongoDB replica set internal | DB node'lari arasi | | `27017/tcp` | MongoDB replica set internal | Between DB nodes |
| `2379/tcp` | etcd client | Patroni → etcd erisimi | | `2379/tcp` | etcd client | Patroni -> etcd access |
| `2380/tcp` | etcd peer | etcd cluster internal | | `2380/tcp` | etcd peer | etcd cluster internal |
| `8008/tcp` | Patroni REST API | Patroni leader election ve saglik kontrolu | | `8008/tcp` | Patroni REST API | Patroni leader election and health check |
IP kisitlamasi Hetzner firewall'da degil, SWAG nginx konfigurasyonunda yapilir. IP restriction is done in the SWAG nginx configuration, not in the Hetzner firewall.
## Outputs ## Outputs
`terraform apply` veya `terraform output` sonrasi asagidaki degerler alinabilir: The following values can be obtained after `terraform apply` or `terraform output`:
| Output | Aciklama | | Output | Description |
| --- | --- | | --- | --- |
| `ansible_inventory_yaml` | Ansible inventory YAML — `ansible/inventory/generated/prod.yml` dosyasina yazilir | | `ansible_inventory_yaml` | Ansible inventory YAML — written to `ansible/inventory/generated/prod.yml` |
| `prod_private_ips` | Tum node'larin private IP haritasi (`swarm` ve `db` alt anahtarlari) | | `prod_private_ips` | Private IP map of all nodes, with `swarm` and `db` subkeys |
| `prod_public_ips` | Tum node'larin public IPv4 haritasi | | `prod_public_ips` | Public IPv4 map of all nodes |
| `prod_floating_ip` | Swarm giris noktasi icin Floating IP adresi (DNS A kaydi bu IP'ye yonlendirilir) | | `prod_floating_ip` | Floating IP address for the Swarm entry point; DNS A record points to this IP |
Ansible inventory cikarmak icin: To extract the Ansible inventory:
```bash ```bash
terraform output -raw ansible_inventory_yaml > \ terraform output -raw ansible_inventory_yaml > \
../../ansible/inventory/generated/prod.yml ../../ansible/inventory/generated/prod.yml
``` ```
## Lifecycle ve Resize Politikasi ## Lifecycle and Resize Policy
### server_type Degisikligi (Yeniden Boyutlandirma) ### `server_type` Change (Resize)
`server_type` degistirmek Terraform destroy+create **tetiklemez**. `hcloud` provider Changing `server_type` does **not** trigger Terraform destroy+create. The `hcloud` provider supports this natively: it stops the server, calls the Hetzner Resize API, and starts it again. Update the value in `terraform.tfvars` and run `terraform apply`.
bunu natively destekler: sunucuyu durdurur, Hetzner Resize API'sini cagirir,
yeniden baslatir. `terraform.tfvars` icinde degeri guncelle, `terraform apply` calistir.
Downtime olur (sunucu durur ve baslar) ancak disk, kurulu yazilim ve Docker volumes There is downtime, because the server stops and starts, but disk, installed software, and Docker volumes are preserved. No `ignore_changes` or manual step is required.
korunur. `ignore_changes` veya manuel adim gerekmez.
### Hangi Degisiklikler Sunucuyu Zorla Yeniden Olusturur? ### Which Changes Force Server Recreation?
| Degisen alan | Davranis | Not | | Changed field | Behavior | Note |
| --- | --- | --- | | --- | --- | --- |
| `server_type` | In-place resize (provider native) | `terraform apply` yeterli | | `server_type` | In-place resize (provider native) | `terraform apply` is enough |
| `hcloud_server_network` | Sadece attachment guncellenir | Ayri resource kullanildigi icin | | `hcloud_server_network` | Only attachment is updated | Because a separate resource is used |
| `hcloud_firewall_attachment` | Sadece attachment guncellenir | Ayri resource kullanildigi icin | | `hcloud_firewall_attachment` | Only attachment is updated | Because a separate resource is used |
| `placement_group_id` | Hetzner API degisime izin vermiyor → destroy+create | Degistirme | | `placement_group_id` | Hetzner API does not allow changing it -> destroy+create | Do not change |
| `image` | Disk imaji degisir → destroy+create | Degistirme | | `image` | Disk image changes -> destroy+create | Do not change |
| `location` | Baska datacenter'a tasinamaz → destroy+create | Degistirme | | `location` | Cannot be moved to another datacenter -> destroy+create | Do not change |
### Network ve Firewall Attachment Ayrimi ### Network and Firewall Attachment Separation
`network` blogu ve `firewall_ids` `hcloud_server` icine gomulmez. Bunun yerine The `network` block and `firewall_ids` are not embedded inside `hcloud_server`. Instead, separate resources are defined:
ayri resource tanimlanir:
- `hcloud_server_network` — private IP atamasi (`for_each` ile her node icin) - `hcloud_server_network` — private IP assignment, for each node with `for_each`
- `hcloud_firewall_attachment` — firewall iliskisi (`for_each` ile turetilen server listesi) - `hcloud_firewall_attachment` — firewall relationship, using the server list derived with `for_each`
### prevent_destroy Korumasi ### `prevent_destroy` Protection
Her sunucuya `lifecycle { prevent_destroy = true }` eklenir. Kasitli silmek icin Each server gets `lifecycle { prevent_destroy = true }`. To intentionally delete a server, temporarily remove the lifecycle block first.
once lifecycle blogunu gecici olarak kaldir.
## Nasil Calistirilir ## How to Run
### Hazirlik ### Preparation
**1. tfvars olustur (bir kere):** **1. Create tfvars once:**
```bash ```bash
cd Environment_Infrastructure/terraform/hetzner/prod cd Environment_Infrastructure/terraform/hetzner/prod
cp terraform.tfvars.example terraform.tfvars cp terraform.tfvars.example terraform.tfvars
# terraform.tfvars icerigini gercek degerlerle doldur # Fill terraform.tfvars with real values
# (hcloud_token, admin_allowed_cidrs, vb.) # (hcloud_token, admin_allowed_cidrs, etc.)
``` ```
`terraform.tfvars` commit edilmez — `.gitignore` ile korunur. `terraform.tfvars` is not committed; it is protected with `.gitignore`.
**2. Provider yukle (bir kere):** **2. Install the provider once:**
```bash ```bash
terraform init terraform init
``` ```
### Ilk Uygulama ### First Apply
```bash ```bash
# Nelerin olusacagini goster — bozma yapma # Show what will be created; do not make changes
terraform plan terraform plan
# Onayla ve olustur # Approve and create
terraform apply terraform apply
``` ```
`apply` sonrasi 6 sunucu, 2 firewall, 1 floating IP ve network kaynaklari Hetzner'da gorunur. After `apply`, 6 servers, 2 firewalls, 1 floating IP, and network resources are visible in Hetzner.
### Ansible Inventory Alma ### Get Ansible Inventory
```bash ```bash
terraform output -raw ansible_inventory_yaml > \ terraform output -raw ansible_inventory_yaml > \
../../ansible/inventory/generated/prod.yml ../../ansible/inventory/generated/prod.yml
``` ```
### Gitea Değişkeni: PROD_FLOATING_IP ### Gitea Variable: `PROD_FLOATING_IP`
Deploy pipeline DNS kayıtlarını otomatik yönetmek için bu değişkene ihtiyaç duyar. `terraform apply` sonrasında bir kez ayarlanır: The deploy pipeline needs this variable to manage DNS records automatically. It is set once after `terraform apply`:
```bash ```bash
terraform output prod_floating_ip terraform output prod_floating_ip
``` ```
Çıkan IP adresini Gitea → proje ayarları**Variables** altında `PROD_FLOATING_IP` adıyla ekle. Pipeline `vars.PROD_FLOATING_IP` ile okur ve GoDaddy A kayıtlarını idempotent olarak günceller. Add the resulting IP address in Gitea -> project settings -> **Variables** with the name `PROD_FLOATING_IP`. The pipeline reads it with `vars.PROD_FLOATING_IP` and updates GoDaddy A records idempotently.
### Resize (Server Type Degistirme) ### Resize (Change Server Type)
`terraform.tfvars` icinde `server_type_swarm` veya `server_type_db` degerini degistir: Change the `server_type_swarm` or `server_type_db` value inside `terraform.tfvars`:
```bash ```bash
terraform apply terraform apply
``` ```
Sunucu durdurulur, Hetzner Resize API cagirilir, yeniden baslatilir. Disk ve Docker volumes korunur. Downtime olur. The server is stopped, the Hetzner Resize API is called, and the server is started again. Disk and Docker volumes are preserved. There is downtime.
### Sunucu Silme (Zorla) ### Server Deletion (Forced)
`prevent_destroy = true` oldugu icin normal `terraform destroy` hata verir. Once `servers.tf` icindeki `lifecycle` blogunu gecici kaldir: Because `prevent_destroy = true` exists, normal `terraform destroy` fails. First, temporarily remove the `lifecycle` block inside `servers.tf`:
```hcl ```hcl
# lifecycle { # lifecycle {
@ -332,26 +323,26 @@ Sunucu durdurulur, Hetzner Resize API cagirilir, yeniden baslatilir. Disk ve Doc
# } # }
``` ```
Sonra: Then:
```bash ```bash
terraform destroy -target=hcloud_server.swarm["iklim-app-01"] terraform destroy -target=hcloud_server.swarm["iklim-app-01"]
``` ```
Islemi tamamladiktan sonra lifecycle blogunu geri ekle. After completing the operation, add the lifecycle block back.
### State Yonetimi ### State Management
Simdilik local state kullanilmaktadir (`terraform.tfstate`). State dosyasi repo'ya commit edilmez. Ekipte birden fazla kisi calisiyorsa Hetzner Object Storage veya HCP Terraform remote state kullanilmalidir. Local state is used for now (`terraform.tfstate`). The state file is not committed to the repo. If more than one person works on the team, Hetzner Object Storage or HCP Terraform remote state must be used.
## Kabul Kriterleri ## Acceptance Criteria
- `terraform plan` sadece prod Hetzner Project token'i ile calisir. - `terraform plan` works only with the prod Hetzner Project token.
- 6 server olusur (`iklim-app-01/02/03`, `iklim-db-01/02/03`). - 6 servers are created: `iklim-app-01/02/03`, `iklim-db-01/02/03`.
- Swarm node'lari `iklim-prod-app-spread` placement group icindedir. - Swarm nodes are inside the `iklim-prod-app-spread` placement group.
- DB node'lari `iklim-prod-db-spread` placement group icindedir. - DB nodes are inside the `iklim-prod-db-spread` placement group.
- Public firewall sadece `22`, `80`, `443` ingress'e izin verir. - Public firewall allows only `22`, `80`, and `443` ingress.
- Private firewall `01-private-network-port-matrisi.md` ile uyumludur. - Private firewall is compatible with `01-private-network-port-matrisi.md`.
- DB replication portlari yalnizca DB subnet'ten erisilebilir. - DB replication ports are accessible only from the DB subnet.
- Floating IP olusur ve `iklim-app-01`'e atanir. - Floating IP is created and assigned to `iklim-app-01`.
- Terraform state ve secret tfvars commit edilmez. - Terraform state and secret tfvars are not committed.

View File

@ -1,12 +1,12 @@
# 07 - Prod Ansible Bootstrap # 07 - Prod Ansible Bootstrap
Bu aşamanın amacı Terraform ile oluşturulan prod makinelerini Linux, security hardening, Docker ve Swarm açısından hazır hale getirmektir. DB cluster yazılımı bu playbook tarafından kurulmaz; ancak DB node'ları Swarm'a worker olarak katılır. The purpose of this phase is to prepare the prod machines created by Terraform for Linux, security hardening, Docker, and Swarm. DB cluster software is not installed by this playbook; however, DB nodes join Swarm as workers.
## Ansible Kurulumu ## Ansible Installation
Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hedef sunuculara herhangi bir ajan kurulmaz, sadece SSH erişimi yeterlidir. Ansible must be installed on the control machine, meaning your own computer. No agent is installed on target servers; SSH access is enough.
### İşletim Sistemine Göre Kurulum ### Installation by Operating System
- **Ubuntu / Debian:** - **Ubuntu / Debian:**
```bash ```bash
@ -17,6 +17,7 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
export PATH="$HOME/.local/bin:$PATH" export PATH="$HOME/.local/bin:$PATH"
pipx install --include-deps ansible pipx install --include-deps ansible
pipx install ansible-lint
``` ```
- **Fedora / Rocky Linux / RHEL:** - **Fedora / Rocky Linux / RHEL:**
@ -27,6 +28,7 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
export PATH="$HOME/.local/bin:$PATH" export PATH="$HOME/.local/bin:$PATH"
pipx install --include-deps ansible pipx install --include-deps ansible
pipx install ansible-lint
``` ```
- **macOS (Homebrew):** - **macOS (Homebrew):**
@ -34,75 +36,76 @@ Ansible, kontrol makinesinde (kendi bilgisayarınızda) yüklü olmalıdır. Hed
brew install ansible brew install ansible
``` ```
- **Python Pip ile (Her platformda):** - **With Python Pip, on any platform:**
```bash ```bash
pipx install --include-deps ansible pipx install --include-deps ansible
pipx install ansible-lint
``` ```
### Ek Python Bağımlılıkları ### Additional Python Dependencies
`password_hash` filtresi için `passlib` kontrol makinesinde gereklidir: `passlib` is required on the control machine for the `password_hash` filter:
```bash ```bash
pipx inject ansible passlib pipx inject ansible passlib
``` ```
> `pip` ile kurduysanız: `pip install passlib` > If you installed with `pip`: `pip install passlib`
### Kurulumun Doğrulanması ### Verify the Installation
Hangi yöntemle kurarsanız kurun, kurulumun başarılı olduğunu doğrulamak için aşağıdaki komutları kullanın: Whichever method you used to install it, use the following commands to verify that the installation succeeded:
```bash ```bash
# Ansible versiyonunu ve yapılandırma yollarını kontrol edin # Check the Ansible version and configuration paths
ansible --version ansible --version
# Ansible binarysinin hangi konumdan çalıştığını kontrol edin # Check which location the Ansible binary is running from
which -a ansible which -a ansible
``` ```
## Ansible Komutlarını Çalıştırma ## Running Ansible Commands
Tüm komutlar `ansible/prod/` dizininden çalıştırılmalıdır. `ansible.cfg` inventory ve roles_path'i otomatik olarak tanımlar. All commands must be run from the `ansible/prod/` directory. `ansible.cfg` automatically defines the inventory and `roles_path`.
### 0. Gerekli Collection'ları Kur (İlk kurulumda bir kez) ### 0. Install Required Collections Once During Initial Setup
```bash ```bash
ansible-galaxy collection install -r ../requirements.yml ansible-galaxy collection install -r ../requirements.yml
``` ```
### 1. Bağlantı Testi (Ping) ### 1. Connection Test (Ping)
```bash ```bash
ansible all -m ping ansible all -m ping
``` ```
### 2. Bootstrap Playbook'unu Çalıştırma ### 2. Run the Bootstrap Playbook
```bash ```bash
ansible-playbook prod-bootstrap.yml --ask-vault-pass ansible-playbook prod-bootstrap.yml --ask-vault-pass
``` ```
*Not: `--ask-vault-pass` parametresi Ansible Vault parolasını sorar; StorageBox şifresi bu şekilde çözülür.* *Note: The `--ask-vault-pass` parameter asks for the Ansible Vault password; the StorageBox password is decrypted this way.*
### 3. Sadece Belirli Bir Rolü Çalıştırma (Tags) ### 3. Run Only a Specific Role (Tags)
```bash ```bash
ansible-playbook prod-bootstrap.yml --tags "hardening" --ask-vault-pass ansible-playbook prod-bootstrap.yml --tags "hardening" --ask-vault-pass
``` ```
## Hedef Makineler ## Target Machines
| Host | Rol | | Host | Role |
| --- | --- | | --- | --- |
| `iklim-app-01` | Swarm manager + app worker | | `iklim-app-01` | Swarm manager + app worker |
| `iklim-app-02` | Swarm manager + app worker | | `iklim-app-02` | Swarm manager + app worker |
| `iklim-app-03` | Swarm manager + app worker | | `iklim-app-03` | Swarm manager + app worker |
| `iklim-db-01` | Manuel DB cluster node | | `iklim-db-01` | Manual DB cluster node |
| `iklim-db-02` | Manuel DB cluster node | | `iklim-db-02` | Manual DB cluster node |
| `iklim-db-03` | Manuel DB cluster node | | `iklim-db-03` | Manual DB cluster node |
## Önerilen Dosya Yapısı ## Recommended File Structure
```text ```text
ansible/ ansible/
@ -130,11 +133,11 @@ ansible/
## Base Role ## Base Role
Tüm prod node'larına uygulanır: Applied to all prod nodes:
- Paket cache update - Package cache update
- `epel-release`ayrı task olarak önce kurulur; `fail2ban`, `davfs2`, `htop`, `btop` bu repoya bağımlı - `epel-release`installed first as a separate task; `fail2ban`, `davfs2`, `htop`, and `btop` depend on this repo
- temel paketler (`epel-release` aktif olduktan sonra): - base packages, after `epel-release` is active:
- `curl` - `curl`
- `wget` - `wget`
- `git` - `git`
@ -142,45 +145,45 @@ Tüm prod node'larına uygulanır:
- `tar` - `tar`
- `unzip` - `unzip`
- `bash-completion` - `bash-completion`
- `gettext`envsubst için; CI/CD deploy pipeline'larında gerekli - `gettext`required for envsubst in CI/CD deploy pipelines
- `tree` - `tree`
- `ca-certificates` - `ca-certificates`
- `fail2ban` - `fail2ban`
- `chrony` - `chrony`
- `python3` - `python3`
- `python3-pip` - `python3-pip`
- `python3-passlib``password_hash` filtresi için (EPEL) - `python3-passlib`for the `password_hash` filter (EPEL)
- `htop` — interaktif proses izleme (EPEL) - `htop` — interactive process monitoring (EPEL)
- `btop`kaynak monitörü, grafik arayüz (EPEL) - `btop`resource monitor with graphical interface (EPEL)
- timezone: `Europe/Istanbul` - timezone: `Europe/Istanbul`
- hostname ayarı - hostname setup
- klavye düzeni: `trq` (Türkçe Q) - keyboard layout: `trq` (Turkish Q)
- chrony/NTP aktif - chrony/NTP active
## Security Hardening Role ## Security Hardening Role
Tüm prod node'larına uygulanır: Applied to all prod nodes:
- SSH password auth kapatılır. - SSH password auth is disabled.
- Root SSH login kapatılır. - Root SSH login is disabled.
- Sadece SSH key auth kalır. - Only SSH key auth remains.
- `PermitEmptyPasswords no` - `PermitEmptyPasswords no`
- `MaxAuthTries 3` - `MaxAuthTries 3`
- `fail2ban` aktif edilir. - `fail2ban` is enabled.
- `dnf-automatic` ile otomatik güvenlik güncelleştirmeleri aktif edilir. - Automatic security updates are enabled with `dnf-automatic`.
- `iklim` sistem kullanıcısı oluşturulur; `wheel` grubuna eklenir (şifre vault'tan alınır). - The `iklim` system user is created and added to the `wheel` group; the password is read from vault.
- `firewalld` default: incoming deny (drop zone), outgoing allow. - `firewalld` default: incoming deny (drop zone), outgoing allow.
- SSH kuralı önce `drop` zone'a rich rule olarak yazılır, ardından default zone `drop` yapılır. - The SSH rule is first written as a rich rule to the `drop` zone, then the default zone is set to `drop`.
- SSH sadece admin CIDR'dan açılır. - SSH is opened only from the admin CIDR.
- DB portları public açılmaz. - DB ports are not opened publicly.
Hetzner Cloud Firewall asıl perimeter kabul edilir. firewalld host üzerinde ikinci savunma katmanıdır. The Hetzner Cloud Firewall is considered the actual perimeter. firewalld is the second defense layer on the host.
## Docker Role ## Docker Role
Tüm prod node'larında (hem app hem db) zorunludur. DB node'ları Swarm Worker olarak ağa dahil olacağı için Docker Engine her makinede kurulu olmalıdır. Required on all prod nodes, both app and db. Because DB nodes join the network as Swarm Workers, Docker Engine must be installed on every machine.
Kurulacak paketler: Packages to install:
- `docker-ce` - `docker-ce`
- `docker-ce-cli` - `docker-ce-cli`
@ -188,33 +191,36 @@ Kurulacak paketler:
- `docker-buildx-plugin` - `docker-buildx-plugin`
- `docker-compose-plugin` - `docker-compose-plugin`
Kurulum resmi Docker dnf repository üzerinden yapılacak (`https://download.docker.com/linux/rhel/docker-ce.repo`). Installation will be done through the official Docker dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`).
## Swarm Role ## Swarm Role
Prod Swarm 3 manager ile kurulacak: Prod Swarm will be set up with 3 managers:
1. `iklim-app-01` üzerinde `docker swarm init` (Advertise/data path addr: `10.20.10.11`) 1. `docker swarm init` on `iklim-app-01` (Advertise/data path addr: `10.20.10.11`)
2. `iklim-app-02` ve `iklim-app-03` manager olarak join olur. 2. `iklim-app-02` and `iklim-app-03` join as managers.
3. `iklim-db-01/02/03` worker olarak join olur. 3. `iklim-db-01/02/03` join as workers.
4. Overlay network oluşturulur: `iklimco-net` 4. Overlay network is created: `iklimco-net`
5. Node etiketleri: 5. Node labels:
- `iklim-app-*` -> `type=service` - `iklim-app-*` -> `type=service`
- `iklim-db-*` -> `role=db`, `db-index=01/02/03` (Patroni node koordinasyonu için) - `iklim-db-*` -> `role=db`, `db-index=01/02/03`, for Patroni node coordination
6. Tüm node'lar `AVAILABILITY=Active` kalır. 6. All nodes remain `AVAILABILITY=Active`.
`db-index` etiketleri prod-bootstrap.yml içinde ayrı bir play ile `iklim-app-01` üzerinden eklenir (swarm role tarafından değil). The `db-index` labels are added through `iklim-app-01` in a separate play inside `prod-bootstrap.yml`, not by the swarm role.
## Node Directory Role ## Node Directory Role
Tüm `iklim-app-*` node'larında: On all `iklim-app-*` nodes:
```text ```text
/opt/iklimco/ssl /opt/iklimco/ssl
/opt/iklimco/init /opt/iklimco/init
/opt/iklimco/stacks /opt/iklimco/stacks
/opt/iklimco/vault/data
``` ```
DB node'larında: `/opt/iklimco/vault/data` is the host path volume of the Vault Raft node; it must be created separately on every app node. Swarm does not manage this directory as an overlay volume; if it is missing, the Vault container will not start.
On DB nodes:
```text ```text
/opt/iklimco/db /opt/iklimco/db
/opt/iklimco/backup /opt/iklimco/backup
@ -222,23 +228,23 @@ DB node'larında:
## StorageBox DAVFS Mount Role ## StorageBox DAVFS Mount Role
Her node'a uygulanır (tüm `iklim-app-*` ve `iklim-db-*`). Applied to every node, all `iklim-app-*` and `iklim-db-*`.
### Prod Sub-Account ### Prod Sub-Account
| Parametre | Değişken | Değer | | Parameter | Variable | Value |
| --- | --- | --- | | --- | --- | --- |
| Ana hesap | `storagebox_account` | `u469968` | | Main account | `storagebox_account` | `u469968` |
| Sub-account | `storagebox_user` | `u469968-sub5` | | Sub-account | `storagebox_user` | `u469968-sub5` |
| WebDAV URL | `storagebox_url` | `https://u469968-sub5.your-storagebox.de/` | | WebDAV URL | `storagebox_url` | `https://u469968-sub5.your-storagebox.de/` |
| Mount point | `storagebox_mount_point` | `/mnt/storagebox` | | Mount point | `storagebox_mount_point` | `/mnt/storagebox` |
## StorageBox SSH Key Role ## StorageBox SSH Key Role
Her node'a uygulanır. Sunucu üzerinde `/root/.ssh/id_ed25519_storagebox` ed25519 anahtar çifti üretilir. Üretilen public key'in StorageBox ana hesabına yüklenmesi (SSH authorized_keys) ayrı bir manuel adımdır: Applied to every node. The `/root/.ssh/id_ed25519_storagebox` ed25519 key pair is generated on the server. Uploading the generated public key to the StorageBox main account (SSH authorized_keys) is a separate manual step:
```bash ```bash
# Her node için: # For each node:
cat /root/.ssh/id_ed25519_storagebox.pub | \ cat /root/.ssh/id_ed25519_storagebox.pub | \
ssh -p 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de \ ssh -p 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de \
"cat >> .ssh/authorized_keys" "cat >> .ssh/authorized_keys"
@ -246,15 +252,15 @@ cat /root/.ssh/id_ed25519_storagebox.pub | \
## Act Runner Role ## Act Runner Role
`iklim-app-*` node'larına uygulanır. Her app node'a Gitea Act Runner kurulur ve systemd servisi olarak başlatılır. Prod ortamında 3 app node üzerinde runner çalışır; deploy pipeline bu runner'lardan herhangi birinde tetiklenebilir. Applied to `iklim-app-*` nodes. Gitea Act Runner is installed on each app node and started as a systemd service. In prod, the runner runs on 3 app nodes; the deploy pipeline can be triggered on any of these runners.
## DB Stack Role ## DB Stack Role
`iklim-db-*` node'larına uygulanır. MongoDB için `/opt/iklimco/db/mongodb/config/` dizinini ve `mongod.conf` dosyasını oluşturur. `group_vars/prod.yml` içinde tanımlı `mongodb_replset_name: "rs0"` değişkeniyle `mongod.conf` replicaSet ve keyFile bloklarını otomatik içerir. Applied to `iklim-db-*` nodes. On each DB node, it creates `/opt/iklimco/db` and `/opt/iklimco/backup` directories, as well as a local reference directory for MongoDB. The actual production configuration, including node-specific `mongod.conf`, replica set auth key, Patroni, and etcd configurations, is set up on StorageBox at `/mnt/storagebox/prod/db/mongodb-0X/config/`, `/mnt/storagebox/prod/db/postgresql-0X/config/`, and `/mnt/storagebox/prod/db/etcd-0X/data/` in the `08-prod-db-cluster-kurulum.md` step.
## /opt/iklimco/stacks/.env ## /opt/iklimco/stacks/.env
DB cluster stack'lerinin gerektirdiği şifre değişkenleri `/opt/iklimco/stacks/.env` dosyasında saklanır. Bu dosya StorageBox'ta `prod/secrets/iklim.co/.env.stacks` olarak tutulur. İlk deploy öncesinde `iklim-app-01` üzerinde aşağıdaki komutla çekilir: Password variables required by the DB cluster stacks are stored in the `/opt/iklimco/stacks/.env` file. This file is stored on StorageBox as `prod/secrets/iklim.co/.env.stacks`. Before the first deploy, it is fetched on `iklim-app-01` with the following command:
```bash ```bash
scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \ scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \
@ -262,36 +268,58 @@ scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.
chmod 600 /opt/iklimco/stacks/.env chmod 600 /opt/iklimco/stacks/.env
``` ```
## Swarm Kurulum Doğrulaması ## StorageBox Directory Structure
Bootstrap sonrası aşağıdaki komutlarla Swarm durumu kontrol edilir: After Ansible bootstrap is completed and before the infra stack is deployed, create the following directories on `iklim-app-01`; StorageBox must be mounted:
```bash ```bash
# 6 node: 3 manager (Leader/Reachable), 3 worker (Ready) # SWAG certificate and configuration directories
mkdir -p /mnt/storagebox/ssl
mkdir -p /mnt/storagebox/swag/config
mkdir -p /mnt/storagebox/swag/site-confs
# Monitoring data directories; Grafana on StorageBox, Prometheus on local volume
mkdir -p /mnt/storagebox/grafana/data
mkdir -p /mnt/storagebox/prometheus/data
# Image directory for the precipitation service
mkdir -p /mnt/storagebox/precipitation/images
```
These directories match the `SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, `SWAG_SITE_CONFS_DIR`, `GRAFANA_DATA_DIR`, and `PROMETHEUS_DATA_DIR` variables in `env-prod/.env`. Because StorageBox is mounted at the same `/mnt/storagebox` path on all app nodes, these directories are created only once and all nodes access them commonly.
## Swarm Setup Verification
After bootstrap, check the Swarm status with the following commands:
```bash
# 6 nodes: 3 managers (Leader/Reachable), 3 workers (Ready)
docker node ls docker node ls
# App node etiketi # App node label
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}' docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
# Beklenen: map[type:service] # Expected: map[type:service]
# DB node etiketi # DB node label
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}' docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
# Beklenen: map[db-index:01 role:db] # Expected: map[db-index:01 role:db]
# swarm-init.sh idempotency — zaten aktif Swarm'da tekrar init denemez # swarm-init.sh idempotency — do not attempt init again in an already active Swarm
grep -n "swarm init\|swarm join" init/swarm-init.sh grep -n "swarm init\|swarm join" init/swarm-init.sh
``` ```
## Kabul Kriterleri ## Acceptance Criteria
- `ansible all -m ping` başarılı olur. - `ansible all -m ping` succeeds.
- 3 Swarm manager node `docker node ls` içinde Leader/Reachable görünür. - 3 Swarm manager nodes appear as Leader/Reachable in `docker node ls`.
- 3 DB node `docker node ls` içinde Worker olarak görünür. - 3 DB nodes appear as Workers in `docker node ls`.
- Manager quorum sağlanır (3 manager, 1 kayıp tolere edilir). - Manager quorum is provided: 3 managers, 1 loss tolerated.
- `iklimco-net` overlay network vardır. - The `iklimco-net` overlay network exists.
- Node etiketleri (`type=service`, `role=db`, `db-index=01/02/03`) inspect ile doğrulanır. - Node labels (`type=service`, `role=db`, `db-index=01/02/03`) are verified with inspect.
- `swarm-init.sh` aktif Swarm'da tekrar init denemez (idempotent). - `swarm-init.sh` does not attempt init again in an active Swarm; it is idempotent.
- Her node'da `/mnt/storagebox` mount edilmiştir. - `/mnt/storagebox` is mounted on every node.
- Her app node'da Gitea Act Runner servisi çalışmaktadır. - The `/opt/iklimco/vault/data` directory exists on every app node.
- DB node'larında `/opt/iklimco/db/mongodb/config/mongod.conf` oluşturulmuştur ve `replSetName: rs0` içermektedir. - The `ssl`, `swag/config`, `swag/site-confs`, `grafana/data`, `prometheus/data`, and `precipitation/images` directories exist on StorageBox.
- Public firewall sadece `22`, `80`, `443` ingress'e izin verir. - The Gitea Act Runner service is running on every app node.
- `/opt/iklimco/db` and `/opt/iklimco/backup` directories exist on DB nodes. Node-specific `mongod.conf` and other DB configurations are created on StorageBox (`/mnt/storagebox/prod/db/...`) in the `08-prod-db-cluster-kurulum.md` step.
- Public firewall allows only `22`, `80`, and `443` ingress.

View File

@ -1,10 +1,10 @@
# 08 - Prod DB Cluster Kurulumu (Swarm) # 08 - Prod DB Cluster Setup (Swarm)
Bu aşamanın amacı üç DB node'unu Docker Swarm'a worker olarak eklemek, MongoDB replica set ve Patroni + etcd ile yönetilen PostgreSQL yüksek erişilebilirlik konfigürasyonunu yapmaktır. The purpose of this phase is to add the three DB nodes to Docker Swarm as workers and configure the MongoDB replica set and the PostgreSQL high-availability setup managed with Patroni + etcd.
`07-prod-ansible-bootstrap.md` tüm DB node'larında tamamlanmış olmalıdır. `07-prod-ansible-bootstrap.md` must be completed on all DB nodes.
## Mimari ## Architecture
``` ```
iklim-app-01/02/03 (Swarm manager'lar, 10.20.10.11/12/13) iklim-app-01/02/03 (Swarm manager'lar, 10.20.10.11/12/13)
@ -14,7 +14,7 @@ iklim-app-01/02/03 (Swarm manager'lar, 10.20.10.11/12/13)
iklim-db-01 (Swarm worker, 10.20.20.11) iklim-db-01 (Swarm worker, 10.20.20.11)
mongodb-01 [rs0 member 0 — preferred primary] mongodb-01 [rs0 member 0 — preferred primary]
etcd-01 [etcd cluster member] etcd-01 [etcd cluster member]
patroni-01 [Patroni + PostgreSQL — ilk primary adayı] patroni-01 [Patroni + PostgreSQL — first primary candidate]
iklim-db-02 (Swarm worker, 10.20.20.12) iklim-db-02 (Swarm worker, 10.20.20.12)
mongodb-02 [rs0 member 1] mongodb-02 [rs0 member 1]
@ -27,13 +27,13 @@ iklim-db-03 (Swarm worker, 10.20.20.13)
patroni-03 [Patroni + PostgreSQL — standby] patroni-03 [Patroni + PostgreSQL — standby]
``` ```
DB container'ları birbirlerini overlay DNS adıyla değil, **Hetzner private IP üzerinden** tanıyor. Bu nedenle her servis portunu `host` modda yayımlar; replikasyon ve etcd trafiği doğrudan private network üzerinden gecer. Hetzner Cloud firewall ve prod `db` firewall zaten bu portlara izin vermektedir. DB containers discover each other through **Hetzner private IPs**, not overlay DNS names. Therefore, each service publishes its port in `host` mode; replication and etcd traffic goes directly through the private network. The Hetzner Cloud firewall and the prod `db` firewall already allow these ports.
## 1. Firewall Güncellemesi ## 1. Firewall Update
`terraform/hetzner/prod/firewall.tf` dosyasında aşağıdaki kuralların mevcut olduğunu doğrula; eksik varsa ekle ve `terraform apply` çalıştır. Verify that the following rules exist in `terraform/hetzner/prod/firewall.tf`; if any are missing, add them and run `terraform apply`.
`hcloud_firewall.swarm` içinde (DB subnet'ten Swarm portlarına): Inside `hcloud_firewall.swarm`, from the DB subnet to Swarm ports:
```hcl ```hcl
rule { rule {
@ -69,7 +69,7 @@ rule {
} }
``` ```
`hcloud_firewall.db` içinde (app subnet'ten Swarm portlarına + overlay; DB subnet içi etcd/Patroni trafiği): Inside `hcloud_firewall.db`, from the app subnet to Swarm ports + overlay, and etcd/Patroni traffic inside the DB subnet:
```hcl ```hcl
rule { rule {
@ -112,6 +112,14 @@ rule {
description = "etcd client port within DB subnet" description = "etcd client port within DB subnet"
} }
rule {
direction = "in"
protocol = "tcp"
port = "2379"
source_ips = [local.app_subnet_cidr]
description = "etcd client port from app subnet (APISIX connects to Patroni etcd)"
}
rule { rule {
direction = "in" direction = "in"
protocol = "tcp" protocol = "tcp"
@ -135,7 +143,7 @@ terraform plan
terraform apply terraform apply
``` ```
## 2. DB Node'larını Swarm'a Ekleme ## 2. Add DB Nodes to Swarm
**Swarm manager'lardan birinde** (iklim-app-01) join token al: **Swarm manager'lardan birinde** (iklim-app-01) join token al:
@ -149,7 +157,7 @@ docker swarm join-token worker
docker swarm join --token <TOKEN> 10.20.10.11:2377 docker swarm join --token <TOKEN> 10.20.10.11:2377
``` ```
**iklim-app-01 üzerinde** node'ları etiketle: Label the nodes **on iklim-app-01**:
```bash ```bash
docker node update --label-add role=db --label-add db-index=01 iklim-db-01 docker node update --label-add role=db --label-add db-index=01 iklim-db-01
@ -159,22 +167,22 @@ docker node update --label-add role=db --label-add db-index=03 iklim-db-03
docker node ls docker node ls
``` ```
## 3. StorageBox Dizin Yapısı ## 3. StorageBox Directory Structure
Her DB node'unda (`/mnt/storagebox` zaten mount edilmiş olmalı): On each DB node, where `/mnt/storagebox` must already be mounted:
```bash ```bash
# iklim-db-01 üzerinde: # On iklim-db-01:
mkdir -p /mnt/storagebox/prod/db/mongodb-01/{data,log,config} mkdir -p /mnt/storagebox/prod/db/mongodb-01/{data,log,config}
mkdir -p /mnt/storagebox/prod/db/postgresql-01/{data,config} mkdir -p /mnt/storagebox/prod/db/postgresql-01/{data,config}
mkdir -p /mnt/storagebox/prod/db/etcd-01/data mkdir -p /mnt/storagebox/prod/db/etcd-01/data
# iklim-db-02 üzerinde: # On iklim-db-02:
mkdir -p /mnt/storagebox/prod/db/mongodb-02/{data,log,config} mkdir -p /mnt/storagebox/prod/db/mongodb-02/{data,log,config}
mkdir -p /mnt/storagebox/prod/db/postgresql-02/{data,config} mkdir -p /mnt/storagebox/prod/db/postgresql-02/{data,config}
mkdir -p /mnt/storagebox/prod/db/etcd-02/data mkdir -p /mnt/storagebox/prod/db/etcd-02/data
# iklim-db-03 üzerinde: # On iklim-db-03:
mkdir -p /mnt/storagebox/prod/db/mongodb-03/{data,log,config} mkdir -p /mnt/storagebox/prod/db/mongodb-03/{data,log,config}
mkdir -p /mnt/storagebox/prod/db/postgresql-03/{data,config} mkdir -p /mnt/storagebox/prod/db/postgresql-03/{data,config}
mkdir -p /mnt/storagebox/prod/db/etcd-03/data mkdir -p /mnt/storagebox/prod/db/etcd-03/data
@ -209,14 +217,14 @@ security:
### Replica Set Auth Key ### Replica Set Auth Key
Tüm DB node'larında **aynı** key dosyası olmalıdır: The **same** key file must exist on all DB nodes:
```bash ```bash
# iklim-db-01 üzerinde oluştur: # Create on iklim-db-01:
openssl rand -base64 756 > /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key openssl rand -base64 756 > /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key
chmod 400 /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key chmod 400 /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key
# Aynı içeriği diğer node'lara kopyala: # Copy the same content to the other nodes:
cat /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key \ cat /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key \
> /mnt/storagebox/prod/db/mongodb-02/config/rs-auth.key > /mnt/storagebox/prod/db/mongodb-02/config/rs-auth.key
cat /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key \ cat /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key \
@ -225,7 +233,7 @@ cat /mnt/storagebox/prod/db/mongodb-01/config/rs-auth.key \
chmod 400 /mnt/storagebox/prod/db/mongodb-0{2,3}/config/rs-auth.key chmod 400 /mnt/storagebox/prod/db/mongodb-0{2,3}/config/rs-auth.key
``` ```
### Stack Dosyası — MongoDB ### Stack File — MongoDB
`/opt/iklimco/stacks/prod-db-mongo.yml`: `/opt/iklimco/stacks/prod-db-mongo.yml`:
@ -313,16 +321,16 @@ services:
condition: on-failure condition: on-failure
``` ```
### Replica Set Başlangıç ### Replica Set Initialization
Stack deploy edildikten sonra **bir kez** çalıştırılır: Run **once** after the stack is deployed:
```bash ```bash
# iklim-db-01 üzerinde: # On iklim-db-01:
docker exec -it $(docker ps -q -f name=iklim-db_mongodb-01) mongosh \ docker exec -it $(docker ps -q -f name=iklim-db_mongodb-01) mongosh \
-u mongo-root -p "${MONGO_ROOT_PASSWORD}" --authenticationDatabase admin -u mongo-root -p "${MONGO_ROOT_PASSWORD}" --authenticationDatabase admin
# mongosh içinde: # Inside mongosh:
rs.initiate({ rs.initiate({
_id: "rs0", _id: "rs0",
members: [ members: [
@ -332,19 +340,19 @@ rs.initiate({
] ]
}) })
# Durum kontrol: # Status check:
rs.status() rs.status()
``` ```
`"stateStr": "PRIMARY"` ve iki `"SECONDARY"` görülünce replica set hazırdır. The replica set is ready when `"stateStr": "PRIMARY"` and two `"SECONDARY"` entries are visible.
## 5. PostgreSQL — Patroni + etcd ## 5. PostgreSQL — Patroni + etcd
Patroni, PostgreSQL primary/standby rollerini etcd üzerinden koordine eder. Primary düşerse diğer node'lardan biri otomatik olarak seçim kazanır ve primary olur. Swarm servisi container'ı yeniden başlatır; Patroni kaldığı yerden devam eder. Patroni coordinates PostgreSQL primary/standby roles through etcd. If the primary goes down, one of the other nodes automatically wins the election and becomes primary. The Swarm service restarts the container; Patroni continues from where it left off.
### 5.1 Özel Image (Patroni + PostGIS) ### 5.1 Custom Image (Patroni + PostGIS)
`postgis/postgis:17-3.5` imajı üzerine Patroni kurulur. Bu imaj Harbor'a push edilip stack'te kullanılır. Patroni is installed on top of the `postgis/postgis:17-3.5` image. This image is pushed to Harbor and used in the stack.
`Environment_Infrastructure/docker/patroni-postgis/Dockerfile`: `Environment_Infrastructure/docker/patroni-postgis/Dockerfile`:
@ -368,7 +376,7 @@ USER postgres
ENTRYPOINT ["patroni", "/etc/patroni/patroni.yml"] ENTRYPOINT ["patroni", "/etc/patroni/patroni.yml"]
``` ```
Build ve push (`ops/push-harbor-custom-images.sh` ile yapılır veya aşağıdaki komutları çalıştır): Build and push; this is done with `ops/push-harbor-custom-images.sh`, or run the commands below:
```bash ```bash
cd Environment_Infrastructure/docker/patroni-postgis cd Environment_Infrastructure/docker/patroni-postgis
@ -377,9 +385,9 @@ echo "$HARBOR_CI_TOKEN" | docker login registry.tarla.io -u robot-ci-push-iklimc
docker push registry.tarla.io/iklimco/patroni-postgis:17-3.5 docker push registry.tarla.io/iklimco/patroni-postgis:17-3.5
``` ```
### 5.2 etcd Kümesi ### 5.2 etcd Cluster
#### Stack Dosyası — etcd #### Stack File — etcd
`/opt/iklimco/stacks/prod-db-etcd.yml`: `/opt/iklimco/stacks/prod-db-etcd.yml`:
@ -491,11 +499,13 @@ services:
condition: on-failure condition: on-failure
``` ```
**Önemli:** `ETCD_INITIAL_CLUSTER_STATE` değeri ilk deploy'da `new`, sonraki tüm deploy'larda `existing` olmalıdır. Yanlış değer bırakılırsa data dizini sıfırlanır. Aşağıdaki Section 6'daki deploy adımları bu durumu otomatik tespit eder — manuel güncelleme gerekmez. **APISIX etcd usage:** In prod, APISIX shares this etcd cluster with the `/apisix` prefix. Patroni uses the `/service/` prefix and APISIX uses the `/apisix/` prefix, so there is no collision. APISIX configuration is managed by the `config.yaml` file in the `docker-stack-infra.prod.yml` overlay; the connection is made to `http://iklim-db-01:2379,http://iklim-db-02:2379,http://iklim-db-03:2379`. Therefore, the app subnet -> DB nodes port 2379 firewall rule is mandatory; it was added in Section 1.
### 5.3 Patroni Konfigürasyonu **Important:** `ETCD_INITIAL_CLUSTER_STATE` must be `new` on the first deploy and `existing` on all later deploys. If the wrong value is left in place, the data directory is reset. The deploy steps in Section 6 below detect this automatically; no manual update is required.
Her node için ayrı bir `patroni.yml` dosyası oluşturulur. Farklılıklar yalnızca `name` ve `connect_address` alanlarındadır. ### 5.3 Patroni Configuration
A separate `patroni.yml` file is created for each node. The only differences are the `name` and `connect_address` fields.
**Node 01** — `/mnt/storagebox/prod/db/postgresql-01/config/patroni.yml`: **Node 01** — `/mnt/storagebox/prod/db/postgresql-01/config/patroni.yml`:
@ -570,7 +580,7 @@ tags:
**Node 02** — `/mnt/storagebox/prod/db/postgresql-02/config/patroni.yml`: **Node 02** — `/mnt/storagebox/prod/db/postgresql-02/config/patroni.yml`:
Node 01 ile aynı içerik, yalnızca şu alanlar farklı: Same content as Node 01; only the following fields differ:
```yaml ```yaml
name: postgresql-02 name: postgresql-02
@ -596,7 +606,7 @@ postgresql:
data_dir: /var/lib/postgresql/data/pgdata data_dir: /var/lib/postgresql/data/pgdata
``` ```
### 5.4 Stack Dosyası — Patroni ### 5.4 Stack File — Patroni
`/opt/iklimco/stacks/prod-db-patroni.yml`: `/opt/iklimco/stacks/prod-db-patroni.yml`:
@ -696,66 +706,66 @@ services:
condition: on-failure condition: on-failure
``` ```
### 5.5 Durum Kontrolü ### 5.5 Status Check
```bash ```bash
# Herhangi bir DB node'unda: # On any DB node:
docker exec -it $(docker ps -q -f name=iklim-patroni_patroni-01) \ docker exec -it $(docker ps -q -f name=iklim-patroni_patroni-01) \
patronictl -c /etc/patroni/patroni.yml list patronictl -c /etc/patroni/patroni.yml list
``` ```
Beklenen çıktı: bir `Leader` ve iki `Replica` satırı, hepsinin `State` sütunu `running`. Expected output: one `Leader` row and two `Replica` rows, all with the `State` column set to `running`.
```bash ```bash
# etcd cluster sağlığı: # etcd cluster health:
docker exec -it $(docker ps -q -f name=iklim-etcd_etcd-01) \ docker exec -it $(docker ps -q -f name=iklim-etcd_etcd-01) \
etcdctl endpoint health \ etcdctl endpoint health \
--endpoints=http://10.20.20.11:2379,http://10.20.20.12:2379,http://10.20.20.13:2379 --endpoints=http://10.20.20.11:2379,http://10.20.20.12:2379,http://10.20.20.13:2379
``` ```
```bash ```bash
# Mevcut primary'i öğren: # Find the current primary:
docker exec -it $(docker ps -q -f name=iklim-patroni_patroni-01) \ docker exec -it $(docker ps -q -f name=iklim-patroni_patroni-01) \
patronictl -c /etc/patroni/patroni.yml topology patronictl -c /etc/patroni/patroni.yml topology
``` ```
## 6. Deploy ## 6. Deploy
Sıra önemlidir: önce etcd, ardından MongoDB ve Patroni stack'leri. Order matters: etcd first, then the MongoDB and Patroni stacks.
### .env Dosyası ### .env File
`/opt/iklimco/stacks/.env` dosyası StorageBox'ta `prod/secrets/iklim.co/.env.stacks` olarak saklanır. İlk kez oluşturulurken güçlü şifrelerle doldurulup StorageBox'a yüklenir; sonraki deploy'larda buradan çekilir: The `/opt/iklimco/stacks/.env` file is stored on StorageBox as `prod/secrets/iklim.co/.env.stacks`. When it is created the first time, it is filled with strong passwords and uploaded to StorageBox; later deploys fetch it from there:
```bash ```bash
# iklim-app-01 üzerinde (bir kez): # On iklim-app-01, once:
scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \ scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \
/opt/iklimco/stacks/.env /opt/iklimco/stacks/.env
chmod 600 /opt/iklimco/stacks/.env chmod 600 /opt/iklimco/stacks/.env
``` ```
Dosya içeriği (`/opt/iklimco/stacks/.env`, repo'ya commit edilmez): File content (`/opt/iklimco/stacks/.env`, not committed to the repo):
```env ```env
DATABASE_POSTGRES_ROOT_USER=postgres DATABASE_POSTGRES_ROOT_USER=postgres
POSTGRES_PASSWORD=<güçlü-şifre> POSTGRES_PASSWORD=<strong-password>
REPLICATOR_PASSWORD=<güçlü-şifre> REPLICATOR_PASSWORD=<strong-password>
MONGO_ROOT_PASSWORD=<güçlü-şifre> MONGO_ROOT_PASSWORD=<strong-password>
``` ```
### Deploy Adımları ### Deploy Steps
```bash ```bash
# iklim-app-01 üzerinde (Swarm manager): # On iklim-app-01 (Swarm manager):
export $(cat /opt/iklimco/stacks/.env | xargs) export $(cat /opt/iklimco/stacks/.env | xargs)
# ETCD_INITIAL_CLUSTER_STATE otomatik tespiti — ilk deploy'da 'new', sonrakinde 'existing' # Automatic ETCD_INITIAL_CLUSTER_STATE detection — 'new' on first deploy, 'existing' afterwards
ETCD_STATE="new" ETCD_STATE="new"
if docker service ls --filter name=iklim-etcd -q 2>/dev/null | grep -q .; then if docker service ls --filter name=iklim-etcd -q 2>/dev/null | grep -q .; then
echo " etcd servisleri mevcut, 'existing' state kullanılıyor..." echo " etcd services exist, using 'existing' state..."
ETCD_STATE="existing" ETCD_STATE="existing"
else else
echo " İlk deploy, 'new' state kullanılıyor..." echo " First deploy, using 'new' state..."
fi fi
sed -i \ sed -i \
"s/ETCD_INITIAL_CLUSTER_STATE: new/ETCD_INITIAL_CLUSTER_STATE: ${ETCD_STATE}/g; \ "s/ETCD_INITIAL_CLUSTER_STATE: new/ETCD_INITIAL_CLUSTER_STATE: ${ETCD_STATE}/g; \
@ -769,14 +779,14 @@ docker stack deploy \
--with-registry-auth \ --with-registry-auth \
iklim-etcd iklim-etcd
# etcd cluster'ın kurulmasını bekle: # Wait for the etcd cluster to be ready:
echo "⏳ etcd bekleniyor..." echo "⏳ etcd bekleniyor..."
for i in $(seq 1 18); do for i in $(seq 1 18); do
if docker exec $(docker ps -q -f name=iklim-etcd_etcd-01 | head -1) \ if docker exec $(docker ps -q -f name=iklim-etcd_etcd-01 | head -1) \
etcdctl endpoint health \ etcdctl endpoint health \
--endpoints=http://10.20.20.11:2379,http://10.20.20.12:2379,http://10.20.20.13:2379 \ --endpoints=http://10.20.20.11:2379,http://10.20.20.12:2379,http://10.20.20.13:2379 \
2>/dev/null | grep -q "is healthy"; then 2>/dev/null | grep -q "is healthy"; then
echo "✅ etcd hazır" echo "✅ etcd ready"
break break
fi fi
[ "$i" -eq 18 ] && echo "❌ etcd timeout" && exit 1 [ "$i" -eq 18 ] && echo "❌ etcd timeout" && exit 1
@ -801,15 +811,15 @@ docker stack services iklim-db
docker stack services iklim-patroni docker stack services iklim-patroni
``` ```
### MongoDB Replica Set Başlatma ### MongoDB Replica Set Initialization
MongoDB stack deploy edildikten sonra bir kez çalıştırılır: Run once after the MongoDB stack is deployed:
```bash ```bash
docker exec -it $(docker ps -q -f name=iklim-db_mongodb-01) mongosh \ docker exec -it $(docker ps -q -f name=iklim-db_mongodb-01) mongosh \
-u mongo-root -p "${MONGO_ROOT_PASSWORD}" --authenticationDatabase admin -u mongo-root -p "${MONGO_ROOT_PASSWORD}" --authenticationDatabase admin
# mongosh içinde: # Inside mongosh:
rs.initiate({ rs.initiate({
_id: "rs0", _id: "rs0",
members: [ members: [
@ -820,46 +830,88 @@ rs.initiate({
}) })
``` ```
## 7. App Servislerinden Erişim ## 7. Access from App Services
App containers connect to DB services through the `iklimco-net` overlay network **by Swarm DNS name**. Because the MongoDB stack (`iklim-db`) and Patroni stack (`iklim-patroni`) share the `iklimco-net` external network, service names are resolved through overlay DNS.
### MongoDB Replica Set Connection String ### MongoDB Replica Set Connection String
Variables in `env-prod/.env`:
```bash
DATABASE_MONGODB_HOST=mongodb-01:27017,mongodb-02:27017,mongodb-03:27017
DATABASE_MONGODB_PARAMS=replicaSet=rs0&readPreference=secondaryPreferred&authSource=admin
``` ```
mongodb://mongo-root:<SIFRE>@10.20.20.11:27017,10.20.20.12:27017,10.20.20.13:27017/<db>?replicaSet=rs0&authSource=admin
Microservice URI through overlay DNS:
``` ```
mongodb://<user>:<password>@mongodb-01:27017,mongodb-02:27017,mongodb-03:27017/<db>?replicaSet=rs0&readPreference=secondaryPreferred&authSource=admin
```
> For direct testing, from outside the overlay with private IP:
> `mongodb://mongo-root:<PASSWORD>@10.20.20.11:27017,10.20.20.12:27017,10.20.20.13:27017/admin?replicaSet=rs0&authSource=admin`
### PostgreSQL — Patroni ### PostgreSQL — Patroni
Patroni her an primary olan node'u yönetir. Uygulama katmanı tüm üç IP'yi vererek primary'e yazabilir, secondary'den okuyabilir: Variables in `env-prod/.env`:
``` ```bash
# Yazma — sadece primary kabul eder: DATABASE_POSTGRES_HOST=patroni-01:5432,patroni-02:5432,patroni-03:5432
jdbc:postgresql://10.20.20.11:5432,10.20.20.12:5432,10.20.20.13:5432/iklimdb?targetServerType=primary DATABASE_POSTGRES_PARAMS=targetServerType=preferSecondary&loadBalanceHosts=true
# Okuma (yük dengeleme):
jdbc:postgresql://10.20.20.11:5432,10.20.20.12:5432,10.20.20.13:5432/iklimdb?targetServerType=preferSecondary
``` ```
PostgreSQL JDBC sürücüsü `targetServerType=primary` ile bağlanmaya çalışacağı tüm node'lara bağlanır ve primary olanı otomatik bulur. Patroni manages whichever node is primary at any moment. The JDBC/libpq driver automatically selects primary/secondary through the `targetServerType` parameter in the multi-host list:
```
# Write — goes to primary (libpq URI):
postgresql://<user>@patroni-01:5432,patroni-02:5432,patroni-03:5432/<db>?targetServerType=primary
# Read (load balancing):
postgresql://<user>@patroni-01:5432,patroni-02:5432,patroni-03:5432/<db>?targetServerType=preferSecondary&loadBalanceHosts=true
```
> For direct testing, from outside the overlay with private IP:
> `postgresql://postgres@10.20.20.11:5432,10.20.20.12:5432,10.20.20.13:5432/postgres?targetServerType=primary`
The PostgreSQL JDBC/libpq driver connects to all listed nodes with `targetServerType=primary` and automatically finds the primary.
### Patroni REST API ### Patroni REST API
Patroni, 8008 portundan HTTP endpoint sunar. Bu endpoint HAProxy veya benzeri bir load balancer ile kullanılarak primary'i otomatik yönlendirme sağlanabilir: Patroni exposes an HTTP endpoint on port 8008. This endpoint can be used with HAProxy or a similar load balancer to route to the primary automatically:
```bash ```bash
# Primary kontrolü (HTTP 200 = primary, HTTP 503 = replica): # Primary check (HTTP 200 = primary, HTTP 503 = replica):
curl -s http://10.20.20.11:8008/primary curl -s http://10.20.20.11:8008/primary
``` ```
## Kabul Kriterleri ## 8. Geliştirici ve Ofis Erişimi (Production)
- `docker stack services iklim-etcd` — üç servis `1/1` Prod cluster yapısında `pg-proxy` veya `mongo-proxy` **kullanılmaz**. Ofis bilgisayarından erişim için doğrudan DB subnet'i hedef alınır.
- `docker stack services iklim-db` — üç MongoDB servisi `1/1`
- `docker stack services iklim-patroni` — üç Patroni servisi `1/1` ### WireGuard Ayarı
- `patronictl list` — 1 `Leader`, 2 `Replica`, hepsi `running` Ofis bilgisayarındaki `.conf` dosyasında `AllowedIPs` güncellenmelidir:
- `etcdctl endpoint health` — üç endpoint `healthy` `AllowedIPs = 10.8.0.1/32, 10.20.20.0/24`
### Bağlantı Parametreleri (Multi-Host)
Modern veritabanı araçları (DBeaver, Compass vb.) küme farkındalıklı bağlantı kurmalıdır:
| Veritabanı | Host Listesi | Port | Kritik Parametre |
| :--- | :--- | :--- | :--- |
| **PostgreSQL** | `10.20.20.11, 10.20.20.12, 10.20.20.13` | `5432` | `targetServerType=primary` |
| **MongoDB** | `10.20.20.11, 10.20.20.12, 10.20.20.13` | `27017` | `replicaSet=rs0` |
## Acceptance Criteria
- `docker stack services iklim-etcd` — three services `1/1`
- `docker stack services iklim-db` — three MongoDB services `1/1`
- `docker stack services iklim-patroni` — three Patroni services `1/1`
- In the output of `docker service ps iklim-patroni_patroni-01`, `patroni-02`, and `patroni-03`, every task runs on an `iklim-db-*` node through the `role=db` placement constraint.
- In the output of `docker service ps iklim-db_mongodb-01`, `mongodb-02`, and `mongodb-03`, every task runs on an `iklim-db-*` node.
- In the output of `docker service ps iklim-etcd_etcd-01`, `etcd-02`, and `etcd-03`, every task runs on an `iklim-db-*` node.
- `patronictl list` — 1 `Leader`, 2 `Replica`, all `running`
- `etcdctl endpoint health` — three endpoints `healthy`
- `rs.status()` — 1 PRIMARY, 2 SECONDARY - `rs.status()` — 1 PRIMARY, 2 SECONDARY
- App node'larından MongoDB ve PostgreSQL'e erişim sağlanır - MongoDB and PostgreSQL are reachable from app nodes.
- `5432`, `27017`, `2379`, `2380`, `8008` portları public internet'ten kapalıdır - Ports `5432`, `27017`, `2379`, `2380`, and `8008` are closed from the public internet.
- Bir DB node yeniden başlatıldığında Patroni otomatik seçim yapar, yeni primary belirlenir - When a DB node is restarted, Patroni performs automatic election and a new primary is selected.
- Patroni primary geçişi sırasında eski primary standby olarak re-join olur (split-brain yoktur) - During Patroni primary transition, the old primary rejoins as standby; there is no split-brain.

View File

@ -1,10 +1,10 @@
# 09 - Prod Runner HA ve Swarm Deploy Modeli # 09 - Prod Runner HA and Swarm Deploy Model
Bu asamanin amaci prod ortaminda Gitea Actions runner'lari HA calisacak sekilde kurmak ve Swarm uzerinde servislerin 3 node'a dagitilmasina uygun on kosullari tanimlamaktir. The purpose of this phase is to set up Gitea Actions runners in prod so they run in HA mode and define the prerequisites for distributing services across 3 nodes on Swarm.
## Runner Sayisi ## Runner Count
Tek runner fonksiyonel olarak yeterlidir, ancak HA degildir. Prod hedefi HA oldugu icin `act_runner` 3 Swarm manager node'unun tamamına systemd servisi olarak kurulacak: A single runner is functionally enough, but it is not HA. Because the prod target is HA, `act_runner` will be installed as a systemd service on all 3 Swarm manager nodes:
| Host | Runner | | Host | Runner |
| --- | --- | | --- | --- |
@ -12,13 +12,13 @@ Tek runner fonksiyonel olarak yeterlidir, ancak HA degildir. Prod hedefi HA oldu
| `iklim-app-02` | `act_runner` systemd | | `iklim-app-02` | `act_runner` systemd |
| `iklim-app-03` | `act_runner` systemd | | `iklim-app-03` | `act_runner` systemd |
Bu modelde herhangi bir manager/runner kaybedilirse diger runner'lar pipeline job'larini alabilir. In this model, if any manager/runner is lost, the other runners can pick up pipeline jobs.
## Runner Kurulum Modeli ## Runner Installation Model
Runner Docker container olarak calismayacak. Docker socket mount yok. The runner will not run as a Docker container. There is no Docker socket mount.
Kurulum: Installation:
- `gitea-runner` sistem kullanicisi - `gitea-runner` sistem kullanicisi
- `/usr/local/bin/act_runner` - `/usr/local/bin/act_runner`
@ -26,11 +26,11 @@ Kurulum:
- `/var/lib/gitea-runner` - `/var/lib/gitea-runner`
- `gitea-act-runner.service` - `gitea-act-runner.service`
Runner job'lari deploy icin Docker CLI kullanacaksa `gitea-runner` kullanicisinin Docker daemon erisimi gerekir. Docker group uyeligi root seviyesine yakin yetki kabul edilir; sadece guvenilir repo/job'lar bu runner label'larini kullanmalidir. If runner jobs use Docker CLI for deploy, the `gitea-runner` user needs access to the Docker daemon. Docker group membership is considered close to root-level permission; only trusted repos/jobs should use these runner labels.
## Runner Label PolitikasI ## Runner Label Policy
Tum prod runner'larda ortak label: Shared labels on all prod runners:
```text ```text
prod-runner prod-runner
@ -39,7 +39,7 @@ swarm-manager
ubuntu-24.04 ubuntu-24.04
``` ```
Node-spesifik label'lar: Node-specific labels:
```text ```text
iklim-app-01 iklim-app-01
@ -47,30 +47,30 @@ iklim-app-02
iklim-app-03 iklim-app-03
``` ```
Mevcut prod workflow'lari `runs-on: prod-runner` kullaniyorsa 3 runner'dan herhangi biri job'u alabilir. Belirli bir node'a sabitlemek gerekirse node-spesifik label kullanilir. If existing prod workflows use `runs-on: prod-runner`, any of the 3 runners can pick up the job. If pinning to a specific node is required, use a node-specific label.
## Deploy Yarismasi Riski ## Deploy Race Risk
Birden fazla runner oldugunda ayni anda birden fazla deploy job'u calisabilir. Bu HA icin iyidir ama ortak kaynaklarda yarisma riski yaratabilir. When there is more than one runner, multiple deploy jobs can run at the same time. This is good for HA, but it can create race risk on shared resources.
Riskli alanlar: Risk areas:
- Ayni stack uzerinde es zamanli `docker stack deploy` - Concurrent `docker stack deploy` on the same stack
- Ayni servis icin es zamanli `docker service update` - Concurrent `docker service update` for the same service
- StorageBox'ta ayni `.env` veya manifest dosyasinin es zamanli guncellenmesi - Concurrent updates to the same `.env` or manifest file on StorageBox
- Root altyapi pipeline'i ile mikroservis deploy pipeline'inin ayni anda calismasi - Root infrastructure pipeline and microservice deploy pipeline running at the same time
Gerekli onlem: Required measure:
- Prod root altyapi deploy'u manuel/onayli calismali. - Prod root infrastructure deploy should run manually or with approval.
- Ayni servis icin prod deploy ayni anda birden fazla kez tetiklenmemeli. - Prod deploy for the same service must not be triggered more than once at the same time.
- Prod deploy workflow'lari StorageBox uzerinde otomatik deploy lock kullanmalidir. - All prod deploy workflows are queued with the Gitea Actions `concurrency: group: prod-deploy` block; concurrent execution is prevented by Gitea.
## Ön Koşullar — StorageBox Sırları ## Prerequisites — StorageBox Secrets
Deploy pipeline çalışmadan önce aşağıdaki dosyaların StorageBox'ta mevcut olması gerekir. Bu dosyalar otomatik oluşturulmaz; ilk kurulumda elle oluşturulur. Before the deploy pipeline runs, the following files must exist on StorageBox. These files are not created automatically; they are created manually during the initial setup.
### SWAG / GoDaddy Kimlik Bilgileri ### SWAG / GoDaddy Credentials
``` ```
prod/secrets/iklim.co/.env.secrets.swag prod/secrets/iklim.co/.env.secrets.swag
@ -81,111 +81,596 @@ GODADDY_KEY=<api-key>
GODADDY_SECRET=<api-secret> GODADDY_SECRET=<api-secret>
``` ```
GoDaddy API anahtarı için: https://developer.godaddy.com/keys — **Production** key oluştur. Mevcut bir anahtarın herhangi bir chat, Slack veya e-postada paylaşıldığı biliniyorsa kullanmadan önce iptal et ve yenisini oluştur. For the GoDaddy API key: https://developer.godaddy.com/keys — create a **Production** key. If an existing key is known to have been shared in any chat, Slack, or email, revoke it before use and create a new one.
> `.env.secrets.swag` yalnızca SWAG/GoDaddy kimlik bilgilerini içerir. > `.env.secrets.swag` contains only SWAG/GoDaddy credentials.
> `.env.secrets.shared` AppRole ID'leri, DB şifreleri ve diğer çalışma zamanı sırlarını içerir — bu iki dosyayı karıştırma. > `.env.secrets.shared` contains AppRole IDs, DB passwords, and other runtime secrets — do not mix these two files.
### Gitea PROD_FLOATING_IP Değişkeni ### Gitea `PROD_FLOATING_IP` Variable
DNS otomasyonu için `PROD_FLOATING_IP` Gitea project variable olarak tanımlanmış olmalıdır. `06-prod-terraform-iaac.md` → "Gitea Değişkeni: PROD_FLOATING_IP" adımına bak. For DNS automation, `PROD_FLOATING_IP` must be defined as a Gitea project variable. See the "Gitea Variable: PROD_FLOATING_IP" step in `06-prod-terraform-iaac.md`.
## StorageBox Deploy Lock Modeli ### Docker Secrets
Prod'da 3 runner oldugu icin deploy lock zorunlu kabul edilir. Lock lokal dosya Before the infra stack is deployed, the following Docker secrets must be created on `iklim-app-01`. These secrets are referenced by `docker-stack-infra.prod.yml`; if they do not exist, stack deploy fails.
sisteminde tutulmayacak; cunku runner'lar farkli makinelerde calisir ve birbirlerinin
`/tmp` veya `/var/lock` dizinlerini gormez.
Lock konumu StorageBox olacaktir:
```text
prod/locks/prod-deploy.lock
prod/locks/prod-infra.lock
prod/locks/services/<service-name>.lock
```
Baslangic modeli:
```text
prod/locks/prod-deploy.lock
```
Bu tek global lock tum prod deploy'lari siraya sokar ve en az karmasik modeldir.
Ileride deploy sureleri uzarsa servis bazli lock'a gecilebilir.
Lock dosyasi/klasoru manuel olusturulmaz. Workflow basinda atomik `mkdir` ile lock
alinir, workflow sonunda `rmdir` ile lock birakilir.
Ornek:
```bash ```bash
LOCK_DIR="prod/locks/prod-deploy.lock" # Redis password, used by Redis master, replica, and sentinel:
LOCK_META="owner.txt" openssl rand -hex 32 | docker secret create redis_password -
ssh "$STORAGEBOX_SSH" "mkdir -p prod/locks && mkdir '$LOCK_DIR'" # RabbitMQ Erlang cluster cookie; must be the same on all RabbitMQ nodes:
ssh "$STORAGEBOX_SSH" "printf '%s\n' 'runner=${GITEA_RUNNER_NAME:-unknown}' 'run=${GITHUB_RUN_ID:-unknown}' 'created_at=$(date -u +%FT%TZ)' > '$LOCK_DIR/$LOCK_META'" openssl rand -hex 32 | docker secret create rabbitmq_erlang_cookie -
# deploy islemleri
ssh "$STORAGEBOX_SSH" "rm -f '$LOCK_DIR/$LOCK_META' && rmdir '$LOCK_DIR'"
``` ```
Davranis: > The `vault_unseal_key` secret is created after Vault is started for the first time; see `roadmap/prod-env/07-vault-raft-plan.md` Step 3. It is not required for the first infra stack deploy; it is waited for until the health check is triggered.
>
> This secret is also used during Vault restarts triggered by cert-reloader: when `cert-reloader` detects a certificate change, it runs `docker service update --force iklimco_vault`; while Vault containers restart, they read from the `vault_unseal_key` Docker secret and automatically unseal. If the secret is missing, Vault remains sealed after every certificate renewal.
- `mkdir '$LOCK_DIR'` basariliysa lock alinmistir. Verify secrets:
- `mkdir '$LOCK_DIR'` fail olursa baska deploy calisiyor kabul edilir.
- Job fail olsa bile cleanup adimi `rm/rmdir` calistirmalidir.
- Stale lock temizligi manuel/onayli olmalidir; otomatik zorla silme ilk asamada uygulanmamalidir.
Lock seviyesi: ```bash
docker secret ls
# redis_password and rabbitmq_erlang_cookie rows must appear
```
| Lock | Ne icin | ### SWAG Nginx Configuration Templates
| --- | --- |
| `prod/locks/prod-deploy.lock` | Ilk asama: tum prod deploy'lar icin global lock |
| `prod/locks/prod-infra.lock` | Ileride root infra deploy'u mikroservis deploy'larindan ayirmak icin |
| `prod/locks/services/<service-name>.lock` | Ileride servis bazli paralel deploy'a gecmek icin |
## Swarm Servis Dagilimi Before the deploy pipeline runs, the following template files must exist in the repo:
Prod'da 3 app node da manager + app worker oldugu icin servisler 3 node'a dagitilabilir. - `swag/site-confs/default.conf`
- `swag/site-confs/api.conf.tpl`
- `swag/site-confs/apigw.conf.tpl`
- `swag/site-confs/rabbitmq.conf.tpl`
- `swag/site-confs/grafana.conf.tpl`
### Mikroservisler These files are created in the test environment (`test-env/04-swag-nginx-configs.md`); they are not created separately for prod. Template files are shared by both environments; prod-specific values are injected with environment variables during deploy.
Her mikroservisin iki stack dosyasi vardir: Verify that the `prod/secrets/iklim.co/.env.prod` file on StorageBox contains the following variables:
| Dosya | Icerik | Ortam | ```bash
API_SUBDOMAIN=api.iklim.co
APIGW_SUBDOMAIN=apigw.iklim.co
RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co
GRAFANA_SUBDOMAIN=grafana.iklim.co
RESTRICTED_IPS="78.187.87.109/32,95.70.151.248/32"
SWAG_CERT_DIR=/mnt/storagebox/ssl
SWAG_CONFIG_DIR=/mnt/storagebox/swag/config
SWAG_SITE_CONFS_DIR=/mnt/storagebox/swag/site-confs
```
The pipeline sources these variables and renders the template files into the `$SWAG_SITE_CONFS_DIR` (`/mnt/storagebox/swag/site-confs`) directory. Because StorageBox is mounted commonly on all app nodes, even if the configuration is created on a single runner, SWAG containers on other nodes access the same files. Detail: `roadmap/prod-env/04-swag-nginx-configs.md`.
### APISIX Configuration
The following prerequisites must be satisfied before deploy.
#### init.sh SSL Block
The `ssls/1` PUT block and the `dev` SSL block inside `init/apisix-core/init.sh` must be removed. This change is made in the test environment (`test-env/05-apisix-remove-ssl.md`); the same `init.sh` file is also used in prod, so no separate change is required for prod.
#### Custom APISIX Image
The prod stack uses the `registry.tarla.io/iklimco/custom-apisix:3.12.0` image. This image's `config.yaml` must contain real IP header configuration for the overlay CIDR:
```yaml
nginx_config:
http:
real_ip_header: "X-Real-IP"
set_real_ip_from: "10.0.0.0/8"
```
`set_real_ip_from: 10.0.0.0/8` covers all container addresses in the Swarm overlay network; this skips SWAG's internal overlay IP and writes the real client IP to APISIX access logs.
If the image requires a rebuild because `config.yaml` changed:
```bash
docker build -t registry.tarla.io/iklimco/custom-apisix:3.12.0 .
docker push registry.tarla.io/iklimco/custom-apisix:3.12.0
```
During deploy, `init/apisix-core/init.sh` is run once by the pipeline. It writes the APISIX configuration to Patroni etcd with the `/apisix` prefix; the 3 replicas in prod read this etcd state commonly, so no separate init per replica is required. Detail: `roadmap/prod-env/05-apisix-remove-ssl.md`.
## Deploy Serialization with Gitea Concurrency
Because 3 runners run in prod, more than one deploy job can be triggered at the same time. Instead of a StorageBox-based `mkdir/rmdir` lock mechanism, the Gitea Actions `concurrency` feature is used.
Add the following block to the pipeline file (`deploy-prod.yml`):
```yaml
concurrency:
group: prod-deploy
cancel-in-progress: false
```
With `cancel-in-progress: false`, a new run in the same group is queued by Gitea until the previous one finishes. It appears as "queued" in the UI and is not shown as an error. There is no stale lock risk: even if the runner crashes or the job is canceled, Gitea handles state management.
All prod deploy workflows, including infra and microservices, must use the same `group: prod-deploy` value so infra deploy and microservice deploy cannot overlap.
## Deploy Pipeline
`.gitea/workflows/deploy-prod.yml` is the full step order of the prod deploy pipeline. Steps marked with `*` are prod-specific and do not exist in the test pipeline.
| # | Step | Note |
| --- | --- | --- | | --- | --- | --- |
| `BE-<Servis>/docker-stack-service.yml` | Base tanimlar, `replicas: 1` | Test + Prod | | 1 | Checkout Branch | |
| `BE-<Servis>/docker-stack-service.prod.yml` | `replicas: 3`, `max_replicas_per_node: 1` | Yalnizca Prod | | 2 | Prepare Folders | |
| 3 | Set up SSH Key and Add to known_hosts | |
| 4 | Update Apt Repository and Install Required Tools | `gettext tree jq``jq` is required for the GoDaddy DNS API |
| 5 | Fetch Service Secret Files | Fetch `.env.secrets.*` from StorageBox |
| 6 | Initialize Workspace | Fetch `.env` and `.env.secrets.shared` from StorageBox; run `init-base.sh` |
| 7 | Upload Updated Secrets to Storagebox | |
| 8 | Provision Vault AppRole IDs and Docker Secrets | |
| 9 | Upload Updated Env to Storagebox | |
| 10 | Prepare Init Files | Cert copy lines removed |
| 11 | Initialize Docker Swarm | |
| 12 | Docker Login to Harbor | |
| 13 | **Update DNS Records** * | GoDaddy API; `api/apigw/rabbitmq/grafana` A records; idempotent |
| 14 | **Prepare SWAG Directories** * | `$SWAG_CONFIG_DIR/dns-conf`; renders nginx conf templates; reloads running SWAG |
| 15 | Bootstrap Vault TLS Placeholder | |
| 16 | Deploy Swarm Stack | base + prod overlay together |
| 17 | **Wait for etcd** * | Waits until Patroni etcd (`iklim-db-01:2379`) is healthy |
| 18 | **Run APISIX Init** * | `SPRING_PROFILES_ACTIVE=prod`; idempotent; writes to etcd |
| 19 | **Bootstrap SWAG Certificate** * | Waits for SWAG to obtain the cert; copies it to `SWAG_CERT_DIR` |
| 20 | **Run Database Init Scripts** * | `postgresql`/`mongodb` Swarm VIP; SQL+JS init; idempotent |
| 21 | Review Environment | |
Prod deploy komutu: ### Removal of Cert Scp Lines
Lines removed from the `Initialize Workspace` step:
```yaml
# REMOVED — manual cert copy with scp is no longer required:
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co.full.crt ./STAR.iklim.co.full.crt
scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co_key.pem ./STAR.iklim.co_key.pem
```
Line also removed from the `Prepare Init Files` step:
```yaml
# REMOVED:
sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.pem /opt/iklimco/ssl/
```
The certificate is now obtained by SWAG from Let's Encrypt and written to the `SWAG_CERT_DIR` (`/mnt/storagebox/ssl/`) directory in the `Bootstrap SWAG Certificate` step. Later renewals are handled automatically by cert-reloader.
### Bootstrap SWAG Certificate (Step 19)
On the first deploy, SWAG obtains the Let's Encrypt certificate with the GoDaddy DNS-01 challenge. This step waits for SWAG to obtain the certificate, for up to 10 minutes, and then copies it to the `SWAG_CERT_DIR` directory:
```yaml
- name: Bootstrap SWAG Certificate
run: |
set -a; . ./.env; set +a
echo "Waiting for SWAG container to start..."
SWAG_CTR=""
for i in $(seq 1 24); do
SWAG_CTR=$(docker ps -q -f name=iklimco_swag 2>/dev/null | head -1)
[ -n "$SWAG_CTR" ] && break
sleep 10
done
if [ -z "$SWAG_CTR" ]; then
echo "❌ SWAG container did not start"
exit 1
fi
CERT_PATH="/config/etc/letsencrypt/live/iklim.co/fullchain.pem"
echo "Waiting for cert (up to 10 min)..."
for i in $(seq 1 20); do
if docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "✅ Cert obtained"
break
fi
echo " attempt $i/20 — waiting 30s..."
sleep 30
done
if ! docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
echo "❌ SWAG did not obtain cert. Logs:"
docker service logs iklimco_swag --tail 50
exit 1
fi
docker exec "$SWAG_CTR" cat "$CERT_PATH" | \
docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
sh -c "cat > /output/STAR.iklim.co.full.crt && chmod 644 /output/STAR.iklim.co.full.crt"
docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \
docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
sh -c "cat > /output/STAR.iklim.co_key.pem && chmod 644 /output/STAR.iklim.co_key.pem"
echo "✅ Cert bootstrapped to ${SWAG_CERT_DIR}/"
working-directory: /workspace/iklim.co
```
After this step, certificate files exist inside `SWAG_CERT_DIR` (`/mnt/storagebox/ssl/`); Vault TLS reads these files. Later renewals are handled automatically by cert-reloader. When the pipeline runs again, this step only waits for the SWAG container to be ready; certificate issuance is managed by SWAG/cert-reloader within Let's Encrypt's 90-day cycle.
### Run Database Init Scripts (Step 20)
PostgreSQL and MongoDB init scripts run through Swarm overlay DNS service names (`postgresql`, `mongodb`):
```yaml
- name: Run Database Init Scripts
run: |
set -a; . ./.env; . ./.env.secrets.shared; set +a
echo "⏳ Waiting for PostgreSQL..."
until docker run --rm --network iklimco-net \
-e PGPASSWORD="${DATABASE_POSTGRES_ROOT_PASSWD}" \
postgis/postgis:17-3.5 \
pg_isready -h postgresql -U "${DATABASE_POSTGRES_ROOT_USER}" -q 2>/dev/null; do
sleep 5
done
for sql_file in $(ls ./init/postgresql/*.sql 2>/dev/null | sort); do
echo "▶ $(basename "$sql_file")"
docker run --rm -i --network iklimco-net \
-e PGPASSWORD="${DATABASE_POSTGRES_ROOT_PASSWD}" \
postgis/postgis:17-3.5 \
psql -h postgresql -U "${DATABASE_POSTGRES_ROOT_USER}" < "$sql_file"
done
echo "⏳ Waiting for MongoDB..."
until docker run --rm --network iklimco-net mongo:8 \
mongosh "mongodb://${DATABASE_MONGODB_ROOT_USER}:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb/admin" \
--eval "db.runCommand({ping:1})" --quiet 2>/dev/null; do
sleep 5
done
for js_file in $(ls ./init/mongodb/*.js 2>/dev/null | sort); do
echo "▶ $(basename "$js_file")"
docker run --rm -i --network iklimco-net mongo:8 \
mongosh "mongodb://${DATABASE_MONGODB_ROOT_USER}:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb/admin" \
--quiet < "$js_file"
done
echo "✅ Database init scripts completed"
working-directory: /workspace/iklim.co
```
- `postgresql` and `mongodb`: Swarm VIP service names, resolved on the `iklimco-net` overlay; Patroni primary automatic routing happens at VIP level
- SQL files `./init/postgresql/*.sql` and JS files `./init/mongodb/*.js` are created in the `Prepare Init Files` step by the `init_postgresql`/`init_mongodb` functions in `common-functions.sh`
- Idempotent: `CREATE IF NOT EXISTS` / `createCollection` semantics; runs safely again on later deploys
## Swarm Service Distribution
In prod, all 3 app nodes are manager + app worker, so services can be distributed across 3 nodes.
### Microservices
Each microservice has two stack files:
| File | Content | Environment |
| --- | --- | --- |
| `BE-<Service>/docker-stack-service.yml` | Base definitions, `replicas: 1` | Test + Prod |
| `BE-<Service>/docker-stack-service.prod.yml` | `replicas: 3`, `max_replicas_per_node: 1` | Prod only |
Prod deploy command:
```bash ```bash
docker stack deploy \ docker stack deploy \
-c BE-<Servis>/docker-stack-service.yml \ -c BE-<Service>/docker-stack-service.yml \
-c BE-<Servis>/docker-stack-service.prod.yml \ -c BE-<Service>/docker-stack-service.prod.yml \
iklimco iklimco
``` ```
`max_replicas_per_node: 1` zorunludur; bu olmadan Swarm node sayisi < replica sayisina dustugunde ayni node'a birden fazla replica yerlestirir. `max_replicas_per_node: 1` is mandatory; without it, when the Swarm node count is lower than the replica count, Swarm places more than one replica on the same node.
### Infra Servisleri ### Infra Services
`docker-stack-infra.yml` (base) ile `docker-stack-infra.prod.yml` (overlay) birlikte deploy edilir. Overlay; Vault, APISIX, RabbitMQ, Redis Sentinel gibi servisleri `replicas: 3` ve `max_replicas_per_node: 1` ile override eder. Detay: `Environment_Infrastructure/roadmap/prod-env/03-infra-stack-changes.md`. `docker-stack-infra.yml` (base) and `docker-stack-infra.prod.yml` (overlay) are deployed together. The overlay overrides services such as Vault, APISIX, RabbitMQ, and Redis Sentinel with `replicas: 3` and `max_replicas_per_node: 1`. Detail: `Environment_Infrastructure/roadmap/prod-env/03-infra-stack-changes.md`.
## Gateway ve Public Trafik #### cert-reloader and Vault Auto-Unseal
Public internet sadece `80/tcp` ve `443/tcp` ile SWAG uzerinden girer. SWAG `iklim-app-01`'e sabitlenmistir (Floating IP bu node'da). APISIX admin portlari (`9180`) ve diger servis portlari public acilmaz; SWAG reverse proxy olarak tum public trafigi APISIX'e iletir. Detay: `Environment_Infrastructure/roadmap/prod-env/04-swag-nginx-configs.md`. The `cert-reloader` sidecar service runs as `replicas: 1` inside the infra stack. It detects the Let's Encrypt certificate renewed by SWAG and distributes it to Vault. Because prod uses the shared StorageBox mount, SSH-based distribution is not required.
## Kabul Kriterleri Certificate renewal flow:
- 3 prod runner Gitea UI'da online gorunur. ```
- Her runner `prod-runner` label'ina sahiptir. SWAG renews the certificate -> writes it to SWAG_CONFIG_DIR (/mnt/storagebox/swag/config)
- Runner'lardan herhangi biri basit Docker komutu calistirabilir. cert-reloader detects the MD5 change
- `docker node ls` 3 manager gosterir. -> copies it to /mnt/storagebox/ssl/ directory (common mount on all app nodes)
- Bir runner/node kapatildiginda diger runner yeni job alabilir. -> runs docker service update --force iklimco_vault
- Prod workflow'lari StorageBox uzerindeki `prod/locks/prod-deploy.lock` global lock'unu kullanir. Vault (3 replicas) restarts
- Lock manuel degil, workflow tarafindan `mkdir/rmdir` ile otomatik yonetilir. -> each instance reads the new certificate from the /mnt/storagebox/ssl/ mount
- Public ingress sadece `22`, `80`, `443` ile sinirlidir. -> healthcheck checks sealed status every 30 seconds
- StorageBox'ta `prod/secrets/iklim.co/.env.secrets.swag` mevcuttur ve geçerli GoDaddy kimlik bilgilerini içerir. -> if sealed: reads from the vault_unseal_key Docker secret and automatically unseals
- Gitea'da `PROD_FLOATING_IP` project variable tanımlıdır. ```
The auto-unseal mechanism is provided by the Vault healthcheck inside `docker-stack-infra.yml`:
```yaml
healthcheck:
test:
- "CMD"
- "sh"
- "-c"
- >-
vault status -format=json 2>/dev/null | grep -q '"sealed":false' ||
vault operator unseal $$(cat /run/secrets/vault_unseal_key 2>/dev/null)
interval: 30s
timeout: 10s
start_period: 15s
retries: 5
```
The 3 replicas run their own healthchecks independently; all of them unseal separately. The certificate renewal -> restart -> auto-unseal chain requires no manual intervention. Detail: `roadmap/prod-env/06-cert-reloader.md`.
#### Vault Raft Configuration
Vault is defined as 3 replicas with Raft storage in the `docker-stack-infra.prod.yml` overlay:
```yaml
vault:
environment:
VAULT_LOCAL_CONFIG: >-
{"api_addr":"https://vault.iklim.co:8200",
"cluster_addr":"https://{{ .Node.Hostname }}:8201",
"storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
"listener":[{"tcp":{"address":"0.0.0.0:8200",
"tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
"tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}],
"default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
volumes:
- /opt/iklimco/vault/data:/vault/file # separate host path on each node — created with Ansible
- ${SWAG_CERT_DIR}:/vault/certs:ro # StorageBox shared — all nodes see the same path
deploy:
mode: replicated
replicas: 3
placement:
max_replicas_per_node: 1
constraints:
- node.labels.type == service
```
`{{ .Node.Hostname }}` is a Docker Swarm Go template; it gives each Vault instance a unique `node_id` and `cluster_addr`. Because `/opt/iklimco/vault/data` is a host path volume, it is not an overlay volume; it must be created separately on each app node during Ansible bootstrap. See `07-prod-ansible-bootstrap.md` — Node Directory Role. Detail: `roadmap/prod-env/07-vault-raft-plan.md`.
## Vault Raft Cluster Initial Setup
After the infra stack is deployed for the first time, the Vault Raft cluster is initialized manually once. These steps are not repeated on every deploy; they are applied only during initial setup.
### Step 1 — Stack Deploy
```bash
docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
```
3 Vault containers start. The first initialized node becomes the leader.
### Step 2 — Vault Initialize (iklim-app-01)
```bash
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
docker exec -it "$VAULT_CTR" vault operator init
```
Store the unseal keys and root token from the output securely. Save the unseal key as a Docker secret:
```bash
echo -n "<unseal-key>" | docker secret create vault_unseal_key -
```
> After this step, the `vault_unseal_key` secret exists. During later certificate renewals, cert-reloader restarts Vault; the healthcheck reads this secret and automatically unseals, so no manual intervention is required.
### Step 3 — Unseal the Leader
```bash
docker exec -it "$VAULT_CTR" vault operator unseal
```
### Step 4 — Join the Other Nodes to the Raft Cluster
The Vault containers on `iklim-app-02` and `iklim-app-03` join the cluster:
```bash
docker exec -it <vault-on-iklim-app-02> vault operator raft join \
https://vault.iklim.co:8200
docker exec -it <vault-on-iklim-app-03> vault operator raft join \
https://vault.iklim.co:8200
```
Each node is also unsealed after it joins:
```bash
docker exec -it <vault-on-iklim-app-02> vault operator unseal
docker exec -it <vault-on-iklim-app-03> vault operator unseal
```
### Step 5 — Verify the Cluster
```bash
docker exec "$VAULT_CTR" vault operator raft list-peers
```
Expected: 3 peers — one `leader`, two `follower`.
## Gateway and Public Traffic
Public internet enters only through SWAG on `80/tcp` and `443/tcp`. SWAG is pinned to `iklim-app-01`, where the Floating IP is located. APISIX admin ports (`9180`) and other service ports are not opened publicly; SWAG forwards all public traffic to APISIX as a reverse proxy.
### Subdomain Routing
| Subdomain | Target Service | Restriction |
| --- | --- | --- |
| `api.iklim.co` | APISIX `:9080` | Public |
| `apigw.iklim.co` | APISIX Dashboard `:9000` | IP restricted with `RESTRICTED_IPS` |
| `rabbitmq.iklim.co` | RabbitMQ Management `:15672` | IP restricted with `RESTRICTED_IPS` |
| `grafana.iklim.co` | Grafana `:3000` | IP restricted with `RESTRICTED_IPS` |
IP restriction is done with the `RESTRICTED_IPS_BLOCK` nginx allow block derived from the `RESTRICTED_IPS` variable; it is applied in SWAG nginx configuration, not in the Hetzner firewall.
### SWAG -> APISIX Load Distribution
SWAG connects to APISIX through the Docker Swarm service name with `proxy_pass http://apisix:9080;`. Swarm resolves the `apisix` service name to a VIP (Virtual IP); the IPVS load balancer distributes incoming connections round-robin across the 3 replicas in prod. No additional upstream or load balancer configuration is required on the SWAG side; load distribution happens transparently at the overlay network layer.
`Prometheus` is intentionally not exposed externally through SWAG. Access uses Grafana, whose internal connection is `http://prometheus:9090`, or an SSH tunnel.
Detay: `Environment_Infrastructure/roadmap/prod-env/04-swag-nginx-configs.md`.
## Post-Deploy Verification
After a successful prod pipeline deploy, run the following checks.
### Swarm Health
```bash
docker node ls
```
Expected: 3 managers (`Leader` + 2 `Reachable`) — `iklim-app-01/02/03`; 3 workers (`Ready`) — `iklim-db-01/02/03`.
```bash
docker service ls --filter label=project=co.iklim
```
All services must show `REPLICAS X/X`; target met.
### Precipitation Image Directory
```bash
ls -ld /mnt/storagebox/precipitation/images
```
The directory must exist; it is required before `iklimco_precipitation-service` is deployed.
```bash
docker volume inspect iklimco_image-data
```
Expected: `Options.device` -> `/mnt/storagebox/precipitation/images`.
### SWAG Certificate
```bash
docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates
```
Expected: `*.iklim.co`, `VALID: XX days` (Let's Encrypt — not the old manual cert).
TLS check from outside:
```bash
echo | openssl s_client -connect api.iklim.co:443 -servername api.iklim.co 2>/dev/null \
| openssl x509 -noout -subject -dates
```
Expected: `CN=*.iklim.co`, `notAfter > 2026-07-15`.
> Warning: The old manual `*.iklim.co` certificate expires on **2026-07-15**. After SWAG's Let's Encrypt certificate is verified for the first time, the old cert on StorageBox can be archived and is no longer used.
### Public API Access
```bash
curl -si https://api.iklim.co/health
```
It must return HTTP 2xx; there must be no TLS error.
### IP Restriction
From a disallowed IP:
```bash
curl -si https://grafana.iklim.co
curl -si https://apigw.iklim.co
curl -si https://rabbitmq.iklim.co
```
All must return HTTP 403.
From an allowed IP (78.187.87.109 or 95.70.151.248):
```bash
curl -si https://grafana.iklim.co # HTTP 200 Grafana
curl -si https://apigw.iklim.co # HTTP 200 APISIX Dashboard
curl -si https://rabbitmq.iklim.co # HTTP 200 RabbitMQ Management
```
### Vault Access Control
Must not be reachable from outside:
```bash
# Expected: connection refused or timeout
curl -sk --connect-timeout 5 https://<iklim-app-01-public-ip>:8200/v1/sys/health
```
Must be reachable from inside the overlay:
```bash
# Expected: {"sealed":false,...}
docker exec $(docker ps -q -f name=iklimco_apisix | head -1) \
curl -sk https://vault.iklim.co:8200/v1/sys/health
```
### No Unexpected Ports
```bash
docker service ls --format "{{.Name}}\t{{.Ports}}" \
--filter label=project=co.iklim
```
Only `iklimco_swag` -> `*:80->80/tcp, *:443->443/tcp` should publish ports; other services must not publish ports.
### APISIX Replica Distribution
```bash
docker service ps iklimco_apisix
```
Expected: 3 tasks, all `Running`, on different nodes.
### fail2ban (SWAG Container)
```bash
docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status
```
Expected: a list with more than one jail.
### Microservice Health (After Microservices Are Deployed)
After microservices are deployed with a separate pipeline:
```bash
curl -si "https://api.iklim.co/v1/weather/current?lat=39&lon=35"
```
Expected: valid JSON weather response.
## Acceptance Criteria
- 3 prod runners appear online in the Gitea UI.
- Every runner has the `prod-runner` label.
- Any runner can run a simple Docker command.
- `docker node ls` shows 3 managers.
- When one runner/node is shut down, another runner can pick up a new job.
- All prod deploy workflows (`concurrency: group: prod-deploy`) are queued by Gitea; concurrent execution is prevented.
- Public ingress is limited to only `22`, `80`, and `443`.
- `prod/secrets/iklim.co/.env.secrets.swag` exists on StorageBox and contains valid GoDaddy credentials.
- `PROD_FLOATING_IP` project variable is defined in Gitea.
- `redis_password` and `rabbitmq_erlang_cookie` appear in `docker secret ls`.
- The `ssl`, `swag/config`, `swag/site-confs`, `grafana/data`, `prometheus/data`, and `precipitation/images` directories exist on StorageBox; see `07-prod-ansible-bootstrap.md` — StorageBox Directory Structure.
- The `swag/site-confs/default.conf`, `api.conf.tpl`, `apigw.conf.tpl`, `rabbitmq.conf.tpl`, and `grafana.conf.tpl` template files exist in the repo.
- StorageBox `prod/secrets/iklim.co/.env.prod` has correct values for `API_SUBDOMAIN`, `APIGW_SUBDOMAIN`, `RABBITMQ_SUBDOMAIN`, `GRAFANA_SUBDOMAIN`, `RESTRICTED_IPS`, `SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, and `SWAG_SITE_CONFS_DIR`.
- After the first deploy, `docker exec $(docker ps -q -f name=iklimco_swag) nginx -t` succeeds and returns `syntax is ok`.
- The output of `cat /mnt/storagebox/swag/site-confs/api.conf | grep server_name` contains `server_name api.iklim.co;`.
- The `ssls/1` PUT block does not exist inside `init/apisix-core/init.sh`.
- The `registry.tarla.io/iklimco/custom-apisix:3.12.0` image exists in Harbor and its `config.yaml` contains `set_real_ip_from: 10.0.0.0/8` configuration.
- After the first deploy, real client IP appears in APISIX access logs, not the SWAG overlay IP: `docker exec $(docker ps -q -f name=iklimco_apisix | head -1) tail -5 /usr/local/apisix/logs/access.log`
- `docker service ps iklimco_cert-reloader` shows that the service is running.
- The output of `docker service logs iklimco_cert-reloader --tail 20` contains `[cert-reloader] started` and has no error lines.
- The `notAfter` date of the Vault TLS endpoint certificate matches `/mnt/storagebox/ssl/STAR.iklim.co.full.crt`: `docker exec $(docker ps -q -f name=iklimco_vault | head -1) sh -c 'echo | openssl s_client -connect vault.iklim.co:8200 2>/dev/null | openssl x509 -noout -dates'`
- `vault operator raft list-peers` returns 3 peers: 1 leader, 2 followers.
- The `vault_unseal_key` Docker secret exists and appears in `docker secret ls`.
- 3 Vault containers are not sealed: `docker exec $(docker ps -q -f name=iklimco_vault | head -1) vault status | grep Sealed` -> `Sealed false`.
- The first deploy pipeline successfully completes all 21 steps; the `Review Environment` step succeeds.
- After the `Bootstrap SWAG Certificate` step, `ls /mnt/storagebox/ssl/` -> `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.pem` exist.
- The `Run Database Init Scripts` step completes without error; PostgreSQL and MongoDB are healthy and init scripts are applied.
- In the output of `docker service ls --filter label=project=co.iklim`, all infra services show `X/X`.
- `docker volume inspect iklimco_image-data``Options.device=/mnt/storagebox/precipitation/images`.
- `docker exec $(docker ps -q -f name=iklimco_swag) certbot certificates` -> `*.iklim.co` Let's Encrypt certificate is valid; it is not the old manual cert.
- `echo | openssl s_client -connect api.iklim.co:443 2>/dev/null | openssl x509 -noout -subject -dates``CN=*.iklim.co`, `notAfter > 2026-07-15`.
- `curl -si https://api.iklim.co/health` -> HTTP 2xx; no TLS error.
- `https://grafana.iklim.co`, `https://apigw.iklim.co`, `https://rabbitmq.iklim.co` — returns HTTP 403 from a disallowed IP and HTTP 200 from an allowed IP.
- `curl --connect-timeout 5 https://<public-ip>:8200` -> connection refused or timeout; Vault is not reachable from outside.
- `docker exec $(docker ps -q -f name=iklimco_apisix | head -1) curl -sk https://vault.iklim.co:8200/v1/sys/health` -> `{"sealed":false,...}`; reachable from inside the overlay.
- `docker service ls --format "{{.Name}}\t{{.Ports}}" --filter label=project=co.iklim` -> only `iklimco_swag` publishes ports.
- `docker service ps iklimco_apisix` -> 3 tasks, `Running`, on different nodes.
- `docker exec $(docker ps -q -f name=iklimco_swag) fail2ban-client status` -> more than one jail appears.