Environment_Infrastructure/setup/06-prod-terraform-iaac.md
Murat ÖZDEMİR bf8f011e43 Restructure setup documentation and refine environment bootstrapping
This commit introduces a reordered and renumbered set of setup documentation files to better reflect the deployment stages for both test and production environments.

Key changes include:
*   A new `setup-vs-roadmap-map.md` file to provide a clear mapping between roadmap tasks and their corresponding setup phases.
*   Significantly expanded Ansible bootstrap documentation for both test and production, detailing Docker, Swarm, security hardening, and StorageBox SSH key management roles.
*   Formalized database Docker and Swarm cluster setup instructions for test and production, including explicit steps for Swarm worker integration of DB nodes.
*   Updated roadmap documentation (`roadmap/prod-env/*`) to align with the refined setup, incorporating correct private IP addresses for Swarm joins, new node labels, and floating IP usage for GoDaddy DNS records.
2026-05-11 17:47:30 +03:00

348 lines
12 KiB
Markdown

# 06 - Prod Terraform IaC
Bu asamanin amaci prod Hetzner Cloud Project icinde HA odakli IaaS kaynaklarini Terraform ile olusturmaktir. Bu dokuman prod Terraform ajanina tek basina verilebilir.
## Kapsam
Terraform prod ortaminda sunlari olusturur:
- Private network: `iklim-prod-net`
- Subnetler:
- App/Swarm subnet: `10.20.10.0/24`
- DB subnet: `10.20.20.0/24`
- Firewall:
- Public ingress: sadece `22/tcp`, `80/tcp`, `443/tcp`
- Private ingress: `01-private-network-port-matrisi.md` dosyasindaki prod kurallari
- SSH key
- Placement groups:
- `iklim-prod-app-spread`
- `iklim-prod-db-spread`
- Floating IP: app entry point icin sabit IPv4 (`iklim-app-01`'e atanir)
- Servers:
- `iklim-app-01`
- `iklim-app-02`
- `iklim-app-03`
- `iklim-db-01`
- `iklim-db-02`
- `iklim-db-03`
- Ansible inventory output
DB cluster yazilimi Terraform ile kurulmayacak. DB node'lari sadece makine, network ve firewall seviyesinde hazirlanacak.
## Versiyon Gereksinimleri
```text
Terraform >= 1.6
hcloud provider ~> 1.49
```
## Onerilen Dosya Yapisi
```text
terraform/
hetzner/
prod/
versions.tf
providers.tf
variables.tf
locals.tf
network.tf
firewall.tf
placement.tf
servers.tf
floating_ip.tf
outputs.tf
terraform.tfvars.example
```
`terraform.tfvars`, state dosyalari ve token repo'ya commit edilmeyecek.
## Degiskenler
`environment` sabiti `locals.tf` icindedir; `tfvars` ile override edilmez.
Minimum degiskenler:
```hcl
hcloud_token = "secret"
location = "fsn1"
image = "rocky-10"
server_type_swarm = "cpx42"
server_type_db = "cpx32"
admin_ssh_public_key_path = "~/.ssh/id_ed25519.pub"
admin_allowed_cidrs = ["X.X.X.X/32"]
```
Server type karari `../hetzner-sizing-report.md` dokumanindaki mevcut test
ortami metrikleri ve prod cluster topolojisi dikkate alinarak belirlenmistir.
Prod app node'lar icin Java mikroservis bellek baskisi nedeniyle `cpx42`,
prod DB node'lar icin ise 3 node cluster baslangici nedeniyle ekonomik
`cpx32` onerilir. Kapasite ihtiyaci metriklerle dogrulandiginda node ekleme
veya in-place rescale yapilabilir.
## Server Rolleri ve Private IP Plani
| Server | Private IP | Rol |
| --- | --- | --- |
| `iklim-app-01` | `10.20.10.11` | Swarm manager + app worker + runner (primary, FIP alir) |
| `iklim-app-02` | `10.20.10.12` | Swarm manager + app worker + runner |
| `iklim-app-03` | `10.20.10.13` | Swarm manager + app worker + runner |
| `iklim-db-01` | `10.20.20.11` | Manuel DB cluster node |
| `iklim-db-02` | `10.20.20.12` | Manuel DB cluster node |
| `iklim-db-03` | `10.20.20.13` | Manuel DB cluster node |
Private IP'ler `locals.tf` icinde `swarm_private_ips` ve `db_private_ips` map'leri olarak sabit tanimlanir. Sunucu listesi `for_each` ile bu map'lerden turetilir.
## Onerilen Kaynaklar ve Maliyet
| Server | Rol | Server Type | CPU | RAM | SSD | Aylik |
| --- | --- | --- | ---: | ---: | ---: | ---: |
| `iklim-app-01` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
| `iklim-app-02` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
| `iklim-app-03` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 |
| `iklim-db-01` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 |
| `iklim-db-02` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 |
| `iklim-db-03` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 |
| **Toplam** | 6 server | | **36 vCPU** | **72 GB** | **1,440 GB** | **$139.44** |
## Placement Group Karari
Prod icin iki ayri spread placement group:
```text
iklim-prod-app-spread: iklim-app-01/02/03
iklim-prod-db-spread: iklim-db-01/02/03
```
Bu sayede Swarm quorum node'lari kendi aralarinda farkli fiziksel host'lara, DB node'lari da kendi aralarinda farkli fiziksel host'lara yerlestirilmeye calisilir.
Notlar:
- Hetzner kabinet secimi dogrudan sunmaz.
- Spread placement group farkli fiziksel host hedefler.
- Farkli lokasyon/region felaket kurtarma bu asamada konu disidir.
- Ileride scale buyudugunde multi-location DR ayri tasarlanmalidir.
## Floating IP
`iklim-prod-app-fip` adli IPv4 floating IP olusturulur ve `iklim-app-01`'e atanir. DNS A kaydi bu IP'ye yonlendirilir. Failover gerekirse floating IP baska bir app node'una tasinabilir.
## Public Firewall
Public ingress:
| Port | Kaynak | Hedef |
| --- | --- | --- |
| `22/tcp` | `admin_allowed_cidrs` | Tum prod node'lari |
| `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` (Floating IP uzerinden) |
| `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` (Floating IP uzerinden) |
Prod'da su portlar public acilmayacak:
- `8200/tcp` Vault
- `5432/tcp` PostgreSQL
- `27017/tcp` MongoDB
- `6379/tcp` Redis
- `5672/tcp`, `15672/tcp`, `61613/tcp`, `15674/tcp` RabbitMQ
- `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` Docker Swarm
- `9180/tcp` APISIX Admin API
- `9090/tcp` Prometheus
- `3000/tcp` Grafana
## Private Firewall
### App (swarm) Firewall — Private Ingress
App subnet kaynakli (`10.20.10.0/24`):
| Port | Servis | Erisim yontemi |
| --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | App subnet icinden |
| `7946/tcp,udp` | Docker Swarm node discovery | App subnet icinden |
| `4789/udp` | Docker Swarm VXLAN overlay | App subnet icinden |
| `8200/tcp` | Vault | Docker overlay / private network |
| `6379/tcp` | Redis | App subnet icinden |
| `5672/tcp` | RabbitMQ AMQP | App subnet icinden |
| `61613/tcp` | RabbitMQ STOMP | App subnet icinden |
| `15674/tcp` | RabbitMQ Web STOMP | App subnet icinden |
| `15672/tcp` | RabbitMQ Management | SWAG arkasinda `443` — IP kisitli |
| `9000/tcp` | APISIX Dashboard | SWAG arkasinda `443` — IP kisitli |
| `9180/tcp` | APISIX Admin API | Docker overlay icinden sadece Dashboard erisir |
| `9090/tcp` | Prometheus | SWAG arkasinda `443` — IP kisitli |
| `3000/tcp` | Grafana | SWAG arkasinda `443` — IP kisitli |
DB subnet kaynakli (`iklim-db-*` node'lari Swarm'a worker olarak katildigi icin):
| Port | Servis | Kaynak |
| --- | --- | --- |
| `2377/tcp` | Docker Swarm control plane | `10.20.20.0/24` |
| `7946/tcp,udp` | Docker Swarm node discovery | `10.20.20.0/24` |
| `4789/udp` | Docker Swarm VXLAN overlay | `10.20.20.0/24` |
### DB Firewall — Private Ingress
Admin erisimi:
| Port | Servis | Kaynak |
| --- | --- | --- |
| `22/tcp` | SSH | `admin_allowed_cidrs` |
App subnet kaynakli (`10.20.10.0/24`):
| Port | Servis | Not |
| --- | --- | --- |
| `5432/tcp` | PostgreSQL (Patroni primary) | App subnet erisimi |
| `27017/tcp` | MongoDB replica set endpoint | App subnet erisimi |
| `2377/tcp` | Docker Swarm control plane | App subnet icinden |
| `7946/tcp,udp` | Docker Swarm node discovery | App subnet icinden |
| `4789/udp` | Docker Swarm VXLAN overlay | App subnet icinden |
DB subnet icindeki karsilikli erisim (`10.20.20.0/24`):
| Port | Servis | Not |
| --- | --- | --- |
| `5432/tcp` | PostgreSQL Patroni replication | DB node'lari arasi |
| `27017/tcp` | MongoDB replica set internal | DB node'lari arasi |
| `2379/tcp` | etcd client | Patroni → etcd erisimi |
| `2380/tcp` | etcd peer | etcd cluster internal |
| `8008/tcp` | Patroni REST API | Patroni leader election ve saglik kontrolu |
IP kisitlamasi Hetzner firewall'da degil, SWAG nginx konfigurasyonunda yapilir.
## Outputs
`terraform apply` veya `terraform output` sonrasi asagidaki degerler alinabilir:
| Output | Aciklama |
| --- | --- |
| `ansible_inventory_yaml` | Ansible inventory YAML — `ansible/inventory/generated/prod.yml` dosyasina yazilir |
| `prod_private_ips` | Tum node'larin private IP haritasi (`swarm` ve `db` alt anahtarlari) |
| `prod_public_ips` | Tum node'larin public IPv4 haritasi |
| `prod_floating_ip` | Swarm giris noktasi icin Floating IP adresi (DNS A kaydi bu IP'ye yonlendirilir) |
Ansible inventory cikarmak icin:
```bash
terraform output -raw ansible_inventory_yaml > \
../../ansible/inventory/generated/prod.yml
```
## Lifecycle ve Resize Politikasi
### server_type Degisikligi (Yeniden Boyutlandirma)
`server_type` degistirmek Terraform destroy+create **tetiklemez**. `hcloud` provider
bunu natively destekler: sunucuyu durdurur, Hetzner Resize API'sini cagirir,
yeniden baslatir. `terraform.tfvars` icinde degeri guncelle, `terraform apply` calistir.
Downtime olur (sunucu durur ve baslar) ancak disk, kurulu yazilim ve Docker volumes
korunur. `ignore_changes` veya manuel adim gerekmez.
### Hangi Degisiklikler Sunucuyu Zorla Yeniden Olusturur?
| Degisen alan | Davranis | Not |
| --- | --- | --- |
| `server_type` | In-place resize (provider native) | `terraform apply` yeterli |
| `hcloud_server_network` | Sadece attachment guncellenir | Ayri resource kullanildigi icin |
| `hcloud_firewall_attachment` | Sadece attachment guncellenir | Ayri resource kullanildigi icin |
| `placement_group_id` | Hetzner API degisime izin vermiyor → destroy+create | Degistirme |
| `image` | Disk imaji degisir → destroy+create | Degistirme |
| `location` | Baska datacenter'a tasinamaz → destroy+create | Degistirme |
### Network ve Firewall Attachment Ayrimi
`network` blogu ve `firewall_ids` `hcloud_server` icine gomulmez. Bunun yerine
ayri resource tanimlanir:
- `hcloud_server_network` — private IP atamasi (`for_each` ile her node icin)
- `hcloud_firewall_attachment` — firewall iliskisi (`for_each` ile turetilen server listesi)
### prevent_destroy Korumasi
Her sunucuya `lifecycle { prevent_destroy = true }` eklenir. Kasitli silmek icin
once lifecycle blogunu gecici olarak kaldir.
## Nasil Calistirilir
### Hazirlik
**1. tfvars olustur (bir kere):**
```bash
cd Environment_Infrastructure/terraform/hetzner/prod
cp terraform.tfvars.example terraform.tfvars
# terraform.tfvars icerigini gercek degerlerle doldur
# (hcloud_token, admin_allowed_cidrs, vb.)
```
`terraform.tfvars` commit edilmez — `.gitignore` ile korunur.
**2. Provider yukle (bir kere):**
```bash
terraform init
```
### Ilk Uygulama
```bash
# Nelerin olusacagini goster — bozma yapma
terraform plan
# Onayla ve olustur
terraform apply
```
`apply` sonrasi 6 sunucu, 2 firewall, 1 floating IP ve network kaynaklari Hetzner'da gorunur.
### Ansible Inventory Alma
```bash
terraform output -raw ansible_inventory_yaml > \
../../ansible/inventory/generated/prod.yml
```
### Resize (Server Type Degistirme)
`terraform.tfvars` icinde `server_type_swarm` veya `server_type_db` degerini degistir:
```bash
terraform apply
```
Sunucu durdurulur, Hetzner Resize API cagirilir, yeniden baslatilir. Disk ve Docker volumes korunur. Downtime olur.
### Sunucu Silme (Zorla)
`prevent_destroy = true` oldugu icin normal `terraform destroy` hata verir. Once `servers.tf` icindeki `lifecycle` blogunu gecici kaldir:
```hcl
# lifecycle {
# prevent_destroy = true
# }
```
Sonra:
```bash
terraform destroy -target=hcloud_server.swarm["iklim-app-01"]
```
Islemi tamamladiktan sonra lifecycle blogunu geri ekle.
### State Yonetimi
Simdilik local state kullanilmaktadir (`terraform.tfstate`). State dosyasi repo'ya commit edilmez. Ekipte birden fazla kisi calisiyorsa Hetzner Object Storage veya HCP Terraform remote state kullanilmalidir.
## Kabul Kriterleri
- `terraform plan` sadece prod Hetzner Project token'i ile calisir.
- 6 server olusur (`iklim-app-01/02/03`, `iklim-db-01/02/03`).
- Swarm node'lari `iklim-prod-app-spread` placement group icindedir.
- DB node'lari `iklim-prod-db-spread` placement group icindedir.
- Public firewall sadece `22`, `80`, `443` ingress'e izin verir.
- Private firewall `01-private-network-port-matrisi.md` ile uyumludur.
- DB replication portlari yalnizca DB subnet'ten erisilebilir.
- Floating IP olusur ve `iklim-app-01`'e atanir.
- Terraform state ve secret tfvars commit edilmez.