docs(infra): restructure and update infrastructure setup documentation

- Anglicized setup and facts markdown file names for better consistency. - Updated 01-swarm-init-multinode.md to highlight Ansible automation of Swarm initialization and labeling. - Overhauled 03-infra-stack-changes.md to describe the single monolithic file strategy and reflect current Redis, RabbitMQ, and etcd cluster configurations. - Fixed minor overrides and typos in Patroni templates and Ansible bootstrap documents. - Restructured README and roadmap mapping to align with the renamed setup documents.
2026-06-15 16:42:18 +03:00 · 2026-06-15 16:42:18 +03:00 · 67dc2986dd
commit 67dc2986dd
parent 1fd752526b
19 changed files with 666 additions and 1188 deletions
--- a/README.md
+++ b/README.md
@ -1,64 +1,111 @@
-# 🌍 iklim.co Altyapı ve Sunucu Yönetimi
+# iklim.co Altyapı ve Sunucu Yönetimi

-Bu depo, `iklim.co` projesinin **test** ve **production** ortamlarını kurmak, yönetmek ve modernize etmek için gerekli olan Infrastructure-as-Code (IaC) varlıklarını, teknik rehberleri ve operasyonel standartları barındırır.
+Bu depo, `iklim.co` test ve production ortamlarını provision etmek, yapılandırmak, işletmek ve modernize etmek için kullanılan Infrastructure-as-Code varlıklarını, kurulum runbook'larını, operasyonel facts dokümanlarını ve planlama notlarını içerir.

-Altyapı yönetimi; Hetzner Cloud üzerinde Terraform ile kaynak provisioning, Ansible ile işletim sistemi yapılandırması ve Docker Swarm üzerinde mikroservis mimarisinin kurgulanması süreçlerini kapsar.
+Altyapı yönetimi Hetzner Cloud üzerinde Terraform ile kaynak provisioning, Ansible ile işletim sistemi ve Swarm bootstrap, Docker Swarm üzerinde altyapı ve uygulama servislerinin deploy edilmesi süreçlerini kapsar.

---
+## Depo Yapısı

-## 📂 Depo Yapısı ve Temel Bölümler
+### Terraform (`terraform/`)

-Bu depodaki dökümantasyon ve kod varlıkları beş ana kategoriye ayrılmıştır:
+Terraform, uzak test ve production ortamları için Hetzner Cloud kaynaklarını tanımlar:

-### 1. 🛣️ Roadmap (`roadmap/`)
-Ortamların (test ve prod) sıfırdan kurulması veya mevcut yapının güncellenmesi için gerekli olan **iş gereksinimlerini, teknik hedefleri ve adım adım uygulama planlarını** içerir. 
- Altyapıda yapılacak büyük değişikliklerin (örn: Redis Sentinel geçişi, APISIX konfigürasyonu, RabbitMQ Quorum Queues) stratejik dökümantasyonudur.
- [roadmap/test-env/](./roadmap/test-env/) - Test ortamı gereksinimleri ve planları.
- [roadmap/prod-env/](./roadmap/prod-env/) - Üretim ortamı HA (High Availability) ve güvenilirlik planları.
+- `terraform/hetzner/test`: test sunucuları, network, firewall, Floating IP, placement ve outputs.
+- `terraform/hetzner/prod`: production app/service node'ları, DB node'ları, private networking, firewall'lar, placement group'lar, Floating IP ve outputs.

-### 2. 🛠️ Setup (`setup/`)
-Altyapının fiziksel olarak ayağa kaldırılması için kullanılan **uygulama dökümanlarıdır**. Bu bölüm şunları yönetmek için kullanılır:
- **Terraform:** Bulut kaynaklarının (Server, Network, Firewall) üretilmesi.
- **Ansible:** İşletim sistemi hazırlığı, güvenlik sertleştirme (hardening), Docker/Swarm kurulumu.
- **CI/CD:** Deployment workflow'larının (Gitea Actions) ve stack manifest'lerinin oluşturulması/güncellenmesi.
- Örn: [setup/06-prod-terraform-iaac.md](./setup/06-prod-terraform-iaac.md), [setup/07-prod-ansible-bootstrap.md](./setup/07-prod-ansible-bootstrap.md)
+Dev ortamı lokal ve Docker Compose tabanlıdır; bu Terraform stack'leri tarafından provision edilmez.

-### 3. 🗺️ Setup vs Roadmap Matrisi (`setup-vs-roadmap-map.md`)
-İşterler doğrultusunda hazırlanan **Roadmap** dökümanları ile bu isterleri teknik olarak hayata geçiren **Setup** dökümanları arasındaki ilişkiyi açıklar.
- Hangi roadmap adımının hangi setup dökümanı ile uygulandığını gösteren bir eşleşme matrisidir.
- [setup-vs-roadmap-map.md](./setup-vs-roadmap-map.md) dökümanından detaylara ulaşılabilir.
+### Ansible (`ansible/`)

-### 4. 📊 Hetzner Sizing Report (`hetzner-sizing-report.md`)
-İklim altyapı servisleri (API Gateway, Microservices, Databases, Broker) için seçilen **Hetzner sunucu tiplerini, CPU/RAM kapasitelerini ve maliyet/performans analizlerini** anlatır.
- Ortam kurulumundan önce kapasite planlaması için temel referans noktasıdır.
- [hetzner-sizing-report.md](./hetzner-sizing-report.md) dökümanını inceleyin.
+Ansible, Terraform provisioning sonrası uzak host'ları hazırlar:

-### 5. 💡 Facts (`facts/`)
-Ortam kurulumları tamamlandıktan sonra ortaya çıkan, **sistemin o anki gerçek durumunu (source of truth) ve bilinmesi gereken kritik teknik detayları** barındıran dökümanlardır.
- "Sistem şu an nasıl çalışıyor?" sorusunun cevabıdır.
- [facts/firewall.md](./facts/firewall.md): Aktif firewall kuralları ve port matrisi.
- [facts/swarm-node-recovery-swag-failover.md](./facts/swarm-node-recovery-swag-failover.md): Node düşmesi durumunda manuel müdahale ve recovery prosedürleri.
+- `ansible/test`: test bootstrap playbook'ları, inventory ve ortama özel değişkenler.
+- `ansible/prod`: production bootstrap playbook'ları, inventory, değişkenler ve prod'a özel rol override'ları.
+- `ansible/roles`: `base`, `hardening`, `docker`, `swarm`, `node_dirs`, `storagebox`, `storagebox_ssh_key`, `act_runner` ve ortak `db_stack` gibi paylaşılan roller.

---
+Production, `ansible/prod/ansible.cfg` içinde `roles_path = roles:../roles` kullanır. Bu nedenle `ansible/prod/roles/db_stack` gibi prod-local roller mevcut olduğunda paylaşılan rollerden önce çalışır.

-## 🧱 Kurulum Akışı (Kanonik Sıra)
+### Setup Runbook'ları (`setup/`)

-Bir ortamı sıfırdan kurarken veya majör bir güncelleme yaparken şu sırayı takip edin:
+Setup dokümanları, ortamları ayağa kaldırmak veya büyük altyapı değişikliklerini uygulamak için kullanılan kanonik uygulama runbook'larıdır. Güncel dosyalar:

-1.  **Analiz:** [hetzner-sizing-report.md](./hetzner-sizing-report.md) ile kaynak ihtiyacını belirleyin.
-2.  **Planlama:** `roadmap/` altındaki ilgili ortam dökümanlarını inceleyerek yapılacak değişiklikleri anlayın.
-3.  **Hizalama:** [setup-vs-roadmap-map.md](./setup-vs-roadmap-map.md) ile hangi setup dökümanlarını kullanacağınızı netleştirin.
-4.  **Uygulama:** `setup/` dökümanlarını (00'dan 09'a kadar) sırasıyla takip ederek Terraform ve Ansible süreçlerini işletin.
-5.  **Doğrulama:** Kurulum sonrası sistemin çalışma prensipleri için `facts/` dökümanlarını referans alın.
+- [setup/00-general-roadmap.md](./setup/00-general-roadmap.md)
+- [setup/01-private-network-port-matrix.md](./setup/01-private-network-port-matrix.md)
+- [setup/02-test-terraform-iac.md](./setup/02-test-terraform-iac.md)
+- [setup/03-test-ansible-bootstrap.md](./setup/03-test-ansible-bootstrap.md)
+- [setup/04-test-db-docker-setup.md](./setup/04-test-db-docker-setup.md)
+- [setup/05-test-runner-and-deploy-prerequisites.md](./setup/05-test-runner-and-deploy-prerequisites.md)
+- [setup/06-prod-terraform-iac.md](./setup/06-prod-terraform-iac.md)
+- [setup/07-prod-ansible-bootstrap.md](./setup/07-prod-ansible-bootstrap.md)
+- [setup/08-prod-db-cluster-setup.md](./setup/08-prod-db-cluster-setup.md)
+- [setup/09-prod-runner-ha-and-swarm.md](./setup/09-prod-runner-ha-and-swarm.md)

---
+Bu dokümanlar Terraform, Ansible, Swarm label'ları, StorageBox path'leri, runner ön koşulları, DB servisleri ve production Swarm deploy modelinin birlikte nasıl çalıştığını açıklar.

-## ✅ Ön Koşullar ve Araçlar
+### Roadmap (`roadmap/`)

- **Terraform >= 1.6**: Altyapı provisioning.
- **Ansible**: Konfigürasyon yönetimi.
- **Hetzner Cloud API Token**: Ortam bazlı yetkilendirme.
- **SSH Key**: Sunucu erişimi için sisteme tanımlı anahtar çifti.
+Roadmap dokümanları test ve production değişiklikleri için gereksinimleri, tasarım hedeflerini ve migration planlarını açıklar:

---
-*iklim.co Infrastructure Team - 2026*
+- [roadmap/test-env/](./roadmap/test-env/)
+- [roadmap/prod-env/](./roadmap/prod-env/)
+
+Roadmap dokümanlarını amaç ve tasarım bağlamı için kullanın. Güncel uygulama akışı için setup runbook'larını kullanın.
+
+### Setup vs Roadmap Map
+
+[setup-vs-roadmap-map.md](./setup-vs-roadmap-map.md), roadmap maddelerini bu maddeleri hayata geçiren setup dokümanları ve implementation alanları ile eşler.
+
+### Facts (`facts/`)
+
+Facts dokümanları güncel durum detaylarını ve operasyonel geçmişi korur:
+
+- [facts/firewall.md](./facts/firewall.md): aktif firewall ve port bilgileri.
+- [facts/node-recovery-failover.md](./facts/node-recovery-failover.md): node recovery ve failover prosedürleri.
+- [facts/prod-kurulum-gecmisi.md](./facts/prod-kurulum-gecmisi.md): production kurulum geçmişi ve güncel production notları.
+
+Facts dokümanlarını “sistem şu an nasıl çalışıyor?” sorusu, tarihsel bağlam ve setup sonrası doğrulama için kullanın.
+
+### Hetzner Sizing Report
+
+[hetzner-sizing-report.md](./hetzner-sizing-report.md), altyapı servisleri, veritabanları, broker'lar ve uygulama workload'ları için sunucu sizing, CPU/RAM seçimleri ve maliyet/performans değerlendirmelerini açıklar.
+
+### Confluence Export (`confluence-wiki/`)
+
+`confluence-wiki/`, altyapı notlarının repository dışına yayınlanması veya mirror edilmesi gerektiğinde kullanılan wiki odaklı/export edilmiş dokümantasyon materyallerini içerir.
+
+## Güncel Production Modeli
+
+Production şu anda ayrık altyapı modeli kullanır:
+
+- Ana infra ve DB stack: root `docker-stack-infra_db-prod.yml`.
+- Vault stack: root `docker-stack-vault.yml`.
+- Vault bootstrap: root `init/vault/vault-bootstrap.sh`; production deploy akışında `init-infra-prod.sh` üzerinden çağrılır.
+- Production pipeline source of truth: root `.gitea/workflows/deploy-prod.yml` ve root `prod_env-ci_dc-pipeline.md`.
+
+`docker-stack-infra_db-prod.yml` bilinçli olarak karma bir stack'tir:
+
+- Patroni/PostgreSQL, MongoDB ve etcd gibi DB/cluster servisleri `iklim-db-*` node'larında çalışır ve gerektiği yerde host-mode cluster portları kullanır.
+- Redis, Redis Sentinel ve RabbitMQ gibi service-node altyapı servisleri `node.labels.type == service` app/service node'larında çalışır ve stack veya reverse proxy tarafından açıkça expose edilmedikçe Docker overlay network üzerinde kalır.
+
+## Kanonik Kurulum Akışı
+
+Yeni bir ortam veya büyük bir altyapı güncellemesi için:
+
+1. [hetzner-sizing-report.md](./hetzner-sizing-report.md) dosyasını inceleyin.
+2. Tasarım amacını anlamak için ilgili `roadmap/` dokümanlarını inceleyin.
+3. Her roadmap maddesinin hangi setup runbook'u ile uygulandığını görmek için [setup-vs-roadmap-map.md](./setup-vs-roadmap-map.md) dosyasını kontrol edin.
+4. Hedef ortam için numaralı `setup/` runbook'larını sırayla takip edin.
+5. Güncel davranışı, recovery prosedürlerini, firewall durumunu ve production geçmişini doğrulamak için `facts/` dokümanlarını kullanın.
+
+## Gerekli Araçlar
+
+- Terraform `>= 1.6`
+- Ansible
+- Hedef ortam için Hetzner Cloud API token
+- Sunucu erişimi için yetkili SSH key pair
+
+## Notlar
+
+- Dev ortamı lokal ve Docker Compose tabanlıdır; uzak Terraform/Ansible otomasyonu test ve production ortamlarını hedefler.
+- Test daha küçük bir uzak ortamdır ve single-node DB/App varsayımlarına dayanır.
+- Production üç app/service node ve üç DB node içeren high-availability uzak ortamdır.
--- a/ansible/prod/roles/db_stack/templates/patroni.yml.j2
+++ b/ansible/prod/roles/db_stack/templates/patroni.yml.j2
@ -15,7 +15,7 @@ etcd3:
    - etcd-02:2379
    - etcd-03:2379
  username: root
-  password: "{{ vault_etcd_root_password }}"
+  password: "${ETCD_ROOT_PASSWORD}"

 bootstrap:
  dcs:
--- a/facts/swarm-node-recovery-swag-failover.md
+++ b/facts/swarm-node-recovery-swag-failover.md
@ -1,4 +1,4 @@
-# Docker Swarm — Node Recovery
+# Test — Docker Swarm Node Recovery

 Test ortamında tek manager (`iklim-app-01`) ve tek worker (`iklim-db-01`) bulunur. Hangi node'un yeniden kurulduğuna göre recovery süreci farklılaşır.

@ -32,17 +32,19 @@ DB verileri `iklim-db-01`'deki named volume'larda korunur, kayıp yaşanmaz.

 Yeni `iklim-db-01` Swarm'dan habersiz başlar (`inactive`). Manager (`iklim-app-01`) eski dead node kaydını tutar.

+> ⚠️ **Veri kaybı:** `iklim-db-01` yeniden kurulduğu için tüm named volume'lar silinmiştir. 3. adım öncesinde backup'tan restore yapılması zorunludur.
+
 ### Çözüm

 ```bash
-# 1. Ansible bootstrap — yeni node otomatik join olur
-cd ansible/test
-ansible-playbook -i inventory/generated/test.yml test-bootstrap.yml --ask-vault-pass
-
-# 2. iklim-app-01 üzerinde — eski dead node kaydını temizle
+# 1. iklim-app-01 üzerinde — eski dead node kaydını temizle (bootstrap'tan ÖNCE yapılmalı)
 docker node ls                   # eski node ID'yi bul
 docker node rm <eski-node-id>

+# 2. Ansible bootstrap — yeni node otomatik join olur
+cd ansible/test
+ansible-playbook -i inventory/generated/test.yml test-bootstrap.yml --ask-vault-pass
+
 # 3. DB stack'i yeniden deploy et (backup'tan restore sonrası)
 ansible-playbook -i inventory/generated/test.yml test-db-post-stack.yml --ask-vault-pass
 ```
@ -68,7 +70,7 @@ ansible-playbook -i inventory/generated/test.yml test-db-post-stack.yml --ask-va
 | Senaryo | Manuel Adım | Ansible Yeterli mi? |
 |---|---|---|
 | Manager (`iklim-app-01`) ölür | `docker swarm leave --force` (worker'da) | Sonrasında evet |
-| Worker (`iklim-db-01`) ölür | `docker node rm <id>` (manager'da) | Büyük ölçüde evet |
+| Worker (`iklim-db-01`) ölür | `docker node rm <id>` (manager'da, bootstrap'tan önce) | Hayır — backup restore gerekir |
 | Her ikisi ölür | Yok | Evet |

 ## Neden Prod'da Bu Sorun Yok
@ -81,6 +83,8 @@ Prod ortamında birden fazla manager node (en az 3) çalıştırılır. Tek mana

 SWAG, cert-reloader, Prometheus ve Grafana cluster-native (replicated) değildir; her zaman tek instance çalışırlar ve varsayılan olarak `iklim-app-01`'e (Floating IP node) sabitlenmişlerdir. `iklim-app-01` çöktüğünde bu servisler durur; DNS/HTTPS erişimi ve izleme (monitoring) kesilir. Swarm quorum 2 manager ile devam eder; mikroservisler ve Vault başka node'lara taşınır.

+`cert-distributor` bu kuralın dışındadır: `mode: global` ile `node.labels.type == service` olan tüm node'larda çalışır; StorageBox'tan sertifikayı node-lokal `/opt/iklimco/ssl`'e kopyalar (Vault FUSE mount kısıtlaması nedeniyle). `iklim-app-01` düştüğünde diğer node'lardaki `cert-distributor` instance'ları çalışmaya devam eder — failover gerektirmez.
+
 Tüm bu servislerin verileri ve konfigürasyonları StorageBox'ta tutulur:
 - **SWAG:** `/mnt/storagebox/swag/config`
 - **SSL:** `/mnt/storagebox/ssl`
@ -91,12 +95,12 @@ Tüm bu servislerin verileri ve konfigürasyonları StorageBox'ta tutulur:

 ### 1. Servisleri Başka Node'a Taşı

-SWAG ve cert-reloader birlikte taşınmalıdır. Prometheus ve Grafana da bağımsız olarak veya aynı anda taşınabilir.
+SWAG ve cert-reloader birlikte taşınmalıdır. Prometheus ve Grafana da bağımsız olarak veya aynı anda taşınabilir. `cert-distributor` global mode'da çalıştığından taşıma gerekmez.

 ```bash
 # iklim-app-02 veya iklim-app-03 üzerinde (aktif manager):

-# SWAG & Cert-Reloader taşıma
+# SWAG & Cert-Reloader taşıma (replicas=1 olduğundan taşıma sırasında kısa kesinti yaşanır)
 docker service update --constraint-add "node.hostname == iklim-app-02" --constraint-rm "node.hostname == iklim-app-01" iklimco_swag
 docker service update --constraint-add "node.hostname == iklim-app-02" --constraint-rm "node.hostname == iklim-app-01" iklimco_cert-reloader

@ -121,8 +125,12 @@ hcloud floating-ip assign <floating-ip-id> <iklim-app-02-server-id>
 4. `iklim-prod-app-fip` satırının sağındaki **⋮** (üç nokta) menüsünü aç → **Reassign**.
 5. Açılan listeden **`iklim-app-02`**'yi seç → **Reassign** butonuna tıkla.

+> **Not:** Floating IP Hetzner panelinde yeniden atandıktan sonra `iklim-app-02`'nin network interface'inde de aktif olması gerekir. Ansible bootstrap bu konfigürasyonu yapıyorsa otomatiktir; emin olmak için `ip addr show` ile Floating IP'nin bind edildiğini doğrula.
+
 ### 3. Doğrula

+SWAG başlama ve sertifika kontrolü birkaç saniye sürebilir; servis `Running` görünse de ilk `curl` başarısız dönebilir. Birkaç saniye bekleyip tekrar dene.
+
 ```bash
 docker service ls | grep -E 'swag|cert-reloader|prometheus|grafana'
 curl -si https://api.iklim.co/health
@ -133,6 +141,9 @@ curl -si https://api.iklim.co/health
 Node Swarm'a yeniden katıldıktan sonra tüm servisleri tekrar `iklim-app-01`'e taşıyıp Floating IP'yi geri aktarabilirsiniz.

 ```bash
+# Önce node'un Swarm'a gerçekten katıldığını doğrula (STATUS = Ready olmalı)
+docker node ls
+
 # Servisleri geri taşı
 for svc in iklimco_swag iklimco_cert-reloader iklimco_prometheus iklimco_grafana; do
  docker service update --constraint-add "node.hostname == iklim-app-01" --constraint-rm "node.hostname == iklim-app-02" $svc
@ -149,5 +160,62 @@ hcloud floating-ip assign <floating-ip-id> <iklim-app-01-server-id>
 | Swarm quorum | Otomatik — 2 manager yeterli |
 | Vault, mikroservisler | Otomatik — `node.labels.type == service` constraint ile başka node'a schedule edilir |
 | SWAG, cert-reloader | Manuel — `docker service update --constraint-*` + Floating IP taşıma |
+| cert-distributor | Otomatik — `mode: global`, tüm servis node'larında zaten çalışır |
 | Prometheus, Grafana | Manuel — `docker service update --constraint-*` |
 | Veriler & Konfig | StorageBox'ta; failover node hemen erişir, veri kaybı yaşanmaz |
+
+---
+
+# Prod — DB Node Recovery
+
+Her DB node'u (`iklim-db-01`, `iklim-db-02`, `iklim-db-03`) aynı servis üçlüsünü barındırır:
+
+| Node | Servisler |
+|------|-----------|
+| `iklim-db-01` | `etcd-01`, `patroni-01`, `mongodb-01` |
+| `iklim-db-02` | `etcd-02`, `patroni-02`, `mongodb-02` |
+| `iklim-db-03` | `etcd-03`, `patroni-03`, `mongodb-03` |
+
+## Senaryo A: Node Geçici Olarak Çöker (Volume'lar Korunur)
+
+etcd, Patroni ve MongoDB'nin tamamı 3 üyeli HA cluster'lardır; quorum için 2 node yeterlidir.
+
+| Servis | Etki | Otomatik İyileşme |
+|--------|------|-------------------|
+| etcd | 2/3 node ile quorum devam eder | Node geri dönünce cluster'a otomatik katılır |
+| Patroni | Replica düşerse primary devam eder; primary düşerse etcd üzerinden yeni primary seçilir | Node geri dönünce replica olarak otomatik katılır |
+| MongoDB | 2/3 node ile quorum devam eder; gerekirse yeni primary seçilir | Node geri dönünce primary'den initial sync ile güncellenir |
+
+**Manuel adım gerekmez.** Docker Swarm `restart_policy: on-failure` servisleri otomatik başlatır.
+
+## Senaryo B: Node Yeniden Kurulur (Volume'lar Silinir)
+
+etcd named volume'ları node-lokal olduğundan node yeniden kurulunca kaybolur. Patroni ve MongoDB kendi kendine iyileşir; etcd manuel müdahale gerektirir.
+
+```bash
+# Aktif bir etcd container'ından — eski üyeyi cluster'dan çıkar
+docker exec -it $(docker ps -q -f name=iklimco_etcd-01) \
+  etcdctl member list --endpoints=http://etcd-01:2379,http://etcd-02:2379,http://etcd-03:2379
+# Çıktıdan yeniden kurulan node'un <member-id>'sini al:
+docker exec -it $(docker ps -q -f name=iklimco_etcd-01) \
+  etcdctl member remove <member-id> --endpoints=http://etcd-01:2379,http://etcd-02:2379,http://etcd-03:2379
+
+# Servisleri yeniden başlat (etcd boş volume ile existing cluster'a katılır;
+# Patroni primary'den pg_basebackup ile otomatik clone alır;
+# MongoDB hostname değişmediyse primary'den otomatik initial sync yapar)
+docker service update --force iklimco_etcd-0N
+docker service update --force iklimco_patroni-0N
+docker service update --force iklimco_mongodb-0N
+```
+
+> **MongoDB hostname değişirse:** Replica set konfigürasyonu eski hostname'i tutar. `mongosh` ile `rs.remove("<eski-host>:27017")` ardından `rs.add("<yeni-host>:27017")` çalıştır.
+
+> **etcd `ETCD_INITIAL_CLUSTER_STATE`:** Stack dosyasında `new` olarak tanımlıdır (ilk kurulum için). Yeniden kurulum senaryosunda Swarm servisi `--force` ile güncellenince etcd boş volume ile başlar ve mevcut cluster'a `existing` modunda katılmaya çalışır. Bitnami etcd image'ı bunu otomatik algılar; sorun yaşanırsa stack dosyasında ilgili node'un `ETCD_INITIAL_CLUSTER_STATE` değerini geçici olarak `existing` yapıp redeploy et, ardından geri al.
+
+## Özet
+
+| Servis | Geçici çöküş | Yeniden kurulum |
+|--------|-------------|-----------------|
+| etcd | Otomatik | Manuel: `member remove` → `service update --force` |
+| Patroni | Otomatik | Otomatik: boş dir'den primary'yi clone alır |
+| MongoDB | Otomatik | Otomatik (aynı hostname); hostname değişirse `rs.remove` + `rs.add` |
--- a/facts/prod-kurulum-gecmisi.md
+++ b/facts/prod-kurulum-gecmisi.md
@ -2,6 +2,11 @@

 Prod kurulum adımları ve mevcut yapı.

+Bu dosya kurulum geçmişini korur. Güncel prod deploy akışı için ana kaynak
+repo kökündeki `prod_env-ci_dc-pipeline.md` dosyasıdır. Aşağıdaki manuel deploy
+adımları, ilk kurulum ve sorun giderme geçmişi olarak tutulur; normal prod deploy
+artık root `.gitea/workflows/deploy-prod.yml` üzerinden yürür.
+
 ## Terraform

 ### Hetzner Cloud Yapılandırması
@ -166,7 +171,27 @@ ansible-playbook prod-bootstrap.yml \
  --vault-password-file=../.vault_pass
 ```

-## DB Stack Deploy
+## Güncel Production Deploy Kaynakları
+
+| Alan | Güncel kaynak |
+| --- | --- |
+| Root prod workflow | `.gitea/workflows/deploy-prod.yml` |
+| Detaylı CI/CD dokümanı | `prod_env-ci_dc-pipeline.md` |
+| Ana infra stack | `docker-stack-infra_db-prod.yml` |
+| Vault HA stack | `docker-stack-vault.yml` |
+| Vault bootstrap script | `init/vault/vault-bootstrap.sh` |
+| Prod env ve secret dosyaları | `prod/secrets/iklim.co/.env`, `.env.secrets.*` |
+
+Güncel yapıda `.deleted` suffix'li eski stack dosyaları yoktur ve prod akışında
+dikkate alınmaz. Ana infra stack `docker-stack-infra_db-prod.yml` dosyasıdır.
+Vault stack'i bu dosyanın içinde değildir; `vault-bootstrap.sh` tarafından
+`docker-stack-vault.yml` ile deploy edilir.
+
+## Tarihsel Manuel DB Stack Deploy (2026-05-21)
+
+Bu bölüm ilk prod DB/infra kurulum geçmişini korumak için bırakılmıştır. Güncel
+normal akışta bu adımlar elle çalıştırılmaz; root prod workflow ana stack deploy,
+Vault bootstrap, MongoDB replica set init ve DB init scriptlerini yönetir.

 ### Custom Image Build

@ -174,6 +199,9 @@ ansible-playbook prod-bootstrap.yml \

 ### Stack Deploy

+Tarihsel not: Bu komut bloğundaki `docker-stack-db-prod.yml` artık güncel stack
+dosyası değildir. Güncel ana stack `docker-stack-infra_db-prod.yml` dosyasıdır.
+
 ```bash
 # Lokal → app-01
 scp ./docker-stack-* root@178.104.210.41:/home/iklim/
@ -198,6 +226,10 @@ history -c && history -w

 ### MongoDB Replica Set Init

+Tarihsel not: İlk kurulumda `rs.initiate` elle verilmişti. Güncel root prod
+workflow içinde `Initialize MongoDB Replica Set` adımı replica set yoksa
+`rs.initiate()`, eksik üye varsa primary üzerinden `rs.add()` çalıştırır.
+
 ```bash
 ssh root@<db-01-ip>

@ -242,10 +274,10 @@ history -c && history -w
 curl -s http://10.20.20.11:8008/cluster | python3 -m json.tool
 ```

-## Mevcut Durum (2026-05-21)
+## Tarihsel Durum (2026-05-21)

 | Adım                                                    | Durum      |
-| --- | --- |
+| ------------------------------------------------------- | ---------- |
 | Terraform — 6 sunucu, ağ, firewall, floating IP         | ✅          |
 | Ansible base + hardening + docker + node_dirs           | ✅          |
 | Ansible storagebox + storagebox_ssh_key                 | ✅          |
@ -256,12 +288,52 @@ curl -s http://10.20.20.11:8008/cluster | python3 -m json.tool
 | DB stack deploy (etcd + MongoDB + Patroni)              | ✅          |
 | MongoDB replica set init (rs0: 1 primary, 2 secondary)  | ✅          |
 | Patroni HA cluster (1 leader, 2 replica, lag=0)         | ✅          |
-| Ana infra stack deploy (docker-stack-infra_db-prod.yml) | ⏳ bekliyor |
-| MongoDB rs.initiate (ilk deploy sonrası elle) | ⏳ bekliyor |
+| Ana infra stack deploy (docker-stack-infra_db-prod.yml) | ✅          |
+| MongoDB rs.initiate (ilk deploy sonrası elle)           | ✅          |
 | Deploy pipeline ilk çalışma                             | ⏳ bekliyor |

+## Güncel Durum (2026-06-15)
+
+| Alan | Güncel durum |
+| --- | --- |
+| Prod deploy kaynak dokümanı | `prod_env-ci_dc-pipeline.md` |
+| Root prod workflow | `.gitea/workflows/deploy-prod.yml` |
+| Ana infra stack | `docker-stack-infra_db-prod.yml` |
+| Vault HA stack | `docker-stack-vault.yml` |
+| Vault deploy yöntemi | `init/vault/vault-bootstrap.sh` tarafından bootstrap/deploy |
+| Eski `.deleted` stack dosyaları | Silindi, güncel akışta yok |
+| Prod env dosyası | StorageBox `prod/secrets/iklim.co/.env` -> workflow workspace `./.env` |
+| Shared secrets | StorageBox `prod/secrets/iklim.co/.env.secrets.shared` |
+| Service secrets | StorageBox `prod/secrets/iklim.co/.env.secrets.<svc>` |
+| SWAG secrets | StorageBox `prod/secrets/iklim.co/.env.secrets.swag` |
+| MongoDB replica set init | Workflow içinde otomatik/idempotent adım olarak yönetiliyor |
+| PostgreSQL init | Patroni primary beklenerek `./init/postgresql/*.sql` ile çalışıyor |
+| MongoDB init | Replica set hazırlandıktan sonra `./init/mongodb/*.js` ile çalışıyor |
+| DNS update | Workflow GoDaddy API ile `api`, `apigw`, `rabbitmq`, `grafana` A kayıtlarını güncelliyor |
+
+Güncel prod workflow ana hatlarıyla şu sırayı izler:
+
+1. StorageBox'tan `.env`, `.env.secrets.shared`, service secret dosyaları ve `.env.secrets.swag` alınır.
+2. PostgreSQL ve MongoDB init template'leri `./init/postgresql` ve `./init/mongodb` altına üretilir.
+3. Harbor pull login yapılır.
+4. SWAG DNS/site config dosyaları hazırlanır.
+5. Vault için geçici TLS placeholder cert gerekirse oluşturulur.
+6. `rabbitmq_erlang_cookie` Docker secret'ı oluşturulur veya mevcutsa korunur.
+7. `docker-stack-infra_db-prod.yml` `iklimco` stack'ine deploy edilir.
+8. Runner job container `iklimco-net` overlay network'üne bağlanır.
+9. `init-infra-prod.sh` çalışır; bu script Vault bootstrap ve RabbitMQ prod hazırlığını yapar.
+10. Vault AppRole ID/Secret ID değerleri ve Docker secrets üretilir.
+11. Güncellenen `.env` ve `.env.secrets.*` dosyaları StorageBox'a yüklenir.
+12. etcd, APISIX, SWAG certificate, MongoDB replica set, DB init scriptleri ve DNS kayıtları doğrulanır/güncellenir.
+
 ## Önemli Mimari Notlar

+### Ana Infra Stack ve Vault Ayrımı (2026-06-15)
+
+Güncel durumda ana infra stack `docker-stack-infra_db-prod.yml` dosyasıdır. Bu stack Redis master/replica/sentinel, RabbitMQ cluster, APISIX, APISIX Dashboard, Prometheus, Grafana, SWAG, cert-reloader, cert-distributor, etcd, Patroni ve MongoDB replica set servislerini içerir.
+
+Vault ana infra stack içinde değildir. Vault HA cluster `docker-stack-vault.yml` dosyasıyla, `init/vault/vault-bootstrap.sh` tarafından deploy edilir. Bootstrap akışı placeholder `vault_unseal_key` oluşturur, `iklimco_vault` servisini deploy eder, Vault init/unseal işlemini yapar ve Docker secret'ı gerçek unseal key ile rotate eder.
+
 ### Tek Stack Yaklaşımı (2026-05-26)

 `docker-stack-infra-prod.yml` ve `docker-stack-db-prod.yml` tek dosyada birleştirildi: `docker-stack-infra_db-prod.yml`. Her iki dosya da aynı `iklimco` stack adına deploy edildiğinden servis isimleri değişmedi.
@ -270,7 +342,9 @@ curl -s http://10.20.20.11:8008/cluster | python3 -m json.tool

 **Network:** `iklimco-net` artık stack tarafından oluşturulur (MTU=1400, attachable). Ansible `swarm` rolündeki network oluşturma task'ı kaldırıldı.

-**MongoDB rs.initiate:** İlk deploy sonrası `rs.initiate` elle verilmeli (DB Stack Deploy bölümüne bakınız).
+**MongoDB rs.initiate:** Bu not ilk kurulum dönemine aittir. Güncel prod workflow
+`Initialize MongoDB Replica Set` adımında `rs.initiate()` ve gerektiğinde `rs.add()`
+işlemlerini yönetir.

 **Network silinirse:** Stack'i yeniden deploy et — `docker stack deploy -c docker-stack-infra_db-prod.yml iklimco`

@ -278,6 +352,11 @@ curl -s http://10.20.20.11:8008/cluster | python3 -m json.tool

 `retry_join.leader_api_addr` olarak `iklimco_vault` (Swarm servis adı) kullanılır. Stack-owned network sayesinde Docker DNS bu VIP'i kayıt eder. `leader_tls_server_name: vault.iklim.co` ile `*.iklim.co` sertifikası TLS doğrulamasını geçer.

+Güncel Vault deploy akışında bu ayar `docker-stack-vault.yml` ve Vault template
+dosyaları üzerinden kullanılır. Vault stack deploy'u root workflow'da doğrudan
+değil, `init-infra-prod.sh` -> `init/vault/init-prod.sh` ->
+`init/vault/vault-bootstrap.sh` zinciriyle yapılır.
+
 ### Runner / iklimco-net (2026-05-26)

 Act runner config'de `container.network: "bridge"` kullanılır (önceki `iklimco-net`). Workflow'da "Connect Runner to Overlay Network" adımı "Deploy Swarm Stacks" sonrasına taşındı — böylece stack'in oluşturduğu `iklimco-net`'e runner job container bağlanabilir.
--- a/roadmap/prod-env/01-swarm-init-multinode.md
+++ b/roadmap/prod-env/01-swarm-init-multinode.md
@ -41,6 +41,9 @@ This scheme is applied consistently across `docker-stack-infra.yml` and all 10 m

 `node.role == worker` is intentionally not used anywhere. DB nodes are Swarm workers, but targeting them via `node.role == worker` would also match any future worker-only app nodes. The explicit `node.labels.role == db` label provides precise, unambiguous targeting regardless of Swarm role.

+## Otomasyon Notu
+**ÖNEMLİ:** Aşağıda listelenen tüm Swarm ilklendirme, join token işlemleri ve node etiketleme (labeling) süreçleri artık manuel yapılmamaktadır. Bu işlemler `Environment_Infrastructure/ansible/prod/prod-bootstrap.yml` ve ortak `swarm` rolü tarafından **tamamen otomatik** olarak yürütülmektedir. Buradaki manuel bash komutları yalnızca referans, bilgi ve sorun giderme (troubleshooting) amaçlı tutulmaktadır.
+
 ## Step 1 — Init Swarm on iklim-app-01 (the prod-runner node)

 ```bash
@ -102,7 +105,7 @@ docker node update --label-add role=db --label-add db-index=03 iklim-db-03

 > DB nodes are Swarm **workers** only — they never become managers.
 > DB services are pinned to them via `node.labels.role == db` placement constraint.
-> See `08-prod-db-cluster-kurulum.md` for DB stack deployment.
+> See `08-prod-db-cluster-setup.md` for DB stack deployment.

 ## Step 6 — Verify

--- a/roadmap/prod-env/02-godaddy-credentials.md
+++ b/roadmap/prod-env/02-godaddy-credentials.md
@ -60,7 +60,7 @@ To get the Floating IP: `terraform output prod_floating_ip`

 Logic: for each record, pipeline queries the current value via GoDaddy API. If already correct, it skips. Otherwise it creates/updates the record.

-> The Floating IP is assigned to `iklim-app-01` (`06-prod-terraform-iaac.md` — `floating_ip.tf`).
+> The Floating IP is assigned to `iklim-app-01` (`06-prod-terraform-iac.md` — `floating_ip.tf`).
 > If failover is needed, the Floating IP can be reassigned to another app node; DNS does not change.

 ## Notes
--- a/roadmap/prod-env/03-infra-stack-changes.md
+++ b/roadmap/prod-env/03-infra-stack-changes.md
@ -1,702 +1,75 @@
-# 03 — docker-stack-infra.yml Changes (Prod)
+# 03 — Production Infrastructure and DB Stack Model

 ## Context

-### File strategy — overlay approach
+This document records the production infrastructure target that is now implemented by the current setup runbooks. The execution source is no longer the old base-plus-prod overlay model.

-Prod-specific service changes are **not written directly** into `docker-stack-infra.yml`; they are kept in a separate overlay file:
+Current references:

-| File | Usage |
-|------|-------|
-| `docker-stack-infra.yml` | Base — works as-is for test |
-| `docker-stack-infra.prod.yml` | Prod overlay — additional services and overrides |
+- Setup source: `../../setup/08-prod-db-cluster-setup.md` and `../../setup/09-prod-runner-ha-and-swarm.md`
+- Main infra and DB stack: root `docker-stack-infra_db-prod.yml`
+- Vault stack: root `docker-stack-vault.yml`
+- Vault bootstrap: root `init/vault/vault-bootstrap.sh`, called through `init-infra-prod.sh`

-```bash
-# Test deploy:
-docker stack deploy -c docker-stack-infra.yml iklimco
+## Current Stack Strategy

-# Prod deploy (Swarm merges both files):
-docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
-```
+Production uses a split stack model:

-Docker Swarm merge rule: if the same service name appears in both files, the overlay wins (deploy, environment, etc.); services only present in the overlay are added.
+- `docker-stack-infra_db-prod.yml`: APISIX, APISIX Dashboard, SWAG, cert services, Redis/Sentinel, RabbitMQ, Prometheus, Grafana, Patroni/PostgreSQL, MongoDB, and etcd.
+- `docker-stack-vault.yml`: Vault Raft cluster only.

-### Prod-specific changes summary
- APISIX: 1 → 3 replicas (overlay override)
- Redis: single-instance → Sentinel cluster — 1 master + 2 replicas + 3 sentinels (overlay adds new services)
- RabbitMQ: 1 → 3-node Erlang cluster (overlay override + env)
- Vault: 1 → 3-node Raft cluster (overlay override) — see `07-vault-raft-plan.md`
- No separate APISIX etcd: Patroni etcd is shared (`/apisix` prefix)
- `init/apisix-core/init.sh`: when `PROFILE=prod`, rate limit `policy:local` → `policy:redis`
+The previous `docker-stack-infra.yml` + `docker-stack-infra.prod.yml` overlay strategy is superseded for production. Do not create or deploy `docker-stack-infra.prod.yml` for the current prod environment.

-### swag-vl volume — not used in prod, not defined in overlay
+## Placement Boundary

-Test-env Step 9 adds the `swag-vl` named volume to the base file. In prod, SWAG mounts to the StorageBox via the `${SWAG_CONFIG_DIR}` env var, so this volume is unused by any service. No need to remove it in the overlay — Swarm does not create unused volume definitions, it remains harmless.
+`docker-stack-infra_db-prod.yml` is intentionally a mixed stack. The placement model is the important boundary:

-No `swag-vl` definition is made in `docker-stack-infra.prod.yml`.
+- DB/cluster services run on `iklim-db-*`: Patroni/PostgreSQL, MongoDB, and etcd.
+- App/service-node infrastructure runs on `iklim-app-*` with `node.labels.type == service`: Redis, Redis Sentinel, RabbitMQ, APISIX, APISIX Dashboard, SWAG, cert-reloader/cert-distributor, Prometheus, and Grafana.
+- Redis and RabbitMQ are not DB-node host-mode services. They stay on the overlay network unless explicitly exposed by the stack or SWAG/APISIX.

-### Monitoring Persistence
+DB services that require direct cluster traffic publish host-mode ports where the current stack defines them. Redis and RabbitMQ must not be changed to host-mode just because they live in the same stack file.

-Prometheus and Grafana run as single instances, but their storage profiles are different:
- **Prometheus:** keep TSDB on a local Docker volume (`prometheus-vl`). Prometheus local storage should not run on StorageBox/DAVFS because of filesystem semantics and WAL/compaction I/O.
- **Grafana:** keep `/var/lib/grafana` on StorageBox (`/mnt/storagebox/grafana/data`) so dashboards, plugins, and the SQLite database are available if the single active instance is manually moved to another node.
+## Current Production Services

-Grafana uses the `GRAFANA_DATA_DIR` env var with a named-volume fallback for test. Prometheus continues to use the named Docker volume. See Step 9 for implementation details.
+| Area | Current model |
+| --- | --- |
+| APISIX | 3 replicas on service nodes; config stored in etcd with `/apisix` prefix |
+| Redis | Sentinel model on service nodes; overlay-only |
+| RabbitMQ | 3-node service-node cluster; management exposed through SWAG, restricted by IP |
+| Vault | Separate 3-node Raft stack via `docker-stack-vault.yml` |
+| PostgreSQL | 3-node Patroni cluster on DB nodes |
+| MongoDB | 3-node replica set on DB nodes |
+| etcd | 3-node cluster on DB nodes, shared by Patroni and APISIX |
+| Prometheus | Single instance; local Docker volume |
+| Grafana | Single instance; StorageBox-backed data path |

-**Note:** PostgreSQL and MongoDB are not in `docker-stack-infra.yml`.  See `08-prod-db-cluster-kurulum.md`.
+## Monitoring Persistence

-## Step 1 — Apply all test-env changes first
+Prometheus TSDB remains on a local Docker volume because StorageBox/DAVFS is not suitable for Prometheus WAL and compaction I/O.

-Follow every step in `test-env/03-infra-stack-changes.md`:
- Add `swag` service
- Add `cert-reloader` service
- Remove published ports for vault, apisix, rabbitmq, prometheus, grafana, apisix-dashboard
- Add `swag-vl` volume
+Grafana uses `/mnt/storagebox/grafana/data` through `GRAFANA_DATA_DIR` so dashboards, plugins, and the SQLite database survive manual service movement between service nodes.

-## Step 2 — Vault: 3-node Raft cluster (prod)
+## APISIX and etcd

-Vault starts directly with 3 replicas; the Phase 1 single-instance stage is skipped in prod.
-See `07-vault-raft-plan.md` Phase 2 for detailed setup steps.
+APISIX uses the DB-node etcd cluster through overlay DNS aliases such as `etcd-01`, `etcd-02`, and `etcd-03`. Patroni and APISIX use different etcd prefixes, so their data does not collide.

-```yaml
-vault:
-  deploy:
-    mode: replicated
-    replicas: 3
-    placement:
-      max_replicas_per_node: 1
-      constraints:
-        - node.labels.type == service
-```
+The app subnet to DB subnet firewall rule for etcd client traffic is part of the current production firewall model. See `../../setup/06-prod-terraform-iac.md`.

-## Step 3 — APISIX: 3 replicas + init.sh rate limit update (prod overlay)
+## Redis and RabbitMQ

-Add to `docker-stack-infra.prod.yml`:
+Redis/Sentinel and RabbitMQ are service-node infrastructure. Their placement follows `node.labels.type == service`.

-```yaml
-# docker-stack-infra.prod.yml
-services:
-  apisix:
-    deploy:
-      mode: replicated
-      replicas: 3
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
+RabbitMQ-related private firewall rules belong to the app/service-node firewall model. Redis and Sentinel do not publish host-mode ports in the current prod stack and do not require Hetzner firewall openings.

-  apisix-dashboard:
-    deploy:
-      mode: replicated
-      replicas: 3
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-```
+## Historical / Superseded by Setup

-APISIX and apisix-dashboard are stateless (config lives in Patroni etcd) — 3 replicas is safe.
-Swarm distributes SWAG requests to APISIX replicas via VIP (IPVS round-robin).
+The following earlier roadmap ideas are retained only as historical context:

-### init.sh — rate limit policy:redis (prod)
+- Creating `docker-stack-infra.prod.yml` as a prod overlay.
+- Deploying prod with `docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco`.
+- Keeping Vault inside the prod infra overlay with `/opt/iklimco/vault/data` host-path storage.
+- Treating PostgreSQL/MongoDB as separate DB stacks such as `docker-stack-db.prod.yml`.
+- Validating a prod merge with `docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml`.

-With `policy:local`, each APISIX instance counts independently → the global limit effectively becomes 3× with 3 replicas.
-Switch to `policy:redis` for `PROFILE=prod`.
-
-Keep the following APISIX plugin limits in `init/apisix-core/init.sh` for `test/prod` unless stated otherwise:
-
-| Scope | Plugin | Target limit |
-|-------|--------|--------------|
-| WebSocket `/ws` | `limit-conn` | `conn: 5` per `remote_addr` |
-| Auth routes `/v1/auth/*`, `/v1/users/*` | `limit-count` | `count: 12`, `time_window: 60` per `remote_addr` |
-| Global rule | `limit-count` | `count: 60`, `time_window: 60` per `remote_addr` |
-
-Update the rate limit and connection limit blocks in `init/apisix-core/init.sh`.
-
-**1. Define threshold constants at the script header:**
-
-```bash
-GLOBAL_LIMIT_COUNT=60
-GLOBAL_LIMIT_WINDOW=60
-AUTH_LIMIT_COUNT=12
-AUTH_LIMIT_WINDOW=60
-WS_LIMIT_CONN=5
-```
-
-**2. Update WebSocket route plugins (test/prod):**
-
-```bash
-if [[ "$PROFILE" != "dev" ]]; then
-  WS_PLUGINS=',"plugins":{"limit-conn":{"conn":'"$WS_LIMIT_CONN"',"burst":2,"default_conn_delay":0.1,"key":"remote_addr","key_type":"var","rejected_code":429}}'
-else
-  WS_PLUGINS=""
-fi
-```
-
-**3. Update Auth route plugins (test/prod):**
-
-```bash
-if [[ "$PROFILE" != "dev" ]]; then
-  AUTH_LIMIT=',"plugins":{"limit-count":{"count":'"$AUTH_LIMIT_COUNT"',"time_window":'"$AUTH_LIMIT_WINDOW"',"key_type":"var","key":"remote_addr","rejected_code":429,"policy":"local"}}'
-else
-  AUTH_LIMIT=""
-fi
-```
-
-**4. Update Global rate limit rule (test/prod):**
-
-```bash
-if [[ "$PROFILE" != "dev" ]]; then
-  if [[ "$PROFILE" == "prod" ]]; then
-    RATE_POLICY="redis"
-    RATE_REDIS=',"redis_host":"redis","redis_port":6379,"redis_password":"'"$REDIS_PASSWORD"'"'
-  else
-    RATE_POLICY="local"
-    RATE_REDIS=""
-  fi
-
-  call_api "global rate limit" -X PUT "$APISIX_ADMIN_URL/global_rules/1" \
-    -H "X-API-KEY: $API_KEY" -H "Content-Type: application/json" \
-    -d '{"plugins":{"limit-count":{"count":'"$GLOBAL_LIMIT_COUNT"',"time_window":'"$GLOBAL_LIMIT_WINDOW"',"key_type":"var","key":"remote_addr","rejected_code":429,"policy":"'"$RATE_POLICY"'","allow_degradation":true'"$RATE_REDIS"'}}}'
-fi
-```
-
-> APISIX's `limit-count` plugin does not natively support Redis Sentinel; `policy:redis` works with a single endpoint.
-> The `redis` service name stays constant within Swarm overlay DNS. `allow_degradation: true` ensures that if Redis is
-> temporarily unreachable (e.g. Sentinel failover ~10-30 s, or master rescheduling), APISIX passes requests through
-> instead of returning errors — rate limiting is briefly suspended but API access is unaffected.
-> Microservices use Spring Data Redis Sentinel natively and are unaffected by master changes.
-> Docker Swarm has no inter-service anti-affinity; the `redis` master placement relies on Swarm's spread strategy
-> to avoid co-locating with a replica. This is a known limitation — accepted in favour of operational simplicity.
-
-## Step 4 — etcd: Separate APISIX etcd removed — Patroni etcd shared
-
-The standalone `etcd` service in `docker-stack-infra.yml` is **not used in prod and must be disabled** by setting `replicas: 0` in the prod overlay.
-APISIX uses the 3-node Patroni etcd cluster running on DB nodes, via the `/apisix` prefix.
-
-### Why consolidated?
- A standalone single-instance etcd was a SPOF for APISIX.
- Patroni etcd is already 3-node HA — APISIX gets a more reliable config store.
- etcd supports prefix-based namespacing; Patroni uses `/service/`, APISIX uses `/apisix/` — no collision.
-
-### APISIX etcd connection configuration
-
-Update the etcd endpoints in the APISIX service in `docker-stack-infra.yml` to point to DB nodes:
-
-```yaml
-apisix:
-  environment:
-    APISIX_STAND_ALONE: "false"
-  # via apisix/conf/config.yaml or environment:
-  # etcd:
-  #   host:
-  #     - "http://etcd-01:2379"
-  #     - "http://etcd-02:2379"
-  #     - "http://etcd-03:2379"
-  #   prefix: "/apisix"
-```
-
-The preferred method is mounting `config.yaml` via a Docker config or volume. etcd endpoints use **overlay DNS aliases** defined in `docker-stack-db.prod.yml` — `etcd-01`, `etcd-02`, `etcd-03` — which are reachable from app nodes via the `iklimco-net` overlay:
-
-```yaml
-# config/apisix/config.yaml
-etcd:
-  host:
-    - "http://etcd-01:2379"
-    - "http://etcd-02:2379"
-    - "http://etcd-03:2379"
-  prefix: "/apisix"
-  timeout: 30
-```
-
-### Disable standalone etcd in prod overlay
-
-Docker Swarm overlay files cannot delete services from the base stack, but `replicas: 0` stops the container entirely:
-
-```yaml
-# docker-stack-infra.prod.yml
-services:
-  etcd:
-    deploy:
-      replicas: 0
-```
-
-### Firewall requirement
-
-etcd access from app nodes to DB nodes must be open (port 2379, app subnet → DB subnet). Verify from an app node:
-
-```bash
-docker run --rm --network iklimco-net alpine \
-  sh -c "wget -qO- http://etcd-01:2379/health"
-```
-
-## Step 5 — Redis: Sentinel cluster (prod overlay)
-
-Redis runs as a single instance in test. In prod, Sentinel provides HA.
-![[redis-sentinel-vs-cluster.png]]
-Bitnami images are used — all configuration is done via env vars, no separate `.conf` file needed.
-
-### Prerequisites
-
-```bash
-# Create Docker secret for Redis password:
-openssl rand -hex 32 | docker secret create redis_password -
-```
-
-### Topology
-
-```
-any app node: redis           (1 replica, spread by Swarm — not pinned)
-2 app nodes:  redis-replica   (2 replicas, max 1/node, spread across app nodes)
-all app nodes: redis-sentinel (3 replicas, max 1/node, spread across all app nodes)
-```
-
-### docker-stack-infra.prod.yml — Redis services
-
-The existing `redis` service is overridden in the prod overlay as **master**; `redis-replica` and `redis-sentinel` are added as new services. The service name (`redis`) remains unchanged so the APISIX connection config does not need updating.
-
-```yaml
-# docker-stack-infra.prod.yml
-services:
-  redis:                          # override base single-instance redis → master
-    image: bitnamisecure/redis:latest
-    environment:
-      ALLOW_EMPTY_PASSWORD: no
-      REDIS_PASSWORD: ${REDIS_PASSWORD}
-      REDIS_REPLICATION_MODE: master
-    deploy:
-      mode: replicated
-      replicas: 1
-      placement:
-        constraints:
-          - node.labels.type == service
-      restart_policy:
-        condition: any
-        delay: 5s
-    labels:
-      project: co.iklim
-
-  redis-replica:
-    image: bitnamisecure/redis:latest
-    environment:
-      ALLOW_EMPTY_PASSWORD: no
-      REDIS_REPLICATION_MODE: slave
-      REDIS_MASTER_HOST: redis
-      REDIS_MASTER_PORT_NUMBER: "6379"
-      REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
-      REDIS_PASSWORD: ${REDIS_PASSWORD}
-    deploy:
-      mode: replicated
-      replicas: 2
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-        preferences:
-          - spread: node.hostname
-      restart_policy:
-        condition: any
-        delay: 5s
-    labels:
-      project: co.iklim
-
-  redis-sentinel:
-    image: bitnamisecure/redis-sentinel:latest
-    environment:
-      REDIS_SENTINEL_MASTER_NAME: prod-master
-      REDIS_MASTER_HOST: redis
-      REDIS_MASTER_PORT_NUMBER: "6379"
-      REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
-      REDIS_SENTINEL_QUORUM: "2"
-      REDIS_SENTINEL_DOWN_AFTER_MILLISECONDS: "5000"
-      REDIS_SENTINEL_FAILOVER_TIMEOUT: "10000"
-    deploy:
-      mode: replicated
-      replicas: 3
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-        preferences:
-          - spread: node.hostname
-      restart_policy:
-        condition: any
-        delay: 5s
-    labels:
-      project: co.iklim
-```
-
-### Microservice connection (Spring Data Redis)
-
-Microservices must use a Sentinel-aware connection:
-
-```yaml
-# application-prod.yml
-spring:
-  data:
-    redis:
-      sentinel:
-        master: prod-master
-        nodes:
-          - redis-sentinel:26379
-      password: ${REDIS_PASSWORD}
-```
-
-### Verification
-
-```bash
-# Query master identity:
-docker exec $(docker ps -q -f name=iklimco_redis-sentinel | head -1) \
-  redis-cli -p 26379 SENTINEL get-master-addr-by-name prod-master
-```
-
-## Step 6 — RabbitMQ: 3-node Erlang cluster (prod overlay)
-
-RabbitMQ runs as a 3-node cluster with one instance per app node.
-
-### Prerequisites
-
-```bash
-# Create Docker secret for Erlang cookie (must be identical on all nodes):
-openssl rand -hex 32 | docker secret create rabbitmq_erlang_cookie -
-```
-
-### docker-stack-infra.prod.yml — RabbitMQ override
-
-```yaml
-# docker-stack-infra.prod.yml (add alongside redis services)
-services:
-  rabbitmq:
-    image: rabbitmq:3-management
-    hostname: "rabbitmq-{{.Node.Hostname}}"
-    environment:
-      RABBITMQ_ERLANG_COOKIE_FILE: /run/secrets/rabbitmq_erlang_cookie
-      RABBITMQ_USE_LONGNAME: "true"
-      RABBITMQ_NODENAME: "rabbit@rabbitmq-{{.Node.Hostname}}"
-    secrets:
-      - rabbitmq_erlang_cookie
-    networks:
-      iklimco-net:
-        aliases:
-          - "rabbitmq-{{.Node.Hostname}}"
-    deploy:
-      mode: replicated
-      replicas: 3
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-      update_config:
-        parallelism: 1
-        order: stop-first
-    labels:
-      project: co.iklim
-
-secrets:
-  rabbitmq_erlang_cookie:
-    external: true
-
-networks:
-  iklimco-net:
-    external: true
-```
-
-### Cluster join procedure (first setup)
-
-RabbitMQ nodes do not form a cluster automatically; manual join is required after first start:
-
-```bash
-# Find the RabbitMQ container on iklim-app-02:
-CTR=$(docker ps -q -f name=iklimco_rabbitmq)
-
-# Stop, join, start:
-docker exec "$CTR" rabbitmqctl stop_app
-docker exec "$CTR" rabbitmqctl join_cluster rabbit@rabbitmq-iklim-app-01
-docker exec "$CTR" rabbitmqctl start_app
-
-# Repeat for iklim-app-03
-```
-
-```bash
-# Verify cluster status (from any node):
-docker exec "$CTR" rabbitmqctl cluster_status
-```
-
-> **HA policy:** After the cluster is formed, set quorum queues as the default:
-> ```bash
-> docker exec "$CTR" rabbitmqctl set_policy ha-all ".*" \
->   '{"queue-type":"quorum"}' --apply-to queues
-> ```
-
-## Step 7 — RabbitMQ WebSocket Sticky Sessions (Consistent Hash)
-
-RabbitMQ Web STOMP (over WebSocket) requires a persistent connection. In a 3-node RabbitMQ cluster, if an APISIX instance uses the default Swarm VIP for the `rabbitmq` upstream, it may cause unnecessary inter-node traffic or connection drops if the session doesn't persist on the same node.
-
-To optimize this, we implement **Consistent Hashing (chash)** at the APISIX layer based on the client's IP address (`remote_addr`).
-
-### 1. Update APISIX Upstream Configuration (init.sh)
-
-Update the `rabbitmq` upstream definition in `init/apisix-core/init.sh` to target specific cluster nodes instead of the generic service name, enabling the `chash` algorithm for prod.
-
-```bash
-# Update upstream rabbitmq block in init.sh
-if [[ "$PROFILE" == "prod" ]]; then
-  # Direct node DNS names to bypass Swarm VIP and allow chash to work effectively
-  RABBITMQ_NODES='{"rabbitmq-iklim-app-01:15674":1, "rabbitmq-iklim-app-02:15674":1, "rabbitmq-iklim-app-03:15674":1}'
-  LB_TYPE="chash"
-  HASH_KEY="remote_addr"
-else
-  RABBITMQ_NODES='{"rabbitmq:15674":1}'
-  LB_TYPE="roundrobin"
-  HASH_KEY=""
-fi
-
-call_api "upstream rabbitmq" -X PUT "$APISIX_ADMIN_URL/upstreams/rabbitmq-upstream" \
-  -H "X-API-KEY: $API_KEY" -H "Content-Type: application/json" \
-  -d '{
-    "name": "rabbitmq-upstream",
-    "type": "'"$LB_TYPE"'",
-    "key": "'"$HASH_KEY"'",
-    "nodes": '"$RABBITMQ_NODES"',
-    "timeout": {"connect": 10, "send": 3600, "read": 3600},
-    "scheme": "http",
-    '"$HC"'
-  }'
-```
-
-### 2. Enable Real IP Detection in APISIX
-
-Consistent hashing by `remote_addr` requires APISIX to see the actual client IP, not the internal IP of the SWAG (Nginx) proxy.
-
-> **DNS Note:** For `chash` to work with node-specific names, the RabbitMQ service must have network aliases configured for each node (e.g., `rabbitmq-{{.Node.Hostname}}`) as shown in Step 6.
-
-In the `config.yaml` inside the custom APISIX image (`custom-apisix:3.12.0`):
-
-```yaml
-nginx_config:
-  http:
-    real_ip_header: "X-Real-IP"
-    set_real_ip_from: "10.0.0.0/8"
-```
-
-## Step 8 — Create `docker-stack-infra.prod.yml`
-
-Create this file in the repo root alongside `docker-stack-infra.yml`. It combines all prod-specific overrides from Steps 2–6 (including disabling the standalone `etcd` from Step 4):
-
-```yaml
-# docker-stack-infra.prod.yml
-# Prod overlay — deploy with:
-#   docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
-
-services:
-
-  vault:
-    environment:
-      VAULT_LOCAL_CONFIG: >-
-        {"api_addr":"https://vault.iklim.co:8200",
-         "cluster_addr":"https://{{ .Node.Hostname }}:8201",
-         "storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
-         "listener":[{"tcp":{"address":"0.0.0.0:8200",
-           "tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
-           "tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}],
-         "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
-    volumes:
-      - /opt/iklimco/vault/data:/vault/file
-      - ${SWAG_CERT_DIR}:/vault/certs:ro
-    deploy:
-      mode: replicated
-      replicas: 3
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-
-  apisix:
-    deploy:
-      mode: replicated
-      replicas: 3
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-
-  apisix-dashboard:
-    deploy:
-      mode: replicated
-      replicas: 3
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-
-  redis:
-    image: bitnamisecure/redis:latest
-    environment:
-      ALLOW_EMPTY_PASSWORD: no
-      REDIS_PASSWORD: ${REDIS_PASSWORD}
-      REDIS_REPLICATION_MODE: master
-    deploy:
-      mode: replicated
-      replicas: 1
-      placement:
-        constraints:
-          - node.labels.type == service
-      restart_policy:
-        condition: any
-        delay: 5s
-    labels:
-      project: co.iklim
-
-  redis-replica:
-    image: bitnamisecure/redis:latest
-    environment:
-      ALLOW_EMPTY_PASSWORD: no
-      REDIS_REPLICATION_MODE: slave
-      REDIS_MASTER_HOST: redis
-      REDIS_MASTER_PORT_NUMBER: "6379"
-      REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
-      REDIS_PASSWORD: ${REDIS_PASSWORD}
-    deploy:
-      mode: replicated
-      replicas: 2
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-        preferences:
-          - spread: node.hostname
-      restart_policy:
-        condition: any
-        delay: 5s
-    labels:
-      project: co.iklim
-
-  redis-sentinel:
-    image: bitnamisecure/redis-sentinel:latest
-    environment:
-      REDIS_SENTINEL_MASTER_NAME: prod-master
-      REDIS_MASTER_HOST: redis
-      REDIS_MASTER_PORT_NUMBER: "6379"
-      REDIS_MASTER_PASSWORD: ${REDIS_PASSWORD}
-      REDIS_SENTINEL_QUORUM: "2"
-      REDIS_SENTINEL_DOWN_AFTER_MILLISECONDS: "5000"
-      REDIS_SENTINEL_FAILOVER_TIMEOUT: "10000"
-    deploy:
-      mode: replicated
-      replicas: 3
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-        preferences:
-          - spread: node.hostname
-      restart_policy:
-        condition: any
-        delay: 5s
-    labels:
-      project: co.iklim
-
-  rabbitmq:
-    image: rabbitmq:3-management
-    hostname: "rabbitmq-{{.Node.Hostname}}"
-    environment:
-      RABBITMQ_ERLANG_COOKIE_FILE: /run/secrets/rabbitmq_erlang_cookie
-      RABBITMQ_USE_LONGNAME: "true"
-      RABBITMQ_NODENAME: "rabbit@rabbitmq-{{.Node.Hostname}}"
-    secrets:
-      - rabbitmq_erlang_cookie
-    networks:
-      iklimco-net:
-        aliases:
-          - "rabbitmq-{{.Node.Hostname}}"
-    deploy:
-      mode: replicated
-      replicas: 3
-      placement:
-        max_replicas_per_node: 1
-        constraints:
-          - node.labels.type == service
-      update_config:
-        parallelism: 1
-        order: stop-first
-    labels:
-      project: co.iklim
-
-secrets:
-  rabbitmq_erlang_cookie:
-    external: true
-
-networks:
-  iklimco-net:
-    external: true
-```
-
-## Step 9 — Monitoring Data Persistence
-
-Prometheus and Grafana run as single instances. Grafana data is placed on the StorageBox shared filesystem for manual failover. Prometheus TSDB stays on a local Docker volume because DAVFS/StorageBox is not suitable for Prometheus WAL and compaction I/O.
-
-**Changes already applied to `docker-stack-infra.yml`:**
-
-```yaml
-prometheus:
-  volumes:
-    - prometheus-vl:/prometheus
-
-grafana:
-  volumes:
-    - ${GRAFANA_DATA_DIR:-grafana-vl}:/var/lib/grafana
-```
-
-Test uses the named Docker volume fallback (`grafana-vl`) for Grafana, and Prometheus always uses the named Docker volume (`prometheus-vl`) — no test env change needed.
-
-**Add to `prod/secrets/iklim.co/.env.prod` on storagebox** (already in `env-prod/.env`):
-
-```bash
-GRAFANA_DATA_DIR=/mnt/storagebox/grafana/data
-```
-
-> `/mnt/storagebox/grafana/data` is created automatically by the Ansible `storagebox` role during bootstrap via the `storagebox_managed_directories` variable. No manual step required.
-
-> Grafana writes its SQLite database and dashboard JSON to `/var/lib/grafana`.
-> Prometheus writes its TSDB to `/prometheus` on the local `prometheus-vl` Docker volume; it is not shared between nodes.
-
-## Step 10 — Verify
-
-```bash
-# Base file must be valid on its own (test deploy):
-docker stack config -c docker-stack-infra.yml > /dev/null && echo "base OK"
-
-# Prod merge must be valid:
-docker stack config -c docker-stack-infra.yml -c docker-stack-infra.prod.yml > /dev/null && echo "prod merge OK"
-```
-
-## Step 11 — Database Proxies and Developer Access
-
-In the production environment, the `pg-proxy` and `mongo-proxy` services (socat-based) defined in the base `docker-stack-infra.yml` are **deprecated and will not be used**.
-
-### Rationale
- **Leader Tracking:** Simple L4 proxies (socat) cannot track the Patroni Leader or MongoDB Primary. They point to a single service VIP, which might lead to a Read-Only replica during failover.
- **HA Connection Strings:** Modern DB drivers (JDBC, libpq, MongoClient) support multi-host connection strings, which provide native failover and load balancing without an intermediate proxy.
-
-### Developer Access Strategy
- **Direct Subnet Access:** Developers connect via WireGuard directly to the DB subnet (`10.20.20.0/24`).
- **No Translation:** Instead of mapping ports like `15432`, the standard ports (`5432`, `27017`) are used across all cluster nodes.
-
-## Placement and Replica Summary — prod
-
-| Service          | File         | Replicas | Placement                                   | HA Note                                                                               |
-| ---------------- | ------------ | -------- | ------------------------------------------- | ------------------------------------------------------------------------------------- |
-| swag             | base         | 1        | `node.hostname == iklim-app-01`             | No clustering support; Floating IP pinned to node                                     |
-| cert-reloader    | base         | 1        | `node.hostname == iklim-app-01`             | Cron-style task; duplicate would be problematic                                       |
-| vault            | prod overlay | 3        | `node.labels.type == service`; max 1/node   | Raft cluster — see `07-vault-raft-plan.md`                                            |
-| apisix           | prod overlay | 3        | `node.labels.type == service`; max 1/node   | Stateless; config in Patroni etcd; rate limit policy:redis                            |
-| apisix-dashboard | prod overlay | 3        | `node.labels.type == service`; max 1/node   | Stateless; reads from etcd                                                            |
-| redis (master)   | prod overlay | 1        | `node.labels.type == service`; Swarm spread | Sentinel cluster master; not pinned — reschedules on node failure                     |
-| redis-replica    | prod overlay | 2        | `node.labels.type == service`; max 1/node   | Sentinel replica; spread:hostname                                                     |
-| redis-sentinel   | prod overlay | 3        | `node.labels.type == service`; max 1/node   | Quorum=2; failover automatic                                                          |
-| rabbitmq         | prod overlay | 3        | `node.labels.type == service`; max 1/node   | Erlang cluster; quorum queues                                                         |
-| prometheus       | base         | 1        | `node.labels.type == service`               | No native HA; Thanos is overkill at this scale                                        |
-| grafana          | base         | 1        | `node.labels.type == service`               | Not critical                                                                          |
-
-> PostgreSQL and MongoDB run in separate DB stacks on `iklimco-*` nodes. See `08-prod-db-cluster-kurulum.md`.
-> etcd: 3-node cluster on DB nodes — APISIX shares it via `/apisix` prefix.
+For current execution, use the setup runbooks and root stack files listed in the Context section.
--- a/roadmap/prod-env/07-vault-raft-plan.md
+++ b/roadmap/prod-env/07-vault-raft-plan.md
@ -1,121 +1,83 @@
-# 07 — Vault: 3-Node Raft Cluster (Prod)
+# 07 — Vault Raft Stack and Bootstrap Automation (Prod)

 ## Context
-Vault starts directly as a 3-node Raft cluster in prod. The single-instance phase used in test is skipped.

-Test used a single Vault instance (file storage, 1 replica on the manager node). Prod goes straight to Raft HA.
+Production Vault is a 3-node Raft cluster, but it is no longer initialized through a manual post-deploy runbook.

-## Vault service configuration
+Current references:

- **Replicas:** 3 (one per service node)
- **Storage:** Raft integrated storage
- **Placement:** `node.labels.type == service` (all 3 app nodes)
- **Cert distribution:** No SSH needed — all nodes mount StorageBox, cert-reloader writes to `SWAG_CERT_DIR=/mnt/storagebox/ssl`, Vault reads from that path on every node
+- Setup source: `../../setup/09-prod-runner-ha-and-swarm.md`
+- Stack file: root `docker-stack-vault.yml`
+- Bootstrap script: root `init/vault/vault-bootstrap.sh`
+- Template: root `init/vault/vault-template-v2.json`

-### Prerequisites
+## Current Model

- [ ] All 3 service nodes are running and labeled `type=service`
- [ ] `/mnt/storagebox/ssl/` directory is mounted and accessible on all 3 app nodes
- [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes)
+Vault is deployed separately from `docker-stack-infra_db-prod.yml`.

-### Vault service YAML (docker-stack-infra.prod.yml overlay)
+The Vault stack uses:

-```yaml
-vault:
-  # ... (image, secrets, healthcheck unchanged from base)
-  environment:
-    VAULT_LOCAL_CONFIG: >-
-      {"api_addr":"https://vault.iklim.co:8200",
-       "cluster_addr":"https://{{ .Node.Hostname }}:8201",
-       "storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
-       "listener":[{"tcp":{"address":"0.0.0.0:8200",
-         "tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
-         "tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}],
-       "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
-  volumes:
-    - /opt/iklimco/vault/data:/vault/file    # host path per node
-    - ${SWAG_CERT_DIR}:/vault/certs:ro   # StorageBox — shared across all nodes, no SSH distribution needed
-  deploy:
-    mode: replicated
-    replicas: 3
-    placement:
-      max_replicas_per_node: 1
-      constraints:
-        - node.labels.type == service
+- 3 replicas, one per service node when placement allows it.
+- Docker volumes such as `vault-data-vl` and `vault-logs-vl`.
+- `/opt/iklimco/ssl:/vault/certs:ro` for TLS certificates.
+- `iklimco-net` as an external overlay network.
+- `vault_unseal_key` as a Docker secret.
+
+The production workflow calls `init-infra-prod.sh`, which calls `init/vault/vault-bootstrap.sh`. The bootstrap script handles stack deploy, initialization, unseal key secret rotation, peer join, and peer unseal.
+
+## Certificate Flow
+
+Vault does not read TLS certificates directly from `/mnt/storagebox/ssl`.
+
+The current flow is:
+
+```text
+SWAG renews certificate
+cert-reloader copies renewed files to /mnt/storagebox/ssl
+cert-distributor syncs certificate files to /opt/iklimco/ssl on service nodes
+Vault reads /opt/iklimco/ssl through the /vault/certs mount
 ```

-> `{{ .Node.Hostname }}` is Docker Swarm's Go template for the node hostname —
-> gives each Vault instance a unique `node_id`.
+## Bootstrap Flow

-## Raft initialization procedure (first deploy)
+Normal production bootstrap is automated:

-### Step 1 — Deploy the stack
+1. Create or refresh the placeholder `vault_unseal_key` secret when needed.
+2. Deploy `docker-stack-vault.yml`.
+3. Initialize Vault with one key share and one threshold if it is not initialized.
+4. Replace the placeholder `vault_unseal_key` secret with the real unseal key.
+5. Unseal the leader.
+6. Join peers to the Raft cluster.
+7. Unseal peers.
+8. Verify Raft peers and service health.
+
+These operations belong to `vault-bootstrap.sh`, not to a manual operator checklist.
+
+## Verification
+
+Use the current setup verification flow:

 ```bash
-docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
+docker service ps iklimco_vault
+docker exec $(docker ps -q -f name=iklimco_vault | head -1) vault status
+docker exec $(docker ps -q -f name=iklimco_vault | head -1) vault operator raft list-peers
 ```

-All 3 Vault containers start. Only the first one to initialize becomes the leader.
+Expected state:

-### Step 2 — Initialize Vault on the leader (iklim-app-01)
+- Vault service has 3 running tasks.
+- `vault status` reports `Sealed false`.
+- Raft list shows one leader and two followers.

-```bash
-VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
-docker exec -it "$VAULT_CTR" vault operator init
-```
+## Historical / Superseded by Setup

-Save the unseal keys and root token securely. Store the unseal key as a Docker secret:
+The previous manual procedure is superseded:

-```bash
-echo -n "<unseal-key>" | docker secret create vault_unseal_key -
-```
+- Deploying Vault through `docker-stack-infra.yml` + `docker-stack-infra.prod.yml`.
+- Creating `/opt/iklimco/vault/data` host-path directories on each app node.
+- Running `vault operator init` manually.
+- Manually copying/storing unseal keys.
+- Manually running `vault operator raft join` on peers.
+- Manually unsealing each peer after join.

-### Step 3 — Unseal the leader
-
-```bash
-docker exec -it "$VAULT_CTR" vault operator unseal
-```
-
-The healthcheck auto-unseals on subsequent restarts via the `vault_unseal_key` secret.
-
-### Step 4 — Join remaining nodes to the Raft cluster
-
-On iklim-app-02 and iklim-app-03 containers:
-
-```bash
-docker exec -it <vault-on-iklim-app-02> vault operator raft join \
-  https://vault.iklim.co:8200
-
-docker exec -it <vault-on-iklim-app-03> vault operator raft join \
-  https://vault.iklim.co:8200
-```
-
-Unseal each node after joining:
-
-```bash
-docker exec -it <vault-on-iklim-app-02> vault operator unseal
-docker exec -it <vault-on-iklim-app-03> vault operator unseal
-```
-
-### Step 5 — Verify cluster
-
-```bash
-docker exec "$VAULT_CTR" vault operator raft list-peers
-```
-
-Expected: 3 peers, one `leader`, two `follower`.
-
-## cert-reloader — no additional changes needed for Raft
-
-cert-reloader writes the cert to `SWAG_CERT_DIR=/mnt/storagebox/ssl`.
-Since StorageBox is mounted on all app nodes, every Vault instance already sees the same path.
-
-The cert renewal flow works unchanged with Raft:
-```
-cert changed → copy to /mnt/storagebox/ssl/ → docker service update --force iklimco_vault
-Vault (3 replicas) restart → each auto-unseals via healthcheck
-```
-
-## Reference
- Vault Raft storage docs: https://developer.hashicorp.com/vault/docs/configuration/storage/raft
- Vault Swarm setup: https://manjit28.medium.com/setting-up-a-secure-and-highly-available-hashicorp-vault-cluster-for-secrets-and-certificates-0ce01a370582
+Keep those notes only as historical context. For current prod, use `docker-stack-vault.yml` and `init/vault/vault-bootstrap.sh`.
--- a/setup-vs-roadmap-map.md
+++ b/setup-vs-roadmap-map.md
@ -1,24 +1,23 @@
 # Setup Aşamaları — Roadmap Eşleştirme Tablosu

-Bu tablo, `roadmap/test-env` ve `roadmap/prod-env` klasörlerindeki yol haritası adımlarının
-Terraform/Ansible setup aşamalarından hangisinde ele alındığını gösterir.
+Bu tablo, `roadmap/test-env` ve `roadmap/prod-env` klasörlerindeki yol haritası adımlarının Terraform/Ansible setup aşamalarından hangisinde ele alındığını gösterir.

 ## TEST ortamı

 | Roadmap adımı | Hangi aşamada ele alınmalı |
 | --- | --- |
-| Hetzner firewall (sadece 22/80/443) | **Terraform `02-test-terraform-iaac.md`** — `firewall.tf` |
-| Sunucu oluşturma (`iklim-app-01`, `iklim-db-01`) | **Terraform `02-test-terraform-iaac.md`** — `servers.tf` |
-| Private network + placement group (`iklim-test-spread`) | **Terraform `02-test-terraform-iaac.md`** — `network.tf`, `placement.tf` |
-| Floating IP (`iklim-test-app-fip`) | **Terraform `02-test-terraform-iaac.md`** — `floating_ip.tf` |
+| Hetzner firewall (sadece 22/80/443) | **Terraform `02-test-terraform-iac.md`** — `firewall.tf` |
+| Sunucu oluşturma (`iklim-app-01`, `iklim-db-01`) | **Terraform `02-test-terraform-iac.md`** — `servers.tf` |
+| Private network + placement group (`iklim-test-spread`) | **Terraform `02-test-terraform-iac.md`** — `network.tf`, `placement.tf` |
+| Floating IP (`iklim-test-app-fip`) | **Terraform `02-test-terraform-iac.md`** — `floating_ip.tf` |
 | Docker Engine kurulumu (app + db node) | **Ansible `03-test-ansible-bootstrap.md`** — `docker` role |
 | Security hardening (SSH, firewalld, fail2ban) | **Ansible `03-test-ansible-bootstrap.md`** — `hardening` role |
 | Docker Swarm init + `iklim-db-01` worker join | **Ansible `03-test-ansible-bootstrap.md`** — `swarm` role |
 | `type=service` ve `role=db` node label'ları | **Ansible `03-test-ansible-bootstrap.md`** — `swarm` role |
 | `/opt/iklimco/...` dizinleri | **Ansible `03-test-ansible-bootstrap.md`** — `node_dirs` role |
 | StorageBox DAVFS mount (`u469968-sub4`) | **Ansible `03-test-ansible-bootstrap.md`** — `storagebox` role |
-| DB stack deploy (PostgreSQL + MongoDB on `iklim-db-01`) | **Manuel `04-test-db-docker-kurulum.md`** |
-| `act_runner` systemd kurulumu | **Ansible `05-test-runner-ve-deploy-onkosullari.md`** — `act_runner` role (`test-app-post-stack.yml`) |
+| DB stack deploy (PostgreSQL + MongoDB on `iklim-db-01`) | **Manuel `04-test-db-docker-setup.md`** |
+| `act_runner` systemd kurulumu | **Ansible `05-test-runner-and-deploy-prerequisites.md`** — `act_runner` role (`test-app-post-stack.yml`) |
 | GoDaddy credentials storagebox'a yükleme | **Manuel kalır** — secret yönetimi, Terraform/Ansible dışı |
 | `docker-stack-infra.yml` port kaldırma + SWAG/cert-reloader ekleme | **Pipeline `deploy-test.yml`** + **repo değişikliği** — `roadmap/test-env/03` |
 | SWAG nginx proxy conf'ları (`template/swag/site-confs/*.conf.tpl`) | **Repo içinde teslim edildi** — `roadmap/test-env/04` |
@ -31,22 +30,22 @@ Terraform/Ansible setup aşamalarından hangisinde ele alındığını gösterir

 | Roadmap adımı | Hangi aşamada ele alınmalı |
 | --- | --- |
-| 6 sunucu oluşturma (`iklim-app-01/02/03`, `iklim-db-01/02/03`) | **Terraform `06-prod-terraform-iaac.md`** — `servers.tf` |
-| Private network + 2 placement group | **Terraform `06-prod-terraform-iaac.md`** — `network.tf`, `placement.tf` |
-| Firewall (sadece 22/80/443 public; private port matrisi) | **Terraform `06-prod-terraform-iaac.md`** — `firewall.tf` |
-| Floating IP (`iklim-prod-app-fip`, `iklim-app-01`'e atanır) | **Terraform `06-prod-terraform-iaac.md`** — `floating_ip.tf` |
+| 6 sunucu oluşturma (`iklim-app-01/02/03`, `iklim-db-01/02/03`) | **Terraform `06-prod-terraform-iac.md`** — `servers.tf` |
+| Private network + 2 placement group | **Terraform `06-prod-terraform-iac.md`** — `network.tf`, `placement.tf` |
+| Firewall (sadece 22/80/443 public; private port matrisi) | **Terraform `06-prod-terraform-iac.md`** — `firewall.tf` |
+| Floating IP (`iklim-prod-app-fip`, `iklim-app-01`'e atanır) | **Terraform `06-prod-terraform-iac.md`** — `floating_ip.tf` |
 | Docker Engine kurulumu (tüm node'lar — app ve db) | **Ansible `07-prod-ansible-bootstrap.md`** — `docker` role |
 | Security hardening (tüm node'lar) | **Ansible `07-prod-ansible-bootstrap.md`** — `hardening` role |
 | Swarm init (`iklim-app-01`) + manager join (`iklim-app-02/03`) | **Ansible `07-prod-ansible-bootstrap.md`** — `swarm` role |
 | `type=service` node label (3 app node) | **Ansible `07-prod-ansible-bootstrap.md`** — `swarm` role |
 | `/opt/iklimco/...` dizinleri + `/opt/iklimco/stacks` | **Ansible `07-prod-ansible-bootstrap.md`** — `node_dirs` role |
 | StorageBox DAVFS mount (`u469968-sub5`) | **Ansible `07-prod-ansible-bootstrap.md`** — `storagebox` role |
-| DB node'larını Swarm'a worker olarak join et | **Manuel `08-prod-db-cluster-kurulum.md`** — Bölüm 2 |
-| `role=db` node label (3 db node) | **Manuel `08-prod-db-cluster-kurulum.md`** — Bölüm 2 |
-| etcd cluster deploy (Patroni için) | **Manuel `08-prod-db-cluster-kurulum.md`** — Bölüm 5.2 |
-| MongoDB replica set deploy | **Manuel `08-prod-db-cluster-kurulum.md`** — Bölüm 4 |
-| Patroni + PostgreSQL HA deploy | **Manuel `08-prod-db-cluster-kurulum.md`** — Bölüm 5.4 |
-| 3× `act_runner` systemd (HA runner) | **Ansible `09-prod-runner-ha-ve-swarm.md`** — `act_runner` role |
+| DB node'larını Swarm'a worker olarak join et | **Manuel `08-prod-db-cluster-setup.md`** — Bölüm 2 |
+| `role=db` node label (3 db node) | **Manuel `08-prod-db-cluster-setup.md`** — Bölüm 2 |
+| etcd cluster deploy (Patroni için) | **Manuel `08-prod-db-cluster-setup.md`** — Bölüm 5.2 |
+| MongoDB replica set deploy | **Manuel `08-prod-db-cluster-setup.md`** — Bölüm 4 |
+| Patroni + PostgreSQL HA deploy | **Manuel `08-prod-db-cluster-setup.md`** — Bölüm 5.4 |
+| 3× `act_runner` systemd (HA runner) | **Ansible `09-prod-runner-ha-and-swarm.md`** — `act_runner` role |
 | GoDaddy credentials storagebox'a yükleme | **Manuel kalır** — secret yönetimi, Terraform/Ansible dışı |
 | `docker-stack-infra.yml` port kaldırma + SWAG/cert-reloader ekleme | **Repo değişikliği** — `roadmap/prod-env/03` |
 | SWAG nginx proxy conf'ları (`template/swag/site-confs/*.conf.tpl`) | **Repo içinde teslim edildi** — `roadmap/prod-env/04` |
@ -61,16 +60,16 @@ Terraform/Ansible setup aşamalarından hangisinde ele alındığını gösterir
 ```
 Environment_Infrastructure/
  setup/                              ← Terraform + Ansible aşama dokümanları
-    00-genel-yol-haritasi.md
-    01-private-network-port-matrisi.md
-    02-test-terraform-iaac.md
+    00-general-roadmap.md
+    01-private-network-port-matrix.md
+    02-test-terraform-iac.md
    03-test-ansible-bootstrap.md
-    04-test-db-docker-kurulum.md
-    05-test-runner-ve-deploy-onkosullari.md
-    06-prod-terraform-iaac.md
+    04-test-db-docker-setup.md
+    05-test-runner-and-deploy-prerequisites.md
+    06-prod-terraform-iac.md
    07-prod-ansible-bootstrap.md
-    08-prod-db-cluster-kurulum.md
-    09-prod-runner-ha-ve-swarm.md
+    08-prod-db-cluster-setup.md
+    09-prod-runner-ha-and-swarm.md
  roadmap/
    test-env/                         ← Test ortamı Roadmap adımları
    prod-env/                         ← Prod Roadmap adımları
--- a/setup/00-genel-yol-haritasi.md
+++ b/setup/00-genel-yol-haritasi.md
@ -43,9 +43,9 @@ Minimum topology for the test environment:
 | Node | Role | Note |
 | --- | --- | --- |
 | `iklim-app-01` | Swarm manager + app worker + Gitea runner | CI/CD test deploy runs through this node |
-| `iklim-db-01` | DB node | DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD |
+| `iklim-db-01` | DB node / Swarm worker | DB host prerequisites are prepared by Ansible; DB services are deployed as Swarm services by the environment stack/pipeline |

-The test DB setup is brought only up to machine and OS preparation with Terraform/Ansible. PostgreSQL/MongoDB cluster installation is outside this phase.
+The test DB setup is brought up to OS, Docker, Swarm worker, config directory, and WireGuard preparation with Terraform/Ansible. PostgreSQL/MongoDB runtime services are not installed directly on the OS; they run as Docker Swarm services.

 ### Prod

@ -56,23 +56,25 @@ HA topology for the prod environment:
 | `iklim-app-*` | 3 | Each one is a Swarm manager + app worker |
 | `iklim-db-*` | 3 | DB cluster nodes |

-Prod DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD. Terraform prepares the DB machines and network/firewall rules; Ansible installs OS hardening and base dependencies.
+Prod DB host prerequisites are prepared by Terraform/Ansible. Runtime DB services are part of the current prod Swarm stack: etcd, Patroni/PostgreSQL, and MongoDB replica set are deployed by the prod root pipeline through `docker-stack-infra_db-prod.yml`.

 ## Public Port Policy

-Ports open to the public internet are only:
+Ports open to the public internet are normally only:

 - `22/tcp` SSH, only from admin IP/CIDR sources
 - `80/tcp` HTTP
 - `443/tcp` HTTPS

+Test has one explicit exception: `51820/udp` is opened on the DB node for WireGuard VPN, authenticated cryptographically. Prod currently does not expose `51820/udp` in Terraform.
+
 `8200/tcp` Vault will not be opened to the public internet. Vault must be reachable only from the private network or Docker overlay.

-`docker-stack-infra.yml` has been aligned with this policy: only the SWAG service publishes ports 80/443; all other services such as Vault, APISIX, RabbitMQ, Prometheus, and Grafana are reachable only through the `iklimco-net` overlay.
+Current prod stack behavior is aligned with this policy: `docker-stack-infra_db-prod.yml` publishes public traffic through SWAG on 80/443. Vault is deployed separately by `vault-bootstrap.sh` using `docker-stack-vault.yml`; it is not publicly exposed.

 ## Private Network Policy

-The detailed matrix of ports that must be opened inside the private network is in `01-private-network-port-matrisi.md`. Agents must treat that file as the source when writing firewall or Ansible UFW rules.
+The detailed matrix of ports that must be opened inside the private network is in `01-private-network-port-matrix.md`. Agents must treat that file as the source when writing Terraform Hetzner firewall rules and Ansible `firewalld` rules.

 ## Gitea Actions Runner Decision

--- a/setup/01-private-network-port-matrisi.md
+++ b/setup/01-private-network-port-matrisi.md
@ -1,8 +1,8 @@
-# 07 - Private Network Port Matrix
+# 01 - Private Network Port Matrix

-This file defines the ports that must be opened inside the Hetzner private network for test and prod environments. Ports open to the public internet will only be `22/tcp`, `80/tcp`, and `443/tcp`. Vault `8200/tcp` will not be opened publicly.
+This file defines the ports that must be opened inside the Hetzner private network for test and prod environments. Public ingress is limited to `22/tcp`, `80/tcp`, and `443/tcp`, with one current test-only exception: `51820/udp` is public on the test DB node for WireGuard. Vault `8200/tcp` will not be opened publicly.

-This matrix must be treated as the source for Terraform Hetzner firewall and Ansible UFW rules.
+This matrix must be treated as the source for Terraform Hetzner firewall and Ansible `firewalld` rules.

 ## Network Plan

@ -11,25 +11,25 @@ This matrix must be treated as the source for Terraform Hetzner firewall and Ans
 | Subnet | CIDR | Purpose |
 | --- | --- | --- |
 | App/Swarm | `10.10.10.0/24` | `iklim-app-01` |
-| DB | `10.10.20.0/24` | `test-db-01` |
+| DB | `10.10.20.0/24` | `iklim-db-01` |

 ### Prod

 | Subnet | CIDR | Purpose |
 | --- | --- | --- |
 | App/Swarm | `10.20.10.0/24` | `iklim-app-01/02/03` |
-| DB | `10.20.20.0/24` | `prod-db-01/02/03` |
+| DB | `10.20.20.0/24` | `iklim-db-01/02/03` |

 ## Public Ingress Standard

-Public ingress for all environments:
+Public ingress:

 | Port | Protocol | Source | Target | Requirement |
 | --- | --- | --- | --- | --- |
 | `22` | TCP | Admin IP/CIDR | All nodes | SSH management |
 | `80` | TCP | Internet | `iklim-app-01` (gateway) | HTTP / ACME redirect |
 | `443` | TCP | Internet | `iklim-app-01` (gateway) | HTTPS |
-| `51820` | UDP | `0.0.0.0/0`, `::/0` | `iklim-db-01` (DB node) | WireGuard VPN — authentication with cryptographic key |
+| `51820` | UDP | `0.0.0.0/0`, `::/0` | `iklim-db-01` in test only | WireGuard VPN — authentication with cryptographic key |

 Critical ports that will not be opened publicly:

@ -80,11 +80,11 @@ These ports will not be opened publicly. Access will be allowed only from requir
 | `9090` | TCP | Prometheus UI/API | Admin CIDR or private ops | Prometheus service/node | Public closed |
 | `3000` | TCP | Grafana UI | Admin CIDR or private ops | Grafana service/node | Public closed |

-`docker-stack-infra.yml` has been updated so that only the SWAG service publishes ports 80/443 in host mode. All other services contain no published ports; access is provided only through the `iklimco-net` overlay. This table remains the source for private ingress decisions.
+The current prod root stack is `docker-stack-infra_db-prod.yml`; Vault is deployed separately with `docker-stack-vault.yml` through `vault-bootstrap.sh`. Public traffic is expected to enter through SWAG on 80/443. Private service reachability is provided by the `iklimco-net` overlay and by the explicit host-mode DB/cluster ports listed below.

 ## DB Node Ports

-Because DB infrastructure will be installed manually, the exact cluster technology is outside this document. Still, the default ports for firewall purposes are below.
+DB runtime services are deployed as Docker Swarm services. Prod currently uses Patroni/PostgreSQL, etcd, and a MongoDB replica set in `docker-stack-infra_db-prod.yml`; the required firewall ports are below.

 ### PostgreSQL / PostGIS (Patroni + etcd)

@ -129,7 +129,7 @@ App subnet (swarm firewall) — traffic inside itself:
 | Source | Target | Ports |
 | --- | --- | --- |
 | `10.20.10.0/24` | `10.20.10.0/24` | `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` (Swarm) |
-| `10.20.10.0/24` | `10.20.10.0/24` | `8200/tcp`, `6379/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp`, `2379/tcp` (application services) |
+| `10.20.10.0/24` | `10.20.10.0/24` | `8200/tcp`, `5672/tcp`, `61613/tcp`, `15674/tcp` (application services) |
 | Admin CIDR or VPN | `10.20.10.0/24` | `15672/tcp`, `9180/tcp`, `9090/tcp`, `3000/tcp` |

 App -> DB traffic (there is no related rule in the swarm firewall; it is allowed in the db firewall):
@ -157,7 +157,7 @@ DB -> App traffic (allowed in the swarm firewall):

 - The public firewall does not open `8200/tcp`.
 - DB ports are not open publicly.
- Swarm ports are open only inside the private app/swarm subnet.
+- Swarm ports are open only between Swarm app and DB subnets.
 - The App/Swarm subnet reaches the DB subnet only through required DB ports.
 - The DB subnet is not opened to the app subnet with broad permissions.
 - Admin UI ports are restricted through admin CIDR/VPN/private ops instead of public access.
--- a/setup/02-test-terraform-iaac.md
+++ b/setup/02-test-terraform-iaac.md
@ -11,8 +11,8 @@ Terraform creates the following in the test environment:
  - App/Swarm subnet: `10.10.10.0/24`
  - DB subnet: `10.10.20.0/24`
 - Firewall:
-  - Public ingress: only `22/tcp`, `80/tcp`, `443/tcp`
-  - Private ingress: test rules in `01-private-network-port-matrisi.md`
+  - Public ingress: `22/tcp`, `80/tcp`, `443/tcp`, plus test DB WireGuard `51820/udp`
+  - Private ingress: test rules in `01-private-network-port-matrix.md`
 - SSH key
 - Placement group: `iklim-test-spread`
 - Floating IP: stable IPv4 for the swarm entry point
@ -21,7 +21,7 @@ Terraform creates the following in the test environment:
  - `iklim-db-01`
 - Ansible inventory output

-Terraform does not install DB software. The DB node is prepared only at the machine, network, and firewall level.
+Terraform does not install DB software. The DB node is prepared at the machine, network, and firewall level; Ansible later prepares Docker, Swarm worker membership, DB config directories, and WireGuard.

 ## Recommended File Structure

@ -69,7 +69,7 @@ The server type decision is based on the current test environment metrics in `..
 | Server | Private IP | Role |
 | --- | --- | --- |
 | `iklim-app-01` | `10.10.10.11` | Swarm manager + app worker + Gitea runner |
-| `iklim-db-01` | `10.10.20.11` | DB node prepared for manual DB installation |
+| `iklim-db-01` | `10.10.20.11` | DB node / Swarm worker for DB services |

 Private IPs must be statically defined inside Terraform. Ansible inventory and firewall rules remain deterministic.

@ -91,7 +91,7 @@ Public ingress:
 | `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-01` |
 | `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-01` |

-For public ingress, `8200/tcp`, `5432/tcp`, `27017/tcp`, `5672/tcp`, `15672/tcp`, `6379/tcp`, `2379/tcp`, `9000/tcp`, `9180/tcp`, `9090/tcp`, and `3000/tcp` will not be opened.
+For public ingress, `8200/tcp`, `5432/tcp`, `27017/tcp`, `5672/tcp`, `15672/tcp`, `6379/tcp`, `2379/tcp`, `9000/tcp`, `9180/tcp`, `9090/tcp`, and `3000/tcp` will not be opened. `51820/udp` is the explicit test-only public exception for WireGuard.

 ### App (swarm) Firewall — Private Ingress

@ -133,9 +133,9 @@ Source from DB subnet, because `iklim-db-01` joins Swarm as a worker:
 | `7946/tcp,udp` | Docker Swarm node discovery | `10.10.10.0/24` (app subnet) |
 | `4789/udp` | Docker Swarm VXLAN overlay | `10.10.10.0/24` (app subnet) |

-IP restriction is done in the SWAG nginx configuration, not in the Hetzner firewall. None of these ports are opened publicly from the `admin_allowed_cidrs` source.
+IP restriction is done in the SWAG nginx configuration, not in the Hetzner firewall. None of these management ports are opened publicly from the `admin_allowed_cidrs` source.

-For other private ingress rules, `01-private-network-port-matrisi.md` will be used as the source.
+For other private ingress rules, `01-private-network-port-matrix.md` will be used as the source.

 ## Placement Group

@ -204,6 +204,6 @@ Each server gets `lifecycle { prevent_destroy = true }`. While this block exists
 - `terraform plan` works only with the test Hetzner Project token.
 - 2 servers are created after `terraform apply`.
 - The two servers can reach each other through the private network.
- Only `22`, `80`, and `443` are open at firewall level from the public internet.
+- Only `22`, `80`, `443`, and test WireGuard `51820/udp` are open at firewall level from the public internet.
 - Vault `8200` remains closed from the public internet.
 - Terraform state is not committed to the repo.
--- a/setup/03-test-ansible-bootstrap.md
+++ b/setup/03-test-ansible-bootstrap.md
@ -97,7 +97,7 @@ ansible-playbook test-bootstrap.yml --tags "hardening" --ask-vault-pass
 | Host | Role |
 | --- | --- |
 | `iklim-app-01` | Swarm manager + app worker |
-| `iklim-db-01` | OS-hardened DB node for manual DB installation |
+| `iklim-db-01` | OS-hardened DB node / Swarm worker for DB services |

 ## Recommended File Structure

@ -281,7 +281,7 @@ Deploy prerequisites on `iklim-app-01`:
 /opt/iklimco/stacks
 ```

-Minimum for manual DB installation on the DB node:
+Minimum DB-node host directories:

 ```text
 /opt/iklimco
@ -391,7 +391,7 @@ vault_iklim_password: "IKLIM_USER_PASSWORD"
       creates: "{{ storagebox_mount_point }}/.mounted_marker"
   ```

-   A marker file can be written to the directory to confirm mount success:
+A marker file can be written to the directory to confirm mount success:

   ```yaml
   - name: Write mount marker
@ -402,7 +402,7 @@ vault_iklim_password: "IKLIM_USER_PASSWORD"

 6. **Create service bind mount directories**

-   In the test environment, the precipitation service's `image-data` volume is bind mounted on the host to `/mnt/storagebox/precipitation/images`. The directory is created by Ansible after StorageBox is mounted and left with `0755` permissions.
+In the test environment, the precipitation service's `image-data` volume is bind mounted on the host to `/mnt/storagebox/precipitation/images`. The directory is created by Ansible after StorageBox is mounted and left with `0755` permissions.

   ```yaml
   - name: Create managed StorageBox directories
@ -447,13 +447,13 @@ An ed25519 SSH key pair is generated on the server and uploaded to the StorageBo

 2. **Upload the public key to StorageBox**

-   This step is done manually and requires the password the first time:
+This step is done manually and requires the password the first time:

   ```bash
   cat /root/.ssh/id_ed25519_storagebox.pub | ssh -p23 u469968-sub4@u469968-sub4.your-storagebox.de install-ssh-key
   ```

-   Later access works passwordlessly:
+Later access works passwordlessly:

   ```bash
   sftp -P23 u469968-sub4@u469968-sub4.your-storagebox.de
@ -461,14 +461,14 @@ An ed25519 SSH key pair is generated on the server and uploaded to the StorageBo

 3. **Add private and public keys to Gitea**

-   Gitea -> Organization Settings -> Actions -> Secrets:
+Gitea -> Organization Settings -> Actions -> Secrets:

   | Secret Name | Value |
   | --- | --- |
   | `STORAGEBOX_SSH_PRIV` | Contents of `/root/.ssh/id_ed25519_storagebox` |
   | `STORAGEBOX_SSH_PUB` | Contents of `/root/.ssh/id_ed25519_storagebox.pub` |

-   To get the key contents:
+To get the key contents:

   ```bash
   cat /root/.ssh/id_ed25519_storagebox
--- a/setup/04-test-db-docker-kurulum.md
+++ b/setup/04-test-db-docker-kurulum.md
@ -1,6 +1,6 @@
-# 04 - Test DB Docker Installation (Swarm Worker)
+# 04 - Test DB Docker Setup (Swarm Worker)

-The purpose of this phase is to add the `iklim-db-01` node to Swarm as a worker and run PostgreSQL and MongoDB as Swarm services.
+The purpose of this phase is to add the `iklim-db-01` node to Swarm as a worker and prepare the host for PostgreSQL and MongoDB Swarm services.

 ## Architecture Decision

@ -8,12 +8,12 @@ The roadmap states that DBs will be installed "manually". In the test environmen

 The installation has **two phases:**
 1. **Preparation (Ansible):** The `test-db-post-stack.yml` playbook sets up DB directories, the `mongod.conf` configuration, and the WireGuard VPN service.
-2. **Deploy (Gitea CI/CD):** The `deploy-test.yml` workflow deploys PostgreSQL and MongoDB services to Swarm through `docker-stack-infra.yml`.
+2. **Deploy (Gitea CI/CD):** The test deploy workflow deploys PostgreSQL and MongoDB services as part of the environment stack.

 **Why?**
 1. **Ease of management:** Version transitions and configuration management are much faster with Docker.
 2. **Overlay Network:** Application services (`iklim-app-01`) can access DBs through the `iklimco-net` overlay network in an encrypted and isolated way.
-3. **Data persistence:** Data is stored in Docker named volumes on `iklim-db-01`. StorageBox is used only for backups.
+3. **Data persistence:** Runtime data is kept on the DB node. StorageBox is used for shared configuration, operational files, and backup-related paths, not as the primary DB data path.

 ## Prerequisites

@ -67,24 +67,21 @@ On `iklim-db-01`, through the `db_stack` and `wireguard` roles:
 - Places the `mongod.conf` file
 - Installs and configures the WireGuard VPN server (`51820/udp`)

-> Deploying DB services (PostgreSQL, MongoDB) to Swarm is the responsibility of the Gitea CI/CD workflow (`deploy-test.yml`), not Ansible. This workflow deploys all services at once through `docker-stack-infra.yml`.
+> Deploying DB services (PostgreSQL, MongoDB) to Swarm is the responsibility of the Gitea CI/CD workflow, not Ansible. The Ansible playbook prepares host directories, configuration, and WireGuard.

 ## 4. Volume and Data Structure

-DB data is stored in Docker named volumes on `iklim-db-01`:
+DB data is stored on `iklim-db-01` through the stack's configured volume or bind-mount layout. The Ansible `db_stack` role prepares MongoDB configuration at:

-| Volume | Content |
-|---|---|
-| `iklim-db_postgresql_data` | PostgreSQL data files |
-| `iklim-db_mongodb_data` | MongoDB data files |
+```text
+/opt/iklimco/db/mongodb/config/mongod.conf
+```

-MongoDB logs are written to stdout and can be watched with `docker logs`. Configuration: `/opt/iklimco/db/mongodb/config/mongod.conf`
-
-> StorageBox is **not used** for DB data. It only has a role in the backup strategy.
+MongoDB logs are written to stdout and can be watched with `docker logs`.

 ## 5. Acceptance Criteria

 - `iklim-db-01` appears as Ready and Active in the `docker node ls` command.
 - `docker stack services iklimco` shows both services with 1/1 replicas.
 - Access from the application node is available through the `iklim-db_postgresql` and `iklim-db_mongodb` DNS names.
- Data is preserved from named volumes after reboot; verify with `docker volume ls`.
+- Data is preserved after reboot according to the stack's configured DB volume/bind-mount layout.
--- a/setup/05-test-runner-and-deploy-prerequisites.md
+++ b/setup/05-test-runner-and-deploy-prerequisites.md
@ -8,7 +8,7 @@ A single runner is used in the test environment for cost and simplicity:

 | Host | Service Name | System User | Labels |
 | --- | --- | --- | --- |
-| `iklim-app-01` | `gitea-act-runner` | `gitea-runner` | `ubuntu-latest`, `ubuntu-22.04`, `ubuntu-20.04`, `test-runner` |
+| `iklim-app-01` | `gitea-act-runner` | `gitea-runner` | `ubuntu-latest`, `ubuntu-22.04`, `ubuntu-20.04`, `test-runner:docker://catthehacker/ubuntu:act-22.04` |

 ## 1. Runner User and Permissions

@ -56,14 +56,15 @@ Critical parts of the configuration:
 ```yaml
 runner:
  labels:
-    - "ubuntu-latest:docker://ubuntu:latest"
-    - "ubuntu-22.04:docker://ubuntu:22.04"
-    - "ubuntu-20.04:docker://ubuntu:20.04"
-    - "test-runner:docker://ubuntu:22.04"
+    - "ubuntu-latest"
+    - "ubuntu-22.04"
+    - "ubuntu-20.04"
+    - "test-runner:docker://catthehacker/ubuntu:act-22.04"

 container:
-  network: "iklimco-net"          # Access to DB services through overlay
-  options: "-v /var/run/docker.sock:/var/run/docker.sock"  # For Docker commands
+  network: "bridge"
+  options: "-v /mnt/storagebox:/mnt/storagebox"
+  docker_host: "unix:///var/run/docker.sock"
 ```

 Status check:
@ -94,7 +95,7 @@ The following secrets must be defined at Gitea Organization level for pipelines

 ## 6. Custom Image Build and Harbor Push

-`docker-stack-infra.yml` and microservice stacks use private images under `registry.tarla.io/iklimco/`. These images are built and pushed to the registry with the `ops/push-harbor-custom-images.sh` script.
+Environment stack files and microservice stacks use private images under `registry.tarla.io/iklimco/`. These images are built and pushed to the registry with the `ops/push-harbor-custom-images.sh` script.

 APISIX config files (`build/apisix-core/config.yaml`, `build/apisix-dashboard/conf.yaml`) are generated from templates under `template/` with `envsubst`. `push-harbor-custom-images.sh` performs this generation internally; temporary files are cleaned automatically when the build finishes.

@ -114,6 +115,6 @@ bash ops/push-harbor-custom-images.sh

 1. The runner labeled `test-runner` appears as **Idle** (green) on the Gitea Runners page.
 2. A workflow using `runs-on: test-runner` is triggered successfully.
-3. The job container can access the Docker daemon and the `iklimco-net` overlay network.
+3. The job can access the Docker daemon through `docker_host`, and deploy workflows connect job containers to `iklimco-net` when overlay access is required.
 4. The `8200/tcp` (Vault) port is closed to the public internet.
 5. `registry.tarla.io/iklimco/custom-apisix`, `custom-apisix-dashboard`, and `custom-prometheus` images exist in Harbor and are pullable.
--- a/setup/06-prod-terraform-iaac.md
+++ b/setup/06-prod-terraform-iaac.md
@ -12,7 +12,7 @@ Terraform creates the following in the prod environment:
  - DB subnet: `10.20.20.0/24`
 - Firewall:
  - Public ingress: only `22/tcp`, `80/tcp`, `443/tcp`
-  - Private ingress: prod rules in `01-private-network-port-matrisi.md`
+  - Private ingress: prod rules in `01-private-network-port-matrix.md`
 - SSH key
 - Placement groups:
  - `iklim-prod-app-spread`
@ -145,6 +145,13 @@ The following ports will not be opened publicly in prod:

 ## Private Firewall

+Firewall placement follows the Swarm placement model:
+
+- DB/cluster services on `iklim-db-*` nodes: Patroni/PostgreSQL, MongoDB, and etcd.
+- App/service-node infrastructure on `iklim-app-*` nodes: Vault, RabbitMQ, APISIX, Prometheus, Grafana, SWAG, and the Redis/Sentinel services from `docker-stack-infra_db-prod.yml`.
+
+RabbitMQ ports are therefore documented under the app firewall. Redis and Redis Sentinel do not publish host-mode ports in the current prod stack; they stay on the Docker overlay network and do not need Hetzner firewall openings.
+
 ### App (swarm) Firewall — Private Ingress

 Source from app subnet (`10.20.10.0/24`):
@ -340,7 +347,7 @@ Local state is used for now (`terraform.tfstate`). The state file is not committ
 - Swarm nodes are inside the `iklim-prod-app-spread` placement group.
 - DB nodes are inside the `iklim-prod-db-spread` placement group.
 - Public firewall allows only `22`, `80`, and `443` ingress.
- Private firewall is compatible with `01-private-network-port-matrisi.md`.
+- Private firewall is compatible with `01-private-network-port-matrix.md`.
 - DB replication ports are accessible only from the DB subnet.
 - Floating IP is created and assigned to `iklim-app-01`.
 - Terraform state and secret tfvars are not committed.
--- a/setup/07-prod-ansible-bootstrap.md
+++ b/setup/07-prod-ansible-bootstrap.md
@ -119,6 +119,8 @@ ansible/
        vars.yml
        vault.yml
    prod-bootstrap.yml
+    roles/
+      db_stack/
  roles/
    base/
    hardening/
@ -131,6 +133,8 @@ ansible/
    db_stack/
 ```

+`ansible/prod/ansible.cfg` sets `roles_path = roles:../roles`. Because of that ordering, `ansible/prod/roles/db_stack` is the production-specific role that is used by `prod-bootstrap.yml`; the shared `ansible/roles/db_stack` remains the common fallback/reference implementation. Production DB behavior that writes Patroni, MongoDB, and replica-set auth files to StorageBox belongs to the prod-local role.
+
 ## Base Role

 Applied to all prod nodes:
@ -200,30 +204,35 @@ Prod Swarm will be set up with 3 managers:
 1. `docker swarm init` on `iklim-app-01` (Advertise/data path addr: `10.20.10.11`)
 2. `iklim-app-02` and `iklim-app-03` join as managers.
 3. `iklim-db-01/02/03` join as workers.
-4. Overlay network is created: `iklimco-net`
+4. `iklimco-net` is not created by the Ansible swarm role. It is created and owned by the Swarm stack (`docker-stack-infra_db-prod.yml`) so Docker embedded DNS works for service VIPs and aliases.
 5. Node labels:
   - `iklim-app-*` -> `type=service`
-   - `iklim-db-*` -> `role=db`, `db-index=01/02/03`, for Patroni node coordination
+   - `iklim-db-*` -> `role=db`
+   - `iklim-db-*` -> `db-index=01/02/03`, for Patroni node coordination
 6. All nodes remain `AVAILABILITY=Active`.

-The `db-index` labels are added through `iklim-app-01` in a separate play inside `prod-bootstrap.yml`, not by the swarm role.
+Labeling is intentionally split across two automation layers:
+
+- The shared `swarm` role adds the generic environment labels: `type=service` on app nodes and `role=db` on DB nodes.
+- The production playbook adds `db-index=01/02/03` through `iklim-app-01` in a separate play inside `prod-bootstrap.yml`.
+
+This split keeps the common Swarm role reusable while letting prod add the Patroni/MongoDB coordination labels it needs.

 ## Node Directory Role

 On all `iklim-app-*` nodes:
 ```text
 /opt/iklimco/ssl
-/opt/iklimco/init
-/opt/iklimco/stacks
-/opt/iklimco/vault/data
 ```

-`/opt/iklimco/vault/data` is the host path volume of the Vault Raft node; it must be created separately on every app node. Swarm does not manage this directory as an overlay volume; if it is missing, the Vault container will not start.
+Vault data is managed by the `docker-stack-vault.yml` stack through Docker volumes. The app nodes need the local SSL directory because `cert-distributor` syncs certificates from StorageBox into `/opt/iklimco/ssl` for Vault.

 On DB nodes:
 ```text
 /opt/iklimco/db
 /opt/iklimco/backup
+/opt/iklimco/db/mongodb
+/opt/iklimco/db/postgresql
 ```

 ## StorageBox DAVFS Mount Role
@ -256,19 +265,22 @@ Applied to `iklim-app-*` nodes. Gitea Act Runner is installed on each app node a

 ## DB Stack Role

-Applied to `iklim-db-*` nodes. On each DB node, it creates `/opt/iklimco/db` and `/opt/iklimco/backup` directories, as well as a local reference directory for MongoDB. The actual production configuration, including node-specific `mongod.conf`, replica set auth key, and Patroni configurations, is set up on StorageBox at `/mnt/storagebox/db/mongodb-0X/config/` and `/mnt/storagebox/db/postgresql-0X/config/` in the `08-prod-db-cluster-kurulum.md` step. etcd data is stored on local Docker named volumes (not StorageBox).
+Applied to `iklim-db-*` nodes. On each DB node, it creates `/opt/iklimco/db`, `/opt/iklimco/backup`, `/opt/iklimco/db/mongodb`, and `/opt/iklimco/db/postgresql`. The production configuration, including node-specific `mongod.conf`, replica set auth key, and Patroni configurations, is deployed by the Ansible `db_stack` role to StorageBox at `/mnt/storagebox/db/mongodb-0X/config/` and `/mnt/storagebox/db/postgresql-0X/config/`. etcd data is stored on local Docker named volumes.

 ## DB Stack Env Variables

-Password variables required by the DB cluster stack (`docker-stack-db.prod.yml`) — `DATABASE_POSTGRES_ROOT_PASSWD`, `DATABASE_POSTGRES_REPLICATOR_PASSWORD`, `DATABASE_MONGODB_ROOT_PASSWD` — are stored in `prod/secrets/iklim.co/.env.secrets.shared` on StorageBox, alongside the other shared secrets. No separate file is needed.
+Password variables required by the prod infra stack (`docker-stack-infra_db-prod.yml`) — including `DATABASE_POSTGRES_ROOT_PASSWD`, `DATABASE_POSTGRES_REPLICATOR_PASSWORD`, `DATABASE_MONGODB_ROOT_PASSWD`, and `ETCD_ROOT_PASSWORD` — are stored in `prod/secrets/iklim.co/.env.secrets.shared` on StorageBox, alongside the other shared secrets. No separate file is needed.

 ## StorageBox Directory Structure

 The `storagebox` Ansible rolü `storagebox_managed_directories` (`group_vars/all/vars.yml`) aracılığıyla aşağıdaki dizinleri bootstrap sırasında **otomatik** oluşturur. Manüel adım gerekmez:

 - `/mnt/storagebox/ssl` → `SWAG_CERT_DIR`
- `/mnt/storagebox/swag/config` → `SWAG_CONFIG_DIR`
+- `/mnt/storagebox/swag`
+- `/mnt/storagebox/swag/dns-conf` → `SWAG_DNS_CONFIG_DIR`
 - `/mnt/storagebox/swag/site-confs` → `SWAG_SITE_CONFS_DIR`
+- `/mnt/storagebox/swag/proxy-confs` → `SWAG_PROXY_CONFS_DIR`
+- `/mnt/storagebox/swag/certbot`
 - `/mnt/storagebox/grafana/data` → `GRAFANA_DATA_DIR`
 - `/mnt/storagebox/precipitation/images`

@ -300,12 +312,12 @@ grep -n "swarm init\|swarm join" init/swarm-init.sh
 - 3 Swarm manager nodes appear as Leader/Reachable in `docker node ls`.
 - 3 DB nodes appear as Workers in `docker node ls`.
 - Manager quorum is provided: 3 managers, 1 loss tolerated.
- The `iklimco-net` overlay network exists.
+- The `iklimco-net` overlay network is created by the Swarm stack after `docker-stack-infra_db-prod.yml` deploy.
 - Node labels (`type=service`, `role=db`, `db-index=01/02/03`) are verified with inspect.
 - `swarm-init.sh` does not attempt init again in an active Swarm; it is idempotent.
 - `/mnt/storagebox` is mounted on every node.
- The `/opt/iklimco/vault/data` directory exists on every app node.
- The `ssl`, `swag/config`, `swag/site-confs`, `grafana/data`, and `precipitation/images` directories exist on StorageBox.
+- The `/opt/iklimco/ssl` directory exists on every app node.
+- The `db`, `ssl`, `swag`, `swag/dns-conf`, `swag/site-confs`, `swag/proxy-confs`, `swag/certbot`, `grafana/data`, and `precipitation/images` directories exist on StorageBox.
 - The Gitea Act Runner service is running on every app node.
- `/opt/iklimco/db` and `/opt/iklimco/backup` directories exist on DB nodes. Node-specific `mongod.conf` and other DB configurations are created on StorageBox (`/mnt/storagebox/db/...`) in the `08-prod-db-cluster-kurulum.md` step.
+- `/opt/iklimco/db` and `/opt/iklimco/backup` directories exist on DB nodes. Node-specific `mongod.conf` and other DB configurations are created on StorageBox (`/mnt/storagebox/db/...`) in the `08-prod-db-cluster-setup.md` step.
 - Public firewall allows only `22`, `80`, and `443` ingress.
--- a/setup/08-prod-db-cluster-kurulum.md
+++ b/setup/08-prod-db-cluster-kurulum.md
@ -27,7 +27,9 @@ iklim-db-03  (Swarm worker, 10.20.20.13)
    patroni-03   [Patroni + PostgreSQL — standby]
 ```

-DB containers discover each other through **overlay DNS aliases** (`mongodb-01`, `etcd-01`, `patroni-01`, etc.) on the shared `iklimco-net` overlay network. Each service publishes its port in `host` mode so replication traffic goes directly through the Hetzner private network while the overlay DNS resolves service names correctly. All containers are defined in the single `docker-stack-db.prod.yml` stack file at the repo root.
+DB containers discover each other through **overlay DNS aliases** (`mongodb-01`, `etcd-01`, `patroni-01`, etc.) on the shared `iklimco-net` overlay network. Patroni/PostgreSQL, MongoDB, and etcd are the DB/cluster services covered by this document; they publish their cluster ports in `host` mode so replication traffic goes directly through the Hetzner private network while overlay DNS resolves service names correctly.
+
+The current prod DB services are defined in the root `docker-stack-infra_db-prod.yml` stack file. That stack also contains non-DB infrastructure services such as Redis, Redis Sentinel, and RabbitMQ. Those services are intentionally different: they run on `node.labels.type == service` app/service nodes, do not publish host-mode ports in this stack, and communicate through the `iklimco-net` overlay network only. Do not generalize the DB host-mode rule to Redis or RabbitMQ.

 ## 1. Firewall Update

@ -145,6 +147,10 @@ terraform apply

 ## 2. Add DB Nodes to Swarm

+This is handled by `Environment_Infrastructure/ansible/prod/prod-bootstrap.yml` through the `swarm` role. The role initializes Swarm on `iklim-app-01`, joins `iklim-app-02/03` as managers, joins `iklim-db-01/02/03` as workers, and labels DB nodes.
+
+Manual equivalent, kept for troubleshooting only:
+
 **Swarm manager'lardan birinde** (iklim-app-01) join token al:

 ```bash
@ -157,19 +163,35 @@ docker swarm join-token worker
 docker swarm join --token <TOKEN> 10.20.10.11:2377
 ```

-Label the nodes **on iklim-app-01**:
+Label the nodes **on iklim-app-01**. In automation this is split into two phases:
+
+- the shared `swarm` role adds `role=db` to DB nodes;
+- the prod-specific `prod-bootstrap.yml` play adds `db-index=01/02/03`.
+
+Manual equivalent:

 ```bash
-docker node update --label-add role=db --label-add db-index=01 iklim-db-01
-docker node update --label-add role=db --label-add db-index=02 iklim-db-02
-docker node update --label-add role=db --label-add db-index=03 iklim-db-03
+docker node update --label-add role=db iklim-db-01
+docker node update --label-add role=db iklim-db-02
+docker node update --label-add role=db iklim-db-03
+
+docker node update --label-add db-index=01 iklim-db-01
+docker node update --label-add db-index=02 iklim-db-02
+docker node update --label-add db-index=03 iklim-db-03

 docker node ls
 ```

 ## 3. StorageBox Directory Structure

-DB data and logs are stored on **local Docker named volumes** (performance, WAL/compaction requirements). Only config files are placed on StorageBox. On each DB node, where `/mnt/storagebox` must already be mounted:
+DB data is stored on local DB-node paths prepared by Ansible:
+
+```text
+/opt/iklimco/db/mongodb
+/opt/iklimco/db/postgresql
+```
+
+Configuration files are placed on StorageBox. On each DB node, where `/mnt/storagebox` must already be mounted:

 ```bash
 # On iklim-db-01:
@ -185,7 +207,7 @@ mkdir -p /mnt/storagebox/db/mongodb-03/config
 mkdir -p /mnt/storagebox/db/postgresql-03/config
 ```

-Config files (`mongod.conf`, `patroni.yml`) are deployed by the Ansible `db_stack` role into these directories. Named Docker volumes (`mongodb-01-data`, `etcd-01-data`, `postgresql-01-data`, etc.) are created automatically by the stack deploy.
+Config files (`mongod.conf`, `patroni.yml`) and the MongoDB replica set key are deployed by the Ansible `db_stack` role into these directories. etcd uses Docker named volumes (`etcd-01-data`, `etcd-02-data`, `etcd-03-data`) from `docker-stack-infra_db-prod.yml`.

 ## 4. MongoDB Replica Set

@ -216,14 +238,18 @@ security:

 ### Replica Set Auth Key

-The **same** key file must exist on all DB nodes:
+The **same** key file must exist on all DB nodes. In the current production setup, this is automated by `ansible/prod/roles/db_stack/tasks/db_node.yml`:
+
+- `iklim-db-01` generates `/mnt/storagebox/db/mongodb-01/config/rs-auth.key` if it is missing.
+- the same key content is copied to `/mnt/storagebox/db/mongodb-02/config/rs-auth.key` and `/mnt/storagebox/db/mongodb-03/config/rs-auth.key`;
+- permissions are set to `0400`.
+
+Manual recovery equivalent, kept only for troubleshooting:

 ```bash
-# Create on iklim-db-01:
 openssl rand -base64 756 > /mnt/storagebox/db/mongodb-01/config/rs-auth.key
 chmod 400 /mnt/storagebox/db/mongodb-01/config/rs-auth.key

-# Copy the same content to the other nodes:
 cat /mnt/storagebox/db/mongodb-01/config/rs-auth.key \
  > /mnt/storagebox/db/mongodb-02/config/rs-auth.key
 cat /mnt/storagebox/db/mongodb-01/config/rs-auth.key \
@ -234,14 +260,16 @@ chmod 400 /mnt/storagebox/db/mongodb-0{2,3}/config/rs-auth.key

 ### Stack File — MongoDB

-MongoDB services are defined in `docker-stack-db.prod.yml` (repo root). Each service uses a named Docker volume for data and log, and a StorageBox bind mount for config:
+MongoDB services are defined in `docker-stack-infra_db-prod.yml` (repo root). Each service uses a local DB-node bind mount for data and a StorageBox bind mount for config:

 ```yaml
 mongodb-01:
-  image: mongo:8.3.2
+  image: ${IMAGE_MONGODB}
+  environment:
+    MONGO_INITDB_ROOT_USERNAME: "${DATABASE_MONGODB_ROOT_USER}"
+    MONGO_INITDB_ROOT_PASSWORD: "${DATABASE_MONGODB_ROOT_PASSWD}"
  volumes:
-    - mongodb-01-data:/data/db
-    - mongodb-01-log:/data/log
+    - /opt/iklimco/db/mongodb:/data/db
    - /mnt/storagebox/db/mongodb-01/config:/data/configdb
  networks:
    iklimco-net:
@ -260,11 +288,18 @@ mongodb-01:
        - node.hostname == iklim-db-01
 ```

-Volumes `mongodb-01-data`, `mongodb-01-log`, etc. are declared at the bottom of `docker-stack-db.prod.yml` and are created automatically on first deploy.
+The same pattern is repeated for `mongodb-02` and `mongodb-03`, with node-specific StorageBox config paths and placement constraints.

 ### Replica Set Initialization

-Run **once** after the stack is deployed:
+Replica set initialization is handled by the root prod workflow step `Initialize MongoDB Replica Set`. The workflow:
+
+1. Connects to the first host from `DATABASE_MONGODB_HOST`.
+2. Runs `rs.initiate()` if the replica set is uninitialized.
+3. Checks current members if the replica set already exists.
+4. Runs `rs.add()` through the primary if hosts from `DATABASE_MONGODB_HOST` are missing.
+
+Manual equivalent, kept for troubleshooting only:

 ```bash
 # On iklim-app-01 (overlay network erişimi için):
@ -293,7 +328,7 @@ Patroni coordinates PostgreSQL primary/standby roles through etcd. If the primar

 ### 5.1 Custom Image (Patroni + PostGIS)

-Patroni is installed on top of the `postgis/postgis:18-3.6` image. This image is pushed to Harbor and used in the stack.
+Patroni is installed on top of the `postgis/postgis:18-3.6` image. This image is pushed to Harbor and used in `docker-stack-infra_db-prod.yml` via `${CUSTOM_IMAGE_REGISTRY}${IMAGE_PATRONI}`.

 `build/patroni-postgis/Dockerfile`:

@ -335,13 +370,13 @@ docker push registry.tarla.io/iklimco/custom-patroni-postgis:18-3.6

 ### 5.2 etcd Cluster

-etcd services are defined in `docker-stack-db.prod.yml`. Each service uses a named Docker volume for data and has an overlay DNS alias. Environment variables reference peer URLs by alias, not by hardcoded IP:
+etcd services are defined in `docker-stack-infra_db-prod.yml`. Each service uses a named Docker volume for data and has an overlay DNS alias. Environment variables reference peer URLs by alias, not by hardcoded IP:

 ```yaml
 etcd-01:
-  image: bitnami/etcd:3
+  image: ${IMAGE_ETCD}
  environment:
-    ALLOW_NONE_AUTHENTICATION: "yes"
+    ALLOW_NONE_AUTHENTICATION: "no"
    ETCD_NAME: etcd-01
    ETCD_INITIAL_ADVERTISE_PEER_URLS: http://etcd-01:2380
    ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380
@ -350,6 +385,7 @@ etcd-01:
    ETCD_INITIAL_CLUSTER: "etcd-01=http://etcd-01:2380,etcd-02=http://etcd-02:2380,etcd-03=http://etcd-03:2380"
    ETCD_INITIAL_CLUSTER_STATE: new
    ETCD_INITIAL_CLUSTER_TOKEN: iklimco-etcd-prod
+    ETCD_ROOT_PASSWORD: "${ETCD_ROOT_PASSWORD}"
  volumes:
    - etcd-01-data:/bitnami/etcd/data
  networks:
@ -366,7 +402,7 @@ etcd-01:

 **APISIX etcd usage:** In prod, APISIX shares this etcd cluster with the `/apisix` prefix. Patroni uses the `/service/` prefix and APISIX uses the `/apisix/` prefix — no collision. The overlay DNS names (`etcd-01:2379`, `etcd-02:2379`, `etcd-03:2379`) are reachable from app nodes via the `iklimco-net` overlay. Therefore, the app subnet → DB nodes port 2379 firewall rule is mandatory; it was added in Section 1.

-**Important:** `ETCD_INITIAL_CLUSTER_STATE` must be `new` on the first deploy and `existing` on all later deploys. The deploy steps in Section 6 detect this automatically; no manual update is required.
+**Important:** `ETCD_INITIAL_CLUSTER_STATE` is currently defined in `docker-stack-infra_db-prod.yml`. When changing etcd cluster membership, do not blindly expand `ETCD_INITIAL_CLUSTER` on a running cluster; add members through etcd membership operations first.

 ### 5.3 Patroni Configuration

@ -447,17 +483,19 @@ For Node 02 and 03, only `name`, `restapi.connect_address`, and `postgresql.conn

 ### 5.4 Stack File — Patroni

-Patroni services are defined in `docker-stack-db.prod.yml`. Each service uses the custom image, a named Docker volume for data, a StorageBox bind mount for the config file, and overlay DNS aliases:
+Patroni services are defined in `docker-stack-infra_db-prod.yml`. Each service uses the custom image, a local DB-node bind mount for data, a StorageBox bind mount for the config file, and overlay DNS aliases:

 ```yaml
 patroni-01:
-  image: registry.tarla.io/iklimco/custom-patroni-postgis:18-3.6
+  image: ${CUSTOM_IMAGE_REGISTRY}${IMAGE_PATRONI}
  environment:
-    DATABASE_POSTGRES_ROOT_PASSWD: "${DATABASE_POSTGRES_ROOT_PASSWD}"
-    DATABASE_POSTGRES_REPLICATOR_PASSWORD: "${DATABASE_POSTGRES_REPLICATOR_PASSWORD}"
+    POSTGRES_USER: "${DATABASE_POSTGRES_ROOT_USER}"
+    POSTGRES_PASSWORD: "${DATABASE_POSTGRES_ROOT_PASSWD}"
+    REPLICATOR_PASSWORD: "${DATABASE_POSTGRES_REPLICATOR_PASSWORD}"
+    ETCD_ROOT_PASSWORD: "${ETCD_ROOT_PASSWORD}"
    TZ: "Europe/Istanbul"
  volumes:
-    - postgresql-01-data:/var/lib/postgresql/data
+    - /opt/iklimco/db/postgresql:/var/lib/postgresql/data
    - /mnt/storagebox/db/postgresql-01/config/patroni.yml:/etc/patroni/patroni.yml:ro
  networks:
    iklimco-net:
@ -480,7 +518,7 @@ patroni-01:
        - node.hostname == iklim-db-01
 ```

-Volumes `postgresql-01-data`, `postgresql-02-data`, `postgresql-03-data` are declared at the bottom of `docker-stack-db.prod.yml` and created automatically on first deploy.
+The same pattern is repeated for `patroni-02` and `patroni-03`, with node-specific StorageBox config paths and placement constraints.

 ### 5.5 Status Check

@ -508,11 +546,11 @@ docker exec -it $(docker ps -q -f name=iklimco_patroni-01 | head -1) \

 ## 6. Deploy

-All DB services (etcd, MongoDB, Patroni) are in the single `docker-stack-db.prod.yml` stack. Deploy from `iklim-app-01` in the repo working directory.
+All DB services (etcd, MongoDB, Patroni) are in the current root prod stack `docker-stack-infra_db-prod.yml`. Normal deployment is done by `.gitea/workflows/deploy-prod.yml`, not by running a separate DB stack manually.

 ### .env File

-DB stack password variables (`DATABASE_POSTGRES_ROOT_PASSWD`, `DATABASE_POSTGRES_REPLICATOR_PASSWORD`, `DATABASE_MONGODB_ROOT_PASSWD`) are stored in `prod/secrets/iklim.co/.env.secrets.shared` on StorageBox. Fetch it to `iklim-app-01` before deploy:
+DB stack password variables (`DATABASE_POSTGRES_ROOT_PASSWD`, `DATABASE_POSTGRES_REPLICATOR_PASSWORD`, `DATABASE_MONGODB_ROOT_PASSWD`, `ETCD_ROOT_PASSWORD`) are stored in `prod/secrets/iklim.co/.env.secrets.shared` on StorageBox. The workflow fetches this file automatically.

 ```bash
 scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.secrets.shared \
@ -522,44 +560,18 @@ chmod 600 /tmp/.env.secrets.shared

 ### Deploy Steps

+The root prod workflow deploys the stack with:
+
 ```bash
-# On iklim-app-01, in the repo working directory:
-set -a; . /tmp/.env.secrets.shared; set +a
-
-# Automatic ETCD_INITIAL_CLUSTER_STATE detection:
-DEPLOY_FILE="docker-stack-db.prod.yml"
-if docker service ls --filter name=iklimco_etcd-01 -q 2>/dev/null | grep -q .; then
-  echo "ℹ️ etcd services mevcut, 'existing' ile deploy ediliyor..."
-  DEPLOY_FILE=$(mktemp /tmp/docker-stack-db.XXXXXX.yml)
-  sed "s/ETCD_INITIAL_CLUSTER_STATE: new/ETCD_INITIAL_CLUSTER_STATE: existing/g" \
-    docker-stack-db.prod.yml > "$DEPLOY_FILE"
-else
-  echo "ℹ️ İlk deploy, 'new' state kullanılıyor..."
-fi
-
 docker stack deploy \
  --with-registry-auth \
-  -c "$DEPLOY_FILE" \
+  --resolve-image changed \
+  -c docker-stack-infra_db-prod.yml \
  iklimco
-
-[ "$DEPLOY_FILE" != "docker-stack-db.prod.yml" ] && rm -f "$DEPLOY_FILE"
-
-# Wait for etcd cluster to be ready:
-echo "⏳ etcd bekleniyor..."
-for i in $(seq 1 18); do
-  if docker run --rm --network iklimco-net alpine \
-      sh -c "wget -qO- http://etcd-01:2379/health 2>/dev/null | grep -q '\"health\":\"true\"'"; then
-    echo "✅ etcd ready"
-    break
-  fi
-  [ "$i" -eq 18 ] && echo "❌ etcd timeout" && exit 1
-  echo "  attempt $i/18 — 10s bekleniyor..."
-  sleep 10
-done
-
-docker stack services iklimco
 ```

+After the stack deploy, the workflow waits for etcd, initializes APISIX, initializes the MongoDB replica set, and runs PostgreSQL/MongoDB init scripts.
+
 ### DB Node Placement Check

 ```bash
@ -572,7 +584,7 @@ All tasks must run on the expected `iklim-db-*` nodes.

 ### MongoDB Replica Set Initialization

-Run once after the stack is deployed:
+Handled by the workflow. Manual form for troubleshooting:

 ```bash
 # From iklim-app-01 via overlay network:
@ -596,7 +608,7 @@ App containers connect to DB services through the `iklimco-net` overlay network

 ### MongoDB Replica Set Connection String

-Variables in `env-prod/.env`:
+Variables in StorageBox `prod/secrets/iklim.co/.env`:

 ```bash
 DATABASE_MONGODB_HOST=mongodb-01:27017,mongodb-02:27017,mongodb-03:27017
@ -613,7 +625,7 @@ mongodb://<user>:<password>@mongodb-01:27017,mongodb-02:27017,mongodb-03:27017/<

 ### PostgreSQL — Patroni

-Variables in `env-prod/.env`:
+Variables in StorageBox `prod/secrets/iklim.co/.env`:

 ```bash
 DATABASE_POSTGRES_HOST=patroni-01:5432,patroni-02:5432,patroni-03:5432
@ -647,8 +659,7 @@ curl -s http://patroni-01:8008/primary
 Prod cluster yapısında `pg-proxy` veya `mongo-proxy` **kullanılmaz**. Ofis bilgisayarından erişim için doğrudan DB subnet'i hedef alınır.

 ### WireGuard Ayarı
-Ofis bilgisayarındaki `.conf` dosyasında `AllowedIPs` güncellenmelidir:
-`AllowedIPs = 10.8.0.1/32, 10.20.20.0/24`
+Ofis bilgisayarındaki `.conf` dosyasında `AllowedIPs` güncellenmelidir: `AllowedIPs = 10.8.0.1/32, 10.20.20.0/24`

 ### Bağlantı Parametreleri (Multi-Host)
 Modern veritabanı araçları (DBeaver, Compass vb.) küme farkındalıklı bağlantı kurmalıdır:
@ -660,7 +671,7 @@ Modern veritabanı araçları (DBeaver, Compass vb.) küme farkındalıklı bağ

 ## Acceptance Criteria

- `docker stack services iklimco` — 9 services visible (etcd-01/02/03, mongodb-01/02/03, patroni-01/02/03), all `1/1`
+- `docker stack services iklimco` — etcd-01/02/03, mongodb-01/02/03, patroni-01/02/03 are visible and all target replicas are healthy
 - `docker service ps iklimco_patroni-01/02/03` — each task runs on its expected `iklim-db-*` node
 - `docker service ps iklimco_mongodb-01/02/03` — each task runs on its expected `iklim-db-*` node
 - `docker service ps iklimco_etcd-01/02/03` — each task runs on its expected `iklim-db-*` node
--- a/setup/09-prod-runner-ha-and-swarm.md
+++ b/setup/09-prod-runner-ha-and-swarm.md
@ -16,7 +16,7 @@ In this model, if any manager/runner is lost, the other runners can pick up pipe

 ## Runner Installation Model

-The runner will not run as a Docker container. There is no Docker socket mount.
+The runner will not run as a Docker container. It runs as a systemd service on the app nodes. Job containers start on Docker `bridge`; deploy workflows connect the job container to `iklimco-net` after the stack creates that network.

 Installation:

@ -33,7 +33,7 @@ If runner jobs use Docker CLI for deploy, the `gitea-runner` user needs access t
 Shared labels on all prod runners:

 ```text
-prod-runner
+prod-runner:docker://catthehacker/ubuntu:act-22.04
 ubuntu-24.04
 ```

@ -86,20 +86,19 @@ For the GoDaddy API key: https://developer.godaddy.com/keys — create a **Produ

 ### Gitea `PROD_FLOATING_IP` Variable

-For DNS automation, `PROD_FLOATING_IP` must be defined as a Gitea project variable. See the "Gitea Variable: PROD_FLOATING_IP" step in `06-prod-terraform-iaac.md`.
+For DNS automation, `PROD_FLOATING_IP` must be defined as a Gitea project variable. See the "Gitea Variable: PROD_FLOATING_IP" step in `06-prod-terraform-iac.md`.

 ### Docker Secrets

-Before the infra stack is deployed, the following Docker secrets must be created on `iklim-app-01`. These secrets are referenced by `docker-stack-infra.prod.yml`; if they do not exist, stack deploy fails.
+Before the infra stack is deployed, `rabbitmq_erlang_cookie` must exist as a Docker secret. The current prod workflow creates it in the `Create Infrastructure Docker Secrets` step if it is missing.

 ```bash
-# RabbitMQ Erlang cluster cookie; must be the same on all RabbitMQ nodes:
+# RabbitMQ Erlang cluster cookie; must be the same on all RabbitMQ nodes.
+# The workflow does this automatically if the secret is missing:
 openssl rand -hex 32 | docker secret create rabbitmq_erlang_cookie -
 ```

-> The `vault_unseal_key` secret is created after Vault is started for the first time; see `roadmap/prod-env/07-vault-raft-plan.md` Step 3. It is not required for the first infra stack deploy; it is waited for until the health check is triggered.
->
-> This secret is also used during Vault restarts triggered by cert-reloader: when `cert-reloader` detects a certificate change, it runs `docker service update --force iklimco_vault`; while Vault containers restart, they read from the `vault_unseal_key` Docker secret and automatically unseal. If the secret is missing, Vault remains sealed after every certificate renewal.
+> The `vault_unseal_key` secret is managed by `init/vault/vault-bootstrap.sh`. The bootstrap script creates a placeholder on first deploy, deploys `docker-stack-vault.yml`, initializes/unseals Vault, and rotates the secret to the real unseal key.

 Verify secrets:

@ -120,7 +119,7 @@ Before the deploy pipeline runs, the following template files must exist in the

 These files are created in the test environment (`test-env/04-swag-nginx-configs.md`); they are not created separately for prod. Template files are shared by both environments; prod-specific values are injected with environment variables during deploy.

-Verify that the `prod/secrets/iklim.co/.env.prod` file on StorageBox contains the following variables:
+Verify that the `prod/secrets/iklim.co/.env` file on StorageBox contains the following variables:

 ```bash
 API_SUBDOMAIN=api.iklim.co
@ -129,11 +128,12 @@ RABBITMQ_SUBDOMAIN=rabbitmq.iklim.co
 GRAFANA_SUBDOMAIN=grafana.iklim.co
 RESTRICTED_IPS="78.187.87.109/32,95.70.151.248/32"
 SWAG_CERT_DIR=/mnt/storagebox/ssl
-SWAG_CONFIG_DIR=/mnt/storagebox/swag/config
+SWAG_DNS_CONFIG_DIR=/mnt/storagebox/swag/dns-conf
 SWAG_SITE_CONFS_DIR=/mnt/storagebox/swag/site-confs
+SWAG_PROXY_CONFS_DIR=/mnt/storagebox/swag/proxy-confs
 ```

-The pipeline sources these variables and renders the template files into the `$SWAG_SITE_CONFS_DIR` (`/mnt/storagebox/swag/site-confs`) directory. Because StorageBox is mounted commonly on all app nodes, even if the configuration is created on a single runner, SWAG containers on other nodes access the same files. Detail: `roadmap/prod-env/04-swag-nginx-configs.md`.
+The pipeline sources these variables and renders the template files into the `$SWAG_SITE_CONFS_DIR` (`/mnt/storagebox/swag/site-confs`) directory. Because StorageBox is mounted commonly on all app nodes, even if the configuration is created on a single runner, SWAG containers on other nodes access the same files.

 ### APISIX Configuration

@ -194,27 +194,41 @@ All prod deploy workflows, including infra and microservices, must use the same
 | 2 | Prepare Folders | |
 | 3 | Set up SSH Key and Add to known_hosts | |
 | 4 | Update Apt Repository and Install Required Tools | `gettext tree jq` — `jq` is required for the GoDaddy DNS API |
-| 5 | Fetch Service Secret Files | Fetch `.env.secrets.*` from StorageBox |
-| 6 | Initialize Workspace | Fetch `.env` and `.env.secrets.shared` from StorageBox; run `init-infra-dev.sh` |
-| 7 | Upload Updated Secrets to Storagebox | |
-| 8 | Provision Vault AppRole IDs and Docker Secrets | |
-| 9 | Upload Updated Env to Storagebox | |
-| 10 | Prepare Init Files | Cert copy lines removed |
-| 11 | Initialize Docker Swarm | |
-| 12 | Docker Login to Harbor | |
-| 13 | **Update DNS Records** * | GoDaddy API; `api/apigw/rabbitmq/grafana` A records; idempotent |
-| 14 | **Prepare SWAG Directories** * | `$SWAG_CONFIG_DIR/dns-conf`; renders nginx conf templates; reloads running SWAG |
-| 15 | Bootstrap Vault TLS Placeholder | |
-| 16 | Deploy Swarm Stack | base + prod overlay together |
-| 17 | **Wait for etcd** * | Waits until Patroni etcd (`etcd-01:2379`) is healthy |
-| 18 | **Run APISIX Init** * | `SPRING_PROFILES_ACTIVE=prod`; idempotent; writes to etcd |
-| 19 | **Bootstrap SWAG Certificate** * | Waits for SWAG to obtain the cert; copies it to `SWAG_CERT_DIR` |
-| 20 | **Run Database Init Scripts** * | `postgresql`/`mongodb` Swarm VIP; SQL+JS init; idempotent |
-| 21 | Review Environment | |
+| 5 | Fetch Prod Env From Storagebox | Fetch `.env` and `.env.secrets.shared` |
+| 6 | Fetch Service Secret Files | Fetch `.env.secrets.<svc>` and `.env.secrets.swag` |
+| 7 | Prepare Database Init Files | Render PostgreSQL/MongoDB init templates |
+| 8 | Docker Login to Harbor | |
+| 9 | Prepare SWAG Directories | Render `dns-conf` and `site-confs`; reload node-local SWAG if present |
+| 10 | Bootstrap Vault TLS Placeholder | Creates temporary cert only if missing |
+| 11 | Create Infrastructure Docker Secrets | Creates `rabbitmq_erlang_cookie` if missing |
+| 12 | Deploy Swarm Stacks | `docker-stack-infra_db-prod.yml` |
+| 13 | Connect Runner to Overlay Network | Connects job container to `iklimco-net` |
+| 14 | Initialize Production Infrastructure | Runs `init-infra-prod.sh`; this triggers Vault bootstrap and RabbitMQ setup |
+| 15 | Wait for Infrastructure Services | Waits for `iklimco_vault` and `iklimco_rabbitmq` |
+| 16 | Provision Vault AppRole IDs and Docker Secrets | Downloads service `vault-files`, runs `init/provision-all-services.sh` |
+| 17 | Upload Updated Secrets to Storagebox | Uploads `.env.secrets.*` and `.env` |
+| 18 | Wait for etcd | Waits for etcd health |
+| 19 | Run APISIX Init | `SPRING_PROFILES_ACTIVE=prod` |
+| 20 | Bootstrap SWAG Certificate | Waits for SWAG and cert-reloader output in `SWAG_CERT_DIR` |
+| 21 | Initialize MongoDB Replica Set | `rs.initiate()` or missing-member `rs.add()` |
+| 22 | Run Database Init Scripts | Patroni primary + MongoDB replica set; SQL+JS init |
+| 23 | Update DNS Records | GoDaddy API; `api/apigw/rabbitmq/grafana` A records |
+| 24 | Review Environment | |

-### Removal of Cert Scp Lines
+### Stack Placement Boundary

-Lines removed from the `Initialize Workspace` step:
+`docker-stack-infra_db-prod.yml` is intentionally a mixed infrastructure stack. The DB/cluster services in that file are placed on DB nodes and expose host-mode cluster ports:
+
+- Patroni/PostgreSQL, MongoDB, and etcd run on `iklim-db-*` workers.
+
+The service-node infrastructure in the same file remains overlay-only unless a reverse proxy or explicit published port is defined by the stack:
+
+- Redis, Redis Sentinel, and RabbitMQ run on `node.labels.type == service` app/service nodes.
+- Redis and RabbitMQ must not be treated as DB-node host-mode services.
+
+### Historical Note: Removed Cert Scp Lines
+
+Older workflow versions copied certificate files manually in an `Initialize Workspace` step. That step no longer exists in the current root prod workflow. The removed lines are kept here only as a historical reference:

 ```yaml
 # REMOVED — manual cert copy with scp is no longer required:
@ -222,7 +236,7 @@ scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebo
 scp -P 23 ${{ vars.STORAGEBOX_USER }}@${{ vars.STORAGEBOX_USER }}.your-storagebox.de:prod/app/iklim.co/ssl/STAR.iklim.co_key.pem ./STAR.iklim.co_key.pem
 ```

-Line also removed from the `Prepare Init Files` step:
+This line was also removed from the old `Prepare Init Files` step:

 ```yaml
 # REMOVED:
@ -231,97 +245,55 @@ sudo cp STAR.iklim.co.full.crt STAR.iklim.co_key.pem /opt/iklimco/ssl/

 The certificate is now obtained by SWAG from Let's Encrypt and written to the `SWAG_CERT_DIR` (`/mnt/storagebox/ssl/`) directory in the `Bootstrap SWAG Certificate` step. Later renewals are handled automatically by cert-reloader.

-### Bootstrap SWAG Certificate (Step 19)
+### Bootstrap SWAG Certificate (Step 20)

-On the first deploy, SWAG obtains the Let's Encrypt certificate with the GoDaddy DNS-01 challenge. This step waits for SWAG to obtain the certificate, for up to 10 minutes, and then copies it to the `SWAG_CERT_DIR` directory:
+On the first deploy, SWAG obtains the Let's Encrypt certificate with the GoDaddy DNS-01 challenge. The current step waits for the Swarm `iklimco_swag` service to be running, then waits for `cert-reloader` to write `STAR.iklim.co.full.crt` to `SWAG_CERT_DIR`.

 ```yaml
 - name: Bootstrap SWAG Certificate
  run: |
    set -a; . ./.env; set +a
-    echo "Waiting for SWAG container to start..."
-    SWAG_CTR=""
-    for i in $(seq 1 24); do
-      SWAG_CTR=$(docker ps -q -f name=iklimco_swag 2>/dev/null | head -1)
-      [ -n "$SWAG_CTR" ] && break
-      sleep 10
-    done
-
-    if [ -z "$SWAG_CTR" ]; then
-      echo "❌ SWAG container did not start"
-      exit 1
-    fi
-
-    CERT_PATH="/config/etc/letsencrypt/live/iklim.co/fullchain.pem"
-    echo "Waiting for cert (up to 10 min)..."
-    for i in $(seq 1 20); do
-      if docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
-        echo "✅ Cert obtained"
-        break
-      fi
-      echo "  attempt $i/20 — waiting 30s..."
-      sleep 30
-    done
-
-    if ! docker exec "$SWAG_CTR" test -f "$CERT_PATH" 2>/dev/null; then
-      echo "❌ SWAG did not obtain cert. Logs:"
-      docker service logs iklimco_swag --tail 50
-      exit 1
-    fi
-
-    docker exec "$SWAG_CTR" cat "$CERT_PATH" | \
-      docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
-        sh -c "cat > /output/STAR.iklim.co.full.crt && chmod 644 /output/STAR.iklim.co.full.crt"
-    docker exec "$SWAG_CTR" cat "/config/etc/letsencrypt/live/iklim.co/privkey.pem" | \
-      docker run --rm -i -v "${SWAG_CERT_DIR}:/output" alpine \
-        sh -c "cat > /output/STAR.iklim.co_key.pem && chmod 644 /output/STAR.iklim.co_key.pem"
-    echo "✅ Cert bootstrapped to ${SWAG_CERT_DIR}/"
+    echo "Waiting for SWAG service..."
+    docker service ps iklimco_swag --filter 'desired-state=running'
+    echo "Waiting for cert-reloader output in ${SWAG_CERT_DIR}..."
+    docker run --rm -v "${SWAG_CERT_DIR}:/ssl:ro" alpine \
+      test -f /ssl/STAR.iklim.co.full.crt
  working-directory: /workspace/iklim.co
 ```

-After this step, certificate files exist inside `SWAG_CERT_DIR` (`/mnt/storagebox/ssl/`); Vault TLS reads these files. Later renewals are handled automatically by cert-reloader. When the pipeline runs again, this step only waits for the SWAG container to be ready; certificate issuance is managed by SWAG/cert-reloader within Let's Encrypt's 90-day cycle.
+After this step, certificate files exist inside `SWAG_CERT_DIR` (`/mnt/storagebox/ssl/`). `cert-distributor` syncs these files to node-local `/opt/iklimco/ssl`, where Vault reads them. Later renewals are handled automatically by SWAG, cert-reloader, and cert-distributor.

-### Run Database Init Scripts (Step 20)
+### Run Database Init Scripts (Step 22)

-PostgreSQL and MongoDB init scripts run through Swarm overlay DNS service names (`postgresql`, `mongodb`):
+PostgreSQL and MongoDB init scripts run after Patroni primary and MongoDB replica set readiness:

 ```yaml
 - name: Run Database Init Scripts
  run: |
    set -a; . ./.env; . ./.env.secrets.shared; set +a

-    echo "⏳ Waiting for PostgreSQL..."
-    until docker run --rm --network iklimco-net \
-      -e PGPASSWORD="${DATABASE_POSTGRES_ROOT_PASSWD}" \
-      postgis/postgis:18-3.6 \
-      pg_isready -h postgresql -U "${DATABASE_POSTGRES_ROOT_USER}" -q 2>/dev/null; do
-      sleep 5
-    done
+    PG_URI="postgresql://${DATABASE_POSTGRES_ROOT_USER}@${DATABASE_POSTGRES_HOST}/postgres?connect_timeout=5&target_session_attrs=read-write"
+    MONGO_URI="mongodb://${DATABASE_MONGODB_ROOT_USER}:${DATABASE_MONGODB_ROOT_PASSWD}@${DATABASE_MONGODB_HOST}/admin?${DATABASE_MONGODB_PARAMS}"
    for sql_file in $(ls ./init/postgresql/*.sql 2>/dev/null | sort); do
      echo "▶ $(basename "$sql_file")"
      docker run --rm -i --network iklimco-net \
        -e PGPASSWORD="${DATABASE_POSTGRES_ROOT_PASSWD}" \
        postgis/postgis:18-3.6 \
-        psql -h postgresql -U "${DATABASE_POSTGRES_ROOT_USER}" < "$sql_file"
+        psql "$PG_URI" < "$sql_file"
    done

-    echo "⏳ Waiting for MongoDB..."
-    until docker run --rm --network iklimco-net mongo:8.3.2 \
-      mongosh "mongodb://${DATABASE_MONGODB_ROOT_USER}:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb/admin" \
-      --eval "db.runCommand({ping:1})" --quiet 2>/dev/null; do
-      sleep 5
-    done
    for js_file in $(ls ./init/mongodb/*.js 2>/dev/null | sort); do
      echo "▶ $(basename "$js_file")"
-      docker run --rm -i --network iklimco-net mongo:8.3.2 \
-        mongosh "mongodb://${DATABASE_MONGODB_ROOT_USER}:${DATABASE_MONGODB_ROOT_PASSWD}@mongodb/admin" \
-        --quiet < "$js_file"
+      docker run --rm -i --network iklimco-net "${IMAGE_MONGODB}" \
+        sh -c 'cat > /tmp/init.js && mongosh "$MONGO_INIT_URI" --quiet --file /tmp/init.js' \
+        < "$js_file"
    done
    echo "✅ Database init scripts completed"
  working-directory: /workspace/iklim.co
 ```

- `postgresql` and `mongodb`: Swarm VIP service names, resolved on the `iklimco-net` overlay; Patroni primary automatic routing happens at VIP level
+- `DATABASE_POSTGRES_HOST`: multi-host Patroni target; the workflow uses `target_session_attrs=read-write` to reach the primary
+- `DATABASE_MONGODB_HOST`: MongoDB replica set host list
 - SQL files `./init/postgresql/*.sql` and JS files `./init/mongodb/*.js` are created in the `Prepare Init Files` step by the `init_postgresql`/`init_mongodb` functions in `common-functions-prod.sh`
 - Idempotent: `CREATE IF NOT EXISTS` / `createCollection` semantics; runs safely again on later deploys

@ -331,27 +303,19 @@ In prod, all 3 app nodes are manager + app worker, so services can be distribute

 ### Microservices

-Each microservice has two stack files:
+Prod microservice workflows do not rebuild application images. They read `deploy/prod.env`, promote the tested Harbor digest to a stable prod tag, and call `swarm_service_update` with `deploy/docker-stack-service.yml`.

-| File | Content | Environment |
-| --- | --- | --- |
-| `BE-<Service>/docker-stack-service.yml` | Base definitions, `replicas: 1` | Test + Prod |
-| `BE-<Service>/docker-stack-service.prod.yml` | `replicas: 3`, `max_replicas_per_node: 1` | Prod only |
-
-Prod deploy command:
+For first deploy, `swarm_service_update` exports `SERVICE_IMAGE` and runs:

 ```bash
-docker stack deploy \
-  -c BE-<Service>/docker-stack-service.yml \
-  -c BE-<Service>/docker-stack-service.prod.yml \
-  iklimco
+docker stack deploy --with-registry-auth -c deploy/docker-stack-service.yml iklimco
 ```

-`max_replicas_per_node: 1` is mandatory; without it, when the Swarm node count is lower than the replica count, Swarm places more than one replica on the same node.
+For existing services it performs `docker service update` with `--update-order start-first` and `--update-failure-action rollback`.

 ### Infra Services

-`docker-stack-infra.yml` (base) and `docker-stack-infra.prod.yml` (overlay) are deployed together. The overlay overrides services such as Vault, APISIX, RabbitMQ, and Redis Sentinel with `replicas: 3` and `max_replicas_per_node: 1`. Detail: `Environment_Infrastructure/roadmap/prod-env/03-infra-stack-changes.md`.
+The current prod infra stack is `docker-stack-infra_db-prod.yml`. Vault is not inside this stack; it is deployed separately by `vault-bootstrap.sh` using `docker-stack-vault.yml`.

 #### cert-reloader and Vault Auto-Unseal

@ -360,53 +324,28 @@ The `cert-reloader` sidecar service runs as `replicas: 1` inside the infra stack
 Certificate renewal flow:

 ```
-SWAG renews the certificate -> writes it to SWAG_CONFIG_DIR (/mnt/storagebox/swag/config)
+SWAG renews the certificate -> stores it inside the SWAG named volume
 cert-reloader detects the MD5 change
-  -> copies it to /mnt/storagebox/ssl/ directory  (common mount on all app nodes)
+  -> copies it to /mnt/storagebox/ssl/ directory  (StorageBox)
+cert-distributor syncs it to /opt/iklimco/ssl on service nodes
  -> runs docker service update --force iklimco_vault
 Vault (3 replicas) restarts
-  -> each instance reads the new certificate from the /mnt/storagebox/ssl/ mount
-  -> healthcheck checks sealed status every 30 seconds
-  -> if sealed: reads from the vault_unseal_key Docker secret and automatically unseals
+  -> each instance reads the new certificate from /opt/iklimco/ssl
+  -> entrypoint retry-unseal loop reads from the vault_unseal_key Docker secret and unseals
 ```

-The auto-unseal mechanism is provided by the Vault healthcheck inside `docker-stack-infra.yml`:
-
-```yaml
-healthcheck:
-  test:
-    - "CMD"
-    - "sh"
-    - "-c"
-    - >-
-      vault status -format=json 2>/dev/null | grep -q '"sealed":false' ||
-      vault operator unseal $$(cat /run/secrets/vault_unseal_key 2>/dev/null)
-  interval: 30s
-  timeout: 10s
-  start_period: 15s
-  retries: 5
-```
-
-The 3 replicas run their own healthchecks independently; all of them unseal separately. The certificate renewal -> restart -> auto-unseal chain requires no manual intervention. Detail: `roadmap/prod-env/06-cert-reloader.md`.
+The 3 Vault replicas run their own retry-unseal loop independently. The certificate renewal -> distribution -> restart -> unseal chain requires no manual intervention after bootstrap.

 #### Vault Raft Configuration

-Vault is defined as 3 replicas with Raft storage in the `docker-stack-infra.prod.yml` overlay:
+Vault is defined as 3 replicas with Raft storage in `docker-stack-vault.yml`:

 ```yaml
 vault:
-  environment:
-    VAULT_LOCAL_CONFIG: >-
-      {"api_addr":"https://vault.iklim.co:8200",
-       "cluster_addr":"https://{{ .Node.Hostname }}:8201",
-       "storage":{"raft":{"path":"/vault/file","node_id":"{{ .Node.Hostname }}"}},
-       "listener":[{"tcp":{"address":"0.0.0.0:8200",
-         "tls_cert_file":"/vault/certs/STAR.iklim.co.full.crt",
-         "tls_key_file":"/vault/certs/STAR.iklim.co_key.pem"}}],
-       "default_lease_ttl":"168h","max_lease_ttl":"720h","ui":true}
  volumes:
-    - /opt/iklimco/vault/data:/vault/file    # separate host path on each node — created with Ansible
-    - ${SWAG_CERT_DIR}:/vault/certs:ro       # StorageBox shared — all nodes see the same path
+    - vault-data-vl:/vault/file
+    - vault-logs-vl:/vault/logs
+    - /opt/iklimco/ssl:/vault/certs:ro
  deploy:
    mode: replicated
    replicas: 3
@ -416,59 +355,37 @@ vault:
        - node.labels.type == service
 ```

-`{{ .Node.Hostname }}` is a Docker Swarm Go template; it gives each Vault instance a unique `node_id` and `cluster_addr`. Because `/opt/iklimco/vault/data` is a host path volume, it is not an overlay volume; it must be created separately on each app node during Ansible bootstrap. See `07-prod-ansible-bootstrap.md` — Node Directory Role. Detail: `roadmap/prod-env/07-vault-raft-plan.md`.
+The Vault stack uses `vault-template-v2.json`, `vault_unseal_key`, and the `iklimco-net` external network. Bootstrap and unseal are handled by `init/vault/vault-bootstrap.sh`.

 ## Vault Raft Cluster Initial Setup

-After the infra stack is deployed for the first time, the Vault Raft cluster is initialized manually once. These steps are not repeated on every deploy; they are applied only during initial setup.
+Vault Raft cluster setup is no longer a manual post-deploy procedure. It is handled by `init/vault/vault-bootstrap.sh`, called through `init-infra-prod.sh` by the root prod workflow.

 ### Step 1 — Stack Deploy

-```bash
-docker stack deploy -c docker-stack-infra.yml -c docker-stack-infra.prod.yml iklimco
-```
+The bootstrap script deploys:

-3 Vault containers start. The first initialized node becomes the leader.
+```bash
+docker stack deploy --with-registry-auth -c docker-stack-vault.yml iklimco
+```

 ### Step 2 — Vault Initialize (iklim-app-01)

-```bash
-VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
-docker exec -it "$VAULT_CTR" vault operator init
-```
-
-Store the unseal keys and root token from the output securely. Save the unseal key as a Docker secret:
+The script runs `vault operator init -key-shares=1 -key-threshold=1` if Vault is not initialized. It stores bootstrap output under `/tmp/vault-bootstrap/main-vault-init.txt` during the run.

 ```bash
-echo -n "<unseal-key>" | docker secret create vault_unseal_key -
+echo "bootstrap" | docker secret create vault_unseal_key -
 ```

-> After this step, the `vault_unseal_key` secret exists. During later certificate renewals, cert-reloader restarts Vault; the healthcheck reads this secret and automatically unseals, so no manual intervention is required.
+Then it rotates `vault_unseal_key` to the real unseal key and unseals the leader and peers.

 ### Step 3 — Unseal the Leader

-```bash
-docker exec -it "$VAULT_CTR" vault operator unseal
-```
+No manual unseal command is required in the normal path.

 ### Step 4 — Join the Other Nodes to the Raft Cluster

-The Vault containers on `iklim-app-02` and `iklim-app-03` join the cluster:
-
-```bash
-docker exec -it <vault-on-iklim-app-02> vault operator raft join \
-  https://vault.iklim.co:8200
-
-docker exec -it <vault-on-iklim-app-03> vault operator raft join \
-  https://vault.iklim.co:8200
-```
-
-Each node is also unsealed after it joins:
-
-```bash
-docker exec -it <vault-on-iklim-app-02> vault operator unseal
-docker exec -it <vault-on-iklim-app-03> vault operator unseal
-```
+Peer join and peer unseal are handled by `vault-bootstrap.sh`.

 ### Step 5 — Verify the Cluster

@ -646,20 +563,20 @@ Expected: valid JSON weather response.
 - `rabbitmq_erlang_cookie` appears in `docker secret ls`.
 - The `ssl`, `swag/config`, `swag/site-confs`, `grafana/data`, and `precipitation/images` directories exist on StorageBox; see `07-prod-ansible-bootstrap.md` — StorageBox Directory Structure.
 - The `template/swag/site-confs/default.conf`, `api.conf.tpl`, `apigw.conf.tpl`, `rabbitmq.conf.tpl`, and `grafana.conf.tpl` template files exist in the repo.
- StorageBox `prod/secrets/iklim.co/.env.prod` has correct values for `API_SUBDOMAIN`, `APIGW_SUBDOMAIN`, `RABBITMQ_SUBDOMAIN`, `GRAFANA_SUBDOMAIN`, `RESTRICTED_IPS`, `SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, and `SWAG_SITE_CONFS_DIR`.
+- StorageBox `prod/secrets/iklim.co/.env` has correct values for `API_SUBDOMAIN`, `APIGW_SUBDOMAIN`, `RABBITMQ_SUBDOMAIN`, `GRAFANA_SUBDOMAIN`, `RESTRICTED_IPS`, `SWAG_CERT_DIR`, `SWAG_DNS_CONFIG_DIR`, `SWAG_SITE_CONFS_DIR`, and `SWAG_PROXY_CONFS_DIR`.
 - After the first deploy, `docker exec $(docker ps -q -f name=iklimco_swag) nginx -t` succeeds and returns `syntax is ok`.
 - The output of `cat /mnt/storagebox/swag/site-confs/api.conf | grep server_name` contains `server_name api.iklim.co;`.
 - The `ssls/1` PUT block does not exist inside `init/apisix-core/init.sh`.
 - The `registry.tarla.io/iklimco/custom-apisix:3.12.0` image exists in Harbor and its `config.yaml` contains `real_ip_header`, `real_ip_recursive`, and `set_real_ip_from` (covering `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`) configuration.
 - After the first deploy, real client IP appears in APISIX access logs, not the SWAG overlay IP: `docker exec $(docker ps -q -f name=iklimco_apisix | head -1) tail -5 /usr/local/apisix/logs/access.log`
 - `docker service ps iklimco_cert-reloader` shows that the service is running.
- `docker service ls` does not contain `iklimco_etcd`, `iklimco_postgresql`, `iklimco_mongodb`, `iklimco_pg-proxy`, or `iklimco_mongo-proxy`; they are removed by the post-deploy step in `deploy-prod.yml` (base stack services superseded by the `iklim-db` stack or deprecated in prod).
+- `docker service ls` contains the current prod infra services from `docker-stack-infra_db-prod.yml` and the separate `iklimco_vault` service from `docker-stack-vault.yml`; deprecated base-stack services such as `iklimco_postgresql`, `iklimco_mongodb`, `iklimco_pg-proxy`, and `iklimco_mongo-proxy` are not present.
 - The output of `docker service logs iklimco_cert-reloader --tail 20` contains `[cert-reloader] started` and has no error lines.
 - The `notAfter` date of the Vault TLS endpoint certificate matches `/mnt/storagebox/ssl/STAR.iklim.co.full.crt`: `docker exec $(docker ps -q -f name=iklimco_vault | head -1) sh -c 'echo | openssl s_client -connect vault.iklim.co:8200 2>/dev/null | openssl x509 -noout -dates'`
 - `vault operator raft list-peers` returns 3 peers: 1 leader, 2 followers.
 - The `vault_unseal_key` Docker secret exists and appears in `docker secret ls`.
 - 3 Vault containers are not sealed: `docker exec $(docker ps -q -f name=iklimco_vault | head -1) vault status | grep Sealed` -> `Sealed  false`.
- The first deploy pipeline successfully completes all 21 steps; the `Review Environment` step succeeds.
+- The first deploy pipeline successfully completes all current root workflow steps; the `Review Environment` step succeeds.
 - After the `Bootstrap SWAG Certificate` step, `ls /mnt/storagebox/ssl/` -> `STAR.iklim.co.full.crt` and `STAR.iklim.co_key.pem` exist.
 - The `Run Database Init Scripts` step completes without error; PostgreSQL and MongoDB are healthy and init scripts are applied.
 - In the output of `docker service ls --filter label=project=co.iklim`, all infra services show `X/X`.