Environment_Infrastructure/setup/00-genel-yol-haritasi.md
Murat ÖZDEMİR 8780c7c05e docs(db): implement direct cluster access strategy for production
- Updated roadmap (03-infra-stack-changes.md) to deprecate database proxies in prod.
- Detailed direct subnet access via WireGuard for production developers.
- Provided multi-host connection parameters for Patroni and MongoDB Replica Sets in setup guide (08-prod-db-cluster-kurulum.md).
- Added environment comparison table to developer access guide.
2026-05-18 14:25:26 +03:00

5.8 KiB

00 - General Roadmap

This file is the main context for agents that will set up the test/prod infrastructure on Hetzner Cloud with Terraform and Ansible in the Environment_Infrastructure repo. Each phase file is written to be self-sufficient; nevertheless, this document is the general decision record.

Goal

The Iklim.co infrastructure will be set up on two separate Hetzner Cloud Projects:

  • test Hetzner Cloud Project
  • prod Hetzner Cloud Project

This separation is considered mandatory. API tokens, networks, firewalls, placement groups, servers, costs, and accidental deletion risks are separated by environment.

Terraform and Ansible Responsibility Boundary

Terraform creates only IaaS resources:

  • Hetzner Cloud server
  • Private network and subnet
  • Firewall
  • SSH key
  • Placement group
  • Optional volume, floating IP, load balancer, or DNS record
  • Ansible inventory output

Ansible prepares the created Linux machines:

  • Linux base packages
  • Security hardening
  • Docker Engine installation
  • Docker Swarm init/join
  • Gitea Actions act_runner systemd installation
  • Shared directories and deploy prerequisites

Docker, Swarm, runner, or application deployment will not be done inside Terraform. Hetzner Cloud resources will not be created inside Ansible.

Environment Topologies

Test

Minimum topology for the test environment:

Node Role Note
iklim-app-01 Swarm manager + app worker + Gitea runner CI/CD test deploy runs through this node
iklim-db-01 DB node DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD

The test DB setup is brought only up to machine and OS preparation with Terraform/Ansible. PostgreSQL/MongoDB cluster installation is outside this phase.

Prod

HA topology for the prod environment:

Node group Count Role
iklim-app-* 3 Each one is a Swarm manager + app worker
iklim-db-* 3 DB cluster nodes

Prod DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD. Terraform prepares the DB machines and network/firewall rules; Ansible installs OS hardening and base dependencies.

Public Port Policy

Ports open to the public internet are only:

  • 22/tcp SSH, only from admin IP/CIDR sources
  • 80/tcp HTTP
  • 443/tcp HTTPS

8200/tcp Vault will not be opened to the public internet. Vault must be reachable only from the private network or Docker overlay.

docker-stack-infra.yml has been aligned with this policy: only the SWAG service publishes ports 80/443; all other services such as Vault, APISIX, RabbitMQ, Prometheus, and Grafana are reachable only through the iklimco-net overlay.

Private Network Policy

The detailed matrix of ports that must be opened inside the private network is in 01-private-network-port-matrisi.md. Agents must treat that file as the source when writing firewall or Ansible UFW rules.

Gitea Actions Runner Decision

act_runner will not run as a Docker container, and the Docker socket will not be mounted into a container.

Preferred installation:

  • act_runner is installed as a Linux systemd service.
  • A separate gitea-runner user is created for the runner.
  • CI/CD jobs can create containers when needed; for this, the runner host needs Docker CLI/daemon access.
  • Because Docker group membership grants permissions close to root level, only trusted Gitea repos/jobs should use these runner labels.

For prod HA, act_runner will be installed not on a single machine but on all 3 Swarm manager nodes. This allows pipelines to continue when one manager/runner is lost. Runner labels must be both shared and node-specific:

  • Shared: prod-runner
  • Node specific: iklim-app-01, iklim-app-02, iklim-app-03

For test, a single runner is enough:

  • Shared: test-runner
  • Node specific: iklim-app-01

Deploy Serialization Decision

Because of the 3-runner HA model in prod, multiple deploy jobs can run at the same time. Gitea Actions concurrency is used to prevent concurrent deploys; a StorageBox-based lock mechanism is not required.

concurrency:
  group: prod-deploy
  cancel-in-progress: false

With cancel-in-progress: false, a new run in the same group is queued by Gitea until the previous one finishes; it appears as "queued" in the UI and is not shown as an error. All prod deploy workflows, including infrastructure and microservices, must use the same group: prod-deploy value so infra deploy and microservice deploy cannot overlap.

Hetzner Physical Host Separation

Hetzner Cloud does not allow direct cabinet selection. Placement Group is used for the requirement of avoiding the same physical host. A placement group of type spread aims to place the cloud servers in the group on different physical hosts.

Constraints:

  • A spread placement group reduces the impact of a single physical host failure.
  • It does not guarantee protection against a wider failure inside the same datacenter or location.
  • For location-level disaster recovery, a different location/region distribution must be designed later.
  • According to Hetzner documentation, there is a maximum limit of 10 servers per spread placement group.

At least two placement groups are recommended for prod:

  • iklim-prod-app-spread: 3 Swarm manager/app nodes
  • iklim-prod-db-spread: 3 DB nodes

Optional for test:

  • iklim-test-spread: iklim-app-01 and iklim-db-01

Sources: