- Updated roadmap (03-infra-stack-changes.md) to deprecate database proxies in prod. - Detailed direct subnet access via WireGuard for production developers. - Provided multi-host connection parameters for Patroni and MongoDB Replica Sets in setup guide (08-prod-db-cluster-kurulum.md). - Added environment comparison table to developer access guide.
5.8 KiB
00 - General Roadmap
This file is the main context for agents that will set up the test/prod infrastructure on Hetzner Cloud with Terraform and Ansible in the Environment_Infrastructure repo. Each phase file is written to be self-sufficient; nevertheless, this document is the general decision record.
Goal
The Iklim.co infrastructure will be set up on two separate Hetzner Cloud Projects:
testHetzner Cloud ProjectprodHetzner Cloud Project
This separation is considered mandatory. API tokens, networks, firewalls, placement groups, servers, costs, and accidental deletion risks are separated by environment.
Terraform and Ansible Responsibility Boundary
Terraform creates only IaaS resources:
- Hetzner Cloud server
- Private network and subnet
- Firewall
- SSH key
- Placement group
- Optional volume, floating IP, load balancer, or DNS record
- Ansible inventory output
Ansible prepares the created Linux machines:
- Linux base packages
- Security hardening
- Docker Engine installation
- Docker Swarm init/join
- Gitea Actions
act_runnersystemd installation - Shared directories and deploy prerequisites
Docker, Swarm, runner, or application deployment will not be done inside Terraform. Hetzner Cloud resources will not be created inside Ansible.
Environment Topologies
Test
Minimum topology for the test environment:
| Node | Role | Note |
|---|---|---|
iklim-app-01 |
Swarm manager + app worker + Gitea runner | CI/CD test deploy runs through this node |
iklim-db-01 |
DB node | DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD |
The test DB setup is brought only up to machine and OS preparation with Terraform/Ansible. PostgreSQL/MongoDB cluster installation is outside this phase.
Prod
HA topology for the prod environment:
| Node group | Count | Role |
|---|---|---|
iklim-app-* |
3 | Each one is a Swarm manager + app worker |
iklim-db-* |
3 | DB cluster nodes |
Prod DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD. Terraform prepares the DB machines and network/firewall rules; Ansible installs OS hardening and base dependencies.
Public Port Policy
Ports open to the public internet are only:
22/tcpSSH, only from admin IP/CIDR sources80/tcpHTTP443/tcpHTTPS
8200/tcp Vault will not be opened to the public internet. Vault must be reachable only from the private network or Docker overlay.
docker-stack-infra.yml has been aligned with this policy: only the SWAG service publishes ports 80/443; all other services such as Vault, APISIX, RabbitMQ, Prometheus, and Grafana are reachable only through the iklimco-net overlay.
Private Network Policy
The detailed matrix of ports that must be opened inside the private network is in 01-private-network-port-matrisi.md. Agents must treat that file as the source when writing firewall or Ansible UFW rules.
Gitea Actions Runner Decision
act_runner will not run as a Docker container, and the Docker socket will not be mounted into a container.
Preferred installation:
act_runneris installed as a Linux systemd service.- A separate
gitea-runneruser is created for the runner. - CI/CD jobs can create containers when needed; for this, the runner host needs Docker CLI/daemon access.
- Because Docker group membership grants permissions close to root level, only trusted Gitea repos/jobs should use these runner labels.
For prod HA, act_runner will be installed not on a single machine but on all 3 Swarm manager nodes. This allows pipelines to continue when one manager/runner is lost. Runner labels must be both shared and node-specific:
- Shared:
prod-runner - Node specific:
iklim-app-01,iklim-app-02,iklim-app-03
For test, a single runner is enough:
- Shared:
test-runner - Node specific:
iklim-app-01
Deploy Serialization Decision
Because of the 3-runner HA model in prod, multiple deploy jobs can run at the same time. Gitea Actions concurrency is used to prevent concurrent deploys; a StorageBox-based lock mechanism is not required.
concurrency:
group: prod-deploy
cancel-in-progress: false
With cancel-in-progress: false, a new run in the same group is queued by Gitea until the previous one finishes; it appears as "queued" in the UI and is not shown as an error. All prod deploy workflows, including infrastructure and microservices, must use the same group: prod-deploy value so infra deploy and microservice deploy cannot overlap.
Hetzner Physical Host Separation
Hetzner Cloud does not allow direct cabinet selection. Placement Group is used for the requirement of avoiding the same physical host. A placement group of type spread aims to place the cloud servers in the group on different physical hosts.
Constraints:
- A spread placement group reduces the impact of a single physical host failure.
- It does not guarantee protection against a wider failure inside the same datacenter or location.
- For location-level disaster recovery, a different location/region distribution must be designed later.
- According to Hetzner documentation, there is a maximum limit of 10 servers per spread placement group.
At least two placement groups are recommended for prod:
iklim-prod-app-spread: 3 Swarm manager/app nodesiklim-prod-db-spread: 3 DB nodes
Optional for test:
iklim-test-spread:iklim-app-01andiklim-db-01
Sources:
- Hetzner Terraform provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest
- Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/
- Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview
- Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview
- Docker Swarm overlay ports: https://docs.docker.com/engine/network/drivers/overlay/
- Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner