# 00 - General Roadmap This file is the main context for agents that will set up the test/prod infrastructure on Hetzner Cloud with Terraform and Ansible in the `Environment_Infrastructure` repo. Each phase file is written to be self-sufficient; nevertheless, this document is the general decision record. ## Goal The Iklim.co infrastructure will be set up on two separate Hetzner Cloud Projects: - `test` Hetzner Cloud Project - `prod` Hetzner Cloud Project This separation is considered mandatory. API tokens, networks, firewalls, placement groups, servers, costs, and accidental deletion risks are separated by environment. ## Terraform and Ansible Responsibility Boundary Terraform creates only IaaS resources: - Hetzner Cloud server - Private network and subnet - Firewall - SSH key - Placement group - Optional volume, floating IP, load balancer, or DNS record - Ansible inventory output Ansible prepares the created Linux machines: - Linux base packages - Security hardening - Docker Engine installation - Docker Swarm init/join - Gitea Actions `act_runner` systemd installation - Shared directories and deploy prerequisites Docker, Swarm, runner, or application deployment will not be done inside Terraform. Hetzner Cloud resources will not be created inside Ansible. ## Environment Topologies ### Test Minimum topology for the test environment: | Node | Role | Note | | --- | --- | --- | | `iklim-app-01` | Swarm manager + app worker + Gitea runner | CI/CD test deploy runs through this node | | `iklim-db-01` | DB node / Swarm worker | DB host prerequisites are prepared by Ansible; DB services are deployed as Swarm services by the environment stack/pipeline | The test DB setup is brought up to OS, Docker, Swarm worker, config directory, and WireGuard preparation with Terraform/Ansible. PostgreSQL/MongoDB runtime services are not installed directly on the OS; they run as Docker Swarm services. ### Prod HA topology for the prod environment: | Node group | Count | Role | | --- | ---: | --- | | `iklim-app-*` | 3 | Each one is a Swarm manager + app worker | | `iklim-db-*` | 3 | DB cluster nodes | Prod DB host prerequisites are prepared by Terraform/Ansible. Runtime DB services are part of the current prod Swarm stack: etcd, Patroni/PostgreSQL, and MongoDB replica set are deployed by the prod root pipeline through `docker-stack-infra_db-prod.yml`. ## Public Port Policy Ports open to the public internet are normally only: - `22/tcp` SSH, only from admin IP/CIDR sources - `80/tcp` HTTP - `443/tcp` HTTPS Test has one explicit exception: `51820/udp` is opened on the DB node for WireGuard VPN, authenticated cryptographically. Prod currently does not expose `51820/udp` in Terraform. `8200/tcp` Vault will not be opened to the public internet. Vault must be reachable only from the private network or Docker overlay. Current prod stack behavior is aligned with this policy: `docker-stack-infra_db-prod.yml` publishes public traffic through SWAG on 80/443. Vault is deployed separately by `vault-bootstrap.sh` using `docker-stack-vault.yml`; it is not publicly exposed. ## Private Network Policy The detailed matrix of ports that must be opened inside the private network is in `01-private-network-port-matrix.md`. Agents must treat that file as the source when writing Terraform Hetzner firewall rules and Ansible `firewalld` rules. ## Gitea Actions Runner Decision `act_runner` will not run as a Docker container, and the Docker socket will not be mounted into a container. Preferred installation: - `act_runner` is installed as a Linux systemd service. - A separate `gitea-runner` user is created for the runner. - CI/CD jobs can create containers when needed; for this, the runner host needs Docker CLI/daemon access. - Because Docker group membership grants permissions close to root level, only trusted Gitea repos/jobs should use these runner labels. For prod HA, `act_runner` will be installed not on a single machine but on all 3 Swarm manager nodes. This allows pipelines to continue when one manager/runner is lost. Runner labels must be both shared and node-specific: - Shared: `prod-runner` - Node specific: `iklim-app-01`, `iklim-app-02`, `iklim-app-03` For test, a single runner is enough: - Shared: `test-runner` - Node specific: `iklim-app-01` ## Deploy Serialization Decision Because of the 3-runner HA model in prod, multiple deploy jobs can run at the same time. Gitea Actions `concurrency` is used to prevent concurrent deploys; a StorageBox-based lock mechanism is not required. ```yaml concurrency: group: prod-deploy cancel-in-progress: false ``` With `cancel-in-progress: false`, a new run in the same group is queued by Gitea until the previous one finishes; it appears as "queued" in the UI and is not shown as an error. All prod deploy workflows, including infrastructure and microservices, must use the same `group: prod-deploy` value so infra deploy and microservice deploy cannot overlap. ## Hetzner Physical Host Separation Hetzner Cloud does not allow direct cabinet selection. `Placement Group` is used for the requirement of avoiding the same physical host. A placement group of type `spread` aims to place the cloud servers in the group on different physical hosts. Constraints: - A spread placement group reduces the impact of a single physical host failure. - It does not guarantee protection against a wider failure inside the same datacenter or location. - For location-level disaster recovery, a different location/region distribution must be designed later. - According to Hetzner documentation, there is a maximum limit of 10 servers per spread placement group. At least two placement groups are recommended for prod: - `iklim-prod-app-spread`: 3 Swarm manager/app nodes - `iklim-prod-db-spread`: 3 DB nodes Optional for test: - `iklim-test-spread`: `iklim-app-01` and `iklim-db-01` Sources: - Hetzner Terraform provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest - Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/ - Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview - Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview - Docker Swarm overlay ports: https://docs.docker.com/engine/network/drivers/overlay/ - Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner