Environment_Infrastructure/setup/00-genel-yol-haritasi.md
Murat ÖZDEMİR 8780c7c05e docs(db): implement direct cluster access strategy for production
- Updated roadmap (03-infra-stack-changes.md) to deprecate database proxies in prod.
- Detailed direct subnet access via WireGuard for production developers.
- Provided multi-host connection parameters for Patroni and MongoDB Replica Sets in setup guide (08-prod-db-cluster-kurulum.md).
- Added environment comparison table to developer access guide.
2026-05-18 14:25:26 +03:00

138 lines
5.8 KiB
Markdown

# 00 - General Roadmap
This file is the main context for agents that will set up the test/prod infrastructure on Hetzner Cloud with Terraform and Ansible in the `Environment_Infrastructure` repo. Each phase file is written to be self-sufficient; nevertheless, this document is the general decision record.
## Goal
The Iklim.co infrastructure will be set up on two separate Hetzner Cloud Projects:
- `test` Hetzner Cloud Project
- `prod` Hetzner Cloud Project
This separation is considered mandatory. API tokens, networks, firewalls, placement groups, servers, costs, and accidental deletion risks are separated by environment.
## Terraform and Ansible Responsibility Boundary
Terraform creates only IaaS resources:
- Hetzner Cloud server
- Private network and subnet
- Firewall
- SSH key
- Placement group
- Optional volume, floating IP, load balancer, or DNS record
- Ansible inventory output
Ansible prepares the created Linux machines:
- Linux base packages
- Security hardening
- Docker Engine installation
- Docker Swarm init/join
- Gitea Actions `act_runner` systemd installation
- Shared directories and deploy prerequisites
Docker, Swarm, runner, or application deployment will not be done inside Terraform. Hetzner Cloud resources will not be created inside Ansible.
## Environment Topologies
### Test
Minimum topology for the test environment:
| Node | Role | Note |
| --- | --- | --- |
| `iklim-app-01` | Swarm manager + app worker + Gitea runner | CI/CD test deploy runs through this node |
| `iklim-db-01` | DB node | DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD |
The test DB setup is brought only up to machine and OS preparation with Terraform/Ansible. PostgreSQL/MongoDB cluster installation is outside this phase.
### Prod
HA topology for the prod environment:
| Node group | Count | Role |
| --- | ---: | --- |
| `iklim-app-*` | 3 | Each one is a Swarm manager + app worker |
| `iklim-db-*` | 3 | DB cluster nodes |
Prod DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD. Terraform prepares the DB machines and network/firewall rules; Ansible installs OS hardening and base dependencies.
## Public Port Policy
Ports open to the public internet are only:
- `22/tcp` SSH, only from admin IP/CIDR sources
- `80/tcp` HTTP
- `443/tcp` HTTPS
`8200/tcp` Vault will not be opened to the public internet. Vault must be reachable only from the private network or Docker overlay.
`docker-stack-infra.yml` has been aligned with this policy: only the SWAG service publishes ports 80/443; all other services such as Vault, APISIX, RabbitMQ, Prometheus, and Grafana are reachable only through the `iklimco-net` overlay.
## Private Network Policy
The detailed matrix of ports that must be opened inside the private network is in `01-private-network-port-matrisi.md`. Agents must treat that file as the source when writing firewall or Ansible UFW rules.
## Gitea Actions Runner Decision
`act_runner` will not run as a Docker container, and the Docker socket will not be mounted into a container.
Preferred installation:
- `act_runner` is installed as a Linux systemd service.
- A separate `gitea-runner` user is created for the runner.
- CI/CD jobs can create containers when needed; for this, the runner host needs Docker CLI/daemon access.
- Because Docker group membership grants permissions close to root level, only trusted Gitea repos/jobs should use these runner labels.
For prod HA, `act_runner` will be installed not on a single machine but on all 3 Swarm manager nodes. This allows pipelines to continue when one manager/runner is lost. Runner labels must be both shared and node-specific:
- Shared: `prod-runner`
- Node specific: `iklim-app-01`, `iklim-app-02`, `iklim-app-03`
For test, a single runner is enough:
- Shared: `test-runner`
- Node specific: `iklim-app-01`
## Deploy Serialization Decision
Because of the 3-runner HA model in prod, multiple deploy jobs can run at the same time. Gitea Actions `concurrency` is used to prevent concurrent deploys; a StorageBox-based lock mechanism is not required.
```yaml
concurrency:
group: prod-deploy
cancel-in-progress: false
```
With `cancel-in-progress: false`, a new run in the same group is queued by Gitea until the previous one finishes; it appears as "queued" in the UI and is not shown as an error. All prod deploy workflows, including infrastructure and microservices, must use the same `group: prod-deploy` value so infra deploy and microservice deploy cannot overlap.
## Hetzner Physical Host Separation
Hetzner Cloud does not allow direct cabinet selection. `Placement Group` is used for the requirement of avoiding the same physical host. A placement group of type `spread` aims to place the cloud servers in the group on different physical hosts.
Constraints:
- A spread placement group reduces the impact of a single physical host failure.
- It does not guarantee protection against a wider failure inside the same datacenter or location.
- For location-level disaster recovery, a different location/region distribution must be designed later.
- According to Hetzner documentation, there is a maximum limit of 10 servers per spread placement group.
At least two placement groups are recommended for prod:
- `iklim-prod-app-spread`: 3 Swarm manager/app nodes
- `iklim-prod-db-spread`: 3 DB nodes
Optional for test:
- `iklim-test-spread`: `iklim-app-01` and `iklim-db-01`
Sources:
- Hetzner Terraform provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest
- Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/
- Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview
- Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview
- Docker Swarm overlay ports: https://docs.docker.com/engine/network/drivers/overlay/
- Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner