- Updated roadmap (03-infra-stack-changes.md) to deprecate database proxies in prod. - Detailed direct subnet access via WireGuard for production developers. - Provided multi-host connection parameters for Patroni and MongoDB Replica Sets in setup guide (08-prod-db-cluster-kurulum.md). - Added environment comparison table to developer access guide.
138 lines
5.8 KiB
Markdown
138 lines
5.8 KiB
Markdown
# 00 - General Roadmap
|
|
|
|
This file is the main context for agents that will set up the test/prod infrastructure on Hetzner Cloud with Terraform and Ansible in the `Environment_Infrastructure` repo. Each phase file is written to be self-sufficient; nevertheless, this document is the general decision record.
|
|
|
|
## Goal
|
|
|
|
The Iklim.co infrastructure will be set up on two separate Hetzner Cloud Projects:
|
|
|
|
- `test` Hetzner Cloud Project
|
|
- `prod` Hetzner Cloud Project
|
|
|
|
This separation is considered mandatory. API tokens, networks, firewalls, placement groups, servers, costs, and accidental deletion risks are separated by environment.
|
|
|
|
## Terraform and Ansible Responsibility Boundary
|
|
|
|
Terraform creates only IaaS resources:
|
|
|
|
- Hetzner Cloud server
|
|
- Private network and subnet
|
|
- Firewall
|
|
- SSH key
|
|
- Placement group
|
|
- Optional volume, floating IP, load balancer, or DNS record
|
|
- Ansible inventory output
|
|
|
|
Ansible prepares the created Linux machines:
|
|
|
|
- Linux base packages
|
|
- Security hardening
|
|
- Docker Engine installation
|
|
- Docker Swarm init/join
|
|
- Gitea Actions `act_runner` systemd installation
|
|
- Shared directories and deploy prerequisites
|
|
|
|
Docker, Swarm, runner, or application deployment will not be done inside Terraform. Hetzner Cloud resources will not be created inside Ansible.
|
|
|
|
## Environment Topologies
|
|
|
|
### Test
|
|
|
|
Minimum topology for the test environment:
|
|
|
|
| Node | Role | Note |
|
|
| --- | --- | --- |
|
|
| `iklim-app-01` | Swarm manager + app worker + Gitea runner | CI/CD test deploy runs through this node |
|
|
| `iklim-db-01` | DB node | DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD |
|
|
|
|
The test DB setup is brought only up to machine and OS preparation with Terraform/Ansible. PostgreSQL/MongoDB cluster installation is outside this phase.
|
|
|
|
### Prod
|
|
|
|
HA topology for the prod environment:
|
|
|
|
| Node group | Count | Role |
|
|
| --- | ---: | --- |
|
|
| `iklim-app-*` | 3 | Each one is a Swarm manager + app worker |
|
|
| `iklim-db-*` | 3 | DB cluster nodes |
|
|
|
|
Prod DB infrastructure will be installed manually; it will not be installed by Gitea CI/CD. Terraform prepares the DB machines and network/firewall rules; Ansible installs OS hardening and base dependencies.
|
|
|
|
## Public Port Policy
|
|
|
|
Ports open to the public internet are only:
|
|
|
|
- `22/tcp` SSH, only from admin IP/CIDR sources
|
|
- `80/tcp` HTTP
|
|
- `443/tcp` HTTPS
|
|
|
|
`8200/tcp` Vault will not be opened to the public internet. Vault must be reachable only from the private network or Docker overlay.
|
|
|
|
`docker-stack-infra.yml` has been aligned with this policy: only the SWAG service publishes ports 80/443; all other services such as Vault, APISIX, RabbitMQ, Prometheus, and Grafana are reachable only through the `iklimco-net` overlay.
|
|
|
|
## Private Network Policy
|
|
|
|
The detailed matrix of ports that must be opened inside the private network is in `01-private-network-port-matrisi.md`. Agents must treat that file as the source when writing firewall or Ansible UFW rules.
|
|
|
|
## Gitea Actions Runner Decision
|
|
|
|
`act_runner` will not run as a Docker container, and the Docker socket will not be mounted into a container.
|
|
|
|
Preferred installation:
|
|
|
|
- `act_runner` is installed as a Linux systemd service.
|
|
- A separate `gitea-runner` user is created for the runner.
|
|
- CI/CD jobs can create containers when needed; for this, the runner host needs Docker CLI/daemon access.
|
|
- Because Docker group membership grants permissions close to root level, only trusted Gitea repos/jobs should use these runner labels.
|
|
|
|
For prod HA, `act_runner` will be installed not on a single machine but on all 3 Swarm manager nodes. This allows pipelines to continue when one manager/runner is lost. Runner labels must be both shared and node-specific:
|
|
|
|
- Shared: `prod-runner`
|
|
- Node specific: `iklim-app-01`, `iklim-app-02`, `iklim-app-03`
|
|
|
|
For test, a single runner is enough:
|
|
|
|
- Shared: `test-runner`
|
|
- Node specific: `iklim-app-01`
|
|
|
|
## Deploy Serialization Decision
|
|
|
|
Because of the 3-runner HA model in prod, multiple deploy jobs can run at the same time. Gitea Actions `concurrency` is used to prevent concurrent deploys; a StorageBox-based lock mechanism is not required.
|
|
|
|
```yaml
|
|
concurrency:
|
|
group: prod-deploy
|
|
cancel-in-progress: false
|
|
```
|
|
|
|
With `cancel-in-progress: false`, a new run in the same group is queued by Gitea until the previous one finishes; it appears as "queued" in the UI and is not shown as an error. All prod deploy workflows, including infrastructure and microservices, must use the same `group: prod-deploy` value so infra deploy and microservice deploy cannot overlap.
|
|
|
|
## Hetzner Physical Host Separation
|
|
|
|
Hetzner Cloud does not allow direct cabinet selection. `Placement Group` is used for the requirement of avoiding the same physical host. A placement group of type `spread` aims to place the cloud servers in the group on different physical hosts.
|
|
|
|
Constraints:
|
|
|
|
- A spread placement group reduces the impact of a single physical host failure.
|
|
- It does not guarantee protection against a wider failure inside the same datacenter or location.
|
|
- For location-level disaster recovery, a different location/region distribution must be designed later.
|
|
- According to Hetzner documentation, there is a maximum limit of 10 servers per spread placement group.
|
|
|
|
At least two placement groups are recommended for prod:
|
|
|
|
- `iklim-prod-app-spread`: 3 Swarm manager/app nodes
|
|
- `iklim-prod-db-spread`: 3 DB nodes
|
|
|
|
Optional for test:
|
|
|
|
- `iklim-test-spread`: `iklim-app-01` and `iklim-db-01`
|
|
|
|
Sources:
|
|
|
|
- Hetzner Terraform provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest
|
|
- Hetzner Networks: https://docs.hetzner.com/cloud/networks/overview/
|
|
- Hetzner Firewalls: https://docs.hetzner.com/cloud/firewalls/overview
|
|
- Hetzner Placement Groups: https://docs.hetzner.com/cloud/placement-groups/overview
|
|
- Docker Swarm overlay ports: https://docs.docker.com/engine/network/drivers/overlay/
|
|
- Gitea act_runner: https://docs.gitea.com/usage/actions/act-runner
|