Document and commit the production bootstrap state after the initial Hetzner and Ansible rollout. - switch Ansible prod runbooks to use the shared vault password file - record production admin CIDRs, SSH key path, encrypted group vault, and encrypted per-host vault files - add generated production inventory and the prod setup history notes from the first bootstrap - keep root password login disabled while preserving key-based root access for Ansible bootstrap continuity - document separate Hetzner projects and tokens for test/prod and commit the prod provider lock file - remove the private Redis firewall allowance from the prod Terraform firewall and matching setup docs
12 KiB
06 - Prod Terraform IaC
The purpose of this phase is to create HA-focused IaaS resources inside the prod Hetzner Cloud Project with Terraform. This document can be given to the prod Terraform agent on its own.
Scope
Terraform creates the following in the prod environment:
- Private network:
iklim-prod-net - Subnets:
- App/Swarm subnet:
10.20.10.0/24 - DB subnet:
10.20.20.0/24
- App/Swarm subnet:
- Firewall:
- Public ingress: only
22/tcp,80/tcp,443/tcp - Private ingress: prod rules in
01-private-network-port-matrisi.md
- Public ingress: only
- SSH key
- Placement groups:
iklim-prod-app-spreadiklim-prod-db-spread
- Floating IP: stable IPv4 for the app entry point, assigned to
iklim-app-01 - Servers:
iklim-app-01iklim-app-02iklim-app-03iklim-db-01iklim-db-02iklim-db-03
- Ansible inventory output
DB cluster software will not be installed with Terraform. DB nodes will be prepared only at the machine, network, and firewall level.
Version Requirements
Terraform >= 1.6
hcloud provider ~> 1.49
Recommended File Structure
terraform/
hetzner/
prod/
versions.tf
providers.tf
variables.tf
locals.tf
network.tf
firewall.tf
placement.tf
servers.tf
floating_ip.tf
outputs.tf
terraform.tfvars.example
terraform.tfvars, state files, and tokens will not be committed to the repo.
Variables
The environment constant is in locals.tf; it is not overridden with tfvars.
Minimum variables:
hcloud_token = "secret"
location = "fsn1"
image = "rocky-10"
server_type_app = "cpx42"
server_type_db = "cpx32"
admin_ssh_public_key_path = "~/.ssh/id_rsa.pub"
admin_allowed_cidrs = ["X.X.X.X/32"]
The server type decision was made by considering the current test environment metrics in ../hetzner-sizing-report.md and the prod cluster topology. cpx42 is recommended for prod app nodes because of Java microservice memory pressure, and the more economical cpx32 is recommended for prod DB nodes because the cluster starts with 3 nodes. When capacity needs are validated with metrics, nodes can be added or in-place rescale can be performed.
Server Roles and Private IP Plan
| Server | Private IP | Role |
|---|---|---|
iklim-app-01 |
10.20.10.11 |
Swarm manager + app worker + runner; primary, receives FIP |
iklim-app-02 |
10.20.10.12 |
Swarm manager + app worker + runner |
iklim-app-03 |
10.20.10.13 |
Swarm manager + app worker + runner |
iklim-db-01 |
10.20.20.11 |
Manual DB cluster node |
iklim-db-02 |
10.20.20.12 |
Manual DB cluster node |
iklim-db-03 |
10.20.20.13 |
Manual DB cluster node |
Private IPs are statically defined inside locals.tf as the app_private_ips and db_private_ips maps. The server list is derived from these maps with for_each.
Recommended Resources and Cost
| Server | Role | Server Type | CPU | RAM | SSD | Monthly |
|---|---|---|---|---|---|---|
iklim-app-01 |
Swarm manager + app worker + runner | cpx42 |
8 AMD | 16 GB | 320 GB | $29.99 |
iklim-app-02 |
Swarm manager + app worker + runner | cpx42 |
8 AMD | 16 GB | 320 GB | $29.99 |
iklim-app-03 |
Swarm manager + app worker + runner | cpx42 |
8 AMD | 16 GB | 320 GB | $29.99 |
iklim-db-01 |
DB cluster node | cpx32 |
4 AMD | 8 GB | 160 GB | $16.49 |
iklim-db-02 |
DB cluster node | cpx32 |
4 AMD | 8 GB | 160 GB | $16.49 |
iklim-db-03 |
DB cluster node | cpx32 |
4 AMD | 8 GB | 160 GB | $16.49 |
| Total | 6 servers | 36 vCPU | 72 GB | 1,440 GB | $139.44 |
Placement Group Decision
Two separate spread placement groups for prod:
iklim-prod-app-spread: iklim-app-01/02/03
iklim-prod-db-spread: iklim-db-01/02/03
This aims to place Swarm quorum nodes on different physical hosts from each other, and DB nodes on different physical hosts from each other.
Notes:
- Hetzner does not provide direct cabinet selection.
- A spread placement group targets different physical hosts.
- Disaster recovery across different locations/regions is outside the scope of this phase.
- Multi-location DR must be designed separately later when scale grows.
Floating IP
An IPv4 floating IP named iklim-prod-app-fip is created and assigned to iklim-app-01. The DNS A record is pointed to this IP. If failover is needed, the floating IP can be moved to another app node.
Public Firewall
Public ingress:
| Port | Source | Target |
|---|---|---|
22/tcp |
admin_allowed_cidrs |
All prod nodes |
80/tcp |
0.0.0.0/0, ::/0 |
iklim-app-* through Floating IP |
443/tcp |
0.0.0.0/0, ::/0 |
iklim-app-* through Floating IP |
The following ports will not be opened publicly in prod:
8200/tcpVault5432/tcpPostgreSQL27017/tcpMongoDB5672/tcp,15672/tcp,61613/tcp,15674/tcpRabbitMQ2377/tcp,7946/tcp,7946/udp,4789/udpDocker Swarm9180/tcpAPISIX Admin API9090/tcpPrometheus3000/tcpGrafana
Private Firewall
App (swarm) Firewall — Private Ingress
Source from app subnet (10.20.10.0/24):
| Port | Service | Access method |
|---|---|---|
2377/tcp |
Docker Swarm control plane | From app subnet |
7946/tcp,udp |
Docker Swarm node discovery | From app subnet |
4789/udp |
Docker Swarm VXLAN overlay | From app subnet |
8200/tcp |
Vault | Docker overlay / private network |
5672/tcp |
RabbitMQ AMQP | From app subnet |
61613/tcp |
RabbitMQ STOMP | From app subnet |
15674/tcp |
RabbitMQ Web STOMP | From app subnet |
15672/tcp |
RabbitMQ Management | Behind SWAG 443 — IP restricted |
9000/tcp |
APISIX Dashboard | Behind SWAG 443 — IP restricted |
9180/tcp |
APISIX Admin API | Only Dashboard accesses it from Docker overlay |
9090/tcp |
Prometheus | Behind SWAG 443 — IP restricted |
3000/tcp |
Grafana | Behind SWAG 443 — IP restricted |
Source from DB subnet, because iklim-db-* nodes join Swarm as workers:
| Port | Service | Source |
|---|---|---|
2377/tcp |
Docker Swarm control plane | 10.20.20.0/24 |
7946/tcp,udp |
Docker Swarm node discovery | 10.20.20.0/24 |
4789/udp |
Docker Swarm VXLAN overlay | 10.20.20.0/24 |
DB Firewall — Private Ingress
Admin access:
| Port | Service | Source |
|---|---|---|
22/tcp |
SSH | admin_allowed_cidrs |
Source from app subnet (10.20.10.0/24):
| Port | Service | Note |
|---|---|---|
5432/tcp |
PostgreSQL (Patroni primary) | App subnet access |
27017/tcp |
MongoDB replica set endpoint | App subnet access |
2379/tcp |
etcd client (Patroni + APISIX) | App subnet access |
2377/tcp |
Docker Swarm control plane | From app subnet |
7946/tcp,udp |
Docker Swarm node discovery | From app subnet |
4789/udp |
Docker Swarm VXLAN overlay | From app subnet |
Mutual access inside the DB subnet (10.20.20.0/24):
| Port | Service | Note |
|---|---|---|
5432/tcp |
PostgreSQL Patroni replication | Between DB nodes |
27017/tcp |
MongoDB replica set internal | Between DB nodes |
2379/tcp |
etcd client | Patroni -> etcd access |
2380/tcp |
etcd peer | etcd cluster internal |
8008/tcp |
Patroni REST API | Patroni leader election and health check |
IP restriction is done in the SWAG nginx configuration, not in the Hetzner firewall.
Outputs
The following values can be obtained after terraform apply or terraform output:
| Output | Description |
|---|---|
ansible_inventory_yaml |
Ansible inventory YAML — written to ansible/prod/inventory/generated/prod.yml |
prod_private_ips |
Private IP map of all nodes, with app and db subkeys |
prod_public_ips |
Public IPv4 map of all nodes |
prod_floating_ip |
Floating IP address for the Swarm entry point; DNS A record points to this IP |
To extract the Ansible inventory:
terraform output -raw ansible_inventory_yaml > \
../../../ansible/prod/inventory/generated/prod.yml
Lifecycle and Resize Policy
server_type Change (Resize)
Changing server_type does not trigger Terraform destroy+create. The hcloud provider supports this natively: it stops the server, calls the Hetzner Resize API, and starts it again. Update the value in terraform.tfvars and run terraform apply.
There is downtime, because the server stops and starts, but disk, installed software, and Docker volumes are preserved. No ignore_changes or manual step is required.
Which Changes Force Server Recreation?
| Changed field | Behavior | Note |
|---|---|---|
server_type |
In-place resize (provider native) | terraform apply is enough |
hcloud_server_network |
Only attachment is updated | Because a separate resource is used |
hcloud_firewall_attachment |
Only attachment is updated | Because a separate resource is used |
placement_group_id |
Hetzner API does not allow changing it -> destroy+create | Do not change |
image |
Disk image changes -> destroy+create | Do not change |
location |
Cannot be moved to another datacenter -> destroy+create | Do not change |
Network and Firewall Attachment Separation
The network block and firewall_ids are not embedded inside hcloud_server. Instead, separate resources are defined:
hcloud_server_network— private IP assignment, for each node withfor_eachhcloud_firewall_attachment— firewall relationship, using the server list derived withfor_each
prevent_destroy Protection
Each server gets lifecycle { prevent_destroy = true }. To intentionally delete a server, temporarily remove the lifecycle block first.
How to Run
Preparation
1. Create tfvars once:
cd Environment_Infrastructure/terraform/hetzner/prod
cp terraform.tfvars.example terraform.tfvars
# Fill terraform.tfvars with real values
# (hcloud_token, admin_allowed_cidrs, etc.)
terraform.tfvars is not committed; it is protected with .gitignore.
2. Install the provider once:
terraform init
First Apply
# Show what will be created; do not make changes
terraform plan
# Approve and create
terraform apply
After apply, 6 servers, 2 firewalls, 1 floating IP, and network resources are visible in Hetzner.
Get Ansible Inventory
terraform output -raw ansible_inventory_yaml > \
../../../ansible/prod/inventory/generated/prod.yml
Gitea Variable: PROD_FLOATING_IP
The deploy pipeline needs this variable to manage DNS records automatically. It is set once after terraform apply:
terraform output prod_floating_ip
Add the resulting IP address in Gitea -> project settings -> Variables with the name PROD_FLOATING_IP. The pipeline reads it with vars.PROD_FLOATING_IP and updates GoDaddy A records idempotently.
Resize (Change Server Type)
Change the server_type_app or server_type_db value inside terraform.tfvars:
terraform apply
The server is stopped, the Hetzner Resize API is called, and the server is started again. Disk and Docker volumes are preserved. There is downtime.
Server Deletion (Forced)
Because prevent_destroy = true exists, normal terraform destroy fails. First, temporarily remove the lifecycle block inside servers.tf:
# lifecycle {
# prevent_destroy = true
# }
Then:
terraform destroy -target=hcloud_server.app["iklim-app-01"]
After completing the operation, add the lifecycle block back.
State Management
Local state is used for now (terraform.tfstate). The state file is not committed to the repo. If more than one person works on the team, Hetzner Object Storage or HCP Terraform remote state must be used.
Acceptance Criteria
terraform planworks only with the prod Hetzner Project token.- 6 servers are created:
iklim-app-01/02/03,iklim-db-01/02/03. - Swarm nodes are inside the
iklim-prod-app-spreadplacement group. - DB nodes are inside the
iklim-prod-db-spreadplacement group. - Public firewall allows only
22,80, and443ingress. - Private firewall is compatible with
01-private-network-port-matrisi.md. - DB replication ports are accessible only from the DB subnet.
- Floating IP is created and assigned to
iklim-app-01. - Terraform state and secret tfvars are not committed.