# 06 - Prod Terraform IaC The purpose of this phase is to create HA-focused IaaS resources inside the prod Hetzner Cloud Project with Terraform. This document can be given to the prod Terraform agent on its own. ## Scope Terraform creates the following in the prod environment: - Private network: `iklim-prod-net` - Subnets: - App/Swarm subnet: `10.20.10.0/24` - DB subnet: `10.20.20.0/24` - Firewall: - Public ingress: only `22/tcp`, `80/tcp`, `443/tcp` - Private ingress: prod rules in `01-private-network-port-matrix.md` - SSH key - Placement groups: - `iklim-prod-app-spread` - `iklim-prod-db-spread` - Floating IP: stable IPv4 for the app entry point, assigned to `iklim-app-01` - Servers: - `iklim-app-01` - `iklim-app-02` - `iklim-app-03` - `iklim-db-01` - `iklim-db-02` - `iklim-db-03` - Ansible inventory output DB cluster software will not be installed with Terraform. DB nodes will be prepared only at the machine, network, and firewall level. ## Version Requirements ```text Terraform >= 1.6 hcloud provider ~> 1.49 ``` ## Recommended File Structure ```text terraform/ hetzner/ prod/ versions.tf providers.tf variables.tf locals.tf network.tf firewall.tf placement.tf servers.tf floating_ip.tf outputs.tf terraform.tfvars.example ``` `terraform.tfvars`, state files, and tokens will not be committed to the repo. ## Variables The `environment` constant is in `locals.tf`; it is not overridden with `tfvars`. Minimum variables: ```hcl hcloud_token = "secret" location = "fsn1" image = "rocky-10" server_type_app = "cpx42" server_type_db = "cpx32" admin_ssh_public_key_path = "~/.ssh/id_rsa.pub" admin_allowed_cidrs = ["X.X.X.X/32"] ``` The server type decision was made by considering the current test environment metrics in `../hetzner-sizing-report.md` and the prod cluster topology. `cpx42` is recommended for prod app nodes because of Java microservice memory pressure, and the more economical `cpx32` is recommended for prod DB nodes because the cluster starts with 3 nodes. When capacity needs are validated with metrics, nodes can be added or in-place rescale can be performed. ## Server Roles and Private IP Plan | Server | Private IP | Role | | --- | --- | --- | | `iklim-app-01` | `10.20.10.11` | Swarm manager + app worker + runner; primary, receives FIP | | `iklim-app-02` | `10.20.10.12` | Swarm manager + app worker + runner | | `iklim-app-03` | `10.20.10.13` | Swarm manager + app worker + runner | | `iklim-db-01` | `10.20.20.11` | Manual DB cluster node | | `iklim-db-02` | `10.20.20.12` | Manual DB cluster node | | `iklim-db-03` | `10.20.20.13` | Manual DB cluster node | Private IPs are statically defined inside `locals.tf` as the `app_private_ips` and `db_private_ips` maps. The server list is derived from these maps with `for_each`. ## Recommended Resources and Cost | Server | Role | Server Type | CPU | RAM | SSD | Monthly | | --- | --- | --- | ---: | ---: | ---: | ---: | | `iklim-app-01` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 | | `iklim-app-02` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 | | `iklim-app-03` | Swarm manager + app worker + runner | `cpx42` | 8 AMD | 16 GB | 320 GB | $29.99 | | `iklim-db-01` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 | | `iklim-db-02` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 | | `iklim-db-03` | DB cluster node | `cpx32` | 4 AMD | 8 GB | 160 GB | $16.49 | | **Total** | 6 servers | | **36 vCPU** | **72 GB** | **1,440 GB** | **$139.44** | ## Placement Group Decision Two separate spread placement groups for prod: ```text iklim-prod-app-spread: iklim-app-01/02/03 iklim-prod-db-spread: iklim-db-01/02/03 ``` This aims to place Swarm quorum nodes on different physical hosts from each other, and DB nodes on different physical hosts from each other. Notes: - Hetzner does not provide direct cabinet selection. - A spread placement group targets different physical hosts. - Disaster recovery across different locations/regions is outside the scope of this phase. - Multi-location DR must be designed separately later when scale grows. ## Floating IP An IPv4 floating IP named `iklim-prod-app-fip` is created and assigned to `iklim-app-01`. The DNS A record is pointed to this IP. If failover is needed, the floating IP can be moved to another app node. ## Public Firewall Public ingress: | Port | Source | Target | | --- | --- | --- | | `22/tcp` | `admin_allowed_cidrs` | All prod nodes | | `80/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` through Floating IP | | `443/tcp` | `0.0.0.0/0`, `::/0` | `iklim-app-*` through Floating IP | The following ports will not be opened publicly in prod: - `8200/tcp` Vault - `5432/tcp` PostgreSQL - `27017/tcp` MongoDB - `5672/tcp`, `15672/tcp`, `61613/tcp`, `15674/tcp` RabbitMQ - `2377/tcp`, `7946/tcp`, `7946/udp`, `4789/udp` Docker Swarm - `9180/tcp` APISIX Admin API - `9090/tcp` Prometheus - `3000/tcp` Grafana ## Private Firewall Firewall placement follows the Swarm placement model: - DB/cluster services on `iklim-db-*` nodes: Patroni/PostgreSQL, MongoDB, and etcd. - App/service-node infrastructure on `iklim-app-*` nodes: Vault, RabbitMQ, APISIX, Prometheus, Grafana, SWAG, and the Redis/Sentinel services from `docker-stack-infra_db-prod.yml`. RabbitMQ ports are therefore documented under the app firewall. Redis and Redis Sentinel do not publish host-mode ports in the current prod stack; they stay on the Docker overlay network and do not need Hetzner firewall openings. ### App (swarm) Firewall — Private Ingress Source from app subnet (`10.20.10.0/24`): | Port | Service | Access method | | --- | --- | --- | | `2377/tcp` | Docker Swarm control plane | From app subnet | | `7946/tcp,udp` | Docker Swarm node discovery | From app subnet | | `4789/udp` | Docker Swarm VXLAN overlay | From app subnet | | `8200/tcp` | Vault | Docker overlay / private network | | `5672/tcp` | RabbitMQ AMQP | From app subnet | | `61613/tcp` | RabbitMQ STOMP | From app subnet | | `15674/tcp` | RabbitMQ Web STOMP | From app subnet | | `15672/tcp` | RabbitMQ Management | Behind SWAG `443` — IP restricted | | `9000/tcp` | APISIX Dashboard | Behind SWAG `443` — IP restricted | | `9180/tcp` | APISIX Admin API | Only Dashboard accesses it from Docker overlay | | `9090/tcp` | Prometheus | Behind SWAG `443` — IP restricted | | `3000/tcp` | Grafana | Behind SWAG `443` — IP restricted | Source from DB subnet, because `iklim-db-*` nodes join Swarm as workers: | Port | Service | Source | | --- | --- | --- | | `2377/tcp` | Docker Swarm control plane | `10.20.20.0/24` | | `7946/tcp,udp` | Docker Swarm node discovery | `10.20.20.0/24` | | `4789/udp` | Docker Swarm VXLAN overlay | `10.20.20.0/24` | ### DB Firewall — Private Ingress Admin access: | Port | Service | Source | | --- | --- | --- | | `22/tcp` | SSH | `admin_allowed_cidrs` | Source from app subnet (`10.20.10.0/24`): | Port | Service | Note | | --- | --- | --- | | `5432/tcp` | PostgreSQL (Patroni primary) | App subnet access | | `27017/tcp` | MongoDB replica set endpoint | App subnet access | | `2379/tcp` | etcd client (Patroni + APISIX) | App subnet access | | `2377/tcp` | Docker Swarm control plane | From app subnet | | `7946/tcp,udp` | Docker Swarm node discovery | From app subnet | | `4789/udp` | Docker Swarm VXLAN overlay | From app subnet | Mutual access inside the DB subnet (`10.20.20.0/24`): | Port | Service | Note | | --- | --- | --- | | `5432/tcp` | PostgreSQL Patroni replication | Between DB nodes | | `27017/tcp` | MongoDB replica set internal | Between DB nodes | | `2379/tcp` | etcd client | Patroni -> etcd access | | `2380/tcp` | etcd peer | etcd cluster internal | | `8008/tcp` | Patroni REST API | Patroni leader election and health check | IP restriction is done in the SWAG nginx configuration, not in the Hetzner firewall. ## Outputs The following values can be obtained after `terraform apply` or `terraform output`: | Output | Description | | --- | --- | | `ansible_inventory_yaml` | Ansible inventory YAML — written to `ansible/prod/inventory/generated/prod.yml` | | `prod_private_ips` | Private IP map of all nodes, with `app` and `db` subkeys | | `prod_public_ips` | Public IPv4 map of all nodes | | `prod_floating_ip` | Floating IP address for the Swarm entry point; DNS A record points to this IP | To extract the Ansible inventory: ```bash terraform output -raw ansible_inventory_yaml > \ ../../../ansible/prod/inventory/generated/prod.yml ``` ## Lifecycle and Resize Policy ### `server_type` Change (Resize) Changing `server_type` does **not** trigger Terraform destroy+create. The `hcloud` provider supports this natively: it stops the server, calls the Hetzner Resize API, and starts it again. Update the value in `terraform.tfvars` and run `terraform apply`. There is downtime, because the server stops and starts, but disk, installed software, and Docker volumes are preserved. No `ignore_changes` or manual step is required. ### Which Changes Force Server Recreation? | Changed field | Behavior | Note | | --- | --- | --- | | `server_type` | In-place resize (provider native) | `terraform apply` is enough | | `hcloud_server_network` | Only attachment is updated | Because a separate resource is used | | `hcloud_firewall_attachment` | Only attachment is updated | Because a separate resource is used | | `placement_group_id` | Hetzner API does not allow changing it -> destroy+create | Do not change | | `image` | Disk image changes -> destroy+create | Do not change | | `location` | Cannot be moved to another datacenter -> destroy+create | Do not change | ### Network and Firewall Attachment Separation The `network` block and `firewall_ids` are not embedded inside `hcloud_server`. Instead, separate resources are defined: - `hcloud_server_network` — private IP assignment, for each node with `for_each` - `hcloud_firewall_attachment` — firewall relationship, using the server list derived with `for_each` ### `prevent_destroy` Protection Each server gets `lifecycle { prevent_destroy = true }`. To intentionally delete a server, temporarily remove the lifecycle block first. ## How to Run ### Preparation **1. Create tfvars once:** ```bash cd Environment_Infrastructure/terraform/hetzner/prod cp terraform.tfvars.example terraform.tfvars # Fill terraform.tfvars with real values # (hcloud_token, admin_allowed_cidrs, etc.) ``` `terraform.tfvars` is not committed; it is protected with `.gitignore`. **2. Install the provider once:** ```bash terraform init ``` ### First Apply ```bash # Show what will be created; do not make changes terraform plan # Approve and create terraform apply ``` After `apply`, 6 servers, 2 firewalls, 1 floating IP, and network resources are visible in Hetzner. ### Get Ansible Inventory ```bash terraform output -raw ansible_inventory_yaml > \ ../../../ansible/prod/inventory/generated/prod.yml ``` ### Gitea Variable: `PROD_FLOATING_IP` The deploy pipeline needs this variable to manage DNS records automatically. It is set once after `terraform apply`: ```bash terraform output prod_floating_ip ``` Add the resulting IP address in Gitea -> project settings -> **Variables** with the name `PROD_FLOATING_IP`. The pipeline reads it with `vars.PROD_FLOATING_IP` and updates GoDaddy A records idempotently. ### Resize (Change Server Type) Change the `server_type_app` or `server_type_db` value inside `terraform.tfvars`: ```bash terraform apply ``` The server is stopped, the Hetzner Resize API is called, and the server is started again. Disk and Docker volumes are preserved. There is downtime. ### Server Deletion (Forced) Because `prevent_destroy = true` exists, normal `terraform destroy` fails. First, temporarily remove the `lifecycle` block inside `servers.tf`: ```hcl # lifecycle { # prevent_destroy = true # } ``` Then: ```bash terraform destroy -target=hcloud_server.app["iklim-app-01"] ``` After completing the operation, add the lifecycle block back. ### State Management Local state is used for now (`terraform.tfstate`). The state file is not committed to the repo. If more than one person works on the team, Hetzner Object Storage or HCP Terraform remote state must be used. ## Acceptance Criteria - `terraform plan` works only with the prod Hetzner Project token. - 6 servers are created: `iklim-app-01/02/03`, `iklim-db-01/02/03`. - Swarm nodes are inside the `iklim-prod-app-spread` placement group. - DB nodes are inside the `iklim-prod-db-spread` placement group. - Public firewall allows only `22`, `80`, and `443` ingress. - Private firewall is compatible with `01-private-network-port-matrix.md`. - DB replication ports are accessible only from the DB subnet. - Floating IP is created and assigned to `iklim-app-01`. - Terraform state and secret tfvars are not committed.