- Configured 'iklimco-net' aliases for RabbitMQ nodes in prod overlay documentation.
- Updated Step 6 and Step 8 stack snippets to include network aliases and definitions.
- Added a technical note to Step 7 explaining DNS requirements for sticky sessions.
- Implemented Consistent Hashing (chash) logic in APISIX upstream configuration for prod.
- Added instructions for real IP detection in APISIX configuration template.
- Documented the bypass of Swarm VIP for better session persistence on RabbitMQ nodes.
Corrects six documentation files to match the actual deployed pipeline
behavior and align test/prod approaches where they share the same code.
prod-env/02-godaddy-credentials.md
- Step 1: correct secret file from .env.secrets.shared to .env.secrets.swag;
add clarifying note that .env.secrets.shared holds AppRole/DB secrets
and must not be used for GoDaddy credentials.
- Step 4: document that GoDaddy A records are now managed automatically
by the pipeline's 'Update DNS Records' step via the GoDaddy API;
reference the Gitea variable PROD_FLOATING_IP that must be set once.
prod-env/08-deploy-pipeline-update.md
- Add Step 2 documenting the new 'Update DNS Records' pipeline step
(GoDaddy API, idempotent check-before-update, requires jq and
vars.PROD_FLOATING_IP).
- Renumber subsequent steps 3-8 to accommodate the new step.
- Fix DB hostnames in Step 7 (Run Database Init Scripts) from
iklimco_postgresql/iklimco_mongodb to postgresql/mongodb, matching
how Swarm overlay DNS resolves service names inside iklimco-net.
- Update context block: correct DB hostname description, replace
outdated storagebox path note with env-var approach, list new steps.
- Update final step order to 24 steps including the DNS step and
Release Deploy Lock; mark Wait for etcd as NEW.
prod-env/09-verify.md
- Insert check #2 for the precipitation image directory
(/mnt/storagebox/precipitation/images) and iklimco_image-data volume
bind mount, mirroring the equivalent check in test-env/08-verify.md.
- Renumber all subsequent checks (3-12) to maintain sequential ordering.
test-env/03-infra-stack-changes.md
- Update SWAG service volume snippet: replace hardcoded paths
(swag-vl:/config, /opt/iklimco/swag/dns-conf, /opt/iklimco/swag/site-confs)
with env-var forms (${SWAG_CONFIG_DIR:-swag-vl}, ${SWAG_DNS_CONF_DIR:-...},
${SWAG_SITE_CONFS_DIR:-...}) to match docker-stack-infra.yml.
- Update cert-reloader volume snippet: replace swag-vl and /opt/iklimco/ssl
with ${SWAG_CONFIG_DIR:-swag-vl} and ${SWAG_CERT_DIR:-/opt/iklimco/ssl},
enabling StorageBox override in prod without changing the base file.
test-env/04-swag-nginx-configs.md
- Replace RESTRICTED_IP_1/RESTRICTED_IP_2 individual env vars with
RESTRICTED_IPS (comma-separated CIDR list) in the required-vars section,
matching env-test/.env and the actual pipeline.
- Update all three IP-restricted template examples (apigw, rabbitmq,
grafana) from allow ${RESTRICTED_IP_1}; allow ${RESTRICTED_IP_2}; to
${RESTRICTED_IPS_BLOCK}, matching the actual .conf.tpl files in the repo.
- Rewrite the deploy step section to match the real pipeline: docker run
alpine for file writing, RESTRICTED_IPS_BLOCK generation via sed, and
envsubst with explicit SWAG_VARS filter to protect nginx $upstream_* vars.
test-env/07-deploy-pipeline-update.md
- Step 2 (Prepare SWAG Directories): replace sudo-tee approach with the
actual docker-run-alpine method used in deploy-test.yml; add nginx
reload block; update notes to reflect RESTRICTED_IPS_BLOCK generation.
- Step 4 (Re-order): correct step numbering to match actual pipeline
(21 steps); mark 'Wait for etcd' as already present in pipeline rather
than a new addition; add Bootstrap Vault TLS Placeholder which was
missing from the documented order.
- Refactor production setup documentation to reflect a 3-node Vault Raft cluster starting from launch.
- Update all paths to use StorageBox mounts for shared state (SWAG config, TLS certs, Monitoring data).
- Switch Nginx configuration convention from proxy-confs to site-confs to align with SWAG's auto-include behavior.
- Standardize TLS private key extensions to .pem.
- Update node failover and recovery facts to include monitoring services.
- Align deployment pipeline instructions with the latest environment variable-driven approach.
Add the Ansible README and expand prod bootstrap coverage for StorageBox keys, DB labels, DB stack configuration, and act runner setup. Update MongoDB configuration for replica set support and refresh prod roadmap/setup documentation for Swarm labels, StorageBox-backed cert paths, and recovery guidance.
- Include missing WireGuard port (51820/udp) in firewall documentation.
- Synchronize PROD DB firewall rules with the latest Patroni/Swarm setup requirements.
- Complete the PROD section of setup-vs-roadmap-map.md to cover all transition steps.
- Clarify that infra services (Vault, RabbitMQ, etc.) are restricted to private/overlay networks.
- 01: Add WireGuard 51820/udp to public ingress table; add 9000/tcp
(APISIX Dashboard) to admin CIDR row in test private rules
- 02: Fix admin_ssh_public_key_path (id_rsa.pub, not id_ed25519.pub);
add WireGuard 51820/udp to DB firewall table; clarify 9000/9180 port
descriptions (app subnet access + SWAG proxy)
- 03: Update file structure with new roles (db_stack, wireguard,
act_runner) and playbooks (test-app/db-post-stack.yml); add floating
IP systemd service to base role description; clarify node labels
- 04: Clarify two-phase deployment (Ansible prepares dirs/config,
Gitea CI/CD deploys stack); add WireGuard setup info
- 05: Add system user column to runner table; fix runner name in
acceptance criteria (iklim-test-app → test-runner)
Add hetzner-floating-ip.service systemd unit to base role so that
the floating IP is bound to eth0 on every boot. The task is
conditional (runs only when hetzner_floating_ip is defined in
host_vars). Add 49.12.116.113 as the floating IP for iklim-app-01
in test host_vars.
The docker role only opened Swarm ports (2377, 7946, 4789).
HTTP and HTTPS were missing, making SWAG unreachable from outside.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add new Ansible role `wireguard` to set up WireGuard VPN server on
DB node with key generation, firewalld rules, and client peer config.
- Introduce `pg-proxy` and `mongo-proxy` socat containers in db_stack
to expose PostgreSQL (15432) and MongoDB (17017) on host ports,
restricted to WireGuard subnet via firewalld.
- Update test environment group_vars with WireGuard client entry for
`murat-inspiron-15-3525`.
- Modify act_runner config: set `docker_host` to unix socket, remove
explicit socket mount from options, and change runner label image to
`catthehacker/ubuntu:act-22.04`.
- Open UDP port 51820 in Hetzner firewall for WireGuard inbound.
- Adjust test-db-post-stack playbook to include wireguard role (tagged).
- Update roadmap document with APISIX init step order.
- Add new Ansible role `wireguard` to set up WireGuard VPN server on
DB node with key generation, firewalld rules, and client peer config.
- Introduce `pg-proxy` and `mongo-proxy` socat containers in db_stack
to expose PostgreSQL (15432) and MongoDB (17017) on host ports,
restricted to WireGuard subnet via firewalld.
- Update test environment group_vars with WireGuard client entry for
`murat-inspiron-15-3525`.
- Modify act_runner config: set `docker_host` to unix socket, remove
explicit socket mount from options, and change runner label image to
`catthehacker/ubuntu:act-22.04`.
- Open UDP port 51820 in Hetzner firewall for WireGuard inbound.
- Adjust test-db-post-stack playbook to include wireguard role (tagged).
- Update roadmap document with APISIX init step order.
Migrates `act_runner` configuration from shell-generated to an Ansible-templated `config.yaml`. This enables:
- Dynamic label provisioning, including `test-runner:docker://ubuntu:22.04`.
- Explicit configuration for joining the `iklimco-net` overlay network.
- Docker socket mounting for CI/CD jobs to interact with the Docker daemon.
Updates `setup/05-test-runner-ve-deploy-onkosullari.md` and other related documentation to reflect the new automated and integrated runner setup.
* Introduces an Ansible role for installing and registering `act_runner` for Gitea Actions.
* Automates PostgreSQL and MongoDB deployment on Docker Swarm in the test environment, leveraging Docker named volumes for data persistence.
* Translates core documentation, including `README.md` and `setup/04-test-db-docker-kurulum.md`, to Turkish.
* Adds comprehensive documentation for firewall architecture (`facts/firewall.md`) and Docker Swarm node recovery (`facts/swarm-node-recovery.md`).
* Enhances security hardening by ensuring `fail2ban` is enabled and streamlining admin SSH key management via Ansible.
* Updates Ansible vault structure to support new secret variables and adds `.vault_pass` to `.gitignore`.
This commit introduces several core configurations and structural improvements:
* **User Management:** Creates a new `iklim` administrative user with a securely hashed password, enabled by `python3-passlib`.
* **System Configuration:** Sets the system keyboard layout to Turkish Q (`trq`).
* **Security Hardening:** Refines firewall rules for SSH using a rich rule and ensures `journald` log limits file creation.
* **Ansible Variable Management:** Restructures `group_vars` by consolidating global variables into `group_vars/all/vars.yml` and sensitive data into a dedicated `group_vars/all/vault.yml`.
* **Ansible Compatibility:** Adds `!unsafe` to a `docker info` shell command to prevent future warnings.
This commit brings the `README.md` and Ansible setup guides (`03-test-ansible-bootstrap.md`, `07-prod-ansible-bootstrap.md`) in sync with the current state of the Ansible automation.
Key updates include:
- Acknowledging the presence of in-repository Ansible playbooks and shared roles.
- Correcting Ansible inventory output paths and Terraform output commands.
- Detailing the new `group_vars/all/{vars.yml,vault.yml}` structure.
- Updating Ansible prerequisites to include `passlib` for password hashing.
- Adding documentation for `iklim` system user creation, keyboard layout, and refined firewall rules.
- Removing outdated "Known Gaps" related to missing Ansible code.
This commit introduces the foundational Ansible playbooks, roles, and configurations for automated provisioning of both production and test environments.
Key capabilities include:
- **Base System Setup:** Common packages, timezone, chrony, and hostname.
- **Security Hardening:** SELinux disable, SSH configuration, `dnf-automatic`, `fail2ban`, `firewalld` setup, and `journald` log limits.
- **Docker & Swarm:** Docker installation and configuration, Docker Swarm initialization/joining for managers and workers, overlay network creation, and node labeling.
- **Storage:** Hetzner StorageBox integration using `davfs2`.
- **Directory Structure:** Creation of application and database-specific directories.
This establishes a comprehensive, automated pipeline for infrastructure deployment and initial configuration.
This commit systematically updates all Terraform configurations, including resources, variables, and labels, to use the more generic `app` designation instead of `swarm`. This improves consistency and decouples the infrastructure naming from a specific container orchestration technology like Docker Swarm.
Adjusts documentation for test and production Ansible bootstrapping to leverage `ansible.cfg`. Commands are now run from specific environment directories (`ansible/test/` or `ansible/prod/`), eliminating the need to explicitly specify inventory and playbook paths.
Also adds an initial step to install required Ansible collections using `ansible-galaxy`.
This commit introduces a reordered and renumbered set of setup documentation files to better reflect the deployment stages for both test and production environments.
Key changes include:
* A new `setup-vs-roadmap-map.md` file to provide a clear mapping between roadmap tasks and their corresponding setup phases.
* Significantly expanded Ansible bootstrap documentation for both test and production, detailing Docker, Swarm, security hardening, and StorageBox SSH key management roles.
* Formalized database Docker and Swarm cluster setup instructions for test and production, including explicit steps for Swarm worker integration of DB nodes.
* Updated roadmap documentation (`roadmap/prod-env/*`) to align with the refined setup, incorporating correct private IP addresses for Swarm joins, new node labels, and floating IP usage for GoDaddy DNS records.
This new README serves as the central documentation for the `iklim.co` Hetzner Cloud environment infrastructure repository. It outlines the project's purpose, scope, and structure, alongside detailed setup guides, security baselines, environment topologies, and usage instructions for Terraform.
Overhaul and expand firewall definitions for both `prod` and `test` environments to enable comprehensive inter-subnet communication.
This includes implementing explicit rules supporting:
- Docker Swarm overlay networks between application and database subnets.
- High-availability database clusters (PostgreSQL replication, MongoDB replica sets, Patroni, etcd).
- Internal access for various infrastructure services (Vault, Redis, RabbitMQ, APISIX, Prometheus, Grafana).
All firewall rule descriptions are standardized in English for improved clarity and consistency.
Additionally, update default `server_type_swarm` and `server_type_db` variables to the recommended `CPX` series for both environments. An initial generated Ansible inventory for the test environment is also added.
- Add `hetzner-sizing-report.md` defining data-driven server type recommendations for test and prod environments.
- Update Terraform configurations to align with the recommended `CPX` server types and refine firewall rules for Docker Swarm and database interactions.
- Introduce comprehensive documentation and stack files for:
- Single-node PostgreSQL/MongoDB deployment on a test DB worker node.
- High-availability 3-node MongoDB replica set and Patroni+etcd PostgreSQL cluster for production.
- Enhance Ansible bootstrap roles with SELinux disabling, fail2ban configuration, and StorageBox SSH key management for CI/CD.
- Reorganize and rename setup documentation files for improved structure and clarity.
- Database nodes now join the Docker Swarm as workers with `role=db` labels, allowing Swarm to manage their dedicated services.
- The `docker-stack-infra.yml` has been updated for production to focus solely on application-level infrastructure components.
- Dedicated database services (PostgreSQL, MongoDB, Patroni-etcd) are now explicitly deployed in separate Swarm stacks on `iklim-db-XX` nodes.
- Standardizes node naming conventions (`iklim-app-XX`, `iklim-db-XX`) across the production roadmap documentation.
- Clarifies that the `etcd` service within `docker-stack-infra.yml` is exclusively for APISIX configuration, distinct from Patroni's etcd cluster.
- This commit introduces the Terraform configuration to provision a production environment on Hetzner Cloud, building on the existing test setup.
- Key improvements and new features include:
* **Multi-node clusters:** Scaling to 3-node Swarm application and database clusters for improved resilience.
* **High availability:** Utilizing a Hetzner Floating IP for the application entry point and `spread` placement groups for fault tolerance across physical hosts.
* **Enhanced network security:** Internal management services (RabbitMQ, APISIX, Prometheus, Grafana) are restricted to the application subnet, expected to be accessed via an internal reverse proxy (SWAG).
* **Internal database replication:** New firewall rules enable PostgreSQL replication and MongoDB replica set traffic within the database subnet.
* **Refined test environment:** Updates to align `test` configuration with the new `prod` structure, including a dedicated floating IP and adjusted firewall rules.
* **Configuration standardization:** Environment-specific details moved to `locals.tf` for clarity, with upgraded server types and migration to Rocky Linux as the base image.
- Updates were also made to the latest version of Terraform to ensure consistency in the documentation
This commit introduces the foundational Infrastructure-as-Code for provisioning a test environment on Hetzner Cloud. It defines server nodes, private networking, comprehensive firewalls, and includes documentation on resource lifecycle and safe configuration practices.