# 07 - Prod Ansible Bootstrap The purpose of this phase is to prepare the prod machines created by Terraform for Linux, security hardening, Docker, and Swarm. DB cluster software is not installed by this playbook; however, DB nodes join Swarm as workers. ## Ansible Installation Ansible must be installed on the control machine, meaning your own computer. No agent is installed on target servers; SSH access is enough. ### Installation by Operating System - **Ubuntu / Debian:** ```bash sudo apt update sudo apt install -y pipx python3-venv pipx ensurepath export PATH="$HOME/.local/bin:$PATH" pipx install --include-deps ansible pipx install ansible-lint ``` - **Fedora / Rocky Linux / RHEL:** ```bash sudo dnf install -y pipx python3-virtualenv pipx ensurepath export PATH="$HOME/.local/bin:$PATH" pipx install --include-deps ansible pipx install ansible-lint ``` - **macOS (Homebrew):** ```bash brew install ansible ``` - **With Python Pip, on any platform:** ```bash pipx install --include-deps ansible pipx install ansible-lint ``` ### Additional Python Dependencies `passlib` is required on the control machine for the `password_hash` filter: ```bash pipx inject ansible passlib ``` > If you installed with `pip`: `pip install passlib` ### Verify the Installation Whichever method you used to install it, use the following commands to verify that the installation succeeded: ```bash # Check the Ansible version and configuration paths ansible --version # Check which location the Ansible binary is running from which -a ansible ``` ## Running Ansible Commands All commands must be run from the `ansible/prod/` directory. `ansible.cfg` automatically defines the inventory and `roles_path`. ### 0. Install Required Collections Once During Initial Setup ```bash ansible-galaxy collection install -r ../requirements.yml ``` ### 1. Connection Test (Ping) ```bash ansible all -m ping ``` ### 2. Run the Bootstrap Playbook ```bash ansible-playbook prod-bootstrap.yml --ask-vault-pass ``` *Note: The `--ask-vault-pass` parameter asks for the Ansible Vault password; the StorageBox password is decrypted this way.* ### 3. Run Only a Specific Role (Tags) ```bash ansible-playbook prod-bootstrap.yml --tags "hardening" --ask-vault-pass ``` ## Target Machines | Host | Role | | --- | --- | | `iklim-app-01` | Swarm manager + app worker | | `iklim-app-02` | Swarm manager + app worker | | `iklim-app-03` | Swarm manager + app worker | | `iklim-db-01` | Manual DB cluster node | | `iklim-db-02` | Manual DB cluster node | | `iklim-db-03` | Manual DB cluster node | ## Recommended File Structure ```text ansible/ prod/ ansible.cfg inventory/ generated/ prod.yml group_vars/ all/ vars.yml vault.yml prod-bootstrap.yml roles/ db_stack/ roles/ base/ hardening/ docker/ swarm/ node_dirs/ storagebox/ storagebox_ssh_key/ act_runner/ db_stack/ ``` `ansible/prod/ansible.cfg` sets `roles_path = roles:../roles`. Because of that ordering, `ansible/prod/roles/db_stack` is the production-specific role that is used by `prod-bootstrap.yml`; the shared `ansible/roles/db_stack` remains the common fallback/reference implementation. Production DB behavior that writes Patroni, MongoDB, and replica-set auth files to StorageBox belongs to the prod-local role. ## Base Role Applied to all prod nodes: - Package cache update - `epel-release` — installed first as a separate task; `fail2ban`, `davfs2`, `htop`, and `btop` depend on this repo - base packages, after `epel-release` is active: - `curl` - `wget` - `git` - `jq` - `tar` - `unzip` - `bash-completion` - `gettext` — required for envsubst in CI/CD deploy pipelines - `tree` - `ca-certificates` - `fail2ban` - `chrony` - `python3` - `python3-pip` - `python3-passlib` — for the `password_hash` filter (EPEL) - `htop` — interactive process monitoring (EPEL) - `btop` — resource monitor with graphical interface (EPEL) - timezone: `Europe/Istanbul` - hostname setup - keyboard layout: `trq` (Turkish Q) - chrony/NTP active ## Security Hardening Role Applied to all prod nodes: - SSH password auth is disabled. - Root SSH login via password is disabled (`PermitRootLogin prohibit-password`); key-based root login remains active so Ansible can connect throughout the bootstrap. - Only SSH key auth remains. - `PermitEmptyPasswords no` - `MaxAuthTries 3` - `fail2ban` is enabled. - Automatic security updates are enabled with `dnf-automatic`. - The `iklim` system user is created and added to the `wheel` group; the password is read from vault. - `firewalld` default: incoming deny (drop zone), outgoing allow. - The SSH rule is first written as a rich rule to the `drop` zone, then the default zone is set to `drop`. - SSH is opened only from the admin CIDR. - DB ports are not opened publicly. The Hetzner Cloud Firewall is considered the actual perimeter. firewalld is the second defense layer on the host. ## Docker Role Required on all prod nodes, both app and db. Because DB nodes join the network as Swarm Workers, Docker Engine must be installed on every machine. Packages to install: - `docker-ce` - `docker-ce-cli` - `containerd.io` - `docker-buildx-plugin` - `docker-compose-plugin` Installation will be done through the official Docker dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`). ## Swarm Role Prod Swarm will be set up with 3 managers: 1. `docker swarm init` on `iklim-app-01` (Advertise/data path addr: `10.20.10.11`) 2. `iklim-app-02` and `iklim-app-03` join as managers. 3. `iklim-db-01/02/03` join as workers. 4. `iklimco-net` is not created by the Ansible swarm role. It is created and owned by the Swarm stack (`docker-stack-infra_db-prod.yml`) so Docker embedded DNS works for service VIPs and aliases. 5. Node labels: - `iklim-app-*` -> `type=service` - `iklim-db-*` -> `role=db` - `iklim-db-*` -> `db-index=01/02/03`, for Patroni node coordination 6. All nodes remain `AVAILABILITY=Active`. Labeling is intentionally split across two automation layers: - The shared `swarm` role adds the generic environment labels: `type=service` on app nodes and `role=db` on DB nodes. - The production playbook adds `db-index=01/02/03` through `iklim-app-01` in a separate play inside `prod-bootstrap.yml`. This split keeps the common Swarm role reusable while letting prod add the Patroni/MongoDB coordination labels it needs. ## Node Directory Role On all `iklim-app-*` nodes: ```text /opt/iklimco/ssl ``` Vault data is managed by the `docker-stack-vault.yml` stack through Docker volumes. The app nodes need the local SSL directory because `cert-distributor` syncs certificates from StorageBox into `/opt/iklimco/ssl` for Vault. On DB nodes: ```text /opt/iklimco/db /opt/iklimco/backup /opt/iklimco/db/mongodb /opt/iklimco/db/postgresql ``` ## StorageBox DAVFS Mount Role Applied to every node, all `iklim-app-*` and `iklim-db-*`. ### Prod Sub-Account | Parameter | Variable | Value | | --- | --- | --- | | Main account | `storagebox_account` | `u469968` | | Sub-account | `storagebox_user` | `u469968-sub5` | | WebDAV URL | `storagebox_url` | `https://u469968-sub5.your-storagebox.de/` | | Mount point | `storagebox_mount_point` | `/mnt/storagebox` | ## StorageBox SSH Key Role Applied to every node. The `/root/.ssh/id_ed25519_storagebox` ed25519 key pair is generated on the server. Uploading the generated public key to the StorageBox main account (SSH authorized_keys) is a separate manual step: ```bash # For each node: cat /root/.ssh/id_ed25519_storagebox.pub | \ ssh -p 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de \ "cat >> .ssh/authorized_keys" ``` ## Act Runner Role Applied to `iklim-app-*` nodes. Gitea Act Runner is installed on each app node and started as a systemd service. In prod, the runner runs on 3 app nodes; the deploy pipeline can be triggered on any of these runners. ## DB Stack Role Applied to `iklim-db-*` nodes. On each DB node, it creates `/opt/iklimco/db`, `/opt/iklimco/backup`, `/opt/iklimco/db/mongodb`, and `/opt/iklimco/db/postgresql`. The production configuration, including node-specific `mongod.conf`, replica set auth key, and Patroni configurations, is deployed by the Ansible `db_stack` role to StorageBox at `/mnt/storagebox/db/mongodb-0X/config/` and `/mnt/storagebox/db/postgresql-0X/config/`. etcd data is stored on local Docker named volumes. ## DB Stack Env Variables Password variables required by the prod infra stack (`docker-stack-infra_db-prod.yml`) — including `DATABASE_POSTGRES_ROOT_PASSWD`, `DATABASE_POSTGRES_REPLICATOR_PASSWORD`, `DATABASE_MONGODB_ROOT_PASSWD`, and `ETCD_ROOT_PASSWORD` — are stored in `prod/secrets/iklim.co/.env.secrets.shared` on StorageBox, alongside the other shared secrets. No separate file is needed. ## StorageBox Directory Structure The `storagebox` Ansible rolü `storagebox_managed_directories` (`group_vars/all/vars.yml`) aracılığıyla aşağıdaki dizinleri bootstrap sırasında **otomatik** oluşturur. Manüel adım gerekmez: - `/mnt/storagebox/ssl` → `SWAG_CERT_DIR` - `/mnt/storagebox/swag` - `/mnt/storagebox/swag/dns-conf` → `SWAG_DNS_CONFIG_DIR` - `/mnt/storagebox/swag/site-confs` → `SWAG_SITE_CONFS_DIR` - `/mnt/storagebox/swag/proxy-confs` → `SWAG_PROXY_CONFS_DIR` - `/mnt/storagebox/swag/certbot` - `/mnt/storagebox/grafana/data` → `GRAFANA_DATA_DIR` - `/mnt/storagebox/precipitation/images` StorageBox tüm app node'larında `/mnt/storagebox` olarak mount edildiğinden dizinler yalnızca bir kez oluşturulur; tüm node'lar ortaklaşa erişir. Prometheus yerel Docker named volume kullanır, StorageBox değil. ## Swarm Setup Verification After bootstrap, check the Swarm status with the following commands: ```bash # 6 nodes: 3 managers (Leader/Reachable), 3 workers (Ready) docker node ls # App node label docker node inspect iklim-app-01 --format '{{.Spec.Labels}}' # Expected: map[type:service] # DB node label docker node inspect iklim-db-01 --format '{{.Spec.Labels}}' # Expected: map[db-index:01 role:db] # swarm-init.sh idempotency — do not attempt init again in an already active Swarm grep -n "swarm init\|swarm join" init/swarm-init.sh ``` ## Acceptance Criteria - `ansible all -m ping` succeeds. - 3 Swarm manager nodes appear as Leader/Reachable in `docker node ls`. - 3 DB nodes appear as Workers in `docker node ls`. - Manager quorum is provided: 3 managers, 1 loss tolerated. - The `iklimco-net` overlay network is created by the Swarm stack after `docker-stack-infra_db-prod.yml` deploy. - Node labels (`type=service`, `role=db`, `db-index=01/02/03`) are verified with inspect. - `swarm-init.sh` does not attempt init again in an active Swarm; it is idempotent. - `/mnt/storagebox` is mounted on every node. - The `/opt/iklimco/ssl` directory exists on every app node. - The `db`, `ssl`, `swag`, `swag/dns-conf`, `swag/site-confs`, `swag/proxy-confs`, `swag/certbot`, `grafana/data`, and `precipitation/images` directories exist on StorageBox. - The Gitea Act Runner service is running on every app node. - `/opt/iklimco/db` and `/opt/iklimco/backup` directories exist on DB nodes. Node-specific `mongod.conf` and other DB configurations are created on StorageBox (`/mnt/storagebox/db/...`) in the `08-prod-db-cluster-setup.md` step. - Public firewall allows only `22`, `80`, and `443` ingress.