Document and commit the production bootstrap state after the initial Hetzner and Ansible rollout. - switch Ansible prod runbooks to use the shared vault password file - record production admin CIDRs, SSH key path, encrypted group vault, and encrypted per-host vault files - add generated production inventory and the prod setup history notes from the first bootstrap - keep root password login disabled while preserving key-based root access for Ansible bootstrap continuity - document separate Hetzner projects and tokens for test/prod and commit the prod provider lock file - remove the private Redis firewall allowance from the prod Terraform firewall and matching setup docs
318 lines
10 KiB
Markdown
318 lines
10 KiB
Markdown
# 07 - Prod Ansible Bootstrap
|
||
|
||
The purpose of this phase is to prepare the prod machines created by Terraform for Linux, security hardening, Docker, and Swarm. DB cluster software is not installed by this playbook; however, DB nodes join Swarm as workers.
|
||
|
||
## Ansible Installation
|
||
|
||
Ansible must be installed on the control machine, meaning your own computer. No agent is installed on target servers; SSH access is enough.
|
||
|
||
### Installation by Operating System
|
||
|
||
- **Ubuntu / Debian:**
|
||
```bash
|
||
sudo apt update
|
||
sudo apt install -y pipx python3-venv
|
||
|
||
pipx ensurepath
|
||
export PATH="$HOME/.local/bin:$PATH"
|
||
|
||
pipx install --include-deps ansible
|
||
pipx install ansible-lint
|
||
```
|
||
|
||
- **Fedora / Rocky Linux / RHEL:**
|
||
```bash
|
||
sudo dnf install -y pipx python3-virtualenv
|
||
|
||
pipx ensurepath
|
||
export PATH="$HOME/.local/bin:$PATH"
|
||
|
||
pipx install --include-deps ansible
|
||
pipx install ansible-lint
|
||
```
|
||
|
||
- **macOS (Homebrew):**
|
||
```bash
|
||
brew install ansible
|
||
```
|
||
|
||
- **With Python Pip, on any platform:**
|
||
```bash
|
||
pipx install --include-deps ansible
|
||
pipx install ansible-lint
|
||
```
|
||
|
||
### Additional Python Dependencies
|
||
|
||
`passlib` is required on the control machine for the `password_hash` filter:
|
||
|
||
```bash
|
||
pipx inject ansible passlib
|
||
```
|
||
|
||
> If you installed with `pip`: `pip install passlib`
|
||
|
||
### Verify the Installation
|
||
|
||
Whichever method you used to install it, use the following commands to verify that the installation succeeded:
|
||
|
||
```bash
|
||
# Check the Ansible version and configuration paths
|
||
ansible --version
|
||
|
||
# Check which location the Ansible binary is running from
|
||
which -a ansible
|
||
```
|
||
|
||
## Running Ansible Commands
|
||
|
||
All commands must be run from the `ansible/prod/` directory. `ansible.cfg` automatically defines the inventory and `roles_path`.
|
||
|
||
### 0. Install Required Collections Once During Initial Setup
|
||
|
||
```bash
|
||
ansible-galaxy collection install -r ../requirements.yml
|
||
```
|
||
|
||
### 1. Connection Test (Ping)
|
||
|
||
```bash
|
||
ansible all -m ping
|
||
```
|
||
|
||
### 2. Run the Bootstrap Playbook
|
||
|
||
```bash
|
||
ansible-playbook prod-bootstrap.yml --ask-vault-pass
|
||
```
|
||
|
||
*Note: The `--ask-vault-pass` parameter asks for the Ansible Vault password; the StorageBox password is decrypted this way.*
|
||
|
||
### 3. Run Only a Specific Role (Tags)
|
||
|
||
```bash
|
||
ansible-playbook prod-bootstrap.yml --tags "hardening" --ask-vault-pass
|
||
```
|
||
|
||
## Target Machines
|
||
|
||
| Host | Role |
|
||
| --- | --- |
|
||
| `iklim-app-01` | Swarm manager + app worker |
|
||
| `iklim-app-02` | Swarm manager + app worker |
|
||
| `iklim-app-03` | Swarm manager + app worker |
|
||
| `iklim-db-01` | Manual DB cluster node |
|
||
| `iklim-db-02` | Manual DB cluster node |
|
||
| `iklim-db-03` | Manual DB cluster node |
|
||
|
||
## Recommended File Structure
|
||
|
||
```text
|
||
ansible/
|
||
prod/
|
||
ansible.cfg
|
||
inventory/
|
||
generated/
|
||
prod.yml
|
||
group_vars/
|
||
all/
|
||
vars.yml
|
||
vault.yml
|
||
prod-bootstrap.yml
|
||
roles/
|
||
base/
|
||
hardening/
|
||
docker/
|
||
swarm/
|
||
node_dirs/
|
||
storagebox/
|
||
storagebox_ssh_key/
|
||
act_runner/
|
||
db_stack/
|
||
```
|
||
|
||
## Base Role
|
||
|
||
Applied to all prod nodes:
|
||
|
||
- Package cache update
|
||
- `epel-release` — installed first as a separate task; `fail2ban`, `davfs2`, `htop`, and `btop` depend on this repo
|
||
- base packages, after `epel-release` is active:
|
||
- `curl`
|
||
- `wget`
|
||
- `git`
|
||
- `jq`
|
||
- `tar`
|
||
- `unzip`
|
||
- `bash-completion`
|
||
- `gettext` — required for envsubst in CI/CD deploy pipelines
|
||
- `tree`
|
||
- `ca-certificates`
|
||
- `fail2ban`
|
||
- `chrony`
|
||
- `python3`
|
||
- `python3-pip`
|
||
- `python3-passlib` — for the `password_hash` filter (EPEL)
|
||
- `htop` — interactive process monitoring (EPEL)
|
||
- `btop` — resource monitor with graphical interface (EPEL)
|
||
- timezone: `Europe/Istanbul`
|
||
- hostname setup
|
||
- keyboard layout: `trq` (Turkish Q)
|
||
- chrony/NTP active
|
||
|
||
## Security Hardening Role
|
||
|
||
Applied to all prod nodes:
|
||
|
||
- SSH password auth is disabled.
|
||
- Root SSH login via password is disabled (`PermitRootLogin prohibit-password`); key-based root login remains active so Ansible can connect throughout the bootstrap.
|
||
- Only SSH key auth remains.
|
||
- `PermitEmptyPasswords no`
|
||
- `MaxAuthTries 3`
|
||
- `fail2ban` is enabled.
|
||
- Automatic security updates are enabled with `dnf-automatic`.
|
||
- The `iklim` system user is created and added to the `wheel` group; the password is read from vault.
|
||
- `firewalld` default: incoming deny (drop zone), outgoing allow.
|
||
- The SSH rule is first written as a rich rule to the `drop` zone, then the default zone is set to `drop`.
|
||
- SSH is opened only from the admin CIDR.
|
||
- DB ports are not opened publicly.
|
||
|
||
The Hetzner Cloud Firewall is considered the actual perimeter. firewalld is the second defense layer on the host.
|
||
|
||
## Docker Role
|
||
|
||
Required on all prod nodes, both app and db. Because DB nodes join the network as Swarm Workers, Docker Engine must be installed on every machine.
|
||
|
||
Packages to install:
|
||
|
||
- `docker-ce`
|
||
- `docker-ce-cli`
|
||
- `containerd.io`
|
||
- `docker-buildx-plugin`
|
||
- `docker-compose-plugin`
|
||
|
||
Installation will be done through the official Docker dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`).
|
||
|
||
## Swarm Role
|
||
|
||
Prod Swarm will be set up with 3 managers:
|
||
|
||
1. `docker swarm init` on `iklim-app-01` (Advertise/data path addr: `10.20.10.11`)
|
||
2. `iklim-app-02` and `iklim-app-03` join as managers.
|
||
3. `iklim-db-01/02/03` join as workers.
|
||
4. Overlay network is created: `iklimco-net`
|
||
5. Node labels:
|
||
- `iklim-app-*` -> `type=service`
|
||
- `iklim-db-*` -> `role=db`, `db-index=01/02/03`, for Patroni node coordination
|
||
6. All nodes remain `AVAILABILITY=Active`.
|
||
|
||
The `db-index` labels are added through `iklim-app-01` in a separate play inside `prod-bootstrap.yml`, not by the swarm role.
|
||
|
||
## Node Directory Role
|
||
|
||
On all `iklim-app-*` nodes:
|
||
```text
|
||
/opt/iklimco/ssl
|
||
/opt/iklimco/init
|
||
/opt/iklimco/stacks
|
||
/opt/iklimco/vault/data
|
||
```
|
||
|
||
`/opt/iklimco/vault/data` is the host path volume of the Vault Raft node; it must be created separately on every app node. Swarm does not manage this directory as an overlay volume; if it is missing, the Vault container will not start.
|
||
|
||
On DB nodes:
|
||
```text
|
||
/opt/iklimco/db
|
||
/opt/iklimco/backup
|
||
```
|
||
|
||
## StorageBox DAVFS Mount Role
|
||
|
||
Applied to every node, all `iklim-app-*` and `iklim-db-*`.
|
||
|
||
### Prod Sub-Account
|
||
|
||
| Parameter | Variable | Value |
|
||
| --- | --- | --- |
|
||
| Main account | `storagebox_account` | `u469968` |
|
||
| Sub-account | `storagebox_user` | `u469968-sub5` |
|
||
| WebDAV URL | `storagebox_url` | `https://u469968-sub5.your-storagebox.de/` |
|
||
| Mount point | `storagebox_mount_point` | `/mnt/storagebox` |
|
||
|
||
## StorageBox SSH Key Role
|
||
|
||
Applied to every node. The `/root/.ssh/id_ed25519_storagebox` ed25519 key pair is generated on the server. Uploading the generated public key to the StorageBox main account (SSH authorized_keys) is a separate manual step:
|
||
|
||
```bash
|
||
# For each node:
|
||
cat /root/.ssh/id_ed25519_storagebox.pub | \
|
||
ssh -p 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de \
|
||
"cat >> .ssh/authorized_keys"
|
||
```
|
||
|
||
## Act Runner Role
|
||
|
||
Applied to `iklim-app-*` nodes. Gitea Act Runner is installed on each app node and started as a systemd service. In prod, the runner runs on 3 app nodes; the deploy pipeline can be triggered on any of these runners.
|
||
|
||
## DB Stack Role
|
||
|
||
Applied to `iklim-db-*` nodes. On each DB node, it creates `/opt/iklimco/db` and `/opt/iklimco/backup` directories, as well as a local reference directory for MongoDB. The actual production configuration, including node-specific `mongod.conf`, replica set auth key, and Patroni configurations, is set up on StorageBox at `/mnt/storagebox/db/mongodb-0X/config/` and `/mnt/storagebox/db/postgresql-0X/config/` in the `08-prod-db-cluster-kurulum.md` step. etcd data is stored on local Docker named volumes (not StorageBox).
|
||
|
||
## /opt/iklimco/stacks/.env
|
||
|
||
Password variables required by the DB cluster stacks are stored in the `/opt/iklimco/stacks/.env` file. This file is stored on StorageBox as `prod/secrets/iklim.co/.env.stacks`. Before the first deploy, it is fetched on `iklim-app-01` with the following command:
|
||
|
||
```bash
|
||
scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \
|
||
/opt/iklimco/stacks/.env
|
||
chmod 600 /opt/iklimco/stacks/.env
|
||
```
|
||
|
||
## StorageBox Directory Structure
|
||
|
||
The `storagebox` Ansible rolü `storagebox_managed_directories` (`group_vars/all/vars.yml`) aracılığıyla aşağıdaki dizinleri bootstrap sırasında **otomatik** oluşturur. Manüel adım gerekmez:
|
||
|
||
- `/mnt/storagebox/ssl` → `SWAG_CERT_DIR`
|
||
- `/mnt/storagebox/swag/config` → `SWAG_CONFIG_DIR`
|
||
- `/mnt/storagebox/swag/site-confs` → `SWAG_SITE_CONFS_DIR`
|
||
- `/mnt/storagebox/grafana/data` → `GRAFANA_DATA_DIR`
|
||
- `/mnt/storagebox/precipitation/images`
|
||
|
||
StorageBox tüm app node'larında `/mnt/storagebox` olarak mount edildiğinden dizinler yalnızca bir kez oluşturulur; tüm node'lar ortaklaşa erişir. Prometheus yerel Docker named volume kullanır, StorageBox değil.
|
||
|
||
## Swarm Setup Verification
|
||
|
||
After bootstrap, check the Swarm status with the following commands:
|
||
|
||
```bash
|
||
# 6 nodes: 3 managers (Leader/Reachable), 3 workers (Ready)
|
||
docker node ls
|
||
|
||
# App node label
|
||
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
|
||
# Expected: map[type:service]
|
||
|
||
# DB node label
|
||
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
|
||
# Expected: map[db-index:01 role:db]
|
||
|
||
# swarm-init.sh idempotency — do not attempt init again in an already active Swarm
|
||
grep -n "swarm init\|swarm join" init/swarm-init.sh
|
||
```
|
||
|
||
## Acceptance Criteria
|
||
|
||
- `ansible all -m ping` succeeds.
|
||
- 3 Swarm manager nodes appear as Leader/Reachable in `docker node ls`.
|
||
- 3 DB nodes appear as Workers in `docker node ls`.
|
||
- Manager quorum is provided: 3 managers, 1 loss tolerated.
|
||
- The `iklimco-net` overlay network exists.
|
||
- Node labels (`type=service`, `role=db`, `db-index=01/02/03`) are verified with inspect.
|
||
- `swarm-init.sh` does not attempt init again in an active Swarm; it is idempotent.
|
||
- `/mnt/storagebox` is mounted on every node.
|
||
- The `/opt/iklimco/vault/data` directory exists on every app node.
|
||
- The `ssl`, `swag/config`, `swag/site-confs`, `grafana/data`, and `precipitation/images` directories exist on StorageBox.
|
||
- The Gitea Act Runner service is running on every app node.
|
||
- `/opt/iklimco/db` and `/opt/iklimco/backup` directories exist on DB nodes. Node-specific `mongod.conf` and other DB configurations are created on StorageBox (`/mnt/storagebox/db/...`) in the `08-prod-db-cluster-kurulum.md` step.
|
||
- Public firewall allows only `22`, `80`, and `443` ingress.
|