Environment_Infrastructure/setup/07-prod-ansible-bootstrap.md
Murat ÖZDEMİR 8780c7c05e docs(db): implement direct cluster access strategy for production
- Updated roadmap (03-infra-stack-changes.md) to deprecate database proxies in prod.
- Detailed direct subnet access via WireGuard for production developers.
- Provided multi-host connection parameters for Patroni and MongoDB Replica Sets in setup guide (08-prod-db-cluster-kurulum.md).
- Added environment comparison table to developer access guide.
2026-05-18 14:25:26 +03:00

326 lines
10 KiB
Markdown

# 07 - Prod Ansible Bootstrap
The purpose of this phase is to prepare the prod machines created by Terraform for Linux, security hardening, Docker, and Swarm. DB cluster software is not installed by this playbook; however, DB nodes join Swarm as workers.
## Ansible Installation
Ansible must be installed on the control machine, meaning your own computer. No agent is installed on target servers; SSH access is enough.
### Installation by Operating System
- **Ubuntu / Debian:**
```bash
sudo apt update
sudo apt install -y pipx python3-venv
pipx ensurepath
export PATH="$HOME/.local/bin:$PATH"
pipx install --include-deps ansible
pipx install ansible-lint
```
- **Fedora / Rocky Linux / RHEL:**
```bash
sudo dnf install -y pipx python3-virtualenv
pipx ensurepath
export PATH="$HOME/.local/bin:$PATH"
pipx install --include-deps ansible
pipx install ansible-lint
```
- **macOS (Homebrew):**
```bash
brew install ansible
```
- **With Python Pip, on any platform:**
```bash
pipx install --include-deps ansible
pipx install ansible-lint
```
### Additional Python Dependencies
`passlib` is required on the control machine for the `password_hash` filter:
```bash
pipx inject ansible passlib
```
> If you installed with `pip`: `pip install passlib`
### Verify the Installation
Whichever method you used to install it, use the following commands to verify that the installation succeeded:
```bash
# Check the Ansible version and configuration paths
ansible --version
# Check which location the Ansible binary is running from
which -a ansible
```
## Running Ansible Commands
All commands must be run from the `ansible/prod/` directory. `ansible.cfg` automatically defines the inventory and `roles_path`.
### 0. Install Required Collections Once During Initial Setup
```bash
ansible-galaxy collection install -r ../requirements.yml
```
### 1. Connection Test (Ping)
```bash
ansible all -m ping
```
### 2. Run the Bootstrap Playbook
```bash
ansible-playbook prod-bootstrap.yml --ask-vault-pass
```
*Note: The `--ask-vault-pass` parameter asks for the Ansible Vault password; the StorageBox password is decrypted this way.*
### 3. Run Only a Specific Role (Tags)
```bash
ansible-playbook prod-bootstrap.yml --tags "hardening" --ask-vault-pass
```
## Target Machines
| Host | Role |
| --- | --- |
| `iklim-app-01` | Swarm manager + app worker |
| `iklim-app-02` | Swarm manager + app worker |
| `iklim-app-03` | Swarm manager + app worker |
| `iklim-db-01` | Manual DB cluster node |
| `iklim-db-02` | Manual DB cluster node |
| `iklim-db-03` | Manual DB cluster node |
## Recommended File Structure
```text
ansible/
prod/
ansible.cfg
inventory/
generated/
prod.yml
group_vars/
all/
vars.yml
vault.yml
prod-bootstrap.yml
roles/
base/
hardening/
docker/
swarm/
node_dirs/
storagebox/
storagebox_ssh_key/
act_runner/
db_stack/
```
## Base Role
Applied to all prod nodes:
- Package cache update
- `epel-release` — installed first as a separate task; `fail2ban`, `davfs2`, `htop`, and `btop` depend on this repo
- base packages, after `epel-release` is active:
- `curl`
- `wget`
- `git`
- `jq`
- `tar`
- `unzip`
- `bash-completion`
- `gettext` — required for envsubst in CI/CD deploy pipelines
- `tree`
- `ca-certificates`
- `fail2ban`
- `chrony`
- `python3`
- `python3-pip`
- `python3-passlib` — for the `password_hash` filter (EPEL)
- `htop` — interactive process monitoring (EPEL)
- `btop` — resource monitor with graphical interface (EPEL)
- timezone: `Europe/Istanbul`
- hostname setup
- keyboard layout: `trq` (Turkish Q)
- chrony/NTP active
## Security Hardening Role
Applied to all prod nodes:
- SSH password auth is disabled.
- Root SSH login is disabled.
- Only SSH key auth remains.
- `PermitEmptyPasswords no`
- `MaxAuthTries 3`
- `fail2ban` is enabled.
- Automatic security updates are enabled with `dnf-automatic`.
- The `iklim` system user is created and added to the `wheel` group; the password is read from vault.
- `firewalld` default: incoming deny (drop zone), outgoing allow.
- The SSH rule is first written as a rich rule to the `drop` zone, then the default zone is set to `drop`.
- SSH is opened only from the admin CIDR.
- DB ports are not opened publicly.
The Hetzner Cloud Firewall is considered the actual perimeter. firewalld is the second defense layer on the host.
## Docker Role
Required on all prod nodes, both app and db. Because DB nodes join the network as Swarm Workers, Docker Engine must be installed on every machine.
Packages to install:
- `docker-ce`
- `docker-ce-cli`
- `containerd.io`
- `docker-buildx-plugin`
- `docker-compose-plugin`
Installation will be done through the official Docker dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`).
## Swarm Role
Prod Swarm will be set up with 3 managers:
1. `docker swarm init` on `iklim-app-01` (Advertise/data path addr: `10.20.10.11`)
2. `iklim-app-02` and `iklim-app-03` join as managers.
3. `iklim-db-01/02/03` join as workers.
4. Overlay network is created: `iklimco-net`
5. Node labels:
- `iklim-app-*` -> `type=service`
- `iklim-db-*` -> `role=db`, `db-index=01/02/03`, for Patroni node coordination
6. All nodes remain `AVAILABILITY=Active`.
The `db-index` labels are added through `iklim-app-01` in a separate play inside `prod-bootstrap.yml`, not by the swarm role.
## Node Directory Role
On all `iklim-app-*` nodes:
```text
/opt/iklimco/ssl
/opt/iklimco/init
/opt/iklimco/stacks
/opt/iklimco/vault/data
```
`/opt/iklimco/vault/data` is the host path volume of the Vault Raft node; it must be created separately on every app node. Swarm does not manage this directory as an overlay volume; if it is missing, the Vault container will not start.
On DB nodes:
```text
/opt/iklimco/db
/opt/iklimco/backup
```
## StorageBox DAVFS Mount Role
Applied to every node, all `iklim-app-*` and `iklim-db-*`.
### Prod Sub-Account
| Parameter | Variable | Value |
| --- | --- | --- |
| Main account | `storagebox_account` | `u469968` |
| Sub-account | `storagebox_user` | `u469968-sub5` |
| WebDAV URL | `storagebox_url` | `https://u469968-sub5.your-storagebox.de/` |
| Mount point | `storagebox_mount_point` | `/mnt/storagebox` |
## StorageBox SSH Key Role
Applied to every node. The `/root/.ssh/id_ed25519_storagebox` ed25519 key pair is generated on the server. Uploading the generated public key to the StorageBox main account (SSH authorized_keys) is a separate manual step:
```bash
# For each node:
cat /root/.ssh/id_ed25519_storagebox.pub | \
ssh -p 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de \
"cat >> .ssh/authorized_keys"
```
## Act Runner Role
Applied to `iklim-app-*` nodes. Gitea Act Runner is installed on each app node and started as a systemd service. In prod, the runner runs on 3 app nodes; the deploy pipeline can be triggered on any of these runners.
## DB Stack Role
Applied to `iklim-db-*` nodes. On each DB node, it creates `/opt/iklimco/db` and `/opt/iklimco/backup` directories, as well as a local reference directory for MongoDB. The actual production configuration, including node-specific `mongod.conf`, replica set auth key, Patroni, and etcd configurations, is set up on StorageBox at `/mnt/storagebox/prod/db/mongodb-0X/config/`, `/mnt/storagebox/prod/db/postgresql-0X/config/`, and `/mnt/storagebox/prod/db/etcd-0X/data/` in the `08-prod-db-cluster-kurulum.md` step.
## /opt/iklimco/stacks/.env
Password variables required by the DB cluster stacks are stored in the `/opt/iklimco/stacks/.env` file. This file is stored on StorageBox as `prod/secrets/iklim.co/.env.stacks`. Before the first deploy, it is fetched on `iklim-app-01` with the following command:
```bash
scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \
/opt/iklimco/stacks/.env
chmod 600 /opt/iklimco/stacks/.env
```
## StorageBox Directory Structure
After Ansible bootstrap is completed and before the infra stack is deployed, create the following directories on `iklim-app-01`; StorageBox must be mounted:
```bash
# SWAG certificate and configuration directories
mkdir -p /mnt/storagebox/ssl
mkdir -p /mnt/storagebox/swag/config
mkdir -p /mnt/storagebox/swag/site-confs
# Monitoring data directories; Grafana on StorageBox, Prometheus on local volume
mkdir -p /mnt/storagebox/grafana/data
mkdir -p /mnt/storagebox/prometheus/data
# Image directory for the precipitation service
mkdir -p /mnt/storagebox/precipitation/images
```
These directories match the `SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, `SWAG_SITE_CONFS_DIR`, `GRAFANA_DATA_DIR`, and `PROMETHEUS_DATA_DIR` variables in `env-prod/.env`. Because StorageBox is mounted at the same `/mnt/storagebox` path on all app nodes, these directories are created only once and all nodes access them commonly.
## Swarm Setup Verification
After bootstrap, check the Swarm status with the following commands:
```bash
# 6 nodes: 3 managers (Leader/Reachable), 3 workers (Ready)
docker node ls
# App node label
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
# Expected: map[type:service]
# DB node label
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
# Expected: map[db-index:01 role:db]
# swarm-init.sh idempotency — do not attempt init again in an already active Swarm
grep -n "swarm init\|swarm join" init/swarm-init.sh
```
## Acceptance Criteria
- `ansible all -m ping` succeeds.
- 3 Swarm manager nodes appear as Leader/Reachable in `docker node ls`.
- 3 DB nodes appear as Workers in `docker node ls`.
- Manager quorum is provided: 3 managers, 1 loss tolerated.
- The `iklimco-net` overlay network exists.
- Node labels (`type=service`, `role=db`, `db-index=01/02/03`) are verified with inspect.
- `swarm-init.sh` does not attempt init again in an active Swarm; it is idempotent.
- `/mnt/storagebox` is mounted on every node.
- The `/opt/iklimco/vault/data` directory exists on every app node.
- The `ssl`, `swag/config`, `swag/site-confs`, `grafana/data`, `prometheus/data`, and `precipitation/images` directories exist on StorageBox.
- The Gitea Act Runner service is running on every app node.
- `/opt/iklimco/db` and `/opt/iklimco/backup` directories exist on DB nodes. Node-specific `mongod.conf` and other DB configurations are created on StorageBox (`/mnt/storagebox/prod/db/...`) in the `08-prod-db-cluster-kurulum.md` step.
- Public firewall allows only `22`, `80`, and `443` ingress.