- Anglicized setup and facts markdown file names for better consistency. - Updated 01-swarm-init-multinode.md to highlight Ansible automation of Swarm initialization and labeling. - Overhauled 03-infra-stack-changes.md to describe the single monolithic file strategy and reflect current Redis, RabbitMQ, and etcd cluster configurations. - Fixed minor overrides and typos in Patroni templates and Ansible bootstrap documents. - Restructured README and roadmap mapping to align with the renamed setup documents.
324 lines
11 KiB
Markdown
324 lines
11 KiB
Markdown
# 07 - Prod Ansible Bootstrap
|
||
|
||
The purpose of this phase is to prepare the prod machines created by Terraform for Linux, security hardening, Docker, and Swarm. DB cluster software is not installed by this playbook; however, DB nodes join Swarm as workers.
|
||
|
||
## Ansible Installation
|
||
|
||
Ansible must be installed on the control machine, meaning your own computer. No agent is installed on target servers; SSH access is enough.
|
||
|
||
### Installation by Operating System
|
||
|
||
- **Ubuntu / Debian:**
|
||
```bash
|
||
sudo apt update
|
||
sudo apt install -y pipx python3-venv
|
||
|
||
pipx ensurepath
|
||
export PATH="$HOME/.local/bin:$PATH"
|
||
|
||
pipx install --include-deps ansible
|
||
pipx install ansible-lint
|
||
```
|
||
|
||
- **Fedora / Rocky Linux / RHEL:**
|
||
```bash
|
||
sudo dnf install -y pipx python3-virtualenv
|
||
|
||
pipx ensurepath
|
||
export PATH="$HOME/.local/bin:$PATH"
|
||
|
||
pipx install --include-deps ansible
|
||
pipx install ansible-lint
|
||
```
|
||
|
||
- **macOS (Homebrew):**
|
||
```bash
|
||
brew install ansible
|
||
```
|
||
|
||
- **With Python Pip, on any platform:**
|
||
```bash
|
||
pipx install --include-deps ansible
|
||
pipx install ansible-lint
|
||
```
|
||
|
||
### Additional Python Dependencies
|
||
|
||
`passlib` is required on the control machine for the `password_hash` filter:
|
||
|
||
```bash
|
||
pipx inject ansible passlib
|
||
```
|
||
|
||
> If you installed with `pip`: `pip install passlib`
|
||
|
||
### Verify the Installation
|
||
|
||
Whichever method you used to install it, use the following commands to verify that the installation succeeded:
|
||
|
||
```bash
|
||
# Check the Ansible version and configuration paths
|
||
ansible --version
|
||
|
||
# Check which location the Ansible binary is running from
|
||
which -a ansible
|
||
```
|
||
|
||
## Running Ansible Commands
|
||
|
||
All commands must be run from the `ansible/prod/` directory. `ansible.cfg` automatically defines the inventory and `roles_path`.
|
||
|
||
### 0. Install Required Collections Once During Initial Setup
|
||
|
||
```bash
|
||
ansible-galaxy collection install -r ../requirements.yml
|
||
```
|
||
|
||
### 1. Connection Test (Ping)
|
||
|
||
```bash
|
||
ansible all -m ping
|
||
```
|
||
|
||
### 2. Run the Bootstrap Playbook
|
||
|
||
```bash
|
||
ansible-playbook prod-bootstrap.yml --ask-vault-pass
|
||
```
|
||
|
||
*Note: The `--ask-vault-pass` parameter asks for the Ansible Vault password; the StorageBox password is decrypted this way.*
|
||
|
||
### 3. Run Only a Specific Role (Tags)
|
||
|
||
```bash
|
||
ansible-playbook prod-bootstrap.yml --tags "hardening" --ask-vault-pass
|
||
```
|
||
|
||
## Target Machines
|
||
|
||
| Host | Role |
|
||
| --- | --- |
|
||
| `iklim-app-01` | Swarm manager + app worker |
|
||
| `iklim-app-02` | Swarm manager + app worker |
|
||
| `iklim-app-03` | Swarm manager + app worker |
|
||
| `iklim-db-01` | Manual DB cluster node |
|
||
| `iklim-db-02` | Manual DB cluster node |
|
||
| `iklim-db-03` | Manual DB cluster node |
|
||
|
||
## Recommended File Structure
|
||
|
||
```text
|
||
ansible/
|
||
prod/
|
||
ansible.cfg
|
||
inventory/
|
||
generated/
|
||
prod.yml
|
||
group_vars/
|
||
all/
|
||
vars.yml
|
||
vault.yml
|
||
prod-bootstrap.yml
|
||
roles/
|
||
db_stack/
|
||
roles/
|
||
base/
|
||
hardening/
|
||
docker/
|
||
swarm/
|
||
node_dirs/
|
||
storagebox/
|
||
storagebox_ssh_key/
|
||
act_runner/
|
||
db_stack/
|
||
```
|
||
|
||
`ansible/prod/ansible.cfg` sets `roles_path = roles:../roles`. Because of that ordering, `ansible/prod/roles/db_stack` is the production-specific role that is used by `prod-bootstrap.yml`; the shared `ansible/roles/db_stack` remains the common fallback/reference implementation. Production DB behavior that writes Patroni, MongoDB, and replica-set auth files to StorageBox belongs to the prod-local role.
|
||
|
||
## Base Role
|
||
|
||
Applied to all prod nodes:
|
||
|
||
- Package cache update
|
||
- `epel-release` — installed first as a separate task; `fail2ban`, `davfs2`, `htop`, and `btop` depend on this repo
|
||
- base packages, after `epel-release` is active:
|
||
- `curl`
|
||
- `wget`
|
||
- `git`
|
||
- `jq`
|
||
- `tar`
|
||
- `unzip`
|
||
- `bash-completion`
|
||
- `gettext` — required for envsubst in CI/CD deploy pipelines
|
||
- `tree`
|
||
- `ca-certificates`
|
||
- `fail2ban`
|
||
- `chrony`
|
||
- `python3`
|
||
- `python3-pip`
|
||
- `python3-passlib` — for the `password_hash` filter (EPEL)
|
||
- `htop` — interactive process monitoring (EPEL)
|
||
- `btop` — resource monitor with graphical interface (EPEL)
|
||
- timezone: `Europe/Istanbul`
|
||
- hostname setup
|
||
- keyboard layout: `trq` (Turkish Q)
|
||
- chrony/NTP active
|
||
|
||
## Security Hardening Role
|
||
|
||
Applied to all prod nodes:
|
||
|
||
- SSH password auth is disabled.
|
||
- Root SSH login via password is disabled (`PermitRootLogin prohibit-password`); key-based root login remains active so Ansible can connect throughout the bootstrap.
|
||
- Only SSH key auth remains.
|
||
- `PermitEmptyPasswords no`
|
||
- `MaxAuthTries 3`
|
||
- `fail2ban` is enabled.
|
||
- Automatic security updates are enabled with `dnf-automatic`.
|
||
- The `iklim` system user is created and added to the `wheel` group; the password is read from vault.
|
||
- `firewalld` default: incoming deny (drop zone), outgoing allow.
|
||
- The SSH rule is first written as a rich rule to the `drop` zone, then the default zone is set to `drop`.
|
||
- SSH is opened only from the admin CIDR.
|
||
- DB ports are not opened publicly.
|
||
|
||
The Hetzner Cloud Firewall is considered the actual perimeter. firewalld is the second defense layer on the host.
|
||
|
||
## Docker Role
|
||
|
||
Required on all prod nodes, both app and db. Because DB nodes join the network as Swarm Workers, Docker Engine must be installed on every machine.
|
||
|
||
Packages to install:
|
||
|
||
- `docker-ce`
|
||
- `docker-ce-cli`
|
||
- `containerd.io`
|
||
- `docker-buildx-plugin`
|
||
- `docker-compose-plugin`
|
||
|
||
Installation will be done through the official Docker dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`).
|
||
|
||
## Swarm Role
|
||
|
||
Prod Swarm will be set up with 3 managers:
|
||
|
||
1. `docker swarm init` on `iklim-app-01` (Advertise/data path addr: `10.20.10.11`)
|
||
2. `iklim-app-02` and `iklim-app-03` join as managers.
|
||
3. `iklim-db-01/02/03` join as workers.
|
||
4. `iklimco-net` is not created by the Ansible swarm role. It is created and owned by the Swarm stack (`docker-stack-infra_db-prod.yml`) so Docker embedded DNS works for service VIPs and aliases.
|
||
5. Node labels:
|
||
- `iklim-app-*` -> `type=service`
|
||
- `iklim-db-*` -> `role=db`
|
||
- `iklim-db-*` -> `db-index=01/02/03`, for Patroni node coordination
|
||
6. All nodes remain `AVAILABILITY=Active`.
|
||
|
||
Labeling is intentionally split across two automation layers:
|
||
|
||
- The shared `swarm` role adds the generic environment labels: `type=service` on app nodes and `role=db` on DB nodes.
|
||
- The production playbook adds `db-index=01/02/03` through `iklim-app-01` in a separate play inside `prod-bootstrap.yml`.
|
||
|
||
This split keeps the common Swarm role reusable while letting prod add the Patroni/MongoDB coordination labels it needs.
|
||
|
||
## Node Directory Role
|
||
|
||
On all `iklim-app-*` nodes:
|
||
```text
|
||
/opt/iklimco/ssl
|
||
```
|
||
|
||
Vault data is managed by the `docker-stack-vault.yml` stack through Docker volumes. The app nodes need the local SSL directory because `cert-distributor` syncs certificates from StorageBox into `/opt/iklimco/ssl` for Vault.
|
||
|
||
On DB nodes:
|
||
```text
|
||
/opt/iklimco/db
|
||
/opt/iklimco/backup
|
||
/opt/iklimco/db/mongodb
|
||
/opt/iklimco/db/postgresql
|
||
```
|
||
|
||
## StorageBox DAVFS Mount Role
|
||
|
||
Applied to every node, all `iklim-app-*` and `iklim-db-*`.
|
||
|
||
### Prod Sub-Account
|
||
|
||
| Parameter | Variable | Value |
|
||
| --- | --- | --- |
|
||
| Main account | `storagebox_account` | `u469968` |
|
||
| Sub-account | `storagebox_user` | `u469968-sub5` |
|
||
| WebDAV URL | `storagebox_url` | `https://u469968-sub5.your-storagebox.de/` |
|
||
| Mount point | `storagebox_mount_point` | `/mnt/storagebox` |
|
||
|
||
## StorageBox SSH Key Role
|
||
|
||
Applied to every node. The `/root/.ssh/id_ed25519_storagebox` ed25519 key pair is generated on the server. Uploading the generated public key to the StorageBox main account (SSH authorized_keys) is a separate manual step:
|
||
|
||
```bash
|
||
# For each node:
|
||
cat /root/.ssh/id_ed25519_storagebox.pub | \
|
||
ssh -p 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de \
|
||
"cat >> .ssh/authorized_keys"
|
||
```
|
||
|
||
## Act Runner Role
|
||
|
||
Applied to `iklim-app-*` nodes. Gitea Act Runner is installed on each app node and started as a systemd service. In prod, the runner runs on 3 app nodes; the deploy pipeline can be triggered on any of these runners.
|
||
|
||
## DB Stack Role
|
||
|
||
Applied to `iklim-db-*` nodes. On each DB node, it creates `/opt/iklimco/db`, `/opt/iklimco/backup`, `/opt/iklimco/db/mongodb`, and `/opt/iklimco/db/postgresql`. The production configuration, including node-specific `mongod.conf`, replica set auth key, and Patroni configurations, is deployed by the Ansible `db_stack` role to StorageBox at `/mnt/storagebox/db/mongodb-0X/config/` and `/mnt/storagebox/db/postgresql-0X/config/`. etcd data is stored on local Docker named volumes.
|
||
|
||
## DB Stack Env Variables
|
||
|
||
Password variables required by the prod infra stack (`docker-stack-infra_db-prod.yml`) — including `DATABASE_POSTGRES_ROOT_PASSWD`, `DATABASE_POSTGRES_REPLICATOR_PASSWORD`, `DATABASE_MONGODB_ROOT_PASSWD`, and `ETCD_ROOT_PASSWORD` — are stored in `prod/secrets/iklim.co/.env.secrets.shared` on StorageBox, alongside the other shared secrets. No separate file is needed.
|
||
|
||
## StorageBox Directory Structure
|
||
|
||
The `storagebox` Ansible rolü `storagebox_managed_directories` (`group_vars/all/vars.yml`) aracılığıyla aşağıdaki dizinleri bootstrap sırasında **otomatik** oluşturur. Manüel adım gerekmez:
|
||
|
||
- `/mnt/storagebox/ssl` → `SWAG_CERT_DIR`
|
||
- `/mnt/storagebox/swag`
|
||
- `/mnt/storagebox/swag/dns-conf` → `SWAG_DNS_CONFIG_DIR`
|
||
- `/mnt/storagebox/swag/site-confs` → `SWAG_SITE_CONFS_DIR`
|
||
- `/mnt/storagebox/swag/proxy-confs` → `SWAG_PROXY_CONFS_DIR`
|
||
- `/mnt/storagebox/swag/certbot`
|
||
- `/mnt/storagebox/grafana/data` → `GRAFANA_DATA_DIR`
|
||
- `/mnt/storagebox/precipitation/images`
|
||
|
||
StorageBox tüm app node'larında `/mnt/storagebox` olarak mount edildiğinden dizinler yalnızca bir kez oluşturulur; tüm node'lar ortaklaşa erişir. Prometheus yerel Docker named volume kullanır, StorageBox değil.
|
||
|
||
## Swarm Setup Verification
|
||
|
||
After bootstrap, check the Swarm status with the following commands:
|
||
|
||
```bash
|
||
# 6 nodes: 3 managers (Leader/Reachable), 3 workers (Ready)
|
||
docker node ls
|
||
|
||
# App node label
|
||
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
|
||
# Expected: map[type:service]
|
||
|
||
# DB node label
|
||
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
|
||
# Expected: map[db-index:01 role:db]
|
||
|
||
# swarm-init.sh idempotency — do not attempt init again in an already active Swarm
|
||
grep -n "swarm init\|swarm join" init/swarm-init.sh
|
||
```
|
||
|
||
## Acceptance Criteria
|
||
|
||
- `ansible all -m ping` succeeds.
|
||
- 3 Swarm manager nodes appear as Leader/Reachable in `docker node ls`.
|
||
- 3 DB nodes appear as Workers in `docker node ls`.
|
||
- Manager quorum is provided: 3 managers, 1 loss tolerated.
|
||
- The `iklimco-net` overlay network is created by the Swarm stack after `docker-stack-infra_db-prod.yml` deploy.
|
||
- Node labels (`type=service`, `role=db`, `db-index=01/02/03`) are verified with inspect.
|
||
- `swarm-init.sh` does not attempt init again in an active Swarm; it is idempotent.
|
||
- `/mnt/storagebox` is mounted on every node.
|
||
- The `/opt/iklimco/ssl` directory exists on every app node.
|
||
- The `db`, `ssl`, `swag`, `swag/dns-conf`, `swag/site-confs`, `swag/proxy-confs`, `swag/certbot`, `grafana/data`, and `precipitation/images` directories exist on StorageBox.
|
||
- The Gitea Act Runner service is running on every app node.
|
||
- `/opt/iklimco/db` and `/opt/iklimco/backup` directories exist on DB nodes. Node-specific `mongod.conf` and other DB configurations are created on StorageBox (`/mnt/storagebox/db/...`) in the `08-prod-db-cluster-setup.md` step.
|
||
- Public firewall allows only `22`, `80`, and `443` ingress.
|