- Updated roadmap (03-infra-stack-changes.md) to deprecate database proxies in prod. - Detailed direct subnet access via WireGuard for production developers. - Provided multi-host connection parameters for Patroni and MongoDB Replica Sets in setup guide (08-prod-db-cluster-kurulum.md). - Added environment comparison table to developer access guide.
326 lines
10 KiB
Markdown
326 lines
10 KiB
Markdown
# 07 - Prod Ansible Bootstrap
|
|
|
|
The purpose of this phase is to prepare the prod machines created by Terraform for Linux, security hardening, Docker, and Swarm. DB cluster software is not installed by this playbook; however, DB nodes join Swarm as workers.
|
|
|
|
## Ansible Installation
|
|
|
|
Ansible must be installed on the control machine, meaning your own computer. No agent is installed on target servers; SSH access is enough.
|
|
|
|
### Installation by Operating System
|
|
|
|
- **Ubuntu / Debian:**
|
|
```bash
|
|
sudo apt update
|
|
sudo apt install -y pipx python3-venv
|
|
|
|
pipx ensurepath
|
|
export PATH="$HOME/.local/bin:$PATH"
|
|
|
|
pipx install --include-deps ansible
|
|
pipx install ansible-lint
|
|
```
|
|
|
|
- **Fedora / Rocky Linux / RHEL:**
|
|
```bash
|
|
sudo dnf install -y pipx python3-virtualenv
|
|
|
|
pipx ensurepath
|
|
export PATH="$HOME/.local/bin:$PATH"
|
|
|
|
pipx install --include-deps ansible
|
|
pipx install ansible-lint
|
|
```
|
|
|
|
- **macOS (Homebrew):**
|
|
```bash
|
|
brew install ansible
|
|
```
|
|
|
|
- **With Python Pip, on any platform:**
|
|
```bash
|
|
pipx install --include-deps ansible
|
|
pipx install ansible-lint
|
|
```
|
|
|
|
### Additional Python Dependencies
|
|
|
|
`passlib` is required on the control machine for the `password_hash` filter:
|
|
|
|
```bash
|
|
pipx inject ansible passlib
|
|
```
|
|
|
|
> If you installed with `pip`: `pip install passlib`
|
|
|
|
### Verify the Installation
|
|
|
|
Whichever method you used to install it, use the following commands to verify that the installation succeeded:
|
|
|
|
```bash
|
|
# Check the Ansible version and configuration paths
|
|
ansible --version
|
|
|
|
# Check which location the Ansible binary is running from
|
|
which -a ansible
|
|
```
|
|
|
|
## Running Ansible Commands
|
|
|
|
All commands must be run from the `ansible/prod/` directory. `ansible.cfg` automatically defines the inventory and `roles_path`.
|
|
|
|
### 0. Install Required Collections Once During Initial Setup
|
|
|
|
```bash
|
|
ansible-galaxy collection install -r ../requirements.yml
|
|
```
|
|
|
|
### 1. Connection Test (Ping)
|
|
|
|
```bash
|
|
ansible all -m ping
|
|
```
|
|
|
|
### 2. Run the Bootstrap Playbook
|
|
|
|
```bash
|
|
ansible-playbook prod-bootstrap.yml --ask-vault-pass
|
|
```
|
|
|
|
*Note: The `--ask-vault-pass` parameter asks for the Ansible Vault password; the StorageBox password is decrypted this way.*
|
|
|
|
### 3. Run Only a Specific Role (Tags)
|
|
|
|
```bash
|
|
ansible-playbook prod-bootstrap.yml --tags "hardening" --ask-vault-pass
|
|
```
|
|
|
|
## Target Machines
|
|
|
|
| Host | Role |
|
|
| --- | --- |
|
|
| `iklim-app-01` | Swarm manager + app worker |
|
|
| `iklim-app-02` | Swarm manager + app worker |
|
|
| `iklim-app-03` | Swarm manager + app worker |
|
|
| `iklim-db-01` | Manual DB cluster node |
|
|
| `iklim-db-02` | Manual DB cluster node |
|
|
| `iklim-db-03` | Manual DB cluster node |
|
|
|
|
## Recommended File Structure
|
|
|
|
```text
|
|
ansible/
|
|
prod/
|
|
ansible.cfg
|
|
inventory/
|
|
generated/
|
|
prod.yml
|
|
group_vars/
|
|
all/
|
|
vars.yml
|
|
vault.yml
|
|
prod-bootstrap.yml
|
|
roles/
|
|
base/
|
|
hardening/
|
|
docker/
|
|
swarm/
|
|
node_dirs/
|
|
storagebox/
|
|
storagebox_ssh_key/
|
|
act_runner/
|
|
db_stack/
|
|
```
|
|
|
|
## Base Role
|
|
|
|
Applied to all prod nodes:
|
|
|
|
- Package cache update
|
|
- `epel-release` — installed first as a separate task; `fail2ban`, `davfs2`, `htop`, and `btop` depend on this repo
|
|
- base packages, after `epel-release` is active:
|
|
- `curl`
|
|
- `wget`
|
|
- `git`
|
|
- `jq`
|
|
- `tar`
|
|
- `unzip`
|
|
- `bash-completion`
|
|
- `gettext` — required for envsubst in CI/CD deploy pipelines
|
|
- `tree`
|
|
- `ca-certificates`
|
|
- `fail2ban`
|
|
- `chrony`
|
|
- `python3`
|
|
- `python3-pip`
|
|
- `python3-passlib` — for the `password_hash` filter (EPEL)
|
|
- `htop` — interactive process monitoring (EPEL)
|
|
- `btop` — resource monitor with graphical interface (EPEL)
|
|
- timezone: `Europe/Istanbul`
|
|
- hostname setup
|
|
- keyboard layout: `trq` (Turkish Q)
|
|
- chrony/NTP active
|
|
|
|
## Security Hardening Role
|
|
|
|
Applied to all prod nodes:
|
|
|
|
- SSH password auth is disabled.
|
|
- Root SSH login is disabled.
|
|
- Only SSH key auth remains.
|
|
- `PermitEmptyPasswords no`
|
|
- `MaxAuthTries 3`
|
|
- `fail2ban` is enabled.
|
|
- Automatic security updates are enabled with `dnf-automatic`.
|
|
- The `iklim` system user is created and added to the `wheel` group; the password is read from vault.
|
|
- `firewalld` default: incoming deny (drop zone), outgoing allow.
|
|
- The SSH rule is first written as a rich rule to the `drop` zone, then the default zone is set to `drop`.
|
|
- SSH is opened only from the admin CIDR.
|
|
- DB ports are not opened publicly.
|
|
|
|
The Hetzner Cloud Firewall is considered the actual perimeter. firewalld is the second defense layer on the host.
|
|
|
|
## Docker Role
|
|
|
|
Required on all prod nodes, both app and db. Because DB nodes join the network as Swarm Workers, Docker Engine must be installed on every machine.
|
|
|
|
Packages to install:
|
|
|
|
- `docker-ce`
|
|
- `docker-ce-cli`
|
|
- `containerd.io`
|
|
- `docker-buildx-plugin`
|
|
- `docker-compose-plugin`
|
|
|
|
Installation will be done through the official Docker dnf repository (`https://download.docker.com/linux/rhel/docker-ce.repo`).
|
|
|
|
## Swarm Role
|
|
|
|
Prod Swarm will be set up with 3 managers:
|
|
|
|
1. `docker swarm init` on `iklim-app-01` (Advertise/data path addr: `10.20.10.11`)
|
|
2. `iklim-app-02` and `iklim-app-03` join as managers.
|
|
3. `iklim-db-01/02/03` join as workers.
|
|
4. Overlay network is created: `iklimco-net`
|
|
5. Node labels:
|
|
- `iklim-app-*` -> `type=service`
|
|
- `iklim-db-*` -> `role=db`, `db-index=01/02/03`, for Patroni node coordination
|
|
6. All nodes remain `AVAILABILITY=Active`.
|
|
|
|
The `db-index` labels are added through `iklim-app-01` in a separate play inside `prod-bootstrap.yml`, not by the swarm role.
|
|
|
|
## Node Directory Role
|
|
|
|
On all `iklim-app-*` nodes:
|
|
```text
|
|
/opt/iklimco/ssl
|
|
/opt/iklimco/init
|
|
/opt/iklimco/stacks
|
|
/opt/iklimco/vault/data
|
|
```
|
|
|
|
`/opt/iklimco/vault/data` is the host path volume of the Vault Raft node; it must be created separately on every app node. Swarm does not manage this directory as an overlay volume; if it is missing, the Vault container will not start.
|
|
|
|
On DB nodes:
|
|
```text
|
|
/opt/iklimco/db
|
|
/opt/iklimco/backup
|
|
```
|
|
|
|
## StorageBox DAVFS Mount Role
|
|
|
|
Applied to every node, all `iklim-app-*` and `iklim-db-*`.
|
|
|
|
### Prod Sub-Account
|
|
|
|
| Parameter | Variable | Value |
|
|
| --- | --- | --- |
|
|
| Main account | `storagebox_account` | `u469968` |
|
|
| Sub-account | `storagebox_user` | `u469968-sub5` |
|
|
| WebDAV URL | `storagebox_url` | `https://u469968-sub5.your-storagebox.de/` |
|
|
| Mount point | `storagebox_mount_point` | `/mnt/storagebox` |
|
|
|
|
## StorageBox SSH Key Role
|
|
|
|
Applied to every node. The `/root/.ssh/id_ed25519_storagebox` ed25519 key pair is generated on the server. Uploading the generated public key to the StorageBox main account (SSH authorized_keys) is a separate manual step:
|
|
|
|
```bash
|
|
# For each node:
|
|
cat /root/.ssh/id_ed25519_storagebox.pub | \
|
|
ssh -p 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de \
|
|
"cat >> .ssh/authorized_keys"
|
|
```
|
|
|
|
## Act Runner Role
|
|
|
|
Applied to `iklim-app-*` nodes. Gitea Act Runner is installed on each app node and started as a systemd service. In prod, the runner runs on 3 app nodes; the deploy pipeline can be triggered on any of these runners.
|
|
|
|
## DB Stack Role
|
|
|
|
Applied to `iklim-db-*` nodes. On each DB node, it creates `/opt/iklimco/db` and `/opt/iklimco/backup` directories, as well as a local reference directory for MongoDB. The actual production configuration, including node-specific `mongod.conf`, replica set auth key, Patroni, and etcd configurations, is set up on StorageBox at `/mnt/storagebox/prod/db/mongodb-0X/config/`, `/mnt/storagebox/prod/db/postgresql-0X/config/`, and `/mnt/storagebox/prod/db/etcd-0X/data/` in the `08-prod-db-cluster-kurulum.md` step.
|
|
|
|
## /opt/iklimco/stacks/.env
|
|
|
|
Password variables required by the DB cluster stacks are stored in the `/opt/iklimco/stacks/.env` file. This file is stored on StorageBox as `prod/secrets/iklim.co/.env.stacks`. Before the first deploy, it is fetched on `iklim-app-01` with the following command:
|
|
|
|
```bash
|
|
scp -P 23 STORAGEBOX_USER@STORAGEBOX_USER.your-storagebox.de:prod/secrets/iklim.co/.env.stacks \
|
|
/opt/iklimco/stacks/.env
|
|
chmod 600 /opt/iklimco/stacks/.env
|
|
```
|
|
|
|
## StorageBox Directory Structure
|
|
|
|
After Ansible bootstrap is completed and before the infra stack is deployed, create the following directories on `iklim-app-01`; StorageBox must be mounted:
|
|
|
|
```bash
|
|
# SWAG certificate and configuration directories
|
|
mkdir -p /mnt/storagebox/ssl
|
|
mkdir -p /mnt/storagebox/swag/config
|
|
mkdir -p /mnt/storagebox/swag/site-confs
|
|
|
|
# Monitoring data directories; Grafana on StorageBox, Prometheus on local volume
|
|
mkdir -p /mnt/storagebox/grafana/data
|
|
mkdir -p /mnt/storagebox/prometheus/data
|
|
|
|
# Image directory for the precipitation service
|
|
mkdir -p /mnt/storagebox/precipitation/images
|
|
```
|
|
|
|
These directories match the `SWAG_CERT_DIR`, `SWAG_CONFIG_DIR`, `SWAG_SITE_CONFS_DIR`, `GRAFANA_DATA_DIR`, and `PROMETHEUS_DATA_DIR` variables in `env-prod/.env`. Because StorageBox is mounted at the same `/mnt/storagebox` path on all app nodes, these directories are created only once and all nodes access them commonly.
|
|
|
|
## Swarm Setup Verification
|
|
|
|
After bootstrap, check the Swarm status with the following commands:
|
|
|
|
```bash
|
|
# 6 nodes: 3 managers (Leader/Reachable), 3 workers (Ready)
|
|
docker node ls
|
|
|
|
# App node label
|
|
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
|
|
# Expected: map[type:service]
|
|
|
|
# DB node label
|
|
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
|
|
# Expected: map[db-index:01 role:db]
|
|
|
|
# swarm-init.sh idempotency — do not attempt init again in an already active Swarm
|
|
grep -n "swarm init\|swarm join" init/swarm-init.sh
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- `ansible all -m ping` succeeds.
|
|
- 3 Swarm manager nodes appear as Leader/Reachable in `docker node ls`.
|
|
- 3 DB nodes appear as Workers in `docker node ls`.
|
|
- Manager quorum is provided: 3 managers, 1 loss tolerated.
|
|
- The `iklimco-net` overlay network exists.
|
|
- Node labels (`type=service`, `role=db`, `db-index=01/02/03`) are verified with inspect.
|
|
- `swarm-init.sh` does not attempt init again in an active Swarm; it is idempotent.
|
|
- `/mnt/storagebox` is mounted on every node.
|
|
- The `/opt/iklimco/vault/data` directory exists on every app node.
|
|
- The `ssl`, `swag/config`, `swag/site-confs`, `grafana/data`, `prometheus/data`, and `precipitation/images` directories exist on StorageBox.
|
|
- The Gitea Act Runner service is running on every app node.
|
|
- `/opt/iklimco/db` and `/opt/iklimco/backup` directories exist on DB nodes. Node-specific `mongod.conf` and other DB configurations are created on StorageBox (`/mnt/storagebox/prod/db/...`) in the `08-prod-db-cluster-kurulum.md` step.
|
|
- Public firewall allows only `22`, `80`, and `443` ingress.
|