Integrate DB nodes into Swarm and refine prod service deployment
- Database nodes now join the Docker Swarm as workers with `role=db` labels, allowing Swarm to manage their dedicated services. - The `docker-stack-infra.yml` has been updated for production to focus solely on application-level infrastructure components. - Dedicated database services (PostgreSQL, MongoDB, Patroni-etcd) are now explicitly deployed in separate Swarm stacks on `iklim-db-XX` nodes. - Standardizes node naming conventions (`iklim-app-XX`, `iklim-db-XX`) across the production roadmap documentation. - Clarifies that the `etcd` service within `docker-stack-infra.yml` is exclusively for APISIX configuration, distinct from Patroni's etcd cluster.
This commit is contained in:
parent
720c79d460
commit
76f87aa2f9
@ -4,79 +4,103 @@
|
|||||||
- **Repo:** `iklim.co` root
|
- **Repo:** `iklim.co` root
|
||||||
- **Environment:** prod
|
- **Environment:** prod
|
||||||
- **Topology:**
|
- **Topology:**
|
||||||
- 3 × service nodes — all act as **Swarm managers AND app workers** (Raft quorum: 1 can fail)
|
- 3 × app nodes (`iklim-app-01/02/03`) — all act as **Swarm managers AND app workers** (Raft quorum: 1 can fail)
|
||||||
- 3 × DB nodes — **NOT part of Docker Swarm** (separate DB cluster, out of scope)
|
- 3 × DB nodes (`iklim-db-01/02/03`) — join Swarm as **workers** with `role=db` label; DB services are placed exclusively on them
|
||||||
|
- **Sizing:** app nodes are `cpx42`, DB nodes are `cpx32`; see `../../hetzner-sizing-report.md`
|
||||||
- All 6 nodes are in the same private network.
|
- All 6 nodes are in the same private network.
|
||||||
- Pipeline trigger: push to `prod-env` branch → Gitea runner on `prod-runner` (first service node).
|
- Pipeline trigger: push to `prod-env` branch → Gitea runner on `prod-runner` (first app node).
|
||||||
- Swarm has 3 nodes total; all are manager-eligible and carry workloads (no dedicated worker-only nodes).
|
- App Swarm managers: 3 nodes all manager-eligible and carry app workloads (no dedicated worker-only app nodes).
|
||||||
|
|
||||||
## Node labeling plan
|
## Node labeling plan
|
||||||
|
|
||||||
| Node | Role | Swarm role | Labels |
|
| Node | Role | Swarm role | Labels |
|
||||||
|------|------|------------|--------|
|
|------|------|------------|--------|
|
||||||
| service-1 | API services, SWAG, Vault | Manager + Worker | `type=service` |
|
| `iklim-app-01` | API services, SWAG, Vault | Manager + Worker | `type=service` |
|
||||||
| service-2 | API services replicas | Manager + Worker | `type=service` |
|
| `iklim-app-02` | API services replicas | Manager + Worker | `type=service` |
|
||||||
| service-3 | API services replicas | Manager + Worker | `type=service` |
|
| `iklim-app-03` | API services replicas | Manager + Worker | `type=service` |
|
||||||
|
| `iklim-db-01` | PostgreSQL (Patroni), etcd | Worker | `role=db` |
|
||||||
|
| `iklim-db-02` | PostgreSQL (Patroni), etcd | Worker | `role=db` |
|
||||||
|
| `iklim-db-03` | MongoDB replica + PostgreSQL (Patroni), etcd | Worker | `role=db` |
|
||||||
|
|
||||||
> DB nodes (`db-1/2/3`) are **not part of Docker Swarm**. They run as a separate cluster
|
## Step 1 — Init Swarm on iklim-app-01 (the prod-runner node)
|
||||||
> and are provisioned independently. No Swarm join or label step applies to them.
|
|
||||||
|
|
||||||
## Step 1 — Init Swarm on service-1 (the prod-runner node)
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
MANAGER_IP=$(hostname -I | awk '{print $1}')
|
MANAGER_IP=$(hostname -I | awk '{print $1}')
|
||||||
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
|
if ! docker info --format '{{.Swarm.LocalNodeState}}' | grep -q "active"; then
|
||||||
docker swarm init --advertise-addr "$MANAGER_IP"
|
docker swarm init --advertise-addr "$MANAGER_IP"
|
||||||
echo "✅ Swarm initialized on $MANAGER_IP"
|
echo "Swarm initialized on $MANAGER_IP"
|
||||||
else
|
else
|
||||||
echo "ℹ️ Swarm already active"
|
echo "Swarm already active"
|
||||||
fi
|
fi
|
||||||
```
|
```
|
||||||
|
|
||||||
## Step 2 — Get manager join token
|
## Step 2 — Get manager join token
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker swarm join-token manager # for service-2, service-3
|
docker swarm join-token manager # for iklim-app-02, iklim-app-03
|
||||||
```
|
```
|
||||||
|
|
||||||
Save this token — needed on service-2 and service-3.
|
Save this token — needed on iklim-app-02 and iklim-app-03.
|
||||||
|
|
||||||
## Step 3 — Join service-2 and service-3 as managers
|
## Step 3 — Join iklim-app-02 and iklim-app-03 as managers
|
||||||
|
|
||||||
SSH into service-2 and service-3, run:
|
SSH into iklim-app-02 and iklim-app-03, run:
|
||||||
```bash
|
```bash
|
||||||
docker swarm join --token <MANAGER_TOKEN> <service-1-ip>:2377
|
docker swarm join --token <MANAGER_TOKEN> 10.10.10.11:2377
|
||||||
```
|
```
|
||||||
|
|
||||||
## Step 4 — Label all Swarm nodes
|
## Step 4 — Label app nodes
|
||||||
|
|
||||||
On service-1, after service-2 and service-3 have joined:
|
On iklim-app-01, after iklim-app-02 and iklim-app-03 have joined:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
for node in service-1 service-2 service-3; do
|
for node in iklim-app-01 iklim-app-02 iklim-app-03; do
|
||||||
docker node update --label-add type=service "$node"
|
docker node update --label-add type=service "$node"
|
||||||
done
|
done
|
||||||
```
|
```
|
||||||
|
|
||||||
> Replace `service-1`, etc. with actual node hostnames shown in `docker node ls`.
|
## Step 5 — Join DB nodes as Swarm workers
|
||||||
> DB nodes are not in Swarm — no join or label step for them.
|
|
||||||
|
|
||||||
## Step 5 — Verify
|
Get the worker join token on iklim-app-01:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker swarm join-token worker
|
||||||
|
```
|
||||||
|
|
||||||
|
SSH into each DB node and join:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker swarm join --token <WORKER_TOKEN> 10.10.10.11:2377
|
||||||
|
```
|
||||||
|
|
||||||
|
Then label them on iklim-app-01:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for node in iklim-db-01 iklim-db-02 iklim-db-03; do
|
||||||
|
docker node update --label-add role=db "$node"
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
> DB nodes are Swarm **workers** only — they never become managers.
|
||||||
|
> DB services are pinned to them via `node.labels.role == db` placement constraint.
|
||||||
|
> See `08-prod-db-cluster-kurulum.md` for DB stack deployment.
|
||||||
|
|
||||||
|
## Step 6 — Verify
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker node ls
|
docker node ls
|
||||||
```
|
```
|
||||||
|
|
||||||
Expected: 3 nodes, all with `MANAGER STATUS` = `Leader` or `Reachable`.
|
Expected: 6 nodes — 3 with `MANAGER STATUS` = `Leader` or `Reachable`, 3 workers with `Ready`.
|
||||||
All 3 nodes remain in `AVAILABILITY=Active` (not drained) so they also carry workloads.
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker node inspect service-1 --format '{{.Spec.Labels}}'
|
docker node inspect iklim-app-01 --format '{{.Spec.Labels}}'
|
||||||
|
docker node inspect iklim-db-01 --format '{{.Spec.Labels}}'
|
||||||
```
|
```
|
||||||
|
|
||||||
Expected: `map[type:service]`.
|
Expected: `map[role:service]` for app nodes, `map[role:db]` for DB nodes.
|
||||||
|
|
||||||
## Step 6 — Confirm `init/swarm-init.sh` multi-node awareness
|
## Step 7 — Confirm `init/swarm-init.sh` multi-node awareness
|
||||||
|
|
||||||
The script is idempotent (skips init if already active). Verify:
|
The script is idempotent (skips init if already active). Verify:
|
||||||
|
|
||||||
@ -84,18 +108,17 @@ The script is idempotent (skips init if already active). Verify:
|
|||||||
grep -n "swarm init\|swarm join" init/swarm-init.sh
|
grep -n "swarm init\|swarm join" init/swarm-init.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
The prod pipeline runs on service-1 only. service-2/3 are joined via Ansible (`swarm` role),
|
The prod pipeline runs on iklim-app-01 only. iklim-app-02/03 are joined via Ansible (`swarm` role),
|
||||||
not via the Gitea pipeline.
|
not via the Gitea pipeline.
|
||||||
|
|
||||||
## Placement constraints used in `docker-stack-infra.yml`
|
## Placement constraints used in `docker-stack-infra.yml`
|
||||||
|
|
||||||
| Constraint | Resolves to |
|
| Constraint | Resolves to |
|
||||||
|------------|-------------|
|
|------------|-------------|
|
||||||
| `node.role == manager` | service-1, service-2, service-3 |
|
| `node.role == manager` | iklim-app-01, iklim-app-02, iklim-app-03 |
|
||||||
| `node.labels.type == service` | service-1, service-2, service-3 |
|
| `node.labels.type == service` | iklim-app-01, iklim-app-02, iklim-app-03 |
|
||||||
|
| `node.labels.role == db` | iklim-db-01, iklim-db-02, iklim-db-03 |
|
||||||
|
|
||||||
SWAG, Vault, cert-reloader: pinned to `node.role == manager`.
|
SWAG, Vault, cert-reloader: pinned to `node.role == manager`.
|
||||||
Microservices: no constraint (distributed across all 3 service nodes by Swarm scheduler).
|
Microservices: no constraint (distributed across all app nodes by Swarm scheduler).
|
||||||
|
DB services (Patroni, etcd, MongoDB): pinned to `node.labels.role == db` in separate DB stacks.
|
||||||
> `node.labels.type == db` constraint is **not used** — DB nodes are not in Swarm.
|
|
||||||
> PostgreSQL and MongoDB run outside Swarm as a separately managed cluster.
|
|
||||||
|
|||||||
@ -32,7 +32,7 @@ No additional action needed in the repo.
|
|||||||
|
|
||||||
## Step 3 — (Handled by pipeline) Write credentials file on prod host
|
## Step 3 — (Handled by pipeline) Write credentials file on prod host
|
||||||
|
|
||||||
The deploy pipeline (see `08-deploy-pipeline-update.md`) runs on service-1:
|
The deploy pipeline (see `08-deploy-pipeline-update.md`) runs on iklim-app-01:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
mkdir -p /opt/iklimco/swag/dns-conf
|
mkdir -p /opt/iklimco/swag/dns-conf
|
||||||
@ -42,16 +42,16 @@ chmod 600 /opt/iklimco/swag/dns-conf/godaddy.ini
|
|||||||
|
|
||||||
## Step 4 — GoDaddy A records for prod subdomains
|
## Step 4 — GoDaddy A records for prod subdomains
|
||||||
|
|
||||||
In GoDaddy DNS panel for `iklim.co`, add/update A records pointing to service-1's public IP:
|
In GoDaddy DNS panel for `iklim.co`, add/update A records pointing to iklim-app-01's public IP:
|
||||||
|
|
||||||
| Record | Value |
|
| Record | Value |
|
||||||
|--------|-------|
|
|--------|-------|
|
||||||
| `api` | `<service-1-public-ip>` |
|
| `api` | `<iklim-app-01-public-ip>` |
|
||||||
| `apigw` | `<service-1-public-ip>` |
|
| `apigw` | `<iklim-app-01-public-ip>` |
|
||||||
| `rabbitmq` | `<service-1-public-ip>` |
|
| `rabbitmq` | `<iklim-app-01-public-ip>` |
|
||||||
| `grafana` | `<service-1-public-ip>` |
|
| `grafana` | `<iklim-app-01-public-ip>` |
|
||||||
|
|
||||||
> Swarm's routing mesh means any node IP would work, but service-1 is the designated
|
> Swarm's routing mesh means any node IP would work, but iklim-app-01 is the designated
|
||||||
> entry point (runs SWAG). Using a single IP keeps DNS simple.
|
> entry point (runs SWAG). Using a single IP keeps DNS simple.
|
||||||
>
|
>
|
||||||
> For HA: add a load balancer or use Hetzner's floating IP in front of the 3 service nodes.
|
> For HA: add a load balancer or use Hetzner's floating IP in front of the 3 service nodes.
|
||||||
|
|||||||
@ -2,41 +2,22 @@
|
|||||||
|
|
||||||
## Context
|
## Context
|
||||||
- **File:** `docker-stack-infra.yml` (repo root — shared between test and prod)
|
- **File:** `docker-stack-infra.yml` (repo root — shared between test and prod)
|
||||||
- All changes from `test-env-setup/03-infra-stack-changes.md` apply here identically.
|
- All changes from `test-env/03-infra-stack-changes.md` apply here identically.
|
||||||
- **Additional prod-specific changes:**
|
- **Additional prod-specific changes:**
|
||||||
- PostgreSQL and MongoDB placement constraints point to `type=db` nodes.
|
- Microservices have no constraint (distributed across app nodes by Swarm).
|
||||||
- Microservices have no constraint (distributed across service nodes by Swarm).
|
|
||||||
- Replica counts for stateless services are increased.
|
- Replica counts for stateless services are increased.
|
||||||
|
- **Note:** PostgreSQL and MongoDB are **not** in `docker-stack-infra.yml` for prod. They run on
|
||||||
|
dedicated DB nodes in separate stacks (`iklim-db` and `iklim-patroni`). See `08-prod-db-cluster-kurulum.md`.
|
||||||
|
|
||||||
## Step 1 — Apply all test-env changes first
|
## Step 1 — Apply all test-env changes first
|
||||||
|
|
||||||
Follow every step in `test-env-setup/03-infra-stack-changes.md`:
|
Follow every step in `test-env/03-infra-stack-changes.md`:
|
||||||
- Add `swag` service
|
- Add `swag` service
|
||||||
- Add `cert-reloader` service
|
- Add `cert-reloader` service
|
||||||
- Remove published ports for vault, apisix, rabbitmq, prometheus, grafana, apisix-dashboard
|
- Remove published ports for vault, apisix, rabbitmq, prometheus, grafana, apisix-dashboard
|
||||||
- Add `swag-vl` volume
|
- Add `swag-vl` volume
|
||||||
|
|
||||||
## Step 2 — Update PostgreSQL placement constraint
|
## Step 2 — Pin Vault to manager node (initial prod — single instance)
|
||||||
|
|
||||||
Change `postgres` service placement to use the `type=db` label:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# CHANGE in postgres service:
|
|
||||||
placement:
|
|
||||||
constraints:
|
|
||||||
- node.labels.type == db
|
|
||||||
```
|
|
||||||
|
|
||||||
## Step 3 — Update MongoDB placement constraint
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# CHANGE in mongo service:
|
|
||||||
placement:
|
|
||||||
constraints:
|
|
||||||
- node.labels.type == db
|
|
||||||
```
|
|
||||||
|
|
||||||
## Step 4 — Pin Vault to manager node (initial prod — single instance)
|
|
||||||
|
|
||||||
Vault starts as a single instance pinned to the manager node.
|
Vault starts as a single instance pinned to the manager node.
|
||||||
Raft cluster migration is handled separately in `07-vault-raft-plan.md`.
|
Raft cluster migration is handled separately in `07-vault-raft-plan.md`.
|
||||||
@ -48,7 +29,7 @@ Raft cluster migration is handled separately in `07-vault-raft-plan.md`.
|
|||||||
- node.role == manager
|
- node.role == manager
|
||||||
```
|
```
|
||||||
|
|
||||||
## Step 5 — Increase APISIX replicas for prod
|
## Step 3 — Increase APISIX replicas for prod
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# CHANGE in apisix service deploy block:
|
# CHANGE in apisix service deploy block:
|
||||||
@ -59,40 +40,46 @@ Raft cluster migration is handled separately in `07-vault-raft-plan.md`.
|
|||||||
APISIX is stateless (config in etcd) — multiple replicas are safe.
|
APISIX is stateless (config in etcd) — multiple replicas are safe.
|
||||||
Swarm load-balances SWAG's requests across APISIX replicas via VIP.
|
Swarm load-balances SWAG's requests across APISIX replicas via VIP.
|
||||||
|
|
||||||
## Step 6 — etcd: 3-node cluster for prod
|
## Step 4 — etcd: single instance in docker-stack-infra.yml (APISIX config store only)
|
||||||
|
|
||||||
For prod, etcd should run as a 3-node cluster (minimum for Raft quorum).
|
The `etcd` service in `docker-stack-infra.yml` is used exclusively by APISIX as its configuration
|
||||||
The current single-instance etcd definition needs to be replaced with a 3-node
|
store. It runs as a single instance on a manager node and is separate from the etcd cluster used by
|
||||||
StatefulSet-style setup using separate service definitions or a dedicated
|
Patroni for PostgreSQL HA.
|
||||||
`docker-stack-etcd.yml`.
|
|
||||||
|
|
||||||
> **Scope note:** etcd clustering for prod is complex and out of scope for initial launch.
|
```yaml
|
||||||
> Deploy with single etcd for initial prod launch. Add etcd clustering as a follow-up task.
|
# etcd placement stays as:
|
||||||
> Track in: `Technical Debt/TODO.md`
|
placement:
|
||||||
|
constraints:
|
||||||
|
- node.role == manager
|
||||||
|
```
|
||||||
|
|
||||||
## Step 7 — Verify the complete file
|
> The 3-node etcd cluster for Patroni/PostgreSQL HA is deployed separately via `08-prod-db-cluster-kurulum.md`
|
||||||
|
> on the dedicated DB nodes. These are two independent etcd deployments with different purposes.
|
||||||
|
|
||||||
|
## Step 5 — Verify the complete file
|
||||||
|
|
||||||
After all edits, validate the YAML:
|
After all edits, validate the YAML:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker stack config -c docker-stack-infra.yml > /dev/null && echo "✅ YAML valid"
|
docker stack config -c docker-stack-infra.yml > /dev/null && echo "YAML valid"
|
||||||
```
|
```
|
||||||
|
|
||||||
No output errors = valid.
|
No output errors = valid.
|
||||||
|
|
||||||
## Placement summary for prod
|
## Placement summary for prod (docker-stack-infra.yml only)
|
||||||
|
|
||||||
| Service | Placement |
|
| Service | Placement |
|
||||||
|---------|-----------|
|
|---------|-----------|
|
||||||
| swag | `node.role == manager` |
|
| swag | `node.role == manager` |
|
||||||
| cert-reloader | `node.role == manager` |
|
| cert-reloader | `node.role == manager` |
|
||||||
| vault | `node.role == manager` |
|
| vault | `node.role == manager` |
|
||||||
| apisix (2 replicas) | no constraint (any node) |
|
| apisix (2 replicas) | no constraint (distributed across app nodes) |
|
||||||
| apisix-dashboard | no constraint |
|
| apisix-dashboard | no constraint |
|
||||||
| postgres | `node.labels.type == db` |
|
|
||||||
| mongo | `node.labels.type == db` |
|
|
||||||
| redis | `node.role == manager` |
|
| redis | `node.role == manager` |
|
||||||
| rabbitmq | `node.role == manager` |
|
| rabbitmq | `node.role == manager` |
|
||||||
| etcd | `node.role == manager` |
|
| etcd (APISIX store) | `node.role == manager` |
|
||||||
| prometheus | `node.role == manager` |
|
| prometheus | `node.role == manager` |
|
||||||
| grafana | `node.role == manager` |
|
| grafana | `node.role == manager` |
|
||||||
|
|
||||||
|
> PostgreSQL and MongoDB are deployed in separate DB stacks on `iklim-db-*` nodes.
|
||||||
|
> See `08-prod-db-cluster-kurulum.md` for those stacks.
|
||||||
|
|||||||
@ -48,7 +48,7 @@ will contain `server_name api.iklim.co;` — correct for prod.
|
|||||||
|
|
||||||
## Verification
|
## Verification
|
||||||
|
|
||||||
After deploy, on service-1:
|
After deploy, on iklim-app-01:
|
||||||
```bash
|
```bash
|
||||||
cat /opt/iklimco/swag/proxy-confs/api.conf | grep server_name
|
cat /opt/iklimco/swag/proxy-confs/api.conf | grep server_name
|
||||||
```
|
```
|
||||||
|
|||||||
@ -20,20 +20,20 @@ No cross-node distribution needed.
|
|||||||
|
|
||||||
## Future behavior (3-node Vault Raft — see step 07)
|
## Future behavior (3-node Vault Raft — see step 07)
|
||||||
|
|
||||||
When Vault runs on service-1, service-2, service-3:
|
When Vault runs on iklim-app-01, iklim-app-02, iklim-app-03:
|
||||||
|
|
||||||
```
|
```
|
||||||
cert-reloader detects cert change
|
cert-reloader detects cert change
|
||||||
→ copies cert to /opt/iklimco/ssl/ on service-1 (local)
|
→ copies cert to /opt/iklimco/ssl/ on iklim-app-01 (local)
|
||||||
→ SSH copy to service-2:/opt/iklimco/ssl/
|
→ SSH copy to iklim-app-02:/opt/iklimco/ssl/
|
||||||
→ SSH copy to service-3:/opt/iklimco/ssl/
|
→ SSH copy to iklim-app-03:/opt/iklimco/ssl/
|
||||||
→ docker service update --force iklimco_vault (restarts all 3 replicas)
|
→ docker service update --force iklimco_vault (restarts all 3 replicas)
|
||||||
```
|
```
|
||||||
|
|
||||||
This requires:
|
This requires:
|
||||||
- An SSH key that cert-reloader can use to reach service-2 and service-3
|
- An SSH key that cert-reloader can use to reach iklim-app-02 and iklim-app-03
|
||||||
- That key mounted as a Docker secret into cert-reloader
|
- That key mounted as a Docker secret into cert-reloader
|
||||||
- Known_hosts for service-2 and service-3 pre-configured
|
- Known_hosts for iklim-app-02 and iklim-app-03 pre-configured
|
||||||
|
|
||||||
Script update for this phase is tracked in `07-vault-raft-plan.md`.
|
Script update for this phase is tracked in `07-vault-raft-plan.md`.
|
||||||
|
|
||||||
|
|||||||
@ -1,7 +1,7 @@
|
|||||||
# 07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod)
|
# 07 — Vault: Initial Single Instance + Raft Cluster Migration Plan (Prod)
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
Vault starts as a single instance on the manager node (service-1) for the initial prod launch.
|
Vault starts as a single instance on the manager node (iklim-app-01) for the initial prod launch.
|
||||||
This matches the current `docker-stack-infra.yml` configuration (file storage, single replica).
|
This matches the current `docker-stack-infra.yml` configuration (file storage, single replica).
|
||||||
|
|
||||||
Raft HA cluster is planned for a later phase.
|
Raft HA cluster is planned for a later phase.
|
||||||
@ -9,8 +9,8 @@ Raft HA cluster is planned for a later phase.
|
|||||||
## Phase 1 — Initial prod launch (current)
|
## Phase 1 — Initial prod launch (current)
|
||||||
|
|
||||||
- **Replicas:** 1
|
- **Replicas:** 1
|
||||||
- **Storage:** file (`/vault/file`) on service-1
|
- **Storage:** file (`/vault/file`) on iklim-app-01
|
||||||
- **Placement:** `node.role == manager` (service-1)
|
- **Placement:** `node.role == manager` (iklim-app-01)
|
||||||
- **Cert:** from `/opt/iklimco/ssl/` (populated by cert-reloader from SWAG volume)
|
- **Cert:** from `/opt/iklimco/ssl/` (populated by cert-reloader from SWAG volume)
|
||||||
- **TLS:** `VAULT_LOCAL_CONFIG` unchanged — `api_addr: https://vault.iklim.co:8200`
|
- **TLS:** `VAULT_LOCAL_CONFIG` unchanged — `api_addr: https://vault.iklim.co:8200`
|
||||||
|
|
||||||
@ -22,14 +22,14 @@ No changes to `docker-stack-infra.yml` vault service for Phase 1.
|
|||||||
- **Replicas:** 3 (one per service node)
|
- **Replicas:** 3 (one per service node)
|
||||||
- **Storage:** Raft integrated (replaces file storage)
|
- **Storage:** Raft integrated (replaces file storage)
|
||||||
- **Placement:** `node.labels.type == service` (all 3 service nodes)
|
- **Placement:** `node.labels.type == service` (all 3 service nodes)
|
||||||
- **Cert distribution:** cert-reloader SSH-copies renewed cert to service-2, service-3
|
- **Cert distribution:** cert-reloader SSH-copies renewed cert to iklim-app-02, iklim-app-03
|
||||||
|
|
||||||
### Prerequisites before migration
|
### Prerequisites before migration
|
||||||
- [ ] All 3 service nodes are running and labeled `type=service`
|
- [ ] All 3 service nodes are running and labeled `type=service`
|
||||||
- [ ] Vault data backed up from Phase 1 (snapshot via `vault operator raft snapshot save`)
|
- [ ] Vault data backed up from Phase 1 (snapshot via `vault operator raft snapshot save`)
|
||||||
- [ ] SSH key created for cert-reloader to reach service-2 and service-3
|
- [ ] SSH key created for cert-reloader to reach iklim-app-02 and iklim-app-03
|
||||||
- [ ] SSH key stored as Docker secret `cert_reloader_ssh_key`
|
- [ ] SSH key stored as Docker secret `cert_reloader_ssh_key`
|
||||||
- [ ] `/opt/iklimco/ssl/` directory exists on service-2 and service-3
|
- [ ] `/opt/iklimco/ssl/` directory exists on iklim-app-02 and iklim-app-03
|
||||||
- [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes)
|
- [ ] Vault data directory `/opt/iklimco/vault/data/` exists on all 3 nodes (host path volumes)
|
||||||
|
|
||||||
### Vault service update for Raft
|
### Vault service update for Raft
|
||||||
@ -65,7 +65,7 @@ vault:
|
|||||||
Only the leader needs to be bootstrapped; others join via `vault operator raft join`:
|
Only the leader needs to be bootstrapped; others join via `vault operator raft join`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# On the primary Vault (service-1 container):
|
# On the primary Vault (iklim-app-01 container):
|
||||||
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
|
VAULT_CTR=$(docker ps -q -f name=iklimco_vault)
|
||||||
|
|
||||||
# Unseal if needed
|
# Unseal if needed
|
||||||
@ -75,22 +75,22 @@ docker exec -it "$VAULT_CTR" vault operator unseal
|
|||||||
docker exec "$VAULT_CTR" vault operator raft list-peers
|
docker exec "$VAULT_CTR" vault operator raft list-peers
|
||||||
```
|
```
|
||||||
|
|
||||||
On service-2 and service-3 containers:
|
On iklim-app-02 and iklim-app-03 containers:
|
||||||
```bash
|
```bash
|
||||||
docker exec -it <vault-on-service-2> vault operator raft join \
|
docker exec -it <vault-on-iklim-app-02> vault operator raft join \
|
||||||
https://vault.iklim.co:8200
|
https://vault.iklim.co:8200
|
||||||
```
|
```
|
||||||
|
|
||||||
### cert-reloader update for Raft
|
### cert-reloader update for Raft
|
||||||
|
|
||||||
Update the cert-reloader command in `docker-stack-infra.yml` to SSH-copy the cert
|
Update the cert-reloader command in `docker-stack-infra.yml` to SSH-copy the cert
|
||||||
to service-2 and service-3 after renewal:
|
to iklim-app-02 and iklim-app-03 after renewal:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# After copying to local /opt/iklimco/ssl/:
|
# After copying to local /opt/iklimco/ssl/:
|
||||||
ssh -i /run/secrets/cert_reloader_ssh_key service-2 \
|
ssh -i /run/secrets/cert_reloader_ssh_key iklim-app-02 \
|
||||||
"cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
|
"cp /dev/stdin /opt/iklimco/ssl/STAR.iklim.co.full.crt" < /opt/iklimco/ssl/STAR.iklim.co.full.crt
|
||||||
# (repeat for service-3 and privkey)
|
# (repeat for iklim-app-03 and privkey)
|
||||||
docker service update --force iklimco_vault
|
docker service update --force iklimco_vault
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@ -8,7 +8,7 @@ Run after a successful prod pipeline deployment.
|
|||||||
```bash
|
```bash
|
||||||
docker node ls
|
docker node ls
|
||||||
```
|
```
|
||||||
Expected: 3 managers (`Leader` + 2 `Reachable`), 3 workers (`Ready`).
|
Expected: 3 managers (`Leader` + 2 `Reachable`) for `iklim-app-01/02/03`, 3 workers (`Ready`) for `iklim-db-01/02/03`.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker service ls --filter label=project=co.iklim
|
docker service ls --filter label=project=co.iklim
|
||||||
@ -57,7 +57,7 @@ curl -si https://rabbitmq.iklim.co # HTTP 200 RabbitMQ Management
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# From outside — must fail
|
# From outside — must fail
|
||||||
curl -sk --connect-timeout 5 https://<service-1-public-ip>:8200/v1/sys/health
|
curl -sk --connect-timeout 5 https://<iklim-app-01-public-ip>:8200/v1/sys/health
|
||||||
# Expected: connection refused or timeout
|
# Expected: connection refused or timeout
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -86,10 +86,21 @@ Only `iklimco_swag` should show `*:80->80/tcp, *:443->443/tcp`.
|
|||||||
## 8 — DB nodes running correct services
|
## 8 — DB nodes running correct services
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker service ps iklimco_postgres
|
# Patroni (PostgreSQL HA) stack
|
||||||
docker service ps iklimco_mongo
|
docker stack services iklim-patroni
|
||||||
|
docker service ps iklim-patroni_patroni-01
|
||||||
|
docker service ps iklim-patroni_patroni-02
|
||||||
|
docker service ps iklim-patroni_patroni-03
|
||||||
|
|
||||||
|
# etcd cluster (for Patroni)
|
||||||
|
docker stack services iklim-db-etcd
|
||||||
|
|
||||||
|
# MongoDB replica set
|
||||||
|
docker stack services iklim-db
|
||||||
|
docker service ps iklim-db_mongodb
|
||||||
```
|
```
|
||||||
Tasks should show node names matching `db-1`, `db-2`, or `db-3`.
|
|
||||||
|
All tasks should show node names matching `iklim-db-01`, `iklim-db-02`, or `iklim-db-03` with placement constraint `role=db`.
|
||||||
|
|
||||||
## 9 — APISIX replicas
|
## 9 — APISIX replicas
|
||||||
|
|
||||||
|
|||||||
@ -4,6 +4,7 @@
|
|||||||
- **Repo:** `iklim.co` root
|
- **Repo:** `iklim.co` root
|
||||||
- **Environment:** test
|
- **Environment:** test
|
||||||
- **Server:** single node — same machine is both Swarm manager and worker
|
- **Server:** single node — same machine is both Swarm manager and worker
|
||||||
|
- **Sizing:** Terraform test app node is `cpx42`; see `../../hetzner-sizing-report.md`
|
||||||
- Pipeline trigger: push to `test-env` branch → Gitea runner executes directly on the test server
|
- Pipeline trigger: push to `test-env` branch → Gitea runner executes directly on the test server
|
||||||
- `init/swarm-init.sh` already exists in the repo and is called by the pipeline
|
- `init/swarm-init.sh` already exists in the repo and is called by the pipeline
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user