cluster: enforce master/worker topology — remove duplicate primary stack from zampp2 #12

New issue

Closed

opened 2026-05-21 21:45:47 -04:00 by zgaetano · 0 comments

zgaetano commented

2026-05-21 21:45:47 -04:00

Owner

Summary

zampp2 was running a full duplicate primary stack (mam-api, web-ui, editor, db, redis, worker) alongside node-agent. This violated the intended topology where only zampp1 is the master orchestrator. Fixed in this session.

Work completed (2026-05-22)

Cluster architecture cleanup

Problem: zampp2 had both docker-compose.yml (full primary stack) and docker-compose.worker.yml (node-agent) running simultaneously, giving it a duplicate MAM web UI, its own isolated postgres/redis, and a second mam-api on port 47432.

Fix:

Stopped and removed the primary stack on zampp2 (docker compose -f docker-compose.yml down)
zampp2 now runs node-agent only (restart: unless-stopped, survives reboots via Docker policy)
Exposed postgres (:5432) and redis (:6379) on zampp1's host so worker nodes can connect when running --profile worker transcoding jobs — commit 00f3f29
Updated zampp2's .env (not tracked — contains credentials) to point DATABASE_URL and REDIS_URL at 172.18.91.216 (zampp1) instead of local container names
Deleted the stale docker-compose.cluster.yml workaround file that had been sitting untracked on zampp1 (its function is now in docker-compose.yml directly)

pickIp() LAN false-positive fix

Problem: pickIp() in routes/cluster.js used the regex /^172\.(1[6-9]|2\d|3[01])\./ to detect Docker bridge IPs — flagging the entire RFC1918 172.16/12 block. The real LAN (172.18.91.x) falls inside this range, so any worker node that joined without an explicit NODE_IP would have its address mangled.

Fix: Narrowed to /^172\.17\./ — only the actual default docker0 bridge — commit 37767f9. In practice this was masked by NODE_IP=172.18.91.217 being set in zampp2's .env, but a new node onboarded without that override would have hit it.

Commits

SHA	Description
`00f3f29`	feat(cluster): expose db/redis ports for worker-node connectivity
`37767f9`	fix(cluster): pickIp() only treats 172.17.x as docker bridge

Current state

zampp1  primary  online  172.18.91.216   — full stack (mam-api, db, redis, worker, web-ui, editor, capture)
zampp2  worker   online  172.18.91.217   — node-agent only + on-demand capture sidecars

Both repos at main HEAD (37767f9), no local-ahead commits on either machine.

Closes in conjunction with #10 (DeckLink end-to-end, which required the remote node routing that this topology depends on)
deploy/onboard-node.sh is the canonical path for adding future worker nodes; it uses --project-name wild-dragon-worker and writes .env.worker (zampp2 was onboarded manually before that script existed)

## Summary zampp2 was running a full duplicate primary stack (mam-api, web-ui, editor, db, redis, worker) alongside node-agent. This violated the intended topology where only zampp1 is the master orchestrator. Fixed in this session. --- ## Work completed (2026-05-22) ### Cluster architecture cleanup **Problem:** zampp2 had both `docker-compose.yml` (full primary stack) and `docker-compose.worker.yml` (node-agent) running simultaneously, giving it a duplicate MAM web UI, its own isolated postgres/redis, and a second mam-api on port 47432. **Fix:** - Stopped and removed the primary stack on zampp2 (`docker compose -f docker-compose.yml down`) - zampp2 now runs **node-agent only** (`restart: unless-stopped`, survives reboots via Docker policy) - Exposed postgres (`:5432`) and redis (`:6379`) on zampp1's host so worker nodes can connect when running `--profile worker` transcoding jobs — commit `00f3f29` - Updated zampp2's `.env` (not tracked — contains credentials) to point `DATABASE_URL` and `REDIS_URL` at `172.18.91.216` (zampp1) instead of local container names - Deleted the stale `docker-compose.cluster.yml` workaround file that had been sitting untracked on zampp1 (its function is now in `docker-compose.yml` directly) ### pickIp() LAN false-positive fix **Problem:** `pickIp()` in `routes/cluster.js` used the regex `/^172\.(1[6-9]|2\d|3[01])\./` to detect Docker bridge IPs — flagging the entire RFC1918 172.16/12 block. The real LAN (`172.18.91.x`) falls inside this range, so any worker node that joined without an explicit `NODE_IP` would have its address mangled. **Fix:** Narrowed to `/^172\.17\./` — only the actual default `docker0` bridge — commit `37767f9`. In practice this was masked by `NODE_IP=172.18.91.217` being set in zampp2's `.env`, but a new node onboarded without that override would have hit it. --- ## Commits | SHA | Description | |-----|-------------| | `00f3f29` | feat(cluster): expose db/redis ports for worker-node connectivity | | `37767f9` | fix(cluster): pickIp() only treats 172.17.x as docker bridge | --- ## Current state ``` zampp1 primary online 172.18.91.216 — full stack (mam-api, db, redis, worker, web-ui, editor, capture) zampp2 worker online 172.18.91.217 — node-agent only + on-demand capture sidecars ``` Both repos at `main` HEAD (`37767f9`), no local-ahead commits on either machine. --- ## Related - Closes in conjunction with #10 (DeckLink end-to-end, which required the remote node routing that this topology depends on) - `deploy/onboard-node.sh` is the canonical path for adding future worker nodes; it uses `--project-name wild-dragon-worker` and writes `.env.worker` (zampp2 was onboarded manually before that script existed)