diff --git a/docs/superpowers/plans/2026-05-21-cluster-codec-revamp.md b/docs/superpowers/plans/2026-05-21-cluster-codec-revamp.md new file mode 100644 index 0000000..f03c667 --- /dev/null +++ b/docs/superpowers/plans/2026-05-21-cluster-codec-revamp.md @@ -0,0 +1,57 @@ +# Cluster Hardening + Codec Settings Revamp + +> Status: **mostly shipped 2026-05-21**. One follow-up remains: the recorders.html UI rewrite. See "Pending" below. + +## Goal +Four user-driven asks from 2026-05-20: +1. Fix cluster page: workers were registering with docker bridge IPs, and three duplicate "zampp2" rows kept appearing. +2. Expand recorder codec settings: per-recorder control over bitrate, framerate, audio channels, container format. +3. Better DeckLink port picker: "BM1/BM2" dropdown was unusable -- diagram the card so operators pick a port visually. +4. Validate the cluster end-to-end now that GPUs are in place. + +## What shipped (commit list) +| Commit | Area | Summary | +|---|---|---| +| `a39c983` | mam-api | Migration 007 -- dedupe `cluster_nodes` rows + unique index on `hostname`. | +| `049beb8` | mam-api | Migration 008 -- expanded codec columns on `recorders` (video/audio bitrate, framerate, audio channels, container, plus `node_id` / `device_index` pinning). | +| `3b4af6e` | node-agent | Prefer `NODE_IP` env override; skip docker bridge / veth / cni interfaces when auto-detecting. Version bumped to 1.1.0. | +| `0efef0d` | mam-api | `routes/cluster.js`: `pickIp()` fallback to request source IP. New `GET /api/v1/cluster/devices/blackmagic` flattens every node's DeckLink capabilities. | +| `40a66ba` | compose | `docker-compose.worker.yml`: `network_mode: host` for node-agent so it inherits host hostname + LAN IP. | +| `0ebb3cf` | deploy | `onboard-node.sh`: auto-detect host LAN IP and write `NODE_IP` + `BMD_MODEL` to `.env.worker`. | +| `f4a83ee` | capture | `capture-manager.js`: dynamic ffmpeg args. Exports `VIDEO_CODECS`, `AUDIO_CODECS`, `CONTAINER_FMT`, `CONTAINER_EXT`. | +| `485af25` | capture | `index.js` bootstrap forwards every codec env var to `captureManager.start()`. | +| `4c65753` | mam-api | `routes/recorders.js`: full codec field whitelist; `/start` passes settings to the capture sidecar. | +| `d39f86d` | web-ui | `services/web-ui/public/js/bmd-card.js` -- SVG renderer for DeckLink port selection. Models: Duo 2, Quad 2, Mini Recorder 4K, Mini Monitor 4K, UltraStudio 4K Mini. | +| `8aa3783` | deploy | `deploy/test-cluster.sh` cluster smoke test. | +| `4a3a672` | cluster | `mam-api` self-heartbeat reads `NODE_HOSTNAME` (otherwise every restart spawns a new primary row). Smoke test rewritten with `jq` after Python f-strings were found to silently false-pass the docker-bridge check. Bridge alarm narrowed to 172.17.x since this LAN occupies 172.18.0.0/16. | + +## Verified cluster state (post-deploy, 2026-05-21) +``` +$ MAM_API_URL=http://localhost:47432 bash deploy/test-cluster.sh +6 pass 0 fail +``` +Two nodes registered, no duplicate hostnames, real LAN IPs (zampp1=172.18.91.216 primary, zampp2=172.18.91.217 worker), fresh heartbeats, 3 NVIDIA GPUs visible on zampp1, DeckLink Duo 2 reporting all 4 ports on zampp2. + +## Deploy state +- **zampp1**: at `4a3a672`, rebuilt `mam-api`/`web-ui`/`worker`/`capture`, migrations 007+008 applied at startup. `.env` has `NODE_HOSTNAME=zampp1`, `NODE_IP=172.18.91.216`. +- **zampp2**: at `4a3a672`, rebuilt `node-agent` + `worker`. `.env` has `NODE_IP=172.18.91.217`, `BMD_COUNT=4`, `BMD_MODEL="DeckLink Duo 2"`, `BMD_DEVICE_0..3` populated. +- **Forgejo PAT** is at `/root/.git-credentials` on zampp1 (mode 600). Pushes from zampp1 need `HOME=/root`. + +## LAN topology gotcha +The user's LAN is **172.18.91.0/24** -- inside Docker's reserved 172.16.0.0/12 range. Any heuristic that flags all of 172.16-172.31 as "docker bridge" will produce false positives. The smoke test now alarms only on 172.17.x (default docker0). The server-side `pickIp()` in `routes/cluster.js` has the same vulnerability but the node-agent's `NODE_IP` env-var override masks it in practice. + +## Pending +- [ ] **`services/web-ui/public/recorders.html` rewrite.** The supporting pieces are in `main` but the HTML wiring was lost to a context-compaction event mid-session. Required UI: + - Tabbed codec settings (Video / Audio / Container) for both master and proxy. + - SDI source picker: node dropdown + inline `BMDCards.render(...)` SVG with click-to-select. + - Load BMD card data from `GET /api/v1/cluster/devices/blackmagic`. + - `` in the head. + - SVG styles (`.bmd-card-svg`, `.bmd-port-ring`, `.bmd-port-group.is-selected`, ...) inlined or split into a CSS file. +- [ ] **Visual polish pass** with flyonui MCP -- the user noted the current UI "still looks AI-designed". Should happen AFTER the recorders.html rewrite. + +## How to pick this up +1. `cd /opt/wild-dragon && git pull` on zampp1 (or zampp2). +2. Read this file end-to-end. Then `services/web-ui/public/js/bmd-card.js` (top JSDoc explains the API) and `services/capture/src/capture-manager.js` (codec catalogs). +3. Inspect `recorders.html` -- it still has the pre-revamp "BM1/BM2" dropdown and flat codec fields. Compare against the `recorders` table columns in `008-codec-settings.sql` for the full field set the UI should drive. +4. Iterate against a live deployment: `bash deploy/test-cluster.sh` for regression check, plus the actual `/recorders.html` page in a browser (web-ui on port 8080, mam-api on 47432). +5. Commit through Forgejo MCP if the diff is small; otherwise push from zampp1 (see Deploy state above for creds location). **Cloudflare WAF blocks large MCP uploads** (the blocked domain is `anthropic.com`, not Forgejo) -- pushing from a host with creds is faster for anything over ~3 KB.