dragonflight/docs/superpowers/plans/2026-05-21-cluster-codec-revamp.md
Zach Gaetano 8403355ba9 docs: add handoff plan for cluster + codec revamp session
Captures the full commit list, current cluster state, the 172.18 LAN topology gotcha, the remaining recorders.html UI rewrite, and how the next agent should pick it up.
2026-05-21 13:13:42 +00:00

5.3 KiB

Cluster Hardening + Codec Settings Revamp

Status: mostly shipped 2026-05-21. One follow-up remains: the recorders.html UI rewrite. See "Pending" below.

Goal

Four user-driven asks from 2026-05-20:

  1. Fix cluster page: workers were registering with docker bridge IPs, and three duplicate "zampp2" rows kept appearing.
  2. Expand recorder codec settings: per-recorder control over bitrate, framerate, audio channels, container format.
  3. Better DeckLink port picker: "BM1/BM2" dropdown was unusable -- diagram the card so operators pick a port visually.
  4. Validate the cluster end-to-end now that GPUs are in place.

What shipped (commit list)

Commit Area Summary
a39c983 mam-api Migration 007 -- dedupe cluster_nodes rows + unique index on hostname.
049beb8 mam-api Migration 008 -- expanded codec columns on recorders (video/audio bitrate, framerate, audio channels, container, plus node_id / device_index pinning).
3b4af6e node-agent Prefer NODE_IP env override; skip docker bridge / veth / cni interfaces when auto-detecting. Version bumped to 1.1.0.
0efef0d mam-api routes/cluster.js: pickIp() fallback to request source IP. New GET /api/v1/cluster/devices/blackmagic flattens every node's DeckLink capabilities.
40a66ba compose docker-compose.worker.yml: network_mode: host for node-agent so it inherits host hostname + LAN IP.
0ebb3cf deploy onboard-node.sh: auto-detect host LAN IP and write NODE_IP + BMD_MODEL to .env.worker.
f4a83ee capture capture-manager.js: dynamic ffmpeg args. Exports VIDEO_CODECS, AUDIO_CODECS, CONTAINER_FMT, CONTAINER_EXT.
485af25 capture index.js bootstrap forwards every codec env var to captureManager.start().
4c65753 mam-api routes/recorders.js: full codec field whitelist; /start passes settings to the capture sidecar.
d39f86d web-ui services/web-ui/public/js/bmd-card.js -- SVG renderer for DeckLink port selection. Models: Duo 2, Quad 2, Mini Recorder 4K, Mini Monitor 4K, UltraStudio 4K Mini.
8aa3783 deploy deploy/test-cluster.sh cluster smoke test.
4a3a672 cluster mam-api self-heartbeat reads NODE_HOSTNAME (otherwise every restart spawns a new primary row). Smoke test rewritten with jq after Python f-strings were found to silently false-pass the docker-bridge check. Bridge alarm narrowed to 172.17.x since this LAN occupies 172.18.0.0/16.

Verified cluster state (post-deploy, 2026-05-21)

$ MAM_API_URL=http://localhost:47432 bash deploy/test-cluster.sh
6 pass  0 fail

Two nodes registered, no duplicate hostnames, real LAN IPs (zampp1=172.18.91.216 primary, zampp2=172.18.91.217 worker), fresh heartbeats, 3 NVIDIA GPUs visible on zampp1, DeckLink Duo 2 reporting all 4 ports on zampp2.

Deploy state

  • zampp1: at 4a3a672, rebuilt mam-api/web-ui/worker/capture, migrations 007+008 applied at startup. .env has NODE_HOSTNAME=zampp1, NODE_IP=172.18.91.216.
  • zampp2: at 4a3a672, rebuilt node-agent + worker. .env has NODE_IP=172.18.91.217, BMD_COUNT=4, BMD_MODEL="DeckLink Duo 2", BMD_DEVICE_0..3 populated.
  • Forgejo PAT is at /root/.git-credentials on zampp1 (mode 600). Pushes from zampp1 need HOME=/root.

LAN topology gotcha

The user's LAN is 172.18.91.0/24 -- inside Docker's reserved 172.16.0.0/12 range. Any heuristic that flags all of 172.16-172.31 as "docker bridge" will produce false positives. The smoke test now alarms only on 172.17.x (default docker0). The server-side pickIp() in routes/cluster.js has the same vulnerability but the node-agent's NODE_IP env-var override masks it in practice.

Pending

  • services/web-ui/public/recorders.html rewrite. The supporting pieces are in main but the HTML wiring was lost to a context-compaction event mid-session. Required UI:
    • Tabbed codec settings (Video / Audio / Container) for both master and proxy.
    • SDI source picker: node dropdown + inline BMDCards.render(...) SVG with click-to-select.
    • Load BMD card data from GET /api/v1/cluster/devices/blackmagic.
    • <script src="js/bmd-card.js?v=1"></script> in the head.
    • SVG styles (.bmd-card-svg, .bmd-port-ring, .bmd-port-group.is-selected, ...) inlined or split into a CSS file.
  • Visual polish pass with flyonui MCP -- the user noted the current UI "still looks AI-designed". Should happen AFTER the recorders.html rewrite.

How to pick this up

  1. cd /opt/wild-dragon && git pull on zampp1 (or zampp2).
  2. Read this file end-to-end. Then services/web-ui/public/js/bmd-card.js (top JSDoc explains the API) and services/capture/src/capture-manager.js (codec catalogs).
  3. Inspect recorders.html -- it still has the pre-revamp "BM1/BM2" dropdown and flat codec fields. Compare against the recorders table columns in 008-codec-settings.sql for the full field set the UI should drive.
  4. Iterate against a live deployment: bash deploy/test-cluster.sh for regression check, plus the actual /recorders.html page in a browser (web-ui on port 8080, mam-api on 47432).
  5. Commit through Forgejo MCP if the diff is small; otherwise push from zampp1 (see Deploy state above for creds location). Cloudflare WAF blocks large MCP uploads (the blocked domain is anthropic.com, not Forgejo) -- pushing from a host with creds is faster for anything over ~3 KB.