58 lines
5.3 KiB
Markdown
58 lines
5.3 KiB
Markdown
|
|
# Cluster Hardening + Codec Settings Revamp
|
||
|
|
|
||
|
|
> Status: **mostly shipped 2026-05-21**. One follow-up remains: the recorders.html UI rewrite. See "Pending" below.
|
||
|
|
|
||
|
|
## Goal
|
||
|
|
Four user-driven asks from 2026-05-20:
|
||
|
|
1. Fix cluster page: workers were registering with docker bridge IPs, and three duplicate "zampp2" rows kept appearing.
|
||
|
|
2. Expand recorder codec settings: per-recorder control over bitrate, framerate, audio channels, container format.
|
||
|
|
3. Better DeckLink port picker: "BM1/BM2" dropdown was unusable -- diagram the card so operators pick a port visually.
|
||
|
|
4. Validate the cluster end-to-end now that GPUs are in place.
|
||
|
|
|
||
|
|
## What shipped (commit list)
|
||
|
|
| Commit | Area | Summary |
|
||
|
|
|---|---|---|
|
||
|
|
| `a39c983` | mam-api | Migration 007 -- dedupe `cluster_nodes` rows + unique index on `hostname`. |
|
||
|
|
| `049beb8` | mam-api | Migration 008 -- expanded codec columns on `recorders` (video/audio bitrate, framerate, audio channels, container, plus `node_id` / `device_index` pinning). |
|
||
|
|
| `3b4af6e` | node-agent | Prefer `NODE_IP` env override; skip docker bridge / veth / cni interfaces when auto-detecting. Version bumped to 1.1.0. |
|
||
|
|
| `0efef0d` | mam-api | `routes/cluster.js`: `pickIp()` fallback to request source IP. New `GET /api/v1/cluster/devices/blackmagic` flattens every node's DeckLink capabilities. |
|
||
|
|
| `40a66ba` | compose | `docker-compose.worker.yml`: `network_mode: host` for node-agent so it inherits host hostname + LAN IP. |
|
||
|
|
| `0ebb3cf` | deploy | `onboard-node.sh`: auto-detect host LAN IP and write `NODE_IP` + `BMD_MODEL` to `.env.worker`. |
|
||
|
|
| `f4a83ee` | capture | `capture-manager.js`: dynamic ffmpeg args. Exports `VIDEO_CODECS`, `AUDIO_CODECS`, `CONTAINER_FMT`, `CONTAINER_EXT`. |
|
||
|
|
| `485af25` | capture | `index.js` bootstrap forwards every codec env var to `captureManager.start()`. |
|
||
|
|
| `4c65753` | mam-api | `routes/recorders.js`: full codec field whitelist; `/start` passes settings to the capture sidecar. |
|
||
|
|
| `d39f86d` | web-ui | `services/web-ui/public/js/bmd-card.js` -- SVG renderer for DeckLink port selection. Models: Duo 2, Quad 2, Mini Recorder 4K, Mini Monitor 4K, UltraStudio 4K Mini. |
|
||
|
|
| `8aa3783` | deploy | `deploy/test-cluster.sh` cluster smoke test. |
|
||
|
|
| `4a3a672` | cluster | `mam-api` self-heartbeat reads `NODE_HOSTNAME` (otherwise every restart spawns a new primary row). Smoke test rewritten with `jq` after Python f-strings were found to silently false-pass the docker-bridge check. Bridge alarm narrowed to 172.17.x since this LAN occupies 172.18.0.0/16. |
|
||
|
|
|
||
|
|
## Verified cluster state (post-deploy, 2026-05-21)
|
||
|
|
```
|
||
|
|
$ MAM_API_URL=http://localhost:47432 bash deploy/test-cluster.sh
|
||
|
|
6 pass 0 fail
|
||
|
|
```
|
||
|
|
Two nodes registered, no duplicate hostnames, real LAN IPs (zampp1=172.18.91.216 primary, zampp2=172.18.91.217 worker), fresh heartbeats, 3 NVIDIA GPUs visible on zampp1, DeckLink Duo 2 reporting all 4 ports on zampp2.
|
||
|
|
|
||
|
|
## Deploy state
|
||
|
|
- **zampp1**: at `4a3a672`, rebuilt `mam-api`/`web-ui`/`worker`/`capture`, migrations 007+008 applied at startup. `.env` has `NODE_HOSTNAME=zampp1`, `NODE_IP=172.18.91.216`.
|
||
|
|
- **zampp2**: at `4a3a672`, rebuilt `node-agent` + `worker`. `.env` has `NODE_IP=172.18.91.217`, `BMD_COUNT=4`, `BMD_MODEL="DeckLink Duo 2"`, `BMD_DEVICE_0..3` populated.
|
||
|
|
- **Forgejo PAT** is at `/root/.git-credentials` on zampp1 (mode 600). Pushes from zampp1 need `HOME=/root`.
|
||
|
|
|
||
|
|
## LAN topology gotcha
|
||
|
|
The user's LAN is **172.18.91.0/24** -- inside Docker's reserved 172.16.0.0/12 range. Any heuristic that flags all of 172.16-172.31 as "docker bridge" will produce false positives. The smoke test now alarms only on 172.17.x (default docker0). The server-side `pickIp()` in `routes/cluster.js` has the same vulnerability but the node-agent's `NODE_IP` env-var override masks it in practice.
|
||
|
|
|
||
|
|
## Pending
|
||
|
|
- [ ] **`services/web-ui/public/recorders.html` rewrite.** The supporting pieces are in `main` but the HTML wiring was lost to a context-compaction event mid-session. Required UI:
|
||
|
|
- Tabbed codec settings (Video / Audio / Container) for both master and proxy.
|
||
|
|
- SDI source picker: node dropdown + inline `BMDCards.render(...)` SVG with click-to-select.
|
||
|
|
- Load BMD card data from `GET /api/v1/cluster/devices/blackmagic`.
|
||
|
|
- `<script src="js/bmd-card.js?v=1"></script>` in the head.
|
||
|
|
- SVG styles (`.bmd-card-svg`, `.bmd-port-ring`, `.bmd-port-group.is-selected`, ...) inlined or split into a CSS file.
|
||
|
|
- [ ] **Visual polish pass** with flyonui MCP -- the user noted the current UI "still looks AI-designed". Should happen AFTER the recorders.html rewrite.
|
||
|
|
|
||
|
|
## How to pick this up
|
||
|
|
1. `cd /opt/wild-dragon && git pull` on zampp1 (or zampp2).
|
||
|
|
2. Read this file end-to-end. Then `services/web-ui/public/js/bmd-card.js` (top JSDoc explains the API) and `services/capture/src/capture-manager.js` (codec catalogs).
|
||
|
|
3. Inspect `recorders.html` -- it still has the pre-revamp "BM1/BM2" dropdown and flat codec fields. Compare against the `recorders` table columns in `008-codec-settings.sql` for the full field set the UI should drive.
|
||
|
|
4. Iterate against a live deployment: `bash deploy/test-cluster.sh` for regression check, plus the actual `/recorders.html` page in a browser (web-ui on port 8080, mam-api on 47432).
|
||
|
|
5. Commit through Forgejo MCP if the diff is small; otherwise push from zampp1 (see Deploy state above for creds location). **Cloudflare WAF blocks large MCP uploads** (the blocked domain is `anthropic.com`, not Forgejo) -- pushing from a host with creds is faster for anything over ~3 KB.
|