docs(playout): work log — commit map, decisions, testing checklist

Replaces the earlier aspirational "complete" log with the actual commit
sequence on feat/playout-mcr, the §7 decisions as built, the media-flow
diagram, port-contention + failover scope, and a runtime testing checklist
(migration → image build → SRT smoke → failover kill test).
This commit is contained in:
Zac 2026-05-30 13:18:55 +00:00
parent d505a488ac
commit 34352e3299

101
WORK_LOG_PLAYOUT.md Normal file
View file

@ -0,0 +1,101 @@
# Playout / Master Control — Implementation Work Log
**Branch:** `feat/playout-mcr` (off `main`)
**Started:** 2026-05-30
**Status:** Code complete, awaiting runtime validation
Tracks the build of the playout (MCR) subsystem against the design at
`docs/superpowers/specs/2026-05-30-playout-mcr-design.md`.
---
## Commit sequence
| # | Commit | Scope |
|---|--------|-------|
| 1 | `docs(playout)` | Design spec, §7 questions answered |
| 2 | `feat(mam-api): migration 029` | Six tables, failover columns, audio_normalized flag |
| 3 | `feat(worker): playout-stage` | S3 → /media + EBU R128 loudnorm + index.js wiring |
| 4 | `feat(playout): sidecar` | CasparCG image + AMCP shim, HLS preview consumer, fps-aware frame math |
| 5 | `feat(mam-api): /playout control plane + auto-failover` | Routes + scheduler health tick + restartChannel helper |
| 6 | `feat(web-ui): MCR page` | screens-playout, styles, app/shell/index.html wiring |
| 7 | `build(playout): compose wiring + .env knobs` | /media volume, queue addition, build-only service |
| 8 | `docs(playout): work log` | This file |
## Resolved §7 decisions (2026-05-30)
- **Audio loudness:** pre-normalize at stage time. ffmpeg `loudnorm` two-pass
(I=-23 LUFS, TP=-1 dBTP, LRA=11), linear mode preserves dynamics. Output
AAC 192k @ 48 kHz, video stream copied. Per-item `audio_normalized` flag
so re-stages of the same asset skip the pass.
- **Frame rate:** `1080p5994` default (was `1080i5994`). Per-channel
override allowed via `video_format`. `fpsFor(videoFormat)` helper in
the sidecar drives SEEK / LENGTH / transition-frames math.
- **Preview latency:** HLS v1. CasparCG runs a second FFMPEG consumer
alongside the primary output, writing `/media/live/<channel_id>/index.m3u8`
(~600 kbps, 2s segments, 6-window list). Web UI plays via the existing
HLS plumbing.
- **Failover:** auto-restart on healthy node for NDI/SRT/RTMP. Alert-only
for DeckLink (device-index pinning makes blind re-placement risky).
Scheduler tick (PG advisory lock, same lock as recorder schedules) polls
sidecar `/status`; ~3 missed checks → `restartChannel(id)` picks the most
recently-seen-online other node, bumps `restart_count`, calls `/start`.
## Architecture notes
**Sidecar model.** One CasparCG container per channel. Spawned by mam-api
via local Docker socket (primary node) or remote node-agent
`/sidecar/start`. Tracked in `playout_sidecars` plus `playout_channels.container_id`.
Killed on `/stop` or by `restartChannel` during failover.
**Media flow.**
```
S3 master/proxy → playout-stage worker → /media/playout/<assetId>.<ext>
(loudnormed, AAC@-23 LUFS)
CasparCG channel #1
primary consumer HLS consumer
(DeckLink/NDI/ ↓
SRT/RTMP) /media/live/<ch_id>/*.m3u8
```
**Port contention.** `assertDeckLinkFree()` blocks starting a SDI channel
when a recorder or another channel on the same node+device_index is active.
**Failover scope.** NDI/SRT/RTMP have no hardware tie, so any healthy
cluster_node is eligible. DeckLink channels surface an alert in the UI
(`status='error'` + `error_message`) and require operator intervention.
## Testing checklist
- [ ] Apply migration 029 on dev DB
- [ ] Build playout image: `docker compose --profile build-only build playout`
- [ ] Build web-ui (`screens-playout` joins the esbuild list automatically)
- [ ] Create channel via POST /api/v1/playout/channels (SRT first, no HW)
- [ ] Stage 2-3 assets to a playlist, verify loudnorm metadata in stderr
- [ ] Start channel → sidecar container appears in `docker ps`
- [ ] AMCP smoke: `telnet <host> 5250`, `VERSION`, `INFO`
- [ ] Play playlist; verify HLS at /media/live/<id>/index.m3u8
- [ ] Skip / pause / resume / stop
- [ ] As-run log: GET /api/v1/playout/channels/:id/asrun
- [ ] Kill sidecar container → scheduler should restart on another node
within ~3 ticks (~45s), restart_count increments
- [ ] DeckLink channel kill: status flips to 'error', NO restart attempt
- [ ] Try starting a decklink channel on a device_index already held by a
recorder → 409
- [ ] MCR UI smoke: nav entry visible, page renders, drag-drop adds items,
transport buttons hit the API
## Known gaps (deferred)
- No WebRTC preview (HLS-only v1 — 4-6s lag, fine for confidence monitor).
- No graphics/CG overlay layer in Phase A (templates land in Phase B).
- No Phase B scheduler / 24/7 wall-clock channel (schema is in place,
scheduler tick is not).
- No multi-channel grid view (one channel at a time per page).
- No timecode / remaining-duration overlay (would need CasparCG INFO poll).
- No audio level meters on the UI.
- `restartChannel` updates DB state and triggers `/start`; if the new node
also fails repeatedly, there's no exponential backoff yet — bounded only
by the manual stop button.