diff --git a/docs/superpowers/specs/2026-05-30-playout-mcr-design.md b/docs/superpowers/specs/2026-05-30-playout-mcr-design.md new file mode 100644 index 0000000..c1bd8a3 --- /dev/null +++ b/docs/superpowers/specs/2026-05-30-playout-mcr-design.md @@ -0,0 +1,235 @@ +# Wild Dragon MAM — Playout / Master Control (MCR) + +**Date:** 2026-05-30 (revised 2026-05-30 — §7 closed) +**Status:** APPROVED — implementation in progress (code drafted but uncommitted; see WORK_LOG_PLAYOUT.md) +**Author:** Zac + Claude + +--- + +## Resolved Decisions + +| Question | Decision | +|----------|----------| +| Playout engine | **CasparCG Server** (orchestrated via AMCP), not ffmpeg-native | +| Channel count | **Multi-channel from start** — N independent channels, placed across cluster nodes by capability (mirrors recorders) | +| Scheduling model | **Phased** — Phase A: on-demand playlist player; Phase B: 24/7 continuous channel | +| Output targets | SDI (DeckLink), NDI, SRT, RTMP — all via CasparCG consumers | +| Media source | Assets live in **S3**; must be staged to a CasparCG-local media volume before play (see §4) | +| CasparCG packaging | **Build our own image** (like `capture/build-with-decklink.sh`) — GL context via GPU passthrough or Xvfb; NDI + DeckLink SDKs fetched at build time (not redistributable) | +| Master codec playability | Zac confirms current masters **play fine in CasparCG** — no transcode-for-playout step; staging is a plain S3→/media copy. Validate on hardware but do not gate on it | +| Management UI | **Single Dragonflight `playout.html` GUI** drives everything via AMCP; operator never touches CasparCG directly | + +--- + +## Overview + +Playout adds **master-control-room** capability to Dragonflight: take library assets, arrange them on a timeline / playlist, and play them out continuously to a broadcast output — SDI via DeckLink, or stream via SRT / RTMP / NDI. Drag-and-drop scheduling, a program/preview monitor, and as-run logging. + +This is the **mirror image** of the existing capture path. Capture is `input → ffmpeg encode → S3`. Playout is `S3 asset → engine → output`. We reuse three things wholesale: + +1. **Cluster node + capability model** — nodes already advertise DeckLink/Deltacast/GPU in `cluster_nodes.capabilities`; channels are placed on nodes that have a free output port, exactly as recorders claim input ports. +2. **Sidecar orchestration** — mam-api spawns containers via the local Docker socket or the remote `node-agent /sidecar/start`. A CasparCG channel is just a different sidecar image. +3. **Scheduler tick + PG advisory lock** — `src/scheduler.js` already runs a single-leader tick over a schedule table. Phase B's wall-clock channel reuses this pattern. + +### Why CasparCG over ffmpeg-native + +The capture stack proves we can drive ffmpeg + DeckLink. But playout's hard part is **gapless, frame-accurate, clean transitions between clips** — every clip boundary in an ffmpeg-per-clip model is a black flash unless we engineer a concat-feeder. CasparCG solves this natively: a channel is a persistent output with a playlist, hard/mix/wipe transitions, layered graphics/logo (CG), and DeckLink/NDI/SRT/RTMP consumers built in. We orchestrate it over **AMCP** (its TCP control protocol) instead of reinventing a feeder. Trade: a new dependency + container image, and media must be on a CasparCG-visible disk (§4). + +--- + +## 1. Data Model + +New migration `029-playout.sql`. Five tables. + +### 1.1 `playout_channels` +A logical output. One channel → one engine instance → one output target. + +``` +id uuid pk +name text -- "Channel 1", "Pop-up SDI" +node_id uuid -> cluster_nodes(id) -- where the engine runs (null = primary) +output_type text -- 'decklink' | 'ndi' | 'srt' | 'rtmp' +output_config jsonb -- { device_index } | { ndi_name } | { url, latency } | { url, key } +video_format text -- '1080i5994' | '1080p5994' | '720p5994' ... +status text -- 'stopped' | 'starting' | 'running' | 'error' +container_id text -- running CasparCG sidecar +project_id uuid -> projects(id) -- RBAC scoping (nullable = admin-only) +created_at / updated_at +``` + +`output_type` + `output_config` map straight to a CasparCG consumer: +- `decklink` → `ADD DECKLINK ...` +- `ndi` → `ADD NDI ...` +- `srt`/`rtmp` → `ADD FFMPEG -f mpegts ...` (CasparCG 2.3+ ffmpeg consumer) + +### 1.2 `playout_playlists` +An ordered list of items bound to a channel. Phase A's primary object. + +``` +id, channel_id -> playout_channels(id) +name, loop boolean, created_at / updated_at +``` + +### 1.3 `playout_items` +One entry on a playlist OR one entry on the 24/7 timeline. + +``` +id +playlist_id uuid -> playout_playlists(id) -- Phase A +asset_id uuid -> assets(id) +sort_order int -- position in playlist (Phase A) +scheduled_at timestamptz -- wall-clock start (Phase B, null in A) +in_point numeric -- seconds, trim head (reuse subclip in/out from editor) +out_point numeric -- seconds, trim tail +transition text -- 'cut' | 'mix' | 'wipe' +transition_ms int +graphics jsonb -- optional CG/template overlay (Phase B+) +media_status text -- 'pending' | 'staging' | 'ready' | 'error' (see §4) +media_path text -- resolved path inside the CasparCG media volume +``` + +### 1.4 `playout_schedule` (Phase B) +Day-ahead, wall-clock-bound timeline rows. Same shape as `playout_items` but `scheduled_at` is authoritative and the scheduler tick (§5) drives transitions. Phase A can ignore this table. + +### 1.5 `playout_as_run` +Append-only log: what actually played, when, for how long. Compliance / billing. + +``` +id, channel_id, asset_id, item_id +started_at, ended_at, duration_s, result -- 'played' | 'skipped' | 'error' +``` + +--- + +## 2. Services & Components + +### 2.1 New sidecar: `services/playout/` (CasparCG wrapper) +A thin container: **CasparCG Server** + a small Node control shim exposing HTTP, the same way `capture` wraps ffmpeg. + +- Base image: official/community `casparcg/server` (Linux build with DeckLink + NDI + FFmpeg producers/consumers). +- Node shim (`src/index.js`): opens an AMCP TCP socket to local CasparCG, exposes: + - `POST /channel/start` → `ADD ` for the channel's output target + - `POST /play` → `PLAY - [transition]` + - `POST /loadbg` + `/play` → preview/cue then take (preview monitor) + - `POST /stop`, `GET /status` → `INFO ` (current clip, position, fps) + - playlist load → translate `playout_items` rows into a sequence of AMCP `LOADBG`/`PLAY` calls, advancing on `OnTransition` / end-of-clip events. +- Mirrors capture's status-polling contract so the UI monitor reuses existing plumbing. + +### 2.2 mam-api: `src/routes/playout.js` +CRUD + control, RBAC-scoped via the existing `assertProjectAccess` helper (channels carry `project_id`). + +``` +GET /playout/channels list (project-filtered) +POST /playout/channels create (edit on project) +POST /playout/channels/:id/start|stop spawn/kill CasparCG sidecar +GET /playout/channels/:id/status proxy engine INFO +POST /playout/channels/:id/play|pause|skip transport control +GET/POST/PUT/DELETE /playout/playlists... playlist + item CRUD, reorder +POST /playout/items/:id/stage kick S3→media-volume staging (§4) +GET /playout/channels/:id/asrun as-run log +``` + +Channel start/stop reuses `resolveNodeTarget()` + the Docker-socket / `node-agent /sidecar/start` split already in `recorders.js`. **Refactor opportunity:** lift that sidecar-spawn logic out of `recorders.js` into `src/orchestration/sidecar.js` so both recorders and playout share it (keep this small — only what both need). + +### 2.3 web-ui: `playout.html` + `public/playout.jsx` +New MCR page. Layout: + +``` +┌─ PREVIEW ───────────┬─ PROGRAM (on air) ──────┐ +│ [cued clip] │ [live output] ● ON AIR │ +│ TC / duration │ TC / remaining │ +│ [CUE] [TAKE] │ [PLAY][PAUSE][SKIP][STOP]│ +├─ MEDIA BIN ─────────┴──────────────────────────┤ +│ (draggable asset list, reuse asset browser) │ +├─ PLAYLIST / TIMELINE ──────────────────────────┤ +│ ▸ clip A ──▸ clip B ──▸ clip C (drag-drop) │ Phase A: ordered list +│ └ 24h time grid w/ now-bar │ Phase B: time-of-day grid +└────────────────────────────────────────────────┘ +``` + +- Drag-drop: reuse whatever the NLE editor timeline uses (check `editor.jsx`); assets drag from the bin into the playlist/grid. +- API via existing `ZAMPP_API.fetch` wrapper. +- Program monitor: HLS preview of the output — CasparCG can emit a second low-bitrate FFmpeg consumer to HLS, reusing the `/live/` HLS plumbing capture already uses. + +--- + +## 3. Channel placement & ports + +A DeckLink port is exclusive — same constraint capture already handles. A node's DeckLink port can be an **input (recorder)** or an **output (playout channel)**, never both at once. So: + +- Extend the capability/port-claim check: when starting a channel on `output_type=decklink`, verify the target node has that device index free (no active recorder, no active channel). +- NDI / SRT / RTMP outputs have no hardware contention → can stack many per node (GPU/CPU-bound only). +- Surface a unified "device map" (extend the existing cluster DeckLink-status endpoint) showing each port's role: idle / recording / playing-out. + +--- + +## 4. Media staging (the S3 ⇄ CasparCG gap) + +**The crux.** Assets live in S3 (`original_s3_key` / `proxy_s3_key`). CasparCG plays from a **local media folder**. Options: + +- **A — Pre-stage to a shared media volume (recommended).** Before a clip can go on air, download/symlink it from S3 to a CasparCG-visible volume (`/media`), set `playout_items.media_status='ready'` + `media_path`. A new BullMQ `playout-stage` job (reuses the worker pattern) does the pull. UI shows per-item readiness; TAKE is blocked until `ready`. Mirrors the growing-file SMB share already mounted for capture. +- **B — Stream from S3 via presigned URL.** CasparCG FFmpeg producer plays an HTTPS presigned URL directly. Zero staging, but seeking/trim and gapless transitions over network are fragile for broadcast. Acceptable as a fallback for SRT/RTMP, risky for SDI. + +**Decision:** Phase A uses **A** (stage proxies for preview, masters for air) with **B** available as a per-channel "low-latency / no-stage" toggle. Zac confirms the current masters play fine in CasparCG, so staging is a **plain S3→/media copy — no transcode-for-playout step**. (Validate on hardware during implementation, but the model does not assume a transcode stage.) + +--- + +## 5. Scheduling + +### Phase A — playlist player +No wall clock. Operator builds a `playout_playlists` row, drags items in, hits PLAY. The playout sidecar walks `playout_items` by `sort_order`, cueing the next clip during the current one (`LOADBG`) and taking it at end-of-clip with the configured transition. `loop` repeats. As-run logged per item. + +### Phase B — 24/7 continuous channel +Wall-clock timeline in `playout_schedule`. Reuse `src/scheduler.js`: +- Add a second tick (or extend the existing one) under the **same PG advisory lock pattern** — exactly-one-leader, so a multi-replica deploy doesn't double-fire. +- Tick responsibilities: stage upcoming items (look-ahead window), verify the on-air item matches the schedule, **fill gaps** (loop a filler/slate asset when the timeline has a hole — a channel must never go black), roll the day forward. +- As-run becomes the compliance record. + +--- + +## 6. Phasing / Milestones + +**Phase A — Playlist playout MVP** +1. Migration `029-playout.sql` (channels, playlists, items, as-run). +2. `services/playout/` sidecar: CasparCG image + AMCP control shim, single output target (start with **SRT or NDI** — no hardware needed for dev; DeckLink behind hardware check). +3. mam-api `routes/playout.js` — channel + playlist CRUD, start/stop, transport, RBAC. +4. `playout-stage` BullMQ job (S3 → /media). +5. web-ui `playout.html` — bin + drag-drop ordered playlist + program/preview monitors + transport. +6. DeckLink output on real hardware; port-contention check vs recorders. + +**Phase B — 24/7 continuous channel** +7. `playout_schedule` + time-of-day grid UI. +8. Scheduler tick (advisory-locked) — look-ahead staging, gap-fill/slate, day-roll. +9. As-run reporting view. +10. Graphics/CG overlay (logo bug, lower-thirds) via CasparCG templates. + +--- + +## 7. Open Questions (for review) + +**Resolved (2026-05-30):** +- ~~CasparCG packaging~~ → **build our own image.** Fetch DeckLink + NDI SDKs at build time (not redistributable — same as capture's DeckLink build). GL context for the mixer comes from GPU passthrough on a real node, or **Xvfb** (virtual framebuffer) where there's no display — community images run `--privileged` + X11 socket. Pin the NDI SDK version to what the server expects (`.so` version mismatch is the common docker failure). +- ~~Master codec playability~~ → Zac confirms masters **play fine**; no transcode-for-playout. Staging = plain S3→/media copy. +- ~~Management GUI~~ → **single Dragonflight `playout.html`** drives everything via AMCP; operator never touches CasparCG. +- ~~Audio loudness~~ → **pre-normalize at stage time** (Zac, 2026-05-30). `playout-stage` job runs ffmpeg `loudnorm` (EBU R128, target −23 LUFS, true-peak −1 dBTP) once, on the S3→/media copy. Output is the cached version CasparCG plays. Staging is no longer a pure copy — staging cost ≈ realtime CPU per clip on first stage; results are reusable across channels. Override (`media_status='ready'` + raw copy) available for clips already mastered to spec. +- ~~Frame rate~~ → **`1080p5994`** default for new channels (Zac, 2026-05-30). Progressive 1080 @ 59.94 fps. Per-channel override allowed (`video_format` column). Streaming-friendly (SRT/RTMP/NDI) and current SDI gear accepts it; matches capture's 59.94 cadence. +- ~~Preview latency~~ → **HLS v1** (Zac, 2026-05-30). Reuse capture's `/live/` HLS plumbing. CasparCG emits a second low-bitrate FFmpeg consumer to HLS. ~4–6s lag, fine for confidence monitor. Operator desk gets a real downstream monitor off the SDI/NDI output anyway. Revisit WebRTC if MCR operators complain. +- ~~Failover~~ → **auto-restart on healthy node** (Zac, 2026-05-30). Scheduler tick (§5) monitors `playout_sidecars` health (AMCP ping + container alive); on N missed checks marks the channel `error`, re-places it on another capability-matching node with a free output port, resumes the playlist from the next item after the as-run-logged on-air item. Gap = black/slate for ~5–30 s during respawn (operator sees a flag in the UI). **DeckLink channels are not auto-failed-over in v1** — device-index pinning makes the destination port non-trivial; v1 alerts and lets the operator move the channel. NDI/SRT/RTMP channels (no hardware contention) failover automatically. Tracked via `restart_count` + `last_restart_at` on `playout_channels`. + +**Still open:** +- (none — all §7 questions resolved 2026-05-30) + +--- + +## 8. Reused building blocks (already in the repo) + +| Need | Existing piece | +|------|----------------| +| Spawn engine container local/remote | `recorders.js` Docker-socket + `node-agent /sidecar/start` | +| Node capability / port model | `cluster_nodes.capabilities`, cluster DeckLink-status endpoint | +| Single-leader scheduled transitions | `src/scheduler.js` + PG advisory lock | +| Background media jobs | BullMQ worker (`services/worker`) | +| RBAC scoping | `src/auth/authz.js` `assertProjectAccess` (channel/project_id) | +| HLS preview plumbing | capture's `/live/` HLS output | +| Subclip in/out points | NLE editor in/out marking | +| API wrapper / SPA shell | `ZAMPP_API.fetch`, esbuild JSX pages |