dragonflight/docs/superpowers/specs/2026-05-30-playout-mcr-design.md
Zac 512267159a docs(playout): MCR design spec — Phase A playlist + Phase B 24/7
Single-doc design covering the playout subsystem: CasparCG-backed sidecars,
multi-channel placement, S3→/media staging, scheduling phases, the data
model, channel placement vs port contention.

§7 questions are answered inline (2026-05-30): −23 LUFS at stage time,
1080p5994 default, HLS preview v1, auto-restart-on-healthy-node failover
(DeckLink alert-only).
2026-05-30 14:02:25 +00:00

235 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Wild Dragon MAM — Playout / Master Control (MCR)
**Date:** 2026-05-30 (revised 2026-05-30 — §7 closed)
**Status:** APPROVED — implementation in progress (code drafted but uncommitted; see WORK_LOG_PLAYOUT.md)
**Author:** Zac + Claude
---
## Resolved Decisions
| Question | Decision |
|----------|----------|
| Playout engine | **CasparCG Server** (orchestrated via AMCP), not ffmpeg-native |
| Channel count | **Multi-channel from start** — N independent channels, placed across cluster nodes by capability (mirrors recorders) |
| Scheduling model | **Phased** — Phase A: on-demand playlist player; Phase B: 24/7 continuous channel |
| Output targets | SDI (DeckLink), NDI, SRT, RTMP — all via CasparCG consumers |
| Media source | Assets live in **S3**; must be staged to a CasparCG-local media volume before play (see §4) |
| CasparCG packaging | **Build our own image** (like `capture/build-with-decklink.sh`) — GL context via GPU passthrough or Xvfb; NDI + DeckLink SDKs fetched at build time (not redistributable) |
| Master codec playability | Zac confirms current masters **play fine in CasparCG** — no transcode-for-playout step; staging is a plain S3→/media copy. Validate on hardware but do not gate on it |
| Management UI | **Single Dragonflight `playout.html` GUI** drives everything via AMCP; operator never touches CasparCG directly |
---
## Overview
Playout adds **master-control-room** capability to Dragonflight: take library assets, arrange them on a timeline / playlist, and play them out continuously to a broadcast output — SDI via DeckLink, or stream via SRT / RTMP / NDI. Drag-and-drop scheduling, a program/preview monitor, and as-run logging.
This is the **mirror image** of the existing capture path. Capture is `input → ffmpeg encode → S3`. Playout is `S3 asset → engine → output`. We reuse three things wholesale:
1. **Cluster node + capability model** — nodes already advertise DeckLink/Deltacast/GPU in `cluster_nodes.capabilities`; channels are placed on nodes that have a free output port, exactly as recorders claim input ports.
2. **Sidecar orchestration** — mam-api spawns containers via the local Docker socket or the remote `node-agent /sidecar/start`. A CasparCG channel is just a different sidecar image.
3. **Scheduler tick + PG advisory lock**`src/scheduler.js` already runs a single-leader tick over a schedule table. Phase B's wall-clock channel reuses this pattern.
### Why CasparCG over ffmpeg-native
The capture stack proves we can drive ffmpeg + DeckLink. But playout's hard part is **gapless, frame-accurate, clean transitions between clips** — every clip boundary in an ffmpeg-per-clip model is a black flash unless we engineer a concat-feeder. CasparCG solves this natively: a channel is a persistent output with a playlist, hard/mix/wipe transitions, layered graphics/logo (CG), and DeckLink/NDI/SRT/RTMP consumers built in. We orchestrate it over **AMCP** (its TCP control protocol) instead of reinventing a feeder. Trade: a new dependency + container image, and media must be on a CasparCG-visible disk (§4).
---
## 1. Data Model
New migration `029-playout.sql`. Five tables.
### 1.1 `playout_channels`
A logical output. One channel → one engine instance → one output target.
```
id uuid pk
name text -- "Channel 1", "Pop-up SDI"
node_id uuid -> cluster_nodes(id) -- where the engine runs (null = primary)
output_type text -- 'decklink' | 'ndi' | 'srt' | 'rtmp'
output_config jsonb -- { device_index } | { ndi_name } | { url, latency } | { url, key }
video_format text -- '1080i5994' | '1080p5994' | '720p5994' ...
status text -- 'stopped' | 'starting' | 'running' | 'error'
container_id text -- running CasparCG sidecar
project_id uuid -> projects(id) -- RBAC scoping (nullable = admin-only)
created_at / updated_at
```
`output_type` + `output_config` map straight to a CasparCG consumer:
- `decklink``ADD <ch> DECKLINK <device> ...`
- `ndi``ADD <ch> NDI ...`
- `srt`/`rtmp` → `ADD <ch> FFMPEG <url> -f mpegts ...` (CasparCG 2.3+ ffmpeg consumer)
### 1.2 `playout_playlists`
An ordered list of items bound to a channel. Phase A's primary object.
```
id, channel_id -> playout_channels(id)
name, loop boolean, created_at / updated_at
```
### 1.3 `playout_items`
One entry on a playlist OR one entry on the 24/7 timeline.
```
id
playlist_id uuid -> playout_playlists(id) -- Phase A
asset_id uuid -> assets(id)
sort_order int -- position in playlist (Phase A)
scheduled_at timestamptz -- wall-clock start (Phase B, null in A)
in_point numeric -- seconds, trim head (reuse subclip in/out from editor)
out_point numeric -- seconds, trim tail
transition text -- 'cut' | 'mix' | 'wipe'
transition_ms int
graphics jsonb -- optional CG/template overlay (Phase B+)
media_status text -- 'pending' | 'staging' | 'ready' | 'error' (see §4)
media_path text -- resolved path inside the CasparCG media volume
```
### 1.4 `playout_schedule` (Phase B)
Day-ahead, wall-clock-bound timeline rows. Same shape as `playout_items` but `scheduled_at` is authoritative and the scheduler tick (§5) drives transitions. Phase A can ignore this table.
### 1.5 `playout_as_run`
Append-only log: what actually played, when, for how long. Compliance / billing.
```
id, channel_id, asset_id, item_id
started_at, ended_at, duration_s, result -- 'played' | 'skipped' | 'error'
```
---
## 2. Services & Components
### 2.1 New sidecar: `services/playout/` (CasparCG wrapper)
A thin container: **CasparCG Server** + a small Node control shim exposing HTTP, the same way `capture` wraps ffmpeg.
- Base image: official/community `casparcg/server` (Linux build with DeckLink + NDI + FFmpeg producers/consumers).
- Node shim (`src/index.js`): opens an AMCP TCP socket to local CasparCG, exposes:
- `POST /channel/start``ADD <ch> <consumer>` for the channel's output target
- `POST /play``PLAY <ch>-<layer> <media> [transition]`
- `POST /loadbg` + `/play` → preview/cue then take (preview monitor)
- `POST /stop`, `GET /status``INFO <ch>` (current clip, position, fps)
- playlist load → translate `playout_items` rows into a sequence of AMCP `LOADBG`/`PLAY` calls, advancing on `OnTransition` / end-of-clip events.
- Mirrors capture's status-polling contract so the UI monitor reuses existing plumbing.
### 2.2 mam-api: `src/routes/playout.js`
CRUD + control, RBAC-scoped via the existing `assertProjectAccess` helper (channels carry `project_id`).
```
GET /playout/channels list (project-filtered)
POST /playout/channels create (edit on project)
POST /playout/channels/:id/start|stop spawn/kill CasparCG sidecar
GET /playout/channels/:id/status proxy engine INFO
POST /playout/channels/:id/play|pause|skip transport control
GET/POST/PUT/DELETE /playout/playlists... playlist + item CRUD, reorder
POST /playout/items/:id/stage kick S3→media-volume staging (§4)
GET /playout/channels/:id/asrun as-run log
```
Channel start/stop reuses `resolveNodeTarget()` + the Docker-socket / `node-agent /sidecar/start` split already in `recorders.js`. **Refactor opportunity:** lift that sidecar-spawn logic out of `recorders.js` into `src/orchestration/sidecar.js` so both recorders and playout share it (keep this small — only what both need).
### 2.3 web-ui: `playout.html` + `public/playout.jsx`
New MCR page. Layout:
```
┌─ PREVIEW ───────────┬─ PROGRAM (on air) ──────┐
│ [cued clip] │ [live output] ● ON AIR │
│ TC / duration │ TC / remaining │
│ [CUE] [TAKE] │ [PLAY][PAUSE][SKIP][STOP]│
├─ MEDIA BIN ─────────┴──────────────────────────┤
│ (draggable asset list, reuse asset browser) │
├─ PLAYLIST / TIMELINE ──────────────────────────┤
│ ▸ clip A ──▸ clip B ──▸ clip C (drag-drop) │ Phase A: ordered list
│ └ 24h time grid w/ now-bar │ Phase B: time-of-day grid
└────────────────────────────────────────────────┘
```
- Drag-drop: reuse whatever the NLE editor timeline uses (check `editor.jsx`); assets drag from the bin into the playlist/grid.
- API via existing `ZAMPP_API.fetch` wrapper.
- Program monitor: HLS preview of the output — CasparCG can emit a second low-bitrate FFmpeg consumer to HLS, reusing the `/live/<id>` HLS plumbing capture already uses.
---
## 3. Channel placement & ports
A DeckLink port is exclusive — same constraint capture already handles. A node's DeckLink port can be an **input (recorder)** or an **output (playout channel)**, never both at once. So:
- Extend the capability/port-claim check: when starting a channel on `output_type=decklink`, verify the target node has that device index free (no active recorder, no active channel).
- NDI / SRT / RTMP outputs have no hardware contention → can stack many per node (GPU/CPU-bound only).
- Surface a unified "device map" (extend the existing cluster DeckLink-status endpoint) showing each port's role: idle / recording / playing-out.
---
## 4. Media staging (the S3 ⇄ CasparCG gap)
**The crux.** Assets live in S3 (`original_s3_key` / `proxy_s3_key`). CasparCG plays from a **local media folder**. Options:
- **A — Pre-stage to a shared media volume (recommended).** Before a clip can go on air, download/symlink it from S3 to a CasparCG-visible volume (`/media`), set `playout_items.media_status='ready'` + `media_path`. A new BullMQ `playout-stage` job (reuses the worker pattern) does the pull. UI shows per-item readiness; TAKE is blocked until `ready`. Mirrors the growing-file SMB share already mounted for capture.
- **B — Stream from S3 via presigned URL.** CasparCG FFmpeg producer plays an HTTPS presigned URL directly. Zero staging, but seeking/trim and gapless transitions over network are fragile for broadcast. Acceptable as a fallback for SRT/RTMP, risky for SDI.
**Decision:** Phase A uses **A** (stage proxies for preview, masters for air) with **B** available as a per-channel "low-latency / no-stage" toggle. Zac confirms the current masters play fine in CasparCG, so staging is a **plain S3→/media copy — no transcode-for-playout step**. (Validate on hardware during implementation, but the model does not assume a transcode stage.)
---
## 5. Scheduling
### Phase A — playlist player
No wall clock. Operator builds a `playout_playlists` row, drags items in, hits PLAY. The playout sidecar walks `playout_items` by `sort_order`, cueing the next clip during the current one (`LOADBG`) and taking it at end-of-clip with the configured transition. `loop` repeats. As-run logged per item.
### Phase B — 24/7 continuous channel
Wall-clock timeline in `playout_schedule`. Reuse `src/scheduler.js`:
- Add a second tick (or extend the existing one) under the **same PG advisory lock pattern** — exactly-one-leader, so a multi-replica deploy doesn't double-fire.
- Tick responsibilities: stage upcoming items (look-ahead window), verify the on-air item matches the schedule, **fill gaps** (loop a filler/slate asset when the timeline has a hole — a channel must never go black), roll the day forward.
- As-run becomes the compliance record.
---
## 6. Phasing / Milestones
**Phase A — Playlist playout MVP**
1. Migration `029-playout.sql` (channels, playlists, items, as-run).
2. `services/playout/` sidecar: CasparCG image + AMCP control shim, single output target (start with **SRT or NDI** — no hardware needed for dev; DeckLink behind hardware check).
3. mam-api `routes/playout.js` — channel + playlist CRUD, start/stop, transport, RBAC.
4. `playout-stage` BullMQ job (S3 → /media).
5. web-ui `playout.html` — bin + drag-drop ordered playlist + program/preview monitors + transport.
6. DeckLink output on real hardware; port-contention check vs recorders.
**Phase B — 24/7 continuous channel**
7. `playout_schedule` + time-of-day grid UI.
8. Scheduler tick (advisory-locked) — look-ahead staging, gap-fill/slate, day-roll.
9. As-run reporting view.
10. Graphics/CG overlay (logo bug, lower-thirds) via CasparCG templates.
---
## 7. Open Questions (for review)
**Resolved (2026-05-30):**
- ~~CasparCG packaging~~ → **build our own image.** Fetch DeckLink + NDI SDKs at build time (not redistributable — same as capture's DeckLink build). GL context for the mixer comes from GPU passthrough on a real node, or **Xvfb** (virtual framebuffer) where there's no display — community images run `--privileged` + X11 socket. Pin the NDI SDK version to what the server expects (`.so` version mismatch is the common docker failure).
- ~~Master codec playability~~ → Zac confirms masters **play fine**; no transcode-for-playout. Staging = plain S3→/media copy.
- ~~Management GUI~~ → **single Dragonflight `playout.html`** drives everything via AMCP; operator never touches CasparCG.
- ~~Audio loudness~~ → **pre-normalize at stage time** (Zac, 2026-05-30). `playout-stage` job runs ffmpeg `loudnorm` (EBU R128, target 23 LUFS, true-peak 1 dBTP) once, on the S3→/media copy. Output is the cached version CasparCG plays. Staging is no longer a pure copy — staging cost ≈ realtime CPU per clip on first stage; results are reusable across channels. Override (`media_status='ready'` + raw copy) available for clips already mastered to spec.
- ~~Frame rate~~ → **`1080p5994`** default for new channels (Zac, 2026-05-30). Progressive 1080 @ 59.94 fps. Per-channel override allowed (`video_format` column). Streaming-friendly (SRT/RTMP/NDI) and current SDI gear accepts it; matches capture's 59.94 cadence.
- ~~Preview latency~~ → **HLS v1** (Zac, 2026-05-30). Reuse capture's `/live/<id>` HLS plumbing. CasparCG emits a second low-bitrate FFmpeg consumer to HLS. ~46s lag, fine for confidence monitor. Operator desk gets a real downstream monitor off the SDI/NDI output anyway. Revisit WebRTC if MCR operators complain.
- ~~Failover~~ → **auto-restart on healthy node** (Zac, 2026-05-30). Scheduler tick (§5) monitors `playout_sidecars` health (AMCP ping + container alive); on N missed checks marks the channel `error`, re-places it on another capability-matching node with a free output port, resumes the playlist from the next item after the as-run-logged on-air item. Gap = black/slate for ~530 s during respawn (operator sees a flag in the UI). **DeckLink channels are not auto-failed-over in v1** — device-index pinning makes the destination port non-trivial; v1 alerts and lets the operator move the channel. NDI/SRT/RTMP channels (no hardware contention) failover automatically. Tracked via `restart_count` + `last_restart_at` on `playout_channels`.
**Still open:**
- (none — all §7 questions resolved 2026-05-30)
---
## 8. Reused building blocks (already in the repo)
| Need | Existing piece |
|------|----------------|
| Spawn engine container local/remote | `recorders.js` Docker-socket + `node-agent /sidecar/start` |
| Node capability / port model | `cluster_nodes.capabilities`, cluster DeckLink-status endpoint |
| Single-leader scheduled transitions | `src/scheduler.js` + PG advisory lock |
| Background media jobs | BullMQ worker (`services/worker`) |
| RBAC scoping | `src/auth/authz.js` `assertProjectAccess` (channel/project_id) |
| HLS preview plumbing | capture's `/live/<id>` HLS output |
| Subclip in/out points | NLE editor in/out marking |
| API wrapper / SPA shell | `ZAMPP_API.fetch`, esbuild JSX pages |