dragonflight/docs/superpowers/specs/2026-05-30-playout-mcr-design.md

236 lines
16 KiB
Markdown
Raw Normal View History

# Wild Dragon MAM — Playout / Master Control (MCR)
**Date:** 2026-05-30 (revised 2026-05-30 — §7 closed)
**Status:** APPROVED — implementation in progress (code drafted but uncommitted; see WORK_LOG_PLAYOUT.md)
**Author:** Zac + Claude
---
## Resolved Decisions
| Question | Decision |
|----------|----------|
| Playout engine | **CasparCG Server** (orchestrated via AMCP), not ffmpeg-native |
| Channel count | **Multi-channel from start** — N independent channels, placed across cluster nodes by capability (mirrors recorders) |
| Scheduling model | **Phased** — Phase A: on-demand playlist player; Phase B: 24/7 continuous channel |
| Output targets | SDI (DeckLink), NDI, SRT, RTMP — all via CasparCG consumers |
| Media source | Assets live in **S3**; must be staged to a CasparCG-local media volume before play (see §4) |
| CasparCG packaging | **Build our own image** (like `capture/build-with-decklink.sh`) — GL context via GPU passthrough or Xvfb; NDI + DeckLink SDKs fetched at build time (not redistributable) |
| Master codec playability | Zac confirms current masters **play fine in CasparCG** — no transcode-for-playout step; staging is a plain S3→/media copy. Validate on hardware but do not gate on it |
| Management UI | **Single Dragonflight `playout.html` GUI** drives everything via AMCP; operator never touches CasparCG directly |
---
## Overview
Playout adds **master-control-room** capability to Dragonflight: take library assets, arrange them on a timeline / playlist, and play them out continuously to a broadcast output — SDI via DeckLink, or stream via SRT / RTMP / NDI. Drag-and-drop scheduling, a program/preview monitor, and as-run logging.
This is the **mirror image** of the existing capture path. Capture is `input → ffmpeg encode → S3`. Playout is `S3 asset → engine → output`. We reuse three things wholesale:
1. **Cluster node + capability model** — nodes already advertise DeckLink/Deltacast/GPU in `cluster_nodes.capabilities`; channels are placed on nodes that have a free output port, exactly as recorders claim input ports.
2. **Sidecar orchestration** — mam-api spawns containers via the local Docker socket or the remote `node-agent /sidecar/start`. A CasparCG channel is just a different sidecar image.
3. **Scheduler tick + PG advisory lock**`src/scheduler.js` already runs a single-leader tick over a schedule table. Phase B's wall-clock channel reuses this pattern.
### Why CasparCG over ffmpeg-native
The capture stack proves we can drive ffmpeg + DeckLink. But playout's hard part is **gapless, frame-accurate, clean transitions between clips** — every clip boundary in an ffmpeg-per-clip model is a black flash unless we engineer a concat-feeder. CasparCG solves this natively: a channel is a persistent output with a playlist, hard/mix/wipe transitions, layered graphics/logo (CG), and DeckLink/NDI/SRT/RTMP consumers built in. We orchestrate it over **AMCP** (its TCP control protocol) instead of reinventing a feeder. Trade: a new dependency + container image, and media must be on a CasparCG-visible disk (§4).
---
## 1. Data Model
New migration `029-playout.sql`. Five tables.
### 1.1 `playout_channels`
A logical output. One channel → one engine instance → one output target.
```
id uuid pk
name text -- "Channel 1", "Pop-up SDI"
node_id uuid -> cluster_nodes(id) -- where the engine runs (null = primary)
output_type text -- 'decklink' | 'ndi' | 'srt' | 'rtmp'
output_config jsonb -- { device_index } | { ndi_name } | { url, latency } | { url, key }
video_format text -- '1080i5994' | '1080p5994' | '720p5994' ...
status text -- 'stopped' | 'starting' | 'running' | 'error'
container_id text -- running CasparCG sidecar
project_id uuid -> projects(id) -- RBAC scoping (nullable = admin-only)
created_at / updated_at
```
`output_type` + `output_config` map straight to a CasparCG consumer:
- `decklink``ADD <ch> DECKLINK <device> ...`
- `ndi``ADD <ch> NDI ...`
- `srt`/`rtmp` → `ADD <ch> FFMPEG <url> -f mpegts ...` (CasparCG 2.3+ ffmpeg consumer)
### 1.2 `playout_playlists`
An ordered list of items bound to a channel. Phase A's primary object.
```
id, channel_id -> playout_channels(id)
name, loop boolean, created_at / updated_at
```
### 1.3 `playout_items`
One entry on a playlist OR one entry on the 24/7 timeline.
```
id
playlist_id uuid -> playout_playlists(id) -- Phase A
asset_id uuid -> assets(id)
sort_order int -- position in playlist (Phase A)
scheduled_at timestamptz -- wall-clock start (Phase B, null in A)
in_point numeric -- seconds, trim head (reuse subclip in/out from editor)
out_point numeric -- seconds, trim tail
transition text -- 'cut' | 'mix' | 'wipe'
transition_ms int
graphics jsonb -- optional CG/template overlay (Phase B+)
media_status text -- 'pending' | 'staging' | 'ready' | 'error' (see §4)
media_path text -- resolved path inside the CasparCG media volume
```
### 1.4 `playout_schedule` (Phase B)
Day-ahead, wall-clock-bound timeline rows. Same shape as `playout_items` but `scheduled_at` is authoritative and the scheduler tick (§5) drives transitions. Phase A can ignore this table.
### 1.5 `playout_as_run`
Append-only log: what actually played, when, for how long. Compliance / billing.
```
id, channel_id, asset_id, item_id
started_at, ended_at, duration_s, result -- 'played' | 'skipped' | 'error'
```
---
## 2. Services & Components
### 2.1 New sidecar: `services/playout/` (CasparCG wrapper)
A thin container: **CasparCG Server** + a small Node control shim exposing HTTP, the same way `capture` wraps ffmpeg.
- Base image: official/community `casparcg/server` (Linux build with DeckLink + NDI + FFmpeg producers/consumers).
- Node shim (`src/index.js`): opens an AMCP TCP socket to local CasparCG, exposes:
- `POST /channel/start``ADD <ch> <consumer>` for the channel's output target
- `POST /play``PLAY <ch>-<layer> <media> [transition]`
- `POST /loadbg` + `/play` → preview/cue then take (preview monitor)
- `POST /stop`, `GET /status``INFO <ch>` (current clip, position, fps)
- playlist load → translate `playout_items` rows into a sequence of AMCP `LOADBG`/`PLAY` calls, advancing on `OnTransition` / end-of-clip events.
- Mirrors capture's status-polling contract so the UI monitor reuses existing plumbing.
### 2.2 mam-api: `src/routes/playout.js`
CRUD + control, RBAC-scoped via the existing `assertProjectAccess` helper (channels carry `project_id`).
```
GET /playout/channels list (project-filtered)
POST /playout/channels create (edit on project)
POST /playout/channels/:id/start|stop spawn/kill CasparCG sidecar
GET /playout/channels/:id/status proxy engine INFO
POST /playout/channels/:id/play|pause|skip transport control
GET/POST/PUT/DELETE /playout/playlists... playlist + item CRUD, reorder
POST /playout/items/:id/stage kick S3→media-volume staging (§4)
GET /playout/channels/:id/asrun as-run log
```
Channel start/stop reuses `resolveNodeTarget()` + the Docker-socket / `node-agent /sidecar/start` split already in `recorders.js`. **Refactor opportunity:** lift that sidecar-spawn logic out of `recorders.js` into `src/orchestration/sidecar.js` so both recorders and playout share it (keep this small — only what both need).
### 2.3 web-ui: `playout.html` + `public/playout.jsx`
New MCR page. Layout:
```
┌─ PREVIEW ───────────┬─ PROGRAM (on air) ──────┐
│ [cued clip] │ [live output] ● ON AIR │
│ TC / duration │ TC / remaining │
│ [CUE] [TAKE] │ [PLAY][PAUSE][SKIP][STOP]│
├─ MEDIA BIN ─────────┴──────────────────────────┤
│ (draggable asset list, reuse asset browser) │
├─ PLAYLIST / TIMELINE ──────────────────────────┤
│ ▸ clip A ──▸ clip B ──▸ clip C (drag-drop) │ Phase A: ordered list
│ └ 24h time grid w/ now-bar │ Phase B: time-of-day grid
└────────────────────────────────────────────────┘
```
- Drag-drop: reuse whatever the NLE editor timeline uses (check `editor.jsx`); assets drag from the bin into the playlist/grid.
- API via existing `ZAMPP_API.fetch` wrapper.
- Program monitor: HLS preview of the output — CasparCG can emit a second low-bitrate FFmpeg consumer to HLS, reusing the `/live/<id>` HLS plumbing capture already uses.
---
## 3. Channel placement & ports
A DeckLink port is exclusive — same constraint capture already handles. A node's DeckLink port can be an **input (recorder)** or an **output (playout channel)**, never both at once. So:
- Extend the capability/port-claim check: when starting a channel on `output_type=decklink`, verify the target node has that device index free (no active recorder, no active channel).
- NDI / SRT / RTMP outputs have no hardware contention → can stack many per node (GPU/CPU-bound only).
- Surface a unified "device map" (extend the existing cluster DeckLink-status endpoint) showing each port's role: idle / recording / playing-out.
---
## 4. Media staging (the S3 ⇄ CasparCG gap)
**The crux.** Assets live in S3 (`original_s3_key` / `proxy_s3_key`). CasparCG plays from a **local media folder**. Options:
- **A — Pre-stage to a shared media volume (recommended).** Before a clip can go on air, download/symlink it from S3 to a CasparCG-visible volume (`/media`), set `playout_items.media_status='ready'` + `media_path`. A new BullMQ `playout-stage` job (reuses the worker pattern) does the pull. UI shows per-item readiness; TAKE is blocked until `ready`. Mirrors the growing-file SMB share already mounted for capture.
- **B — Stream from S3 via presigned URL.** CasparCG FFmpeg producer plays an HTTPS presigned URL directly. Zero staging, but seeking/trim and gapless transitions over network are fragile for broadcast. Acceptable as a fallback for SRT/RTMP, risky for SDI.
**Decision:** Phase A uses **A** (stage proxies for preview, masters for air) with **B** available as a per-channel "low-latency / no-stage" toggle. Zac confirms the current masters play fine in CasparCG, so staging is a **plain S3→/media copy — no transcode-for-playout step**. (Validate on hardware during implementation, but the model does not assume a transcode stage.)
---
## 5. Scheduling
### Phase A — playlist player
No wall clock. Operator builds a `playout_playlists` row, drags items in, hits PLAY. The playout sidecar walks `playout_items` by `sort_order`, cueing the next clip during the current one (`LOADBG`) and taking it at end-of-clip with the configured transition. `loop` repeats. As-run logged per item.
### Phase B — 24/7 continuous channel
Wall-clock timeline in `playout_schedule`. Reuse `src/scheduler.js`:
- Add a second tick (or extend the existing one) under the **same PG advisory lock pattern** — exactly-one-leader, so a multi-replica deploy doesn't double-fire.
- Tick responsibilities: stage upcoming items (look-ahead window), verify the on-air item matches the schedule, **fill gaps** (loop a filler/slate asset when the timeline has a hole — a channel must never go black), roll the day forward.
- As-run becomes the compliance record.
---
## 6. Phasing / Milestones
**Phase A — Playlist playout MVP**
1. Migration `029-playout.sql` (channels, playlists, items, as-run).
2. `services/playout/` sidecar: CasparCG image + AMCP control shim, single output target (start with **SRT or NDI** — no hardware needed for dev; DeckLink behind hardware check).
3. mam-api `routes/playout.js` — channel + playlist CRUD, start/stop, transport, RBAC.
4. `playout-stage` BullMQ job (S3 → /media).
5. web-ui `playout.html` — bin + drag-drop ordered playlist + program/preview monitors + transport.
6. DeckLink output on real hardware; port-contention check vs recorders.
**Phase B — 24/7 continuous channel**
7. `playout_schedule` + time-of-day grid UI.
8. Scheduler tick (advisory-locked) — look-ahead staging, gap-fill/slate, day-roll.
9. As-run reporting view.
10. Graphics/CG overlay (logo bug, lower-thirds) via CasparCG templates.
---
## 7. Open Questions (for review)
**Resolved (2026-05-30):**
- ~~CasparCG packaging~~ → **build our own image.** Fetch DeckLink + NDI SDKs at build time (not redistributable — same as capture's DeckLink build). GL context for the mixer comes from GPU passthrough on a real node, or **Xvfb** (virtual framebuffer) where there's no display — community images run `--privileged` + X11 socket. Pin the NDI SDK version to what the server expects (`.so` version mismatch is the common docker failure).
- ~~Master codec playability~~ → Zac confirms masters **play fine**; no transcode-for-playout. Staging = plain S3→/media copy.
- ~~Management GUI~~ → **single Dragonflight `playout.html`** drives everything via AMCP; operator never touches CasparCG.
- ~~Audio loudness~~ → **pre-normalize at stage time** (Zac, 2026-05-30). `playout-stage` job runs ffmpeg `loudnorm` (EBU R128, target 23 LUFS, true-peak 1 dBTP) once, on the S3→/media copy. Output is the cached version CasparCG plays. Staging is no longer a pure copy — staging cost ≈ realtime CPU per clip on first stage; results are reusable across channels. Override (`media_status='ready'` + raw copy) available for clips already mastered to spec.
- ~~Frame rate~~ → **`1080p5994`** default for new channels (Zac, 2026-05-30). Progressive 1080 @ 59.94 fps. Per-channel override allowed (`video_format` column). Streaming-friendly (SRT/RTMP/NDI) and current SDI gear accepts it; matches capture's 59.94 cadence.
- ~~Preview latency~~ → **HLS v1** (Zac, 2026-05-30). Reuse capture's `/live/<id>` HLS plumbing. CasparCG emits a second low-bitrate FFmpeg consumer to HLS. ~46s lag, fine for confidence monitor. Operator desk gets a real downstream monitor off the SDI/NDI output anyway. Revisit WebRTC if MCR operators complain.
- ~~Failover~~ → **auto-restart on healthy node** (Zac, 2026-05-30). Scheduler tick (§5) monitors `playout_sidecars` health (AMCP ping + container alive); on N missed checks marks the channel `error`, re-places it on another capability-matching node with a free output port, resumes the playlist from the next item after the as-run-logged on-air item. Gap = black/slate for ~530 s during respawn (operator sees a flag in the UI). **DeckLink channels are not auto-failed-over in v1** — device-index pinning makes the destination port non-trivial; v1 alerts and lets the operator move the channel. NDI/SRT/RTMP channels (no hardware contention) failover automatically. Tracked via `restart_count` + `last_restart_at` on `playout_channels`.
**Still open:**
- (none — all §7 questions resolved 2026-05-30)
---
## 8. Reused building blocks (already in the repo)
| Need | Existing piece |
|------|----------------|
| Spawn engine container local/remote | `recorders.js` Docker-socket + `node-agent /sidecar/start` |
| Node capability / port model | `cluster_nodes.capabilities`, cluster DeckLink-status endpoint |
| Single-leader scheduled transitions | `src/scheduler.js` + PG advisory lock |
| Background media jobs | BullMQ worker (`services/worker`) |
| RBAC scoping | `src/auth/authz.js` `assertProjectAccess` (channel/project_id) |
| HLS preview plumbing | capture's `/live/<id>` HLS output |
| Subclip in/out points | NLE editor in/out marking |
| API wrapper / SPA shell | `ZAMPP_API.fetch`, esbuild JSX pages |