dragonflight/docs/superpowers/specs/2026-05-30-playout-mcr-design.md
Zac 512267159a docs(playout): MCR design spec — Phase A playlist + Phase B 24/7
Single-doc design covering the playout subsystem: CasparCG-backed sidecars,
multi-channel placement, S3→/media staging, scheduling phases, the data
model, channel placement vs port contention.

§7 questions are answered inline (2026-05-30): −23 LUFS at stage time,
1080p5994 default, HLS preview v1, auto-restart-on-healthy-node failover
(DeckLink alert-only).
2026-05-30 14:02:25 +00:00

16 KiB
Raw Blame History

Wild Dragon MAM — Playout / Master Control (MCR)

Date: 2026-05-30 (revised 2026-05-30 — §7 closed) Status: APPROVED — implementation in progress (code drafted but uncommitted; see WORK_LOG_PLAYOUT.md) Author: Zac + Claude


Resolved Decisions

Question Decision
Playout engine CasparCG Server (orchestrated via AMCP), not ffmpeg-native
Channel count Multi-channel from start — N independent channels, placed across cluster nodes by capability (mirrors recorders)
Scheduling model Phased — Phase A: on-demand playlist player; Phase B: 24/7 continuous channel
Output targets SDI (DeckLink), NDI, SRT, RTMP — all via CasparCG consumers
Media source Assets live in S3; must be staged to a CasparCG-local media volume before play (see §4)
CasparCG packaging Build our own image (like capture/build-with-decklink.sh) — GL context via GPU passthrough or Xvfb; NDI + DeckLink SDKs fetched at build time (not redistributable)
Master codec playability Zac confirms current masters play fine in CasparCG — no transcode-for-playout step; staging is a plain S3→/media copy. Validate on hardware but do not gate on it
Management UI Single Dragonflight playout.html GUI drives everything via AMCP; operator never touches CasparCG directly

Overview

Playout adds master-control-room capability to Dragonflight: take library assets, arrange them on a timeline / playlist, and play them out continuously to a broadcast output — SDI via DeckLink, or stream via SRT / RTMP / NDI. Drag-and-drop scheduling, a program/preview monitor, and as-run logging.

This is the mirror image of the existing capture path. Capture is input → ffmpeg encode → S3. Playout is S3 asset → engine → output. We reuse three things wholesale:

  1. Cluster node + capability model — nodes already advertise DeckLink/Deltacast/GPU in cluster_nodes.capabilities; channels are placed on nodes that have a free output port, exactly as recorders claim input ports.
  2. Sidecar orchestration — mam-api spawns containers via the local Docker socket or the remote node-agent /sidecar/start. A CasparCG channel is just a different sidecar image.
  3. Scheduler tick + PG advisory locksrc/scheduler.js already runs a single-leader tick over a schedule table. Phase B's wall-clock channel reuses this pattern.

Why CasparCG over ffmpeg-native

The capture stack proves we can drive ffmpeg + DeckLink. But playout's hard part is gapless, frame-accurate, clean transitions between clips — every clip boundary in an ffmpeg-per-clip model is a black flash unless we engineer a concat-feeder. CasparCG solves this natively: a channel is a persistent output with a playlist, hard/mix/wipe transitions, layered graphics/logo (CG), and DeckLink/NDI/SRT/RTMP consumers built in. We orchestrate it over AMCP (its TCP control protocol) instead of reinventing a feeder. Trade: a new dependency + container image, and media must be on a CasparCG-visible disk (§4).


1. Data Model

New migration 029-playout.sql. Five tables.

1.1 playout_channels

A logical output. One channel → one engine instance → one output target.

id              uuid pk
name            text                 -- "Channel 1", "Pop-up SDI"
node_id         uuid -> cluster_nodes(id)   -- where the engine runs (null = primary)
output_type     text   -- 'decklink' | 'ndi' | 'srt' | 'rtmp'
output_config   jsonb  -- { device_index } | { ndi_name } | { url, latency } | { url, key }
video_format    text   -- '1080i5994' | '1080p5994' | '720p5994' ...
status          text   -- 'stopped' | 'starting' | 'running' | 'error'
container_id    text                 -- running CasparCG sidecar
project_id      uuid -> projects(id) -- RBAC scoping (nullable = admin-only)
created_at / updated_at

output_type + output_config map straight to a CasparCG consumer:

  • decklinkADD <ch> DECKLINK <device> ...
  • ndiADD <ch> NDI ...
  • srt/rtmpADD <ch> FFMPEG <url> -f mpegts ... (CasparCG 2.3+ ffmpeg consumer)

1.2 playout_playlists

An ordered list of items bound to a channel. Phase A's primary object.

id, channel_id -> playout_channels(id)
name, loop boolean, created_at / updated_at

1.3 playout_items

One entry on a playlist OR one entry on the 24/7 timeline.

id
playlist_id   uuid -> playout_playlists(id)   -- Phase A
asset_id      uuid -> assets(id)
sort_order    int                              -- position in playlist (Phase A)
scheduled_at  timestamptz                      -- wall-clock start (Phase B, null in A)
in_point      numeric   -- seconds, trim head (reuse subclip in/out from editor)
out_point     numeric   -- seconds, trim tail
transition    text      -- 'cut' | 'mix' | 'wipe'
transition_ms int
graphics      jsonb     -- optional CG/template overlay (Phase B+)
media_status  text      -- 'pending' | 'staging' | 'ready' | 'error'  (see §4)
media_path    text      -- resolved path inside the CasparCG media volume

1.4 playout_schedule (Phase B)

Day-ahead, wall-clock-bound timeline rows. Same shape as playout_items but scheduled_at is authoritative and the scheduler tick (§5) drives transitions. Phase A can ignore this table.

1.5 playout_as_run

Append-only log: what actually played, when, for how long. Compliance / billing.

id, channel_id, asset_id, item_id
started_at, ended_at, duration_s, result  -- 'played' | 'skipped' | 'error'

2. Services & Components

2.1 New sidecar: services/playout/ (CasparCG wrapper)

A thin container: CasparCG Server + a small Node control shim exposing HTTP, the same way capture wraps ffmpeg.

  • Base image: official/community casparcg/server (Linux build with DeckLink + NDI + FFmpeg producers/consumers).
  • Node shim (src/index.js): opens an AMCP TCP socket to local CasparCG, exposes:
    • POST /channel/startADD <ch> <consumer> for the channel's output target
    • POST /playPLAY <ch>-<layer> <media> [transition]
    • POST /loadbg + /play → preview/cue then take (preview monitor)
    • POST /stop, GET /statusINFO <ch> (current clip, position, fps)
    • playlist load → translate playout_items rows into a sequence of AMCP LOADBG/PLAY calls, advancing on OnTransition / end-of-clip events.
  • Mirrors capture's status-polling contract so the UI monitor reuses existing plumbing.

2.2 mam-api: src/routes/playout.js

CRUD + control, RBAC-scoped via the existing assertProjectAccess helper (channels carry project_id).

GET    /playout/channels                     list (project-filtered)
POST   /playout/channels                     create (edit on project)
POST   /playout/channels/:id/start|stop      spawn/kill CasparCG sidecar
GET    /playout/channels/:id/status          proxy engine INFO
POST   /playout/channels/:id/play|pause|skip transport control
GET/POST/PUT/DELETE /playout/playlists...    playlist + item CRUD, reorder
POST   /playout/items/:id/stage              kick S3→media-volume staging (§4)
GET    /playout/channels/:id/asrun           as-run log

Channel start/stop reuses resolveNodeTarget() + the Docker-socket / node-agent /sidecar/start split already in recorders.js. Refactor opportunity: lift that sidecar-spawn logic out of recorders.js into src/orchestration/sidecar.js so both recorders and playout share it (keep this small — only what both need).

2.3 web-ui: playout.html + public/playout.jsx

New MCR page. Layout:

┌─ PREVIEW ───────────┬─ PROGRAM (on air) ──────┐
│  [cued clip]        │  [live output] ● ON AIR  │
│  TC / duration      │  TC / remaining          │
│  [CUE] [TAKE]       │  [PLAY][PAUSE][SKIP][STOP]│
├─ MEDIA BIN ─────────┴──────────────────────────┤
│  (draggable asset list, reuse asset browser)   │
├─ PLAYLIST / TIMELINE ──────────────────────────┤
│  ▸ clip A ──▸ clip B ──▸ clip C   (drag-drop)  │  Phase A: ordered list
│  └ 24h time grid w/ now-bar                     │  Phase B: time-of-day grid
└────────────────────────────────────────────────┘
  • Drag-drop: reuse whatever the NLE editor timeline uses (check editor.jsx); assets drag from the bin into the playlist/grid.
  • API via existing ZAMPP_API.fetch wrapper.
  • Program monitor: HLS preview of the output — CasparCG can emit a second low-bitrate FFmpeg consumer to HLS, reusing the /live/<id> HLS plumbing capture already uses.

3. Channel placement & ports

A DeckLink port is exclusive — same constraint capture already handles. A node's DeckLink port can be an input (recorder) or an output (playout channel), never both at once. So:

  • Extend the capability/port-claim check: when starting a channel on output_type=decklink, verify the target node has that device index free (no active recorder, no active channel).
  • NDI / SRT / RTMP outputs have no hardware contention → can stack many per node (GPU/CPU-bound only).
  • Surface a unified "device map" (extend the existing cluster DeckLink-status endpoint) showing each port's role: idle / recording / playing-out.

4. Media staging (the S3 ⇄ CasparCG gap)

The crux. Assets live in S3 (original_s3_key / proxy_s3_key). CasparCG plays from a local media folder. Options:

  • A — Pre-stage to a shared media volume (recommended). Before a clip can go on air, download/symlink it from S3 to a CasparCG-visible volume (/media), set playout_items.media_status='ready' + media_path. A new BullMQ playout-stage job (reuses the worker pattern) does the pull. UI shows per-item readiness; TAKE is blocked until ready. Mirrors the growing-file SMB share already mounted for capture.
  • B — Stream from S3 via presigned URL. CasparCG FFmpeg producer plays an HTTPS presigned URL directly. Zero staging, but seeking/trim and gapless transitions over network are fragile for broadcast. Acceptable as a fallback for SRT/RTMP, risky for SDI.

Decision: Phase A uses A (stage proxies for preview, masters for air) with B available as a per-channel "low-latency / no-stage" toggle. Zac confirms the current masters play fine in CasparCG, so staging is a plain S3→/media copy — no transcode-for-playout step. (Validate on hardware during implementation, but the model does not assume a transcode stage.)


5. Scheduling

Phase A — playlist player

No wall clock. Operator builds a playout_playlists row, drags items in, hits PLAY. The playout sidecar walks playout_items by sort_order, cueing the next clip during the current one (LOADBG) and taking it at end-of-clip with the configured transition. loop repeats. As-run logged per item.

Phase B — 24/7 continuous channel

Wall-clock timeline in playout_schedule. Reuse src/scheduler.js:

  • Add a second tick (or extend the existing one) under the same PG advisory lock pattern — exactly-one-leader, so a multi-replica deploy doesn't double-fire.
  • Tick responsibilities: stage upcoming items (look-ahead window), verify the on-air item matches the schedule, fill gaps (loop a filler/slate asset when the timeline has a hole — a channel must never go black), roll the day forward.
  • As-run becomes the compliance record.

6. Phasing / Milestones

Phase A — Playlist playout MVP

  1. Migration 029-playout.sql (channels, playlists, items, as-run).
  2. services/playout/ sidecar: CasparCG image + AMCP control shim, single output target (start with SRT or NDI — no hardware needed for dev; DeckLink behind hardware check).
  3. mam-api routes/playout.js — channel + playlist CRUD, start/stop, transport, RBAC.
  4. playout-stage BullMQ job (S3 → /media).
  5. web-ui playout.html — bin + drag-drop ordered playlist + program/preview monitors + transport.
  6. DeckLink output on real hardware; port-contention check vs recorders.

Phase B — 24/7 continuous channel 7. playout_schedule + time-of-day grid UI. 8. Scheduler tick (advisory-locked) — look-ahead staging, gap-fill/slate, day-roll. 9. As-run reporting view. 10. Graphics/CG overlay (logo bug, lower-thirds) via CasparCG templates.


7. Open Questions (for review)

Resolved (2026-05-30):

  • CasparCG packagingbuild our own image. Fetch DeckLink + NDI SDKs at build time (not redistributable — same as capture's DeckLink build). GL context for the mixer comes from GPU passthrough on a real node, or Xvfb (virtual framebuffer) where there's no display — community images run --privileged + X11 socket. Pin the NDI SDK version to what the server expects (.so version mismatch is the common docker failure).
  • Master codec playability → Zac confirms masters play fine; no transcode-for-playout. Staging = plain S3→/media copy.
  • Management GUIsingle Dragonflight playout.html drives everything via AMCP; operator never touches CasparCG.
  • Audio loudnesspre-normalize at stage time (Zac, 2026-05-30). playout-stage job runs ffmpeg loudnorm (EBU R128, target 23 LUFS, true-peak 1 dBTP) once, on the S3→/media copy. Output is the cached version CasparCG plays. Staging is no longer a pure copy — staging cost ≈ realtime CPU per clip on first stage; results are reusable across channels. Override (media_status='ready' + raw copy) available for clips already mastered to spec.
  • Frame rate1080p5994 default for new channels (Zac, 2026-05-30). Progressive 1080 @ 59.94 fps. Per-channel override allowed (video_format column). Streaming-friendly (SRT/RTMP/NDI) and current SDI gear accepts it; matches capture's 59.94 cadence.
  • Preview latencyHLS v1 (Zac, 2026-05-30). Reuse capture's /live/<id> HLS plumbing. CasparCG emits a second low-bitrate FFmpeg consumer to HLS. ~46s lag, fine for confidence monitor. Operator desk gets a real downstream monitor off the SDI/NDI output anyway. Revisit WebRTC if MCR operators complain.
  • Failoverauto-restart on healthy node (Zac, 2026-05-30). Scheduler tick (§5) monitors playout_sidecars health (AMCP ping + container alive); on N missed checks marks the channel error, re-places it on another capability-matching node with a free output port, resumes the playlist from the next item after the as-run-logged on-air item. Gap = black/slate for ~530 s during respawn (operator sees a flag in the UI). DeckLink channels are not auto-failed-over in v1 — device-index pinning makes the destination port non-trivial; v1 alerts and lets the operator move the channel. NDI/SRT/RTMP channels (no hardware contention) failover automatically. Tracked via restart_count + last_restart_at on playout_channels.

Still open:

  • (none — all §7 questions resolved 2026-05-30)

8. Reused building blocks (already in the repo)

Need Existing piece
Spawn engine container local/remote recorders.js Docker-socket + node-agent /sidecar/start
Node capability / port model cluster_nodes.capabilities, cluster DeckLink-status endpoint
Single-leader scheduled transitions src/scheduler.js + PG advisory lock
Background media jobs BullMQ worker (services/worker)
RBAC scoping src/auth/authz.js assertProjectAccess (channel/project_id)
HLS preview plumbing capture's /live/<id> HLS output
Subclip in/out points NLE editor in/out marking
API wrapper / SPA shell ZAMPP_API.fetch, esbuild JSX pages