From 0f6c715a30044951dabcb1056b457f5be87e4496 Mon Sep 17 00:00:00 2001 From: Zac Gaetano Date: Fri, 29 May 2026 04:16:17 +0000 Subject: [PATCH] docs: All-Intra HEVC (NVENC) growing-file ingest design Captures the current working system (capture sidecar, finalize flow, live monitor, capability-routed GPU worker pool, deploy gotchas) and the target design: GPU All-Intra HEVC master to offload the ProRes CPU wall while keeping edit-while-record, scaling to 8 signals + multi-vendor (Blackmagic/Deltacast/AJA). Includes a validation gate (prove Premiere growing-HEVC edit on one channel first). Co-Authored-By: Claude Opus 4.8 --- .../2026-05-29-all-intra-hevc-ingest.md | 197 ++++++++++++++++++ 1 file changed, 197 insertions(+) create mode 100644 docs/design/2026-05-29-all-intra-hevc-ingest.md diff --git a/docs/design/2026-05-29-all-intra-hevc-ingest.md b/docs/design/2026-05-29-all-intra-hevc-ingest.md new file mode 100644 index 0000000..104e2c1 --- /dev/null +++ b/docs/design/2026-05-29-all-intra-hevc-ingest.md @@ -0,0 +1,197 @@ +# All-Intra HEVC (NVENC) Growing-File Ingest + +Date: 2026-05-29 | Status: design, pending validation gate (see §8) +Authors: Zac + Claude + +## 1. Purpose + +Replace the CPU-bound ProRes capture encode with **All-Intra HEVC on NVENC** +as the growing-file master codec, so we can: + +- **Offload ingest encode from CPU to GPU** (the current scaling wall), and +- **Keep edit-while-record** (all-intra => growing file stays editable), and +- **Scale to up to 8 simultaneous signals per machine**, across Blackmagic + today and Deltacast + AJA later. + +This doc captures the target design AND the current working system it builds on, +so it is self-contained for whoever implements it. + +## 2. Why this codec + +Growing-file editing (Premiere/Avid mounting a still-recording file over SMB) +requires two things: **intra-frame** (every frame a keyframe, so a partial file +is decodable to the last whole frame) and a **container whose index is not +deferred to EOF**. ProRes/DNxHR satisfy this but are CPU-only (NVIDIA has no +ProRes encoder). Long-GOP H.264/HEVC/AV1 do NOT work for edit-while-record. + +**All-Intra HEVC (`-g 1 -bf 0`) via `hevc_nvenc`** is the one path that is both +GPU-accelerated AND all-intra: it breaks the "ProRes must be CPU" constraint +without losing edit-while-record. Trade-off: All-Intra bitrate approaches +ProRes, so the win is **CPU offload, not storage**. AV1 is rejected (no NLE +edit support; av1_nvenc absent from our ffmpeg builds). + +## 3. Current working system (what we build on) + +### Topology +- **zampp1** (172.18.91.200): primary. Runs db (postgres), queue (redis), + mam-api (:47432), web-ui (:47434), and the GPU worker pool. GPUs: Tesla P4 + + 2x Quadro P400. Repo at /opt/wild-dragon (its own clone). +- **zampp2** (172.18.91.216): worker/capture node. 12-vCPU QEMU VM, NVIDIA L4, + 4x Blackmagic DeckLink (exposed as /dev/blackmagic/io0..io3). Runs node-agent + (:7436). Repo at /opt/wild-dragon (separate clone). +- The repo is checked out independently on BOTH nodes; node-specific files + (node-agent, capture, worker overlay) are edited on the node that runs them. + +### Capture (current) +mam-api `POST /recorders/:id/start` pre-creates a `live` asset and dispatches +`POST /sidecar/start` to the recorder's node-agent, which spawns a +`wild-dragon-capture:latest` container (host network, privileged, +/dev/blackmagic bound). The capture ffmpeg: +- input: `-f decklink -i "DeckLink Duo (N)"` +- filter: `yadif` (CPU deinterlace) +- output 0 (master): `prores_ks` (CPU) -> S3 (pipe) or growing SMB file +- output 1 (preview): `libx264` veryfast HLS -> /live/{assetId} (CPU) +DeckLink does capture (cheap); BOTH encodes are CPU. ~5 vCPU per 1080i signal +=> ~2 signals saturate the 12-vCPU VM. GPUs are idle during capture. + +### Stop / finalize (working) +node-agent stops the sidecar with a **180s grace** (was 10s -> SIGKILL bug). +Capture's SIGTERM handler finalises the session and calls +`POST /assets/:id/finalize` (the live asset id passed as ASSET_ID), which flips +the asset out of `live`, records duration + S3 keys, and kicks the +proxy -> thumbnail -> filmstrip chain. (Earlier 409 bug: it used to POST a new +asset and collide with the live row.) + +### Live monitor (working) +SDI HLS preview is a 2nd output of the capture ffmpeg (one DeckLink read -> +split -> ProRes + H.264 HLS), written to /live/{assetId} on the capture node. +node-agent serves GET /live/* over HTTP; mam-api proxies +GET /api/v1/recorders/:id/live/* to the recorder's node-agent; the web-ui +HlsPreview loads the proxied URL. Browser auth is the session cookie +(same-origin). + +### GPU worker pool (working, post-capture) +BullMQ on shared Redis; queues are type-named (proxy/thumbnail/filmstrip/ +conform/trim). Workers are capability-routed by `WORKER_QUEUES`, one GPU-pinned +container per card (`NVIDIA_VISIBLE_DEVICES` by UUID): +- HEAVY (proxy/conform/trim): Tesla P4 (zampp1) + L4 (zampp2), `h264_nvenc`. +- LIGHT (thumbnail/filmstrip): 2x Quadro P400 (zampp1). +DB setting `gpu_transcode_enabled=true` + `gpu_codec=h264_nvenc` enable NVENC. +Each worker stamps `WORKER_LABEL` onto job data -> Jobs UI "Node" column. +`RUN_PROMOTION=true` on exactly one worker runs the growing-files->S3 scan. +The worker GPU image is built from services/worker/Dockerfile.gpu (CUDA base + +Ubuntu ffmpeg with h264/hevc_nvenc; NO av1_nvenc). + +### Deploy gotchas (learned) +- Service source is BAKED into images; edits need rebuild + recreate (or the + GPU image rebuild reuses cached layers so only final COPY changes -> fast). +- The capture image can only build on zampp2 (DeckLink SDK present there). +- Per-node `.env`: zampp2's REDIS_URL/DATABASE_URL/S3_* now point at zampp1 + (.200); secrets live only in .env, never in committed compose. +- Clear all containers on both nodes before a full redeploy (user preference). + +## 4. Target design + +### 4.1 Capture ffmpeg gains NVENC +The capture image's custom FFmpeg 7.1 is currently built WITHOUT nvenc (only +prores_ks/dnxhd/libx264). Rebuild `services/capture/Dockerfile` ffmpeg with: +`--enable-cuda-nvcc --enable-libnpp --enable-nvenc --enable-cuvid` plus +nv-codec-headers (ffnvcodec) installed before configure. Keep `--enable-decklink` +and the existing codecs (ProRes stays available as a selectable fallback). +Verify `ffmpeg -encoders | grep nvenc` shows hevc_nvenc/h264_nvenc afterwards. + +### 4.2 Capture sidecar gets a GPU +node-agent `handleSidecarStart` currently spawns the capture container with no +GPU. Add NVIDIA runtime + device pinning to the sidecar create spec: +`HostConfig.Runtime='nvidia'` (or DeviceRequests with the node's GPU) and env +`NVIDIA_VISIBLE_DEVICES=` + `NVIDIA_DRIVER_CAPABILITIES=video,compute,utility`. +The capture node's GPU is shared with its worker-l4 (see capacity, §5). + +### 4.3 Encode parameters (master) +All-Intra HEVC on NVENC: +`-c:v hevc_nvenc -preset p4 -rc vbr -g 1 -bf 0 -profile:v main10 -pix_fmt p010le` (10-bit 4:2:2 is not NVENC-native; NVENC HEVC is 4:2:0 8/10-bit. +If 4:2:2 mezzanine is required, that is a HARD blocker for NVENC and we stay on +ProRes for those feeds — see §8). Bitrate target tuned per format (1080i59.94 +~100-160 Mbps to rival ProRes HQ). `-g 1 -bf 0` => every frame IDR (all-intra). + +### 4.4 Container (growing-file) +Write the master to a growing file on the SMB share (GROWING_PATH), same path +the promotion worker already uploads on EOF. Container candidates, in order of +preference for Premiere growing-file mounts: +1. **MXF OP1a** (`-f mxf`) — broadcast standard, designed for growing/edit-while- + ingest; best Avid/Premiere support. HEVC-in-MXF support in Premiere is the + key unknown to validate (§8). +2. **Fragmented MOV/MP4** (`-movflags +frag_keyframe+empty_moov+default_base_moof`) + — no moov-at-EOF, readable while growing; fallback if MXF+HEVC is unsupported. +The HLS preview path is unchanged except it can also move to h264_nvenc now that +capture has NVENC (frees the last libx264 CPU cost). + +## 5. Capacity & scaling (8 signals/machine) + +After the move, per-signal CPU is just: DeckLink capture + yadif + mux + frame +upload to the GPU. The heavy HEVC encode is on NVENC. The constraint shifts from +CPU to **NVENC throughput + GPU memory + PCIe/host bandwidth**: +- The **L4 is a datacenter card => unlimited NVENC sessions** (no consumer + 3-session cap). 8x 1080i HEVC-I encode sessions are well within an L4. +- GPU memory: ~8 concurrent 1080 NVENC sessions + frame buffers fit in 24 GB. +- The capture node's L4 is shared between capture (per-signal HEVC-I) and the + worker-l4 proxy jobs. Under 8-signal load, give capture priority; consider + moving worker-l4 (post-record proxies) to zampp1's P4 only, or gate worker-l4 + intake while signals are live. +- yadif on CPU is still ~0.5-1 vCPU/signal; consider `yadif_cuda`/`bwdif_cuda` + (GPU deinterlace) once frames are uploaded to the GPU, keeping CPU near-idle. + +**Node sizing:** a 12-vCPU VM was the ProRes wall; with GPU encode the same VM +should carry many more signals, but for 8x SDI + GPU + card passthrough prefer a +larger VM or bare metal with proper PCIe passthrough. Or spread signals across +multiple capture nodes (the node-agent model already supports N nodes; mam-api +routes each recorder to its node). + +## 6. Multi-vendor capture (Blackmagic / Deltacast / AJA) + +Today capture is hard-wired to `-f decklink`. Before three vendors accrue +special-cases, introduce a **source-backend abstraction** in capture-manager: +each backend returns ffmpeg input args + device discovery. +- **Blackmagic**: `-f decklink -i ""` (current). Devices via + `ffmpeg -sources decklink`. +- **Deltacast**: VideoMaster SDK. No native ffmpeg demuxer upstream — needs an + SDK-backed capture (their SDK -> pipe to ffmpeg, or a small grabber). Plan a + `deltacast` backend that shells their tool into ffmpeg stdin (rawvideo). +- **AJA**: libajantv2. Also no upstream ffmpeg input; AJA ships `ntv2` capture + tools. Plan an `aja` backend feeding rawvideo into ffmpeg. +All backends converge on the SAME encode/output stage (HEVC-I NVENC + HLS), so +only the input differs. node-agent already binds the right /dev nodes per +sourceType (decklink/deltacast); extend for AJA. + +## 7. Risks + +- **4:2:2 / 10-bit chroma:** NVENC HEVC is 4:2:0 (8/10-bit). ProRes HQ is 4:2:2 + 10-bit. If a workflow REQUIRES 4:2:2 mezzanine, NVENC HEVC cannot match it and + those feeds stay on ProRes (CPU). Decide per-workflow. +- **Premiere growing HEVC support:** edit-while-record for HEVC-in-MXF (or frag + MOV) is unproven in our stack — this is the make-or-break validation (§8). +- **GPU contention** between live capture and post-record proxies on the same + L4; mitigate by prioritising capture / relocating proxy load. +- **Storage:** All-Intra HEVC bitrate ~ ProRes; expect similar disk usage. +- **Editor performance:** HEVC-I decode in Premiere is heavier than ProRes on + the edit workstation (decode cost moves to the editor). Validate scrubbing. +- **NVENC quality at all-intra** vs ProRes for archival; tune bitrate/preset. + +## 8. Validation gate (do FIRST, before building the pipeline) + +Prove the editor story on ONE channel before wiring 8: +1. Rebuild capture ffmpeg with NVENC; give the sidecar the L4. +2. Capture one DeckLink feed to All-Intra HEVC, writing a GROWING file to the + SMB share in (a) MXF OP1a, then (b) fragmented MOV. +3. While still recording, mount it in Premiere over SMB and confirm: + edit-while-record works, scrubbing is acceptable, audio in sync, file remains + valid after stop. Pick the container that works; if neither does, HEVC-I is + capture-only (no growing edit) and we keep ProRes for growing workflows. + +## 9. Rollout +1. Validation gate (§8) on one channel. +2. Make capture codec/container a recorder setting; default growing feeds to + HEVC-I NVENC, keep ProRes selectable. +3. Move HLS preview to h264_nvenc. +4. Source-backend abstraction (§6) — land before Deltacast/AJA hardware. +5. GPU deinterlace + capacity test to 8 signals; finalise node sizing.