Commit graph

159 commits

Author SHA1 Message Date
4609517756 fix(growing): AVC-Intra 100 1080p59.94 via libx264 (verified raw2bmx accepts)
Verified on-node: libx264 high422 10-bit + x264 avcintra-class=100 -> raw2bmx -t op1a --avci100_1080p -f 60000/1001 produces a valid MXF (ffprobe: h264 High 4:2:2 Intra yuv422p10le 60000/1001). True 1080p59.94, openable in Premiere. CPU encode for now per demo deadline; NVENC h264 cannot do 4:2:2.
2026-06-04 18:35:42 +00:00
e6b7856bb2 fix(growing): restore proven XDCAM HD422 (MPEG-2 422) rdd9 @1080i59.94
Today's codec churn (h264_nvenc/AVC-Intra/op1a) produced .mxf files Premiere could not open. Subagent diagnosis vs the old working files on the share proved the working format is XDCAM HD422 = mpeg2video 4:2:2 yuv422p, clip type rdd9, wrapped at 1080i59.94 (30000/1001). raw2bmx REJECTS MPEG-2 422 at 60000/1001, so a 1080p59.94 SDI feed must wrap as 1080i59.94. Reverted GROWING_VIDEO_ELEMENTARY_ARGS to mpeg2video, raw2bmx -t rdd9 --mpeg2lg_422p_hl_1080i -f 30000/1001, -f mpeg2video pipe, and rates() 59.94->30000/1001 with 1080 forced interlaced.
2026-06-04 18:31:03 +00:00
e6eb565e30 fix(growing): switch to NVENC H.264 High Intra
Using libx264 confirmed raw2bmx works with AVC-Intra parameters at 59.94. Switching back to hardware-accelerated h264_nvenc with matching parameters (High profile, all-intra GOP 1, yuv422p, aud) to keep CPU load low during 8-port burn tests.
2026-06-04 18:18:41 +00:00
4bbbc9f4dc fix(growing): add -aud 1 to h264_nvenc to fix raw2bmx parse failure 2026-06-04 18:12:50 +00:00
42806b5e10 fix(growing): use stable H.264 High All-Intra for 1080p59.94 MXF
XDCAM HD422 does not strictly support 1080p59.94, and ffmpeg/raw2bmx failed to negotiate the stream. Reverted to h264_nvenc (High profile, all-intra GOP 1, yuv420p) which raw2bmx can reliably wrap as OP1a (--avc_high) at 60000/1001. This restores NVENC hardware acceleration and Premiere edit-while-record compatibility.
2026-06-04 17:56:08 +00:00
f36de429e8 feat(growing): configure AVC-Intra 100 1080p59.94 via NVENC for growing files 2026-06-04 17:49:21 +00:00
1f71de494f fix(growing): correct growing frame rate selection for 59.94 fps 2026-06-04 17:31:43 +00:00
c2d15b4e3a debug: capture raw2bmx output to log 2026-06-04 17:11:47 +00:00
80f157968f fix(capture): pin NVENC to a GPU with ffmpeg -gpu N (privileged bypasses env)
NVIDIA_VISIBLE_DEVICES=1 was set but the sidecar still SAW /dev/nvidia0,1,2 and nvenc used GPU 0 — because capture sidecars run Privileged, which exposes every GPU device node regardless of NVIDIA_VISIBLE_DEVICES/DeviceRequests. Real fix: node-agent passes CAPTURE_GPU_INDEX to the sidecar and capture-manager adds ffmpeg '-gpu N' to the hevc_nvenc + h264_nvenc encoders, so each port's master+HLS encode is explicitly bound to its assigned L4. Spreads 8 ports across 3 cards.
2026-06-04 16:07:59 +00:00
4be12c6f9a fix(stability): spread capture encodes across all GPUs + GOP parse + filmstrip retry
VIDEO FREEZE UNDER BURN (transient stall, self-recovers): all 8 capture
sidecars ran NVIDIA_VISIBLE_DEVICES=all with no -gpu selector, so ffmpeg
nvenc put every session (8 HEVC masters + 8 HLS = 16) on physical GPU 0
while the other two L4s sat idle. GPU 0 NVENC hit 86%, encode fell below
realtime, the framecache ring lapped → video froze → caught up → recovered.
Bridge verified smooth at 60fps throughout. FIX: node-agent now round-robins
each sidecar to a GPU by capture port (port % detected-GPU-count) via
NVIDIA_VISIBLE_DEVICES, honoring an explicit gpuUuid when set. Auto-detects
GPU count from nvidia-smi (override CAPTURE_GPU_COUNT). ~3 encoders/GPU now.

GOP PARSE: Number.parseFloat('60000/1001') returns 60000, making GOP 120000
(near open-GOP) instead of ~120. Added parseFps() to handle rational rates;
fixed hevcNvencArgs + buildHlsVideoArgs.

FILMSTRIP: RustFS object store intermittently returns NoSuchKey on GET for
keys that List/Head confirm exist, blanking the strip. Generation/queue/DB
all verified healthy (13/15 assets HAVE filmstrips). FIX: API now serves the
filmstrip JSON through itself with retry-on-NoSuchKey (succeeds within a
couple attempts) instead of handing the browser a signed URL — also closes
the S3 CORS gap. Frontend updated to consume the direct JSON.
2026-06-04 15:56:44 +00:00
d3e520e3b1 fix(capture+gui): kill audio-drift regression + fix elapsed/signal status
A/V REGRESSION (no audio + start stutter): capture-manager.js dropped the
-use_wallclock_as_timestamps 1 flag on the audio FIFO input (re-added by
d6b0b3a). Wallclock stamped audio by arrival time while video is CFR
frame-count, so audio ran 3-18% longer and master aresample padded seconds
of LEADING SILENCE → silent head, late video start, apparent 'no audio'.
Removing it restores the sample-count PTS baseline (8e5405c/55a72af):
audio shares the SDI clock domain, no drift, no pad.

GUI BUG A (elapsed showed 1hr+ on standby/just-started): frontend seeded
elapsed from recorder.started_at = the standby CONTAINER boot time (hours
old). Now seeds ONLY from the sidecar session duration (liveStatus.duration
when live.recording), shows nothing when idle. Backend /status now returns
session-scoped duration + recording flag, not container uptime.

GUI BUG B (false 'stopped' signal on idle ports): backend inferred signal
from container Running state (running->receiving, down->stopped) — so idle
standby ports with down sidecars showed red 'stopped'. Now signal comes
from the sidecar session (live.recording); standby = neutral 'idle', never
a false 'stopped'/'receiving'.
2026-06-04 13:21:30 +00:00
0c405ae7d4 fix(growing): read GROWING_ENABLED from env at record time + drop dead const
Second half of the growing-never-engages bug. start() decided growing via the module-level const GROWING_ENABLED (captured false at standby boot) and referenced the now-removed GROWING_SMB_MOUNT const (ReferenceError, silently swallowed). Both made growingActive=false, so every growing record produced HEVC/S3 instead of XDCAM HD422 MXF. Now reads process.env.GROWING_ENABLED + growingSmbConfig().mount fresh at record start.
2026-06-04 13:02:23 +00:00
ac1d7e1e1f fix(growing): read SMB params from env at mount time, not module load
Root cause of growing producing .mov instead of XDCAM HD422 .mxf:

mountGrowingShare() used module-level consts (GROWING_SMB_MOUNT etc.)
captured from process.env at IMPORT time. Standby capture containers boot
with these unset and receive the SMB mount/credentials per-session over
/capture/start (capture.js sets process.env right before start()). Because
the consts were already frozen empty, mountGrowingShare() saw no mount
source, returned false, and growing silently fell back to S3 streaming —
producing an HEVC .mov while the asset key said .mxf.

Fix: growingSmbConfig() reads process.env fresh at mount time. Also drop
the stale const guard in unmountGrowingShare().
2026-06-04 12:51:32 +00:00
cb25711ec6 fix(growing): inline CIFS creds + capture caps + storage probe timeout
Three fixes to restore growing-files (XDCAM HD422 MXF) recording:

1. capture-manager mountGrowingShare: pass username=/password= inline
   instead of a credentials= file. TrueNAS SMB3 rejects the creds-file form
   with EACCES (-13, 'cannot mount read-only') while the identical inline
   creds mount fine. This was causing every growing record to silently fall
   back to the HEVC/S3 path (producing .mov, not .mxf).

2. docker-compose capture: add cap_add SYS_ADMIN + DAC_READ_SEARCH and
   apparmor:unconfined so mount.cifs can run inside the container.

3. storage /overview: wrap S3 HeadBucket/ListObjects probe in a 5s timeout
   so the admin 'Mount health' card stops hanging on 'Probing…' forever
   when S3 is slow.
2026-06-04 12:42:39 +00:00
d6b0b3a9a6 fix(capture): restore proven-clean wallclock audio (match de509c6 baseline)
Removing wallclock made A/V length drift far worse (audio 11.8% long). The
known-clean config used wallclock + master aresample=async=1; the leading
silence is a standby backlog artifact addressed by the bridge live-edge flush +
record-start audio FIFO drain, not by changing the timestamp source.
2026-06-04 05:06:40 +00:00
55a72af905 fix(capture): derive audio PTS from sample count (kill 2.5s leading silence)
The persistent ~2.5s of leading silence was the master aresample=async=1 PADDING
the audio to reconcile a PTS-origin mismatch: video PTS starts at frame 0
(-framerate), but -use_wallclock_as_timestamps stamped the first audio chunk at
its wall-clock arrival time (~2.5s after the ffmpeg graph opened). aresample
filled the gap with silence.

Drop wallclock: audio PTS now comes from the 48kHz sample count starting at 0 —
the same origin as video frame 0 — so the streams align with no pad. The bridge
already hands live audio (backlog flushed on attach), so no rate reference is
needed from wallclock.
2026-06-04 05:01:05 +00:00
e9e883d06e fix(deltacast-bridge): flush queued audio backlog to live edge on reader attach
The ~2.5s of leading silence at record start was the VHD audio slot QUEUE: while
the recorder is idle (no FIFO reader), the bridge blocks on open(O_WRONLY) but the
board keeps buffering audio slots. When the record ffmpeg attaches, the bridge
streamed that stale backlog first — heard as leading silence and pushing audio
out of alignment with the live video.

On each reader attach, drain slots that lock FAST (already-queued backlog) and
stop at the first lock that takes ~a frame period (= waiting on a live slot), so
the reader is handed the live edge, A/V aligned.
2026-06-04 04:54:32 +00:00
b1a2249f36 fix(capture): align A/V at record start (kill leading silence + length drift)
Root cause of 'silent first ~1s then clean' + ~0.5% audio-too-long: in standby
the bridge keeps filling the audio FIFO while the idle-preview consumes only
video, so when recording starts ffmpeg reads a ~0.5s backlog of stale audio,
AND the video-only pre-roll discards video frames the audio never had.

Fix: (1) skip the video-only pre-roll in standby (warm slot = no unstable
frames), (2) drain the audio FIFO non-blocking immediately before ffmpeg opens
it, so audio starts at the live edge aligned with the first real video frame.
2026-06-04 04:49:53 +00:00
fffb6b63b5 fix(capture): revert 16ch audio to clean 2ch — fixes pitch/rate regression
The 16ch interleave in the deltacast bridge produced audio at HALF the correct
sample rate (measured 24224 vs 48000 samples/s/ch), which broke A/V sync and
pitch. Per the working baseline (audio was clean before the channel selector),
revert the bridge audio thread to the original single-group 2ch extraction and
the capture-manager audio input to -ac 2 + wallclock + aresample.

KEPT the good fixes: long-GOP HEVC for non-growing (NVENC realtime, no frame
drops) and GPU-only codec list. 16ch/channel-select is shelved for a separate,
properly-validated change.
2026-06-04 04:33:34 +00:00
b28393eb76 Revert "fix(capture): skip video-only pre-roll in standby to stop A/V pitch drift"
This reverts commit 51b66d882f.
2026-06-04 04:28:11 +00:00
51b66d882f fix(capture): skip video-only pre-roll in standby to stop A/V pitch drift
The pre-roll drained only the video pipe (fc_pipe) while the audio FIFO kept
buffering, so ffmpeg read ~PRE_ROLL_SECONDS of surplus pre-roll audio — making
audio longer than video, which when synced compresses audio ~0.5% (pitch-up,
measured: 2591573 audio samples vs 2579395 expected for the video duration).

In standby the framecache slot is already warm (no unstable startup frames), so
the drain is unnecessary; skipping it lets ffmpeg open video and audio together
from the same instant. Cold on-demand spawns keep the brief drain.
2026-06-04 04:24:08 +00:00
07eea02109 fix(capture): restore audio wallclock (throughput) + remove CPU codec options
- restore -use_wallclock_as_timestamps on audio input: without it ffmpeg's raw
  s16le reader stalled the graph (NVENC idle at 9%, ~half frames dropped). With
  it + long-GOP HEVC the encoder runs realtime and A/V length stays locked.
- remove all CPU codec options (prores*, dnxh*, libx264/265) from recorder UI;
  GPU NVENC only (hevc_nvenc / h264_nvenc). 3x L4 cluster, no reason for CPU.
- GPU codec defaults in env builders + proxy default h264_nvenc.
2026-06-04 04:14:59 +00:00
0ea22e1e53 fix(capture): gate all-intra HEVC on growing-files; normal record uses long-GOP
The hevc_nvenc codec was hardcoded to all-intra (-force_key_frames expr:1), which
is ~4x the NVENC load. Applied to every recording it exceeded the L4's realtime
budget at 1080p59.94 10-bit -> fc_pipe dropped ~half the frames -> video came out
shorter than the (correct) audio -> A/V drift + pitch-up on playback.

Now all-intra is used ONLY when growing-files is on (where it's required for the
editable head). Normal recordings use efficient long-GOP HEVC (2s GOP, 2 B-frames)
which NVENC sustains in realtime with zero drops.
2026-06-04 04:09:14 +00:00
8e5405c3f9 fix(capture): derive deltacast audio PTS from sample count, not wall-clock
Removing -use_wallclock_as_timestamps on the SDI audio input. The bridge writes
SDI-clock-paced samples, so PTS from the 48kHz sample count shares the video's
clock domain and the audio length tracks the video length exactly. Wall-clock
timestamps made audio length = real elapsed time, which drifted ~1% longer than
the frame-count video when the encoder dipped under realtime (pitch-up).
2026-06-04 04:01:54 +00:00
51f939b1fe fix(deltacast-bridge): use group-0 sample count as authoritative audio length
Taking the MAX sample count across the 4 audio groups could emit more audio
frames per slot than group 0 (the SDI-clock reference), drifting the audio
stream slightly longer than video — heard as a ~1% pitch-up. Group 0 paces the
timeline exactly as the original 2ch path did; shorter groups are silence-padded
to its length, never extending it.
2026-06-04 04:01:25 +00:00
095306d9cf feat(recorders): 16ch SDI audio capture + per-recorder channel select + menu redesign
Audio:
- deltacast-bridge: always extract all 4 SDI audio groups (16ch), interleave to
  one 16ch s16le stream per port FIFO; format JSON reports audio_channels:16
- capture-manager: declare FIFO as 16ch input; keep first N discrete channels
  (2/8/16) via pan channelmap on the master (no downmix); HLS preview stays
  stereo. effAudioChannels drives -ac on the master container.
- config modal: Audio channels select (2/8/16)
- channel count already flows mam-api->node-agent->capture via RECORDING_AUDIO_CHANNELS

UI redesign (production craft):
- recorders grouped into per-node hardware 'rack' cards (online/offline state)
- lifecycle accent rail: grey DISABLED / green ENABLED / pulsing-red RECORDING
- promoted capture-port chip, monospaced metadata, Enable as primary CTA
- dedicated recorder CSS block; built on existing design tokens
2026-06-04 03:34:41 +00:00
5668c03615 chore(capture): remove stale legacy FIFO path + pin capture profile
- capture-manager: remove dead legacy deltacast FIFO video path (FC_SLOT_ID
  is now always set by node-agent, framecache mandatory on all SDI nodes)
- node-agent: correct stale comment about legacy FIFO fallback
- onboard-node.sh: harden detect_sdi (device-node checks, not just lspci) and
  persist COMPOSE_PROFILES so framecache survives every redeploy on SDI nodes
- remove committed capture.js.bak

Root cause of this session's outage: zampp3 came up without the capture
compose profile, so framecache never started; the bridge published to shm
with no consumer and recorders showed 'receiving' with no real capture.
2026-06-04 02:50:57 +00:00
Wild Dragon Dev
8efcf5c545 feat(capture): remove build-with-decklink.sh script 2026-06-04 01:27:41 +00:00
Wild Dragon Dev
e5abbede43 debug(fc_writer): add trace logs for GET slots path 2026-06-04 01:13:19 +00:00
Wild Dragon Dev
cc489f7774 fix(fc_writer): handle 409 Conflict by fetching existing slot details via GET 2026-06-04 01:12:06 +00:00
Wild Dragon Dev
5b72ee167d fix(decklink-bridge): prevent redundant fc_writer_open loops via last_format tracking 2026-06-04 01:10:47 +00:00
Wild Dragon Dev
d957ce74ae fix(decklink-bridge): avoid redundant fc_writer_open calls in reopen_slot 2026-06-04 01:09:08 +00:00
Wild Dragon Dev
6481760dff revert(capture): Dockerfile copy paths to root-relative for compose build 2026-06-04 00:39:24 +00:00
Wild Dragon Dev
650a100d17 build(capture): include decklink-bridge in runtime image 2026-06-04 00:37:49 +00:00
Wild Dragon Dev
400cb786ab fix(decklink-bridge): use IDeckLinkVideoBuffer QueryInterface to get raw bytes 2026-06-04 00:35:16 +00:00
Wild Dragon Dev
74055e79f8 fix(decklink-bridge): use GetFrameInternalBufferBytes instead of GetBytes 2026-06-04 00:28:19 +00:00
Wild Dragon Dev
a096226072 fix(capture): remove -use_wallclock_as_timestamps from framecache video input
The framecache ring delivers frame-accurate frames at exactly the SDI clock
rate. -use_wallclock_as_timestamps was wrong for this source — it stamped
frames by ffmpeg arrival time rather than capture time, causing the recorded
file to report wrong framerates (e.g. 56.06 instead of 59.94) and a
glitchy first second at startup (NVENC cold-start backlog bunched timestamps).

Fix: remove -use_wallclock_as_timestamps from the rawvideo (pipe:0) input
and rely on -framerate for correct CFR timestamps from frame 0.
Audio keeps its FIFO wallclock; aresample=async=1 on the master output
resamples audio to align with the CFR video PTS.
2026-06-03 22:30:03 +00:00
Wild Dragon Dev
7631527f46 fix(capture): add auth header to finalize call in POST /capture/stop 2026-06-03 22:11:15 +00:00
Wild Dragon Dev
3d4880d944 fix(capture): reduce pre-roll to 1s in standby mode (slot already warm) 2026-06-03 22:05:11 +00:00
Wild Dragon Dev
ef57900583 feat(recorders): always-on standby sidecars for deltacast, sdi, blackmagic
Sidecars now spawn at recorder CREATE time instead of /start time.
The container boots in STANDBY=1 mode (idle preview only, no ffmpeg master).
On /start, mam-api sends per-session params (CLIP_NAME, ASSET_ID, PROJECT_ID)
to the running sidecar via HTTP POST /capture/start — ffmpeg starts in <1s.
On /stop, mam-api calls HTTP POST /capture/stop — container stays alive in
standby, ready for the next take immediately.
Container is only killed on recorder DELETE.

This eliminates: Docker create/start overhead (~1-2s), bridge startup (~2-5s),
and pre-roll wait (~5s). Latency from 'record' click to first encoded frame
drops from ~10s to ~1s.

Changes:
- capture/src/index.js: boot in standby when STANDBY=1 env is set; still
  start idle preview (live thumbnail visible before recording)
- capture/src/routes/capture.js: POST /start accepts full codec params and
  asset_id in body (skips mam-api asset creation when asset_id provided)
- node-agent/index.js: handleSidecarStandby() + POST /sidecar/standby route;
  warms bridge at recorder create time
- recorders.js POST /: spawn standby sidecar after DB insert (non-fatal)
- recorders.js POST /:id/start: HTTP fast-path to standby sidecar; falls
  back to on-demand spawn if standby not available
- recorders.js POST /:id/stop: HTTP /capture/stop, keep container in standby
- recorders.js GET /:id/status: use port-based URL for local capture status
2026-06-03 21:59:33 +00:00
Wild Dragon Dev
7172447644 fix(capture): remove leftover localMasterPath from session state 2026-06-03 21:42:35 +00:00
Wild Dragon Dev
37b325e1d8 fix(capture): restore direct-to-S3 streaming (pipe:1 + fragmented MOV)
Reverts the local-temp+faststart approach from 549ca6c. Masters now stream
ffmpeg stdout directly to S3 via multipart upload — no local disk consumed
on the worker. Uses +frag_keyframe+empty_moov+default_base_moof which
Premiere Pro 25.x handles natively (to be confirmed separately).

Zero /tmp/capture files. Worker disk stays flat during recording.
2026-06-03 21:40:58 +00:00
Wild Dragon Dev
ccaef50c09 fix(decklink): cast videoFrame to base type for GetBytes, re-enable build 2026-06-03 18:45:15 +00:00
Wild Dragon Dev
522faacdcc fix(capture): remove stale CMakeCache on rebuilds 2026-06-03 18:28:22 +00:00
Wild Dragon Dev
5eaf71b70c fix(capture): correct npm install COPY path in Dockerfile 2026-06-03 18:14:39 +00:00
Wild Dragon Dev
38b31d6170 fix(capture): temporarily disable decklink-bridge build stage 2026-06-03 18:02:50 +00:00
Wild Dragon Dev
aa646dbb71 fix(capture): fix redefined 'expected' variable in decklink-bridge 2026-06-03 17:40:40 +00:00
Wild Dragon Dev
6294e98dc3 fix(capture): copy full deltacast-bridge dir for fc_writer to ensure include path 2026-06-03 17:38:59 +00:00
Wild Dragon Dev
4b018cb8cb fix(capture): fix decklink-bridge include path for fc_writer 2026-06-03 17:34:52 +00:00
Wild Dragon Dev
d193b84466 fix(capture): correct Dockerfile copy path for framecache source 2026-06-03 17:25:00 +00:00