Commit graph

1116 commits

Author SHA1 Message Date
e9e883d06e fix(deltacast-bridge): flush queued audio backlog to live edge on reader attach
The ~2.5s of leading silence at record start was the VHD audio slot QUEUE: while
the recorder is idle (no FIFO reader), the bridge blocks on open(O_WRONLY) but the
board keeps buffering audio slots. When the record ffmpeg attaches, the bridge
streamed that stale backlog first — heard as leading silence and pushing audio
out of alignment with the live video.

On each reader attach, drain slots that lock FAST (already-queued backlog) and
stop at the first lock that takes ~a frame period (= waiting on a live slot), so
the reader is handed the live edge, A/V aligned.
2026-06-04 04:54:32 +00:00
b1a2249f36 fix(capture): align A/V at record start (kill leading silence + length drift)
Root cause of 'silent first ~1s then clean' + ~0.5% audio-too-long: in standby
the bridge keeps filling the audio FIFO while the idle-preview consumes only
video, so when recording starts ffmpeg reads a ~0.5s backlog of stale audio,
AND the video-only pre-roll discards video frames the audio never had.

Fix: (1) skip the video-only pre-roll in standby (warm slot = no unstable
frames), (2) drain the audio FIFO non-blocking immediately before ffmpeg opens
it, so audio starts at the live edge aligned with the first real video frame.
2026-06-04 04:49:53 +00:00
fffb6b63b5 fix(capture): revert 16ch audio to clean 2ch — fixes pitch/rate regression
The 16ch interleave in the deltacast bridge produced audio at HALF the correct
sample rate (measured 24224 vs 48000 samples/s/ch), which broke A/V sync and
pitch. Per the working baseline (audio was clean before the channel selector),
revert the bridge audio thread to the original single-group 2ch extraction and
the capture-manager audio input to -ac 2 + wallclock + aresample.

KEPT the good fixes: long-GOP HEVC for non-growing (NVENC realtime, no frame
drops) and GPU-only codec list. 16ch/channel-select is shelved for a separate,
properly-validated change.
2026-06-04 04:33:34 +00:00
b28393eb76 Revert "fix(capture): skip video-only pre-roll in standby to stop A/V pitch drift"
This reverts commit 51b66d882f.
2026-06-04 04:28:11 +00:00
51b66d882f fix(capture): skip video-only pre-roll in standby to stop A/V pitch drift
The pre-roll drained only the video pipe (fc_pipe) while the audio FIFO kept
buffering, so ffmpeg read ~PRE_ROLL_SECONDS of surplus pre-roll audio — making
audio longer than video, which when synced compresses audio ~0.5% (pitch-up,
measured: 2591573 audio samples vs 2579395 expected for the video duration).

In standby the framecache slot is already warm (no unstable startup frames), so
the drain is unnecessary; skipping it lets ffmpeg open video and audio together
from the same instant. Cold on-demand spawns keep the brief drain.
2026-06-04 04:24:08 +00:00
07eea02109 fix(capture): restore audio wallclock (throughput) + remove CPU codec options
- restore -use_wallclock_as_timestamps on audio input: without it ffmpeg's raw
  s16le reader stalled the graph (NVENC idle at 9%, ~half frames dropped). With
  it + long-GOP HEVC the encoder runs realtime and A/V length stays locked.
- remove all CPU codec options (prores*, dnxh*, libx264/265) from recorder UI;
  GPU NVENC only (hevc_nvenc / h264_nvenc). 3x L4 cluster, no reason for CPU.
- GPU codec defaults in env builders + proxy default h264_nvenc.
2026-06-04 04:14:59 +00:00
0ea22e1e53 fix(capture): gate all-intra HEVC on growing-files; normal record uses long-GOP
The hevc_nvenc codec was hardcoded to all-intra (-force_key_frames expr:1), which
is ~4x the NVENC load. Applied to every recording it exceeded the L4's realtime
budget at 1080p59.94 10-bit -> fc_pipe dropped ~half the frames -> video came out
shorter than the (correct) audio -> A/V drift + pitch-up on playback.

Now all-intra is used ONLY when growing-files is on (where it's required for the
editable head). Normal recordings use efficient long-GOP HEVC (2s GOP, 2 B-frames)
which NVENC sustains in realtime with zero drops.
2026-06-04 04:09:14 +00:00
8e5405c3f9 fix(capture): derive deltacast audio PTS from sample count, not wall-clock
Removing -use_wallclock_as_timestamps on the SDI audio input. The bridge writes
SDI-clock-paced samples, so PTS from the 48kHz sample count shares the video's
clock domain and the audio length tracks the video length exactly. Wall-clock
timestamps made audio length = real elapsed time, which drifted ~1% longer than
the frame-count video when the encoder dipped under realtime (pitch-up).
2026-06-04 04:01:54 +00:00
51f939b1fe fix(deltacast-bridge): use group-0 sample count as authoritative audio length
Taking the MAX sample count across the 4 audio groups could emit more audio
frames per slot than group 0 (the SDI-clock reference), drifting the audio
stream slightly longer than video — heard as a ~1% pitch-up. Group 0 paces the
timeline exactly as the original 2ch path did; shorter groups are silence-padded
to its length, never extending it.
2026-06-04 04:01:25 +00:00
095306d9cf feat(recorders): 16ch SDI audio capture + per-recorder channel select + menu redesign
Audio:
- deltacast-bridge: always extract all 4 SDI audio groups (16ch), interleave to
  one 16ch s16le stream per port FIFO; format JSON reports audio_channels:16
- capture-manager: declare FIFO as 16ch input; keep first N discrete channels
  (2/8/16) via pan channelmap on the master (no downmix); HLS preview stays
  stereo. effAudioChannels drives -ac on the master container.
- config modal: Audio channels select (2/8/16)
- channel count already flows mam-api->node-agent->capture via RECORDING_AUDIO_CHANNELS

UI redesign (production craft):
- recorders grouped into per-node hardware 'rack' cards (online/offline state)
- lifecycle accent rail: grey DISABLED / green ENABLED / pulsing-red RECORDING
- promoted capture-port chip, monospaced metadata, Enable as primary CTA
- dedicated recorder CSS block; built on existing design tokens
2026-06-04 03:34:41 +00:00
de509c66ab feat(recorders): hardware-identity model with Enable/Disable lifecycle
Recorders are now physical capture ports, not user-created rows:
- migration 036: label, enabled, auto_provisioned + UNIQUE(node_id,device_index)
  (the structural fix that makes two recorders sharing a port impossible)
- mam-api: auto-provision one recorder row per port from heartbeat capabilities
  (reconcileRecordersForNode); create-once, never overwrites operator config
- mam-api: POST /:id/enable + /:id/disable (provision/teardown standby sidecar);
  PATCH accepts label; config persists across enable/disable
- node-agent: freeCapturePort() force-removes any container on a capture port
  before standby/start — eliminates the EADDRINUSE collisions
- web-ui: recorder menu grouped by node (online/offline), Enable/Disable toggle,
  per-recorder config modal (codec/bitrate/growing/label/project), friendly
  label over hardware name, no destructive delete

Fixes the delete/recreate churn that orphaned standby sidecars and collided on
capture ports during this session's outage.
2026-06-04 03:14:43 +00:00
9f2eac7b61 merge: capture cleanup + standby reconcile helper (base for recorder redesign) 2026-06-04 03:05:06 +00:00
bf4632b911 feat(mam-api): extract ensureStandbySidecar + add POST /recorders/reconcile-standby
Re-provisions the persistent standby sidecar for SDI/deltacast recorders that
lost theirs (manual cleanup, node redeploy, wiped /dev/shm). Without this the
recorder falls back to slow on-demand spawn on /start, which can collide on the
capture port (EADDRINUSE). Idempotent; { force:true } recreates even when a
container_id is already set.
2026-06-04 03:05:00 +00:00
5668c03615 chore(capture): remove stale legacy FIFO path + pin capture profile
- capture-manager: remove dead legacy deltacast FIFO video path (FC_SLOT_ID
  is now always set by node-agent, framecache mandatory on all SDI nodes)
- node-agent: correct stale comment about legacy FIFO fallback
- onboard-node.sh: harden detect_sdi (device-node checks, not just lspci) and
  persist COMPOSE_PROFILES so framecache survives every redeploy on SDI nodes
- remove committed capture.js.bak

Root cause of this session's outage: zampp3 came up without the capture
compose profile, so framecache never started; the bridge published to shm
with no consumer and recorders showed 'receiving' with no real capture.
2026-06-04 02:50:57 +00:00
Wild Dragon Dev
4045e30cd2 fix(node-agent): make http server handler async 2026-06-04 01:54:38 +00:00
Wild Dragon Dev
df6ca084ff feat(web-ui): add Node column to Containers screen + integrated log viewer 2026-06-04 01:48:44 +00:00
Wild Dragon Dev
2f13c8d8b1 feat(mam-api): aggregate containers from all nodes + proxy logs 2026-06-04 01:42:13 +00:00
Wild Dragon Dev
a90adb5b52 feat(node-agent): add /containers and /sidecar/:id/logs endpoints 2026-06-04 01:40:44 +00:00
Wild Dragon Dev
8efcf5c545 feat(capture): remove build-with-decklink.sh script 2026-06-04 01:27:41 +00:00
Wild Dragon Dev
e5abbede43 debug(fc_writer): add trace logs for GET slots path 2026-06-04 01:13:19 +00:00
Wild Dragon Dev
cc489f7774 fix(fc_writer): handle 409 Conflict by fetching existing slot details via GET 2026-06-04 01:12:06 +00:00
Wild Dragon Dev
5b72ee167d fix(decklink-bridge): prevent redundant fc_writer_open loops via last_format tracking 2026-06-04 01:10:47 +00:00
Wild Dragon Dev
d957ce74ae fix(decklink-bridge): avoid redundant fc_writer_open calls in reopen_slot 2026-06-04 01:09:08 +00:00
Wild Dragon Dev
58c058b10c fix(framecache): bind port 7435 to 0.0.0.0 so remote bridges can register slots 2026-06-04 01:00:54 +00:00
Wild Dragon Dev
e715af158d fix(node-agent): pass FRAMECACHE_IP to node-agent env 2026-06-04 00:58:51 +00:00
Wild Dragon Dev
21ba7595b3 fix(node-agent): await async cleanup + fix syntax 2026-06-04 00:57:22 +00:00
Wild Dragon Dev
315b31a68b fix(node-agent): await stopDecklinkBridge and clean up stale occurrences 2026-06-04 00:54:29 +00:00
Wild Dragon Dev
d1b40f5303 fix(node-agent): pass correct FC_URL and Cmd to containerized decklink-bridge 2026-06-04 00:51:14 +00:00
Wild Dragon Dev
6ee8dd5694 feat(node-agent): containerized decklink-bridge + async bridge management 2026-06-04 00:46:19 +00:00
Wild Dragon Dev
8ca7c79acd fix(node-agent): mount decklink-bridge wrapper script as file (not dir) 2026-06-04 00:43:19 +00:00
Wild Dragon Dev
fb0ce320a5 build(node-agent): mount host /usr/local/bin to expose decklink-bridge wrapper 2026-06-04 00:42:31 +00:00
Wild Dragon Dev
6481760dff revert(capture): Dockerfile copy paths to root-relative for compose build 2026-06-04 00:39:24 +00:00
Wild Dragon Dev
650a100d17 build(capture): include decklink-bridge in runtime image 2026-06-04 00:37:49 +00:00
Wild Dragon Dev
400cb786ab fix(decklink-bridge): use IDeckLinkVideoBuffer QueryInterface to get raw bytes 2026-06-04 00:35:16 +00:00
Wild Dragon Dev
74055e79f8 fix(decklink-bridge): use GetFrameInternalBufferBytes instead of GetBytes 2026-06-04 00:28:19 +00:00
Wild Dragon Dev
a5aed86349 fix(recorders): kill stale standby container before on-demand respawn to prevent EADDRINUSE 2026-06-03 23:04:17 +00:00
Wild Dragon Dev
a096226072 fix(capture): remove -use_wallclock_as_timestamps from framecache video input
The framecache ring delivers frame-accurate frames at exactly the SDI clock
rate. -use_wallclock_as_timestamps was wrong for this source — it stamped
frames by ffmpeg arrival time rather than capture time, causing the recorded
file to report wrong framerates (e.g. 56.06 instead of 59.94) and a
glitchy first second at startup (NVENC cold-start backlog bunched timestamps).

Fix: remove -use_wallclock_as_timestamps from the rawvideo (pipe:0) input
and rely on -framerate for correct CFR timestamps from frame 0.
Audio keeps its FIFO wallclock; aresample=async=1 on the master output
resamples audio to align with the CFR video PTS.
2026-06-03 22:30:03 +00:00
Wild Dragon Dev
7631527f46 fix(capture): add auth header to finalize call in POST /capture/stop 2026-06-03 22:11:15 +00:00
Wild Dragon Dev
a22bda44a7 fix(recorders): set PRE_ROLL_SECONDS=1 for sdi/deltacast/blackmagic sidecars 2026-06-03 22:07:16 +00:00
Wild Dragon Dev
3d4880d944 fix(capture): reduce pre-roll to 1s in standby mode (slot already warm) 2026-06-03 22:05:11 +00:00
Wild Dragon Dev
ef57900583 feat(recorders): always-on standby sidecars for deltacast, sdi, blackmagic
Sidecars now spawn at recorder CREATE time instead of /start time.
The container boots in STANDBY=1 mode (idle preview only, no ffmpeg master).
On /start, mam-api sends per-session params (CLIP_NAME, ASSET_ID, PROJECT_ID)
to the running sidecar via HTTP POST /capture/start — ffmpeg starts in <1s.
On /stop, mam-api calls HTTP POST /capture/stop — container stays alive in
standby, ready for the next take immediately.
Container is only killed on recorder DELETE.

This eliminates: Docker create/start overhead (~1-2s), bridge startup (~2-5s),
and pre-roll wait (~5s). Latency from 'record' click to first encoded frame
drops from ~10s to ~1s.

Changes:
- capture/src/index.js: boot in standby when STANDBY=1 env is set; still
  start idle preview (live thumbnail visible before recording)
- capture/src/routes/capture.js: POST /start accepts full codec params and
  asset_id in body (skips mam-api asset creation when asset_id provided)
- node-agent/index.js: handleSidecarStandby() + POST /sidecar/standby route;
  warms bridge at recorder create time
- recorders.js POST /: spawn standby sidecar after DB insert (non-fatal)
- recorders.js POST /:id/start: HTTP fast-path to standby sidecar; falls
  back to on-demand spawn if standby not available
- recorders.js POST /:id/stop: HTTP /capture/stop, keep container in standby
- recorders.js GET /:id/status: use port-based URL for local capture status
2026-06-03 21:59:33 +00:00
Wild Dragon Dev
7172447644 fix(capture): remove leftover localMasterPath from session state 2026-06-03 21:42:35 +00:00
Wild Dragon Dev
37b325e1d8 fix(capture): restore direct-to-S3 streaming (pipe:1 + fragmented MOV)
Reverts the local-temp+faststart approach from 549ca6c. Masters now stream
ffmpeg stdout directly to S3 via multipart upload — no local disk consumed
on the worker. Uses +frag_keyframe+empty_moov+default_base_moof which
Premiere Pro 25.x handles natively (to be confirmed separately).

Zero /tmp/capture files. Worker disk stays flat during recording.
2026-06-03 21:40:58 +00:00
Wild Dragon Dev
dc66833247 fix: declare all slot functions in slot.h to prevent 64-bit pointer truncation
fc_slot_create, fc_slot_destroy, fc_slot_open, fc_slot_close, and
fc_slot_write_frame were defined in slot.c but never declared in slot.h.
Any translation unit calling them without seeing a proper prototype
would fall back to implicit int return (32 bits), truncating 64-bit
pointers and causing SIGSEGV on dereference.

This affected framecache.c (POST /slots → fc_slot_create, DELETE
→ fc_slot_destroy) and other callers.
2026-06-03 20:16:35 +00:00
Wild Dragon Dev
2198199a9f fix: inline accessors in slot.h now that struct fc_slot is a complete type 2026-06-03 20:11:14 +00:00
Wild Dragon Dev
f318e9c501 fix: move struct fc_slot definition to slot.h and declare accessors to fix 64-bit pointer truncation
The struct fc_slot was defined only in slot.c, making it an incomplete type
in slot.h. The inline accessor functions (fc_slot_id, fc_slot_header, etc.)
in slot.h could not compile because they referenced incomplete struct
members. The compiler fell back to implicit int return type, truncating
64-bit pointers to 32 bits, causing SIGSEGV in registry_add() when
strncpy received a truncated slot_id pointer.

Fix: move the struct definition to slot.h and add proper function
declarations for the accessors (definitions stay in slot.c).
2026-06-03 20:10:31 +00:00
Wild Dragon Dev
902d985ca8 framecache: add SIGPIPE ignore, signal logging, and init:true for stable POST handling 2026-06-03 20:05:55 +00:00
Wild Dragon Dev
0ed1254fd9 fix(framecache): bind port 7435 to host loopback so host bridges can register slots 2026-06-03 19:29:04 +00:00
Wild Dragon Dev
b5235e0a2c fix(node-agent): always mount /dev/shm into sidecars for framecache access 2026-06-03 19:02:04 +00:00
Wild Dragon Dev
5686e65df9 fix(docker): mount /dev/shm into capture sidecars for framecache access 2026-06-03 18:55:44 +00:00