Root cause of 'silent first ~1s then clean' + ~0.5% audio-too-long: in standby
the bridge keeps filling the audio FIFO while the idle-preview consumes only
video, so when recording starts ffmpeg reads a ~0.5s backlog of stale audio,
AND the video-only pre-roll discards video frames the audio never had.
Fix: (1) skip the video-only pre-roll in standby (warm slot = no unstable
frames), (2) drain the audio FIFO non-blocking immediately before ffmpeg opens
it, so audio starts at the live edge aligned with the first real video frame.
The 16ch interleave in the deltacast bridge produced audio at HALF the correct
sample rate (measured 24224 vs 48000 samples/s/ch), which broke A/V sync and
pitch. Per the working baseline (audio was clean before the channel selector),
revert the bridge audio thread to the original single-group 2ch extraction and
the capture-manager audio input to -ac 2 + wallclock + aresample.
KEPT the good fixes: long-GOP HEVC for non-growing (NVENC realtime, no frame
drops) and GPU-only codec list. 16ch/channel-select is shelved for a separate,
properly-validated change.
The pre-roll drained only the video pipe (fc_pipe) while the audio FIFO kept
buffering, so ffmpeg read ~PRE_ROLL_SECONDS of surplus pre-roll audio — making
audio longer than video, which when synced compresses audio ~0.5% (pitch-up,
measured: 2591573 audio samples vs 2579395 expected for the video duration).
In standby the framecache slot is already warm (no unstable startup frames), so
the drain is unnecessary; skipping it lets ffmpeg open video and audio together
from the same instant. Cold on-demand spawns keep the brief drain.
- restore -use_wallclock_as_timestamps on audio input: without it ffmpeg's raw
s16le reader stalled the graph (NVENC idle at 9%, ~half frames dropped). With
it + long-GOP HEVC the encoder runs realtime and A/V length stays locked.
- remove all CPU codec options (prores*, dnxh*, libx264/265) from recorder UI;
GPU NVENC only (hevc_nvenc / h264_nvenc). 3x L4 cluster, no reason for CPU.
- GPU codec defaults in env builders + proxy default h264_nvenc.
The hevc_nvenc codec was hardcoded to all-intra (-force_key_frames expr:1), which
is ~4x the NVENC load. Applied to every recording it exceeded the L4's realtime
budget at 1080p59.94 10-bit -> fc_pipe dropped ~half the frames -> video came out
shorter than the (correct) audio -> A/V drift + pitch-up on playback.
Now all-intra is used ONLY when growing-files is on (where it's required for the
editable head). Normal recordings use efficient long-GOP HEVC (2s GOP, 2 B-frames)
which NVENC sustains in realtime with zero drops.
Removing -use_wallclock_as_timestamps on the SDI audio input. The bridge writes
SDI-clock-paced samples, so PTS from the 48kHz sample count shares the video's
clock domain and the audio length tracks the video length exactly. Wall-clock
timestamps made audio length = real elapsed time, which drifted ~1% longer than
the frame-count video when the encoder dipped under realtime (pitch-up).
Taking the MAX sample count across the 4 audio groups could emit more audio
frames per slot than group 0 (the SDI-clock reference), drifting the audio
stream slightly longer than video — heard as a ~1% pitch-up. Group 0 paces the
timeline exactly as the original 2ch path did; shorter groups are silence-padded
to its length, never extending it.
Recorders are now physical capture ports, not user-created rows:
- migration 036: label, enabled, auto_provisioned + UNIQUE(node_id,device_index)
(the structural fix that makes two recorders sharing a port impossible)
- mam-api: auto-provision one recorder row per port from heartbeat capabilities
(reconcileRecordersForNode); create-once, never overwrites operator config
- mam-api: POST /:id/enable + /:id/disable (provision/teardown standby sidecar);
PATCH accepts label; config persists across enable/disable
- node-agent: freeCapturePort() force-removes any container on a capture port
before standby/start — eliminates the EADDRINUSE collisions
- web-ui: recorder menu grouped by node (online/offline), Enable/Disable toggle,
per-recorder config modal (codec/bitrate/growing/label/project), friendly
label over hardware name, no destructive delete
Fixes the delete/recreate churn that orphaned standby sidecars and collided on
capture ports during this session's outage.
Re-provisions the persistent standby sidecar for SDI/deltacast recorders that
lost theirs (manual cleanup, node redeploy, wiped /dev/shm). Without this the
recorder falls back to slow on-demand spawn on /start, which can collide on the
capture port (EADDRINUSE). Idempotent; { force:true } recreates even when a
container_id is already set.
- capture-manager: remove dead legacy deltacast FIFO video path (FC_SLOT_ID
is now always set by node-agent, framecache mandatory on all SDI nodes)
- node-agent: correct stale comment about legacy FIFO fallback
- onboard-node.sh: harden detect_sdi (device-node checks, not just lspci) and
persist COMPOSE_PROFILES so framecache survives every redeploy on SDI nodes
- remove committed capture.js.bak
Root cause of this session's outage: zampp3 came up without the capture
compose profile, so framecache never started; the bridge published to shm
with no consumer and recorders showed 'receiving' with no real capture.
The framecache ring delivers frame-accurate frames at exactly the SDI clock
rate. -use_wallclock_as_timestamps was wrong for this source — it stamped
frames by ffmpeg arrival time rather than capture time, causing the recorded
file to report wrong framerates (e.g. 56.06 instead of 59.94) and a
glitchy first second at startup (NVENC cold-start backlog bunched timestamps).
Fix: remove -use_wallclock_as_timestamps from the rawvideo (pipe:0) input
and rely on -framerate for correct CFR timestamps from frame 0.
Audio keeps its FIFO wallclock; aresample=async=1 on the master output
resamples audio to align with the CFR video PTS.
Sidecars now spawn at recorder CREATE time instead of /start time.
The container boots in STANDBY=1 mode (idle preview only, no ffmpeg master).
On /start, mam-api sends per-session params (CLIP_NAME, ASSET_ID, PROJECT_ID)
to the running sidecar via HTTP POST /capture/start — ffmpeg starts in <1s.
On /stop, mam-api calls HTTP POST /capture/stop — container stays alive in
standby, ready for the next take immediately.
Container is only killed on recorder DELETE.
This eliminates: Docker create/start overhead (~1-2s), bridge startup (~2-5s),
and pre-roll wait (~5s). Latency from 'record' click to first encoded frame
drops from ~10s to ~1s.
Changes:
- capture/src/index.js: boot in standby when STANDBY=1 env is set; still
start idle preview (live thumbnail visible before recording)
- capture/src/routes/capture.js: POST /start accepts full codec params and
asset_id in body (skips mam-api asset creation when asset_id provided)
- node-agent/index.js: handleSidecarStandby() + POST /sidecar/standby route;
warms bridge at recorder create time
- recorders.js POST /: spawn standby sidecar after DB insert (non-fatal)
- recorders.js POST /:id/start: HTTP fast-path to standby sidecar; falls
back to on-demand spawn if standby not available
- recorders.js POST /:id/stop: HTTP /capture/stop, keep container in standby
- recorders.js GET /:id/status: use port-based URL for local capture status
Reverts the local-temp+faststart approach from 549ca6c. Masters now stream
ffmpeg stdout directly to S3 via multipart upload — no local disk consumed
on the worker. Uses +frag_keyframe+empty_moov+default_base_moof which
Premiere Pro 25.x handles natively (to be confirmed separately).
Zero /tmp/capture files. Worker disk stays flat during recording.
fc_slot_create, fc_slot_destroy, fc_slot_open, fc_slot_close, and
fc_slot_write_frame were defined in slot.c but never declared in slot.h.
Any translation unit calling them without seeing a proper prototype
would fall back to implicit int return (32 bits), truncating 64-bit
pointers and causing SIGSEGV on dereference.
This affected framecache.c (POST /slots → fc_slot_create, DELETE
→ fc_slot_destroy) and other callers.
The struct fc_slot was defined only in slot.c, making it an incomplete type
in slot.h. The inline accessor functions (fc_slot_id, fc_slot_header, etc.)
in slot.h could not compile because they referenced incomplete struct
members. The compiler fell back to implicit int return type, truncating
64-bit pointers to 32 bits, causing SIGSEGV in registry_add() when
strncpy received a truncated slot_id pointer.
Fix: move the struct definition to slot.h and add proper function
declarations for the accessors (definitions stay in slot.c).
- services/capture/src/capture-manager.js:
- Added 5s pre-roll delay for Deltacast, Blackmagic, and SDI capture paths.
- Activates when is present.
- Spawns the process immediately, drains/discards the unstable frames for 5 seconds, and then pipes the stdout of the to the actual process.
- Keeps the master output perfectly clean from frame 0.
- services/web-ui/public/screens-ingest.jsx:
- Removed currentFps framerate display from both recording and idle status states.
The scheduler tick loop updates a schedule's status to 'starting' and 'stopping'
in the database while it initiates the API calls to the recorder container. The
original CHECK constraint in recorder_schedules rejected these two statuses,
causing the scheduler to crash on constraint violation and never start the job.
- services/framecache/src/net_ingest.c: new network ingest process
- Spawns ffmpeg to decode SRT/RTMP → raw UYVY422 stdout
- Reads decoded frames and writes into framecache slot via shm
- Registers slot with framecache HTTP API on startup
- Deregisters slot on clean exit (SIGTERM)
- Reconnect loop for listener mode (stays alive between sessions)
- --url, --slot-id, --fc-url, --width, --height, --fps-num/den,
--source-type, --listen, --listen-port, --stream-key args
- Emits format JSON to stderr on first frame
- services/framecache/CMakeLists.txt: add net_ingest target
- services/framecache/Dockerfile: copy net_ingest to runtime image
- services/node-agent/index.js:
- startNetIngest() / stopNetIngest(): lifecycle management per recorder
- Spawns net_ingest before sidecar start for srt/rtmp sourceTypes
- Injects FC_SLOT_ID=net-<containerId> into sidecar env
- Sets IpcMode=host for network sidecars using framecache
- Maps temp id → real containerId after container create
- stopNetIngest() called on sidecar stop
- NET_INGEST_BIN env var (default: docker exec framecache net_ingest)
- services/capture/src/capture-manager.js:
- _buildInputArgs srt/rtmp: framecache path when FC_SLOT_ID set
(spawns fc_pipe, uses pipe:0 rawvideo input — same as SDI path)
- Falls back to direct URL when FC_SLOT_ID not set (legacy path)
- audioMap: network via framecache uses '0🅰️0?' (video-only fc_pipe,
no audio FIFO — audio-in-shm is roadmap)
- HLS tee: sdiHlsDir covers network-via-framecache; legacy tee gated
on !FC_SLOT_ID to avoid duplicate HLS outputs
- fc_pipe piped to ffmpeg stdin for network framecache path
- docker-compose.worker.yml: FC_URL + NET_INGEST_BIN in node-agent env
- fc_writer.h/fc_writer.c: new framecache slot writer module
- Registers slot via POST /slots to framecache HTTP API on signal lock
- Opens shm file returned by API (O_RDWR + mmap MAP_SHARED)
- fc_writer_write(): atomic write_cursor advance + sem_post per frame
- fc_writer_close(): DELETE /slots/:id + munmap + sem_close
- HTTP calls via raw POSIX sockets (no libcurl dependency)
- Parses host:port from FC_URL env var or --fc-url arg
- main.c changes:
- PortState gains slot_id, fc_url, fc_writer fields
- --fc-url CLI arg + FC_URL env var (default http://localhost:7435)
- On signal lock: fc_writer_open() before thread launch;
falls back to FIFO if framecache unreachable (fc_writer == NULL)
- video_thread: shm path primary (fc_writer != NULL),
FIFO path fallback (fc_writer == NULL or LEGACY_FIFO=1)
- Format JSON now includes slot_id field for node-agent consumption
- Cleanup: fc_writer_close() before VHD_CloseBoardHandle
- CMakeLists.txt:
- Add fc_writer.c to build
- Link rt (shm_open, sem_open)
- LEGACY_FIFO option (OFF by default) for nodes without framecache
Audio thread unchanged — audio stays in FIFO (shm audio is roadmap).
- services/framecache/: new standalone container
- slot.h/slot.c: shm ring buffer (120 frames, FC_MAGIC header, atomic
write_cursor, POSIX semaphore per slot)
- registry.h/registry.c: in-memory slot registry + /dev/shm/framecache/
registry.json persistence
- framecache.c: HTTP API server (libmicrohttpd, port 7435)
POST /slots, GET /slots, GET /slots/:id, DELETE /slots/:id, GET /health
- fc_client.h/fc_client.c: consumer library — fc_consumer_open/read/close
with per-consumer cursor, timeout via sem_timedwait, automatic skip+count
when consumer falls behind writer by > ring_depth frames
- fc_test_consumer.c: dev utility to attach to any slot and print fps/stats
- CMakeLists.txt: framecache server + fc_client static lib + test consumer
- Dockerfile: builder + slim runtime stages
- docker-compose.worker.yml: add framecache service (profile: capture,
ipc: host, shm_size from FC_SHM_SIZE_GB env var, healthcheck)
- .env.example: document FC_SHM_SIZE_GB with per-node guidance