When a capture sidecar stopped/restarted, the bridge video thread got EPIPE
on the FIFO write, set g_port_stop[port]=1, and the port went dead — requiring
a full bridge restart to recover. Subsequent record attempts on that port would
hang in 'connecting' forever.
Fix: mirror the audio thread pattern — on EPIPE, close the FIFO and loop back
to open() blocking for the next reader. Hardware lock errors (SDK failures)
still stop the port via g_port_stop as before. Only reader-disconnect (EPIPE)
now recovers gracefully.
This was the cause of port 6 (Ghost) failure in the burn test.
🤖 Generated with Claude Code
ROOT CAUSE of 'connecting' hangs and intermittent port failures:
The DELTA-12G-e-h 8c is a bidirectional card. Without calling
VHD_SetBiDirCfg(board_index, VHD_BIDIR_80) before streaming, the
board remains in its default bi-dir config (likely 4RX/4TX) — so
RX stream opens fail with VHDERR_RESOURCEUNAVAILABLE on channels
configured as TX, causing random 'connecting' hangs per the SDK docs.
Per SDK Tools.cpp SetNbChannels() pattern:
1. Open temporary board handle
2. Check IS_BIDIR + channel counts
3. Call VHD_SetBiDirCfg(board_index, VHD_BIDIR_80) for 8ch bidir
4. Close temp handle, then open real board handle for streaming
Also add VHD_SetChannelProperty(VHD_CHANNEL_MODE_SDI) for ASI-type
channels per Sample_RX.cpp — required for 12G-ASI/3G-ASI channel
types to correctly detect incoming video standard.
🤖 Generated with Claude Code
Root cause A (main.c): audio_thread set the global g_stop flag on EPIPE
(ffmpeg reader died). This killed ALL port threads across the entire bridge
process. Bridge process then exited with all 8 ports gone.
Root cause B (node-agent/index.js): startDeltacastBridge() skipped respawn
when FIFOs existed in /dev/shm/deltacast, even if the bridge process was dead.
Next ffmpeg opened the audio FIFO read-end and blocked forever (no writer) →
no audio (or video) for any new recording.
Fix A (main.c):
- Add per-port atomic g_port_stop[MAX_PORTS] flags.
- Audio thread: on EPIPE, close the FIFO fd and loop back to reopen it.
The VHD ANC stream stays open across reconnects. Other ports unaffected.
- Video thread: on EPIPE or stream error, set only g_port_stop[port], not
the global g_stop. Other ports keep running.
- MAX_PORTS #define moved before globals so g_port_stop[MAX_PORTS] compiles.
Fix B (node-agent/index.js):
- Add _dcBridgeProcessAlive() — scans /proc/<pid>/cmdline for deltacast-bridge.
- startDeltacastBridge(): if FIFOs exist but no live bridge process is found,
spawn a fresh bridge instead of silently returning. Detects bridges started
externally (e.g. sudo on the host before node-agent started).
Requires: bridge rebuild + restart on zampp3. No capture image rebuild needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The old architecture spawned one deltacast-capture per recorder port; each
called VHD_OpenBoardHandle, triggering a BufMngr.c:781 OOB fault in the
delta_x300 kernel driver whenever two opens raced.
Fix: a single deltacast-bridge daemon opens the board once, opens RX
streams for all requested ports concurrently, and writes each port's
video/audio to named FIFOs (/dev/shm/deltacast/video-<N>.fifo,
/dev/shm/deltacast/audio-<N>.fifo). Capture sidecars read from those
FIFOs directly — no board handle, no race, no flock.
Changes:
services/capture/deltacast-bridge/main.c
- Complete rewrite: --ports csv arg, board opened once, one
video+audio thread pair per port, FIFO paths per port, format
JSON emitted per port on signal lock, SIGTERM clean shutdown.
- flock/serialize logic removed (no longer needed).
- --port single-port compat alias retained.
services/capture/deltacast-bridge/CMakeLists.txt
- Rename target deltacast-capture -> deltacast-bridge.
- POST_BUILD symlink deltacast-capture -> deltacast-bridge for compat.
services/capture/src/capture-manager.js
- deltacast _buildInputArgs: remove bridge spawn; wait up to 30s
for FIFOs to exist (bridge may be starting); return rawvideo +
s16le FIFO inputArgs. bridgeProcess=null.
- audioMap: keyed on sourceType instead of bridgeProcess (both
inputs are always present for deltacast).
- Remove readFirstStderrLine helper (no longer needed).
- Remove bridgeProcess.stdout.pipe / processes.bridge stop signal.
services/node-agent/index.js
- Add import spawn for bridge daemon management.
- Add startDeltacastBridge / stopDeltacastBridge: host-process
lifecycle for the shared bridge, ref-counted by sidecar count.
- handleSidecarStart: on deltacast, increment counter + start bridge;
decrement on container create/start failure.
- handleSidecarStop: decrement counter; stop bridge when last sidecar.
- _containerSourceType map tracks containerId->sourceType for stop.
- Old acquireDcLock mutex retained but no longer called.
The signal timeout deadline was set at process start before waiting for
the flock. Bridges queued behind earlier ports waited minutes for the
lock, then found their 30s signal deadline had already expired before
they even opened the board, causing false "no signal" failures on ports
that have live signal.
Fix: move clock_gettime deadline initialisation to AFTER flock acquired
and board opened, so the full sig_timeout is always available for signal
detection regardless of queue wait time.
Concurrent VHD_OpenBoardHandle calls from multiple capture sidecars
trigger delta_x300 BufMngr.c:781 array-index-out-of-bounds, wedging all
RX channels until the module is reloaded. The node-agent stagger only
delays container start — the bridge binary starts ~2s later and can still
race. This fix acquires an exclusive flock on /dev/shm/deltacast/bridge.lock
before VHD_OpenBoardHandle and holds it until signal lock succeeds (then
adds a 4s settle before releasing so the board's buffer queues stabilize).
Lock is released on signal failure too so the next bridge is never
permanently blocked. All 8 channels can now start safely by serializing
through the same lock file mounted into every sidecar.
ffmpeg opens all inputs before processing; input 1 is the audio FIFO. The
bridge previously opened the FIFO writer only after VHD_OpenStreamHandle +
VHD_StartStream succeeded, returning early on failure / no embedded audio and
never opening the FIFO -> ffmpeg blocked forever on input 1 -> 0 fps and an
empty HLS preview. Now the FIFO writer is opened unconditionally and first,
and the audio thread feeds a continuous, wall-clock-paced s16le stereo stream
(real samples when available, otherwise silence). SIGPIPE is ignored so a
dying ffmpeg returns EPIPE instead of killing the bridge.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>