dragonflight

Author	SHA1	Message	Date
ZGaetano	d3e520e3b1	fix(capture+gui): kill audio-drift regression + fix elapsed/signal status A/V REGRESSION (no audio + start stutter): capture-manager.js dropped the -use_wallclock_as_timestamps 1 flag on the audio FIFO input (re-added by `d6b0b3a`). Wallclock stamped audio by arrival time while video is CFR frame-count, so audio ran 3-18% longer and master aresample padded seconds of LEADING SILENCE → silent head, late video start, apparent 'no audio'. Removing it restores the sample-count PTS baseline (8e5405c/55a72af): audio shares the SDI clock domain, no drift, no pad. GUI BUG A (elapsed showed 1hr+ on standby/just-started): frontend seeded elapsed from recorder.started_at = the standby CONTAINER boot time (hours old). Now seeds ONLY from the sidecar session duration (liveStatus.duration when live.recording), shows nothing when idle. Backend /status now returns session-scoped duration + recording flag, not container uptime. GUI BUG B (false 'stopped' signal on idle ports): backend inferred signal from container Running state (running->receiving, down->stopped) — so idle standby ports with down sidecars showed red 'stopped'. Now signal comes from the sidecar session (live.recording); standby = neutral 'idle', never a false 'stopped'/'receiving'.	2026-06-04 13:21:30 +00:00
ZGaetano	727bdaae80	feat(growing): auto-promotion scanner + hours-based delay setting The growing_promote_after_seconds setting was stored but NEVER read — no scanner existed, so growing clips only left the SMB share on a manual right-click 'Move to S3'. This adds the missing automation: - promotion-scanner.js: every 60s, finds pending_migration assets idle (updated_at) longer than settings.growing_promote_after_seconds and enqueues a promotion job. Idempotent (status guard + stable jobId) so it's safe on every promotion worker. 12h default fallback. - worker/index.js: starts the scanner on promotion-capable workers. - Settings UI: the delay field is now 'Auto-promote to S3 after (hours)' (converts hours<->seconds; storage stays seconds). Notes the manual Library right-click 'Move to S3' option too. Manual promotion (right-click Move to S3) and the safe HLS-segment live thumbnail were already implemented and working.	2026-06-04 13:14:03 +00:00
ZGaetano	0c405ae7d4	fix(growing): read GROWING_ENABLED from env at record time + drop dead const Second half of the growing-never-engages bug. start() decided growing via the module-level const GROWING_ENABLED (captured false at standby boot) and referenced the now-removed GROWING_SMB_MOUNT const (ReferenceError, silently swallowed). Both made growingActive=false, so every growing record produced HEVC/S3 instead of XDCAM HD422 MXF. Now reads process.env.GROWING_ENABLED + growingSmbConfig().mount fresh at record start.	2026-06-04 13:02:23 +00:00
ZGaetano	b27b9f6909	fix(s3): keep-alive agents + long timeouts to end socket starvation Root cause of stuck 'processing', failed deletes, and dead playback: The mam-api proxies media (/video, /hls pipe the full S3 body through Express), holding long-lived streaming sockets. With the SDK's default http agents (no keep-alive, unbounded but unpooled) those streams starved control-plane calls — DeleteObject and the proxy worker's master download — which timed out (10s connectionTimeout) in bursts. Fixes: - mam-api S3 client: dedicated keep-alive http/https Agents (maxSockets 256) + requestTimeout raised 30s→300s so large master GETs finish. - worker S3 client: previously had NO handler config at all (SDK defaults). Added keep-alive agents + 600s requestTimeout so proxy/conform master downloads (hundreds of MB) don't stall and leave assets in 'processing'.	2026-06-04 12:53:28 +00:00
ZGaetano	ac1d7e1e1f	fix(growing): read SMB params from env at mount time, not module load Root cause of growing producing .mov instead of XDCAM HD422 .mxf: mountGrowingShare() used module-level consts (GROWING_SMB_MOUNT etc.) captured from process.env at IMPORT time. Standby capture containers boot with these unset and receive the SMB mount/credentials per-session over /capture/start (capture.js sets process.env right before start()). Because the consts were already frozen empty, mountGrowingShare() saw no mount source, returned false, and growing silently fell back to S3 streaming — producing an HEVC .mov while the asset key said .mxf. Fix: growingSmbConfig() reads process.env fresh at mount time. Also drop the stale const guard in unmountGrowingShare().	2026-06-04 12:51:32 +00:00
ZGaetano	cb25711ec6	fix(growing): inline CIFS creds + capture caps + storage probe timeout Three fixes to restore growing-files (XDCAM HD422 MXF) recording: 1. capture-manager mountGrowingShare: pass username=/password= inline instead of a credentials= file. TrueNAS SMB3 rejects the creds-file form with EACCES (-13, 'cannot mount read-only') while the identical inline creds mount fine. This was causing every growing record to silently fall back to the HEVC/S3 path (producing .mov, not .mxf). 2. docker-compose capture: add cap_add SYS_ADMIN + DAC_READ_SEARCH and apparmor:unconfined so mount.cifs can run inside the container. 3. storage /overview: wrap S3 HeadBucket/ListObjects probe in a 5s timeout so the admin 'Mount health' card stops hanging on 'Probing…' forever when S3 is slow.	2026-06-04 12:42:39 +00:00
ZGaetano	2812705d1c	feat(recorders): expose XDCAM HD422 bitrate field in growing mode Growing mode now shows an editable 'XDCAM HD422 bitrate (Mbps)' input (codec stays fixed to the growing MXF path). Default seeds to 50 Mbps for growing, 25 for GPU master. Backend already honored recording_video_bitrate via _buildGrowingOrchestrator -b:v/minrate/maxrate; this surfaces the control in the config modal.	2026-06-04 11:52:07 +00:00
ZGaetano	91d0d755a5	fix(node-agent): proper stdcopy demux for container logs (clean line starts)	2026-06-04 05:29:56 +00:00
ZGaetano	179a740453	feat(admin): cluster-wide Logs page + fix container log demux + poll containers - mam-api: dockerLogs() + demuxDockerStream() — the local container-log path JSON.parsed Docker's raw multiplexed stream and always returned '(no logs)'; now strips stdcopy framing and returns readable text (tail configurable). - web-ui: new Logs admin page — every container across every node grouped by node in a left rail, live-follow log viewer with filter + copy on the right. Reuses the now-working /cluster/containers/:node/:id/logs endpoint. - web-ui: Containers screen now polls every 5s (was load-once) so the cross-cluster view stays live without manual refresh. - icons: add server + file glyphs (were referenced but missing -> blank). - nav: Logs wired into the Admin sidebar section + routes + breadcrumbs.	2026-06-04 05:28:17 +00:00
ZGaetano	1348db8f33	fix(node-agent): import crypto — auth was ALWAYS failing on remote nodes THE root cause of 'container view only shows the primary': checkAgentAuth used crypto.timingSafeEqual but 'crypto' was never imported (ES module). The call threw ReferenceError, the try/catch swallowed it, _bearerEq returned false, so EVERY bearer-token check on a node-agent failed. The primary's own containers showed only because the local node-agent has no NODE_TOKEN (auth skipped). Adding 'import crypto from crypto' makes token comparison work, so the primary mam-api can now read containers + logs from every node.	2026-06-04 05:21:33 +00:00
ZGaetano	4ad145f00a	debug(node-agent): log token prefix/suffix on auth reject	2026-06-04 05:20:13 +00:00
ZGaetano	90bd82f49a	debug(node-agent): log auth reject token lengths	2026-06-04 05:18:34 +00:00
ZGaetano	70c873ae95	fix(cluster): shared CLUSTER_READ_TOKEN so mam-api sees containers on ALL nodes /cluster/containers only returned the primary's containers: mam-api fanned out to each node-agent with a single NODE_AGENT_TOKEN, but each node-agent only accepted its own bound NODE_TOKEN, so remote nodes returned 401 and were silently dropped (UI showed 'only zampp1'). node-agent now ALSO accepts a shared CLUSTER_READ_TOKEN (= mam-api's NODE_AGENT_TOKEN) for the read-only container/log endpoints, so the aggregate container view + per-container logs work across the whole cluster.	2026-06-04 05:14:44 +00:00
ZGaetano	d6b0b3a9a6	fix(capture): restore proven-clean wallclock audio (match `de509c6` baseline) Removing wallclock made A/V length drift far worse (audio 11.8% long). The known-clean config used wallclock + master aresample=async=1; the leading silence is a standby backlog artifact addressed by the bridge live-edge flush + record-start audio FIFO drain, not by changing the timestamp source.	2026-06-04 05:06:40 +00:00
ZGaetano	55a72af905	fix(capture): derive audio PTS from sample count (kill 2.5s leading silence) The persistent ~2.5s of leading silence was the master aresample=async=1 PADDING the audio to reconcile a PTS-origin mismatch: video PTS starts at frame 0 (-framerate), but -use_wallclock_as_timestamps stamped the first audio chunk at its wall-clock arrival time (~2.5s after the ffmpeg graph opened). aresample filled the gap with silence. Drop wallclock: audio PTS now comes from the 48kHz sample count starting at 0 — the same origin as video frame 0 — so the streams align with no pad. The bridge already hands live audio (backlog flushed on attach), so no rate reference is needed from wallclock.	2026-06-04 05:01:05 +00:00
ZGaetano	e9e883d06e	fix(deltacast-bridge): flush queued audio backlog to live edge on reader attach The ~2.5s of leading silence at record start was the VHD audio slot QUEUE: while the recorder is idle (no FIFO reader), the bridge blocks on open(O_WRONLY) but the board keeps buffering audio slots. When the record ffmpeg attaches, the bridge streamed that stale backlog first — heard as leading silence and pushing audio out of alignment with the live video. On each reader attach, drain slots that lock FAST (already-queued backlog) and stop at the first lock that takes ~a frame period (= waiting on a live slot), so the reader is handed the live edge, A/V aligned.	2026-06-04 04:54:32 +00:00
ZGaetano	b1a2249f36	fix(capture): align A/V at record start (kill leading silence + length drift) Root cause of 'silent first ~1s then clean' + ~0.5% audio-too-long: in standby the bridge keeps filling the audio FIFO while the idle-preview consumes only video, so when recording starts ffmpeg reads a ~0.5s backlog of stale audio, AND the video-only pre-roll discards video frames the audio never had. Fix: (1) skip the video-only pre-roll in standby (warm slot = no unstable frames), (2) drain the audio FIFO non-blocking immediately before ffmpeg opens it, so audio starts at the live edge aligned with the first real video frame.	2026-06-04 04:49:53 +00:00
ZGaetano	fffb6b63b5	fix(capture): revert 16ch audio to clean 2ch — fixes pitch/rate regression The 16ch interleave in the deltacast bridge produced audio at HALF the correct sample rate (measured 24224 vs 48000 samples/s/ch), which broke A/V sync and pitch. Per the working baseline (audio was clean before the channel selector), revert the bridge audio thread to the original single-group 2ch extraction and the capture-manager audio input to -ac 2 + wallclock + aresample. KEPT the good fixes: long-GOP HEVC for non-growing (NVENC realtime, no frame drops) and GPU-only codec list. 16ch/channel-select is shelved for a separate, properly-validated change.	2026-06-04 04:33:34 +00:00
ZGaetano	b28393eb76	Revert "fix(capture): skip video-only pre-roll in standby to stop A/V pitch drift" This reverts commit `51b66d882f`.	2026-06-04 04:28:11 +00:00
ZGaetano	51b66d882f	fix(capture): skip video-only pre-roll in standby to stop A/V pitch drift The pre-roll drained only the video pipe (fc_pipe) while the audio FIFO kept buffering, so ffmpeg read ~PRE_ROLL_SECONDS of surplus pre-roll audio — making audio longer than video, which when synced compresses audio ~0.5% (pitch-up, measured: 2591573 audio samples vs 2579395 expected for the video duration). In standby the framecache slot is already warm (no unstable startup frames), so the drain is unnecessary; skipping it lets ffmpeg open video and audio together from the same instant. Cold on-demand spawns keep the brief drain.	2026-06-04 04:24:08 +00:00
ZGaetano	07eea02109	fix(capture): restore audio wallclock (throughput) + remove CPU codec options - restore -use_wallclock_as_timestamps on audio input: without it ffmpeg's raw s16le reader stalled the graph (NVENC idle at 9%, ~half frames dropped). With it + long-GOP HEVC the encoder runs realtime and A/V length stays locked. - remove all CPU codec options (prores, dnxh, libx264/265) from recorder UI; GPU NVENC only (hevc_nvenc / h264_nvenc). 3x L4 cluster, no reason for CPU. - GPU codec defaults in env builders + proxy default h264_nvenc.	2026-06-04 04:14:59 +00:00
ZGaetano	0ea22e1e53	fix(capture): gate all-intra HEVC on growing-files; normal record uses long-GOP The hevc_nvenc codec was hardcoded to all-intra (-force_key_frames expr:1), which is ~4x the NVENC load. Applied to every recording it exceeded the L4's realtime budget at 1080p59.94 10-bit -> fc_pipe dropped ~half the frames -> video came out shorter than the (correct) audio -> A/V drift + pitch-up on playback. Now all-intra is used ONLY when growing-files is on (where it's required for the editable head). Normal recordings use efficient long-GOP HEVC (2s GOP, 2 B-frames) which NVENC sustains in realtime with zero drops.	2026-06-04 04:09:14 +00:00
ZGaetano	8e5405c3f9	fix(capture): derive deltacast audio PTS from sample count, not wall-clock Removing -use_wallclock_as_timestamps on the SDI audio input. The bridge writes SDI-clock-paced samples, so PTS from the 48kHz sample count shares the video's clock domain and the audio length tracks the video length exactly. Wall-clock timestamps made audio length = real elapsed time, which drifted ~1% longer than the frame-count video when the encoder dipped under realtime (pitch-up).	2026-06-04 04:01:54 +00:00
ZGaetano	51f939b1fe	fix(deltacast-bridge): use group-0 sample count as authoritative audio length Taking the MAX sample count across the 4 audio groups could emit more audio frames per slot than group 0 (the SDI-clock reference), drifting the audio stream slightly longer than video — heard as a ~1% pitch-up. Group 0 paces the timeline exactly as the original 2ch path did; shorter groups are silence-padded to its length, never extending it.	2026-06-04 04:01:25 +00:00
ZGaetano	095306d9cf	feat(recorders): 16ch SDI audio capture + per-recorder channel select + menu redesign Audio: - deltacast-bridge: always extract all 4 SDI audio groups (16ch), interleave to one 16ch s16le stream per port FIFO; format JSON reports audio_channels:16 - capture-manager: declare FIFO as 16ch input; keep first N discrete channels (2/8/16) via pan channelmap on the master (no downmix); HLS preview stays stereo. effAudioChannels drives -ac on the master container. - config modal: Audio channels select (2/8/16) - channel count already flows mam-api->node-agent->capture via RECORDING_AUDIO_CHANNELS UI redesign (production craft): - recorders grouped into per-node hardware 'rack' cards (online/offline state) - lifecycle accent rail: grey DISABLED / green ENABLED / pulsing-red RECORDING - promoted capture-port chip, monospaced metadata, Enable as primary CTA - dedicated recorder CSS block; built on existing design tokens	2026-06-04 03:34:41 +00:00
ZGaetano	de509c66ab	feat(recorders): hardware-identity model with Enable/Disable lifecycle Recorders are now physical capture ports, not user-created rows: - migration 036: label, enabled, auto_provisioned + UNIQUE(node_id,device_index) (the structural fix that makes two recorders sharing a port impossible) - mam-api: auto-provision one recorder row per port from heartbeat capabilities (reconcileRecordersForNode); create-once, never overwrites operator config - mam-api: POST /:id/enable + /:id/disable (provision/teardown standby sidecar); PATCH accepts label; config persists across enable/disable - node-agent: freeCapturePort() force-removes any container on a capture port before standby/start — eliminates the EADDRINUSE collisions - web-ui: recorder menu grouped by node (online/offline), Enable/Disable toggle, per-recorder config modal (codec/bitrate/growing/label/project), friendly label over hardware name, no destructive delete Fixes the delete/recreate churn that orphaned standby sidecars and collided on capture ports during this session's outage.	2026-06-04 03:14:43 +00:00
ZGaetano	9f2eac7b61	merge: capture cleanup + standby reconcile helper (base for recorder redesign)	2026-06-04 03:05:06 +00:00
ZGaetano	bf4632b911	feat(mam-api): extract ensureStandbySidecar + add POST /recorders/reconcile-standby Re-provisions the persistent standby sidecar for SDI/deltacast recorders that lost theirs (manual cleanup, node redeploy, wiped /dev/shm). Without this the recorder falls back to slow on-demand spawn on /start, which can collide on the capture port (EADDRINUSE). Idempotent; { force:true } recreates even when a container_id is already set.	2026-06-04 03:05:00 +00:00
ZGaetano	5668c03615	chore(capture): remove stale legacy FIFO path + pin capture profile - capture-manager: remove dead legacy deltacast FIFO video path (FC_SLOT_ID is now always set by node-agent, framecache mandatory on all SDI nodes) - node-agent: correct stale comment about legacy FIFO fallback - onboard-node.sh: harden detect_sdi (device-node checks, not just lspci) and persist COMPOSE_PROFILES so framecache survives every redeploy on SDI nodes - remove committed capture.js.bak Root cause of this session's outage: zampp3 came up without the capture compose profile, so framecache never started; the bridge published to shm with no consumer and recorders showed 'receiving' with no real capture.	2026-06-04 02:50:57 +00:00
Wild Dragon Dev	4045e30cd2	fix(node-agent): make http server handler async	2026-06-04 01:54:38 +00:00
Wild Dragon Dev	df6ca084ff	feat(web-ui): add Node column to Containers screen + integrated log viewer	2026-06-04 01:48:44 +00:00
Wild Dragon Dev	2f13c8d8b1	feat(mam-api): aggregate containers from all nodes + proxy logs	2026-06-04 01:42:13 +00:00
Wild Dragon Dev	a90adb5b52	feat(node-agent): add /containers and /sidecar/:id/logs endpoints	2026-06-04 01:40:44 +00:00
Wild Dragon Dev	8efcf5c545	feat(capture): remove build-with-decklink.sh script	2026-06-04 01:27:41 +00:00
Wild Dragon Dev	e5abbede43	debug(fc_writer): add trace logs for GET slots path	2026-06-04 01:13:19 +00:00
Wild Dragon Dev	cc489f7774	fix(fc_writer): handle 409 Conflict by fetching existing slot details via GET	2026-06-04 01:12:06 +00:00
Wild Dragon Dev	5b72ee167d	fix(decklink-bridge): prevent redundant fc_writer_open loops via last_format tracking	2026-06-04 01:10:47 +00:00
Wild Dragon Dev	d957ce74ae	fix(decklink-bridge): avoid redundant fc_writer_open calls in reopen_slot	2026-06-04 01:09:08 +00:00
Wild Dragon Dev	58c058b10c	fix(framecache): bind port 7435 to 0.0.0.0 so remote bridges can register slots	2026-06-04 01:00:54 +00:00
Wild Dragon Dev	e715af158d	fix(node-agent): pass FRAMECACHE_IP to node-agent env	2026-06-04 00:58:51 +00:00
Wild Dragon Dev	21ba7595b3	fix(node-agent): await async cleanup + fix syntax	2026-06-04 00:57:22 +00:00
Wild Dragon Dev	315b31a68b	fix(node-agent): await stopDecklinkBridge and clean up stale occurrences	2026-06-04 00:54:29 +00:00
Wild Dragon Dev	d1b40f5303	fix(node-agent): pass correct FC_URL and Cmd to containerized decklink-bridge	2026-06-04 00:51:14 +00:00
Wild Dragon Dev	6ee8dd5694	feat(node-agent): containerized decklink-bridge + async bridge management	2026-06-04 00:46:19 +00:00
Wild Dragon Dev	8ca7c79acd	fix(node-agent): mount decklink-bridge wrapper script as file (not dir)	2026-06-04 00:43:19 +00:00
Wild Dragon Dev	fb0ce320a5	build(node-agent): mount host /usr/local/bin to expose decklink-bridge wrapper	2026-06-04 00:42:31 +00:00
Wild Dragon Dev	6481760dff	revert(capture): Dockerfile copy paths to root-relative for compose build	2026-06-04 00:39:24 +00:00
Wild Dragon Dev	650a100d17	build(capture): include decklink-bridge in runtime image	2026-06-04 00:37:49 +00:00
Wild Dragon Dev	400cb786ab	fix(decklink-bridge): use IDeckLinkVideoBuffer QueryInterface to get raw bytes	2026-06-04 00:35:16 +00:00
Wild Dragon Dev	74055e79f8	fix(decklink-bridge): use GetFrameInternalBufferBytes instead of GetBytes	2026-06-04 00:28:19 +00:00

1 2 3 4 5 ...

1131 commits