datarhei-dragonfork-core/NOTES.md
ZGaetano 891f65dff6
Some checks failed
ci / vet + build (push) Failing after 5m0s
ci / race tests (push) Has been skipped
ci / WebRTC smoke (5-viewer fanout) (push) Has been skipped
ci / WebRTC latency p95 gate (push) Has been skipped
docs: add v0.2 and v0.3 implementation notes
Documents the SDK gap and monkey-patch workaround, seed-data.sh
no-clobber problem, keyframe cache lock ordering rationale, and
the STAP-A IDR detection gap.
2026-05-10 13:07:04 -04:00

5.2 KiB

Datarhei - Dragon Fork — Implementation Notes

This file tracks observations, gotchas, and decisions made during the Dragon Fork WebRTC egress implementation. Keep entries chronological; each milestone adds a new section.

Baseline (M1, 2026-04-17)

  • Forked from upstream datarhei/core commit 0de97f4 ("Add linux/arm/v8 build").
  • Upstream module path: github.com/datarhei/core/v16. The Dragon Fork keeps this module path so internal imports don't churn; the fork is distinguished by its repo location (forge.wilddragon.net/zgaetano/datarhei-dragonfork-core) and branch history, not its Go module identity.
  • Toolchain: Go 1.22.8, FFmpeg 4.4.2 in the sandbox. FFmpeg 6.x recommended for publishers in Task 10; 4.4.2 is sufficient for the PoC (libx264 + libopus + RTP muxer all present).
  • go build ./... on the clean fork: succeeds.
  • go test -short ./... on the clean fork: all packages pass. No upstream flakes observed.

Pre-existing state of note

  • None flagged.

v0.2 (2026-05-06)

Restreamer SDK gap — UpsertIngest does not forward webrtc or whip_ingest

The datarhei Restreamer SDK's UpsertIngest builds the FFmpeg process config from control.hls, control.rtmp, control.srt, and control.process, but silently discards control.webrtc and control.whip_ingest. This was confirmed by inspecting the minified SDK bundle inside the running container.

Implication: toggling WebRTC egress or WHIP ingest from the Restreamer UI required a monkey-patch. The Edit view (overlay/src/views/Edit/index.js) overrides props.restreamer._upsertProcess immediately before the UpsertIngest call to inject whip_ingest.enabled (and in future, webrtc.enabled) into the process config JSON, then restores the original method in a finally block. The patch is narrow and reversible.

Why not patch the SDK? The SDK is a vendored minified bundle inside the upstream Restreamer UI. Patching it would require either maintaining a fork of the minifier input (the full SDK source is not in the UI repo) or deobfuscating and re-minifying. The monkey-patch is pragmatic and self-documenting.

UI state uses enable (no d); Core API uses enabled (with d)

WHEP.js and WHIP.js use enable (matching existing convention in the UI controls). The monkey-patch performs the mapping:

config.whip_ingest = { enabled: !!(control.whip_ingest && control.whip_ingest.enable) };

If this causes confusion in the future, unify on enabled in both places.


v0.3 (2026-05-10)

ProcessConfigWHIPIngest was the critical backend gap

http/api/process.go had ProcessConfigWebRTC for WHEP but no corresponding struct for WHIP. Without it, the JSON field whip_ingest was dropped on unmarshal and app.Config.WHIPIngest.Enabled was always false. WHIP could never activate via the API regardless of what the UI sent. Adding ProcessConfigWHIPIngest and wiring it into Marshal()/Unmarshal() was the essential fix (commit 4d94c88).

seed-data.sh no-clobber bug

seed-data.sh used cp -n (no-clobber) for all files including the built JS bundle. On first deploy the static/ directory doesn't exist and the copy succeeds. On every subsequent deploy the directory already exists and the copy is skipped — so a rebuilt container silently serves the old bundle. The symptom was confusing: the binary had new code, but the UI didn't reflect it.

Fix: split into two phases. index.html, asset-manifest.json, and static/ are always overwritten (cp -Rfp). All other content (channel database, player bundles, operator-uploaded media) remains no-clobber to protect live data.

Keyframe cache lock ordering

keyFrameCache has its own mutex c.mu distinct from Source.mu. The two locks are never nested:

  • readLoop calls cache.push(pkt) (acquires/releases c.mu), then acquires s.mu for the subscriber fanout — sequential, not nested.
  • Source.Subscribe() takes the snapshot outside s.mu (acquires/releases c.mu), then acquires s.mu to register the subscriber — also sequential.

This means there is no deadlock risk even though two separate mutexes are involved. The snapshot-before-lock pattern in Subscribe was chosen for clarity, not necessity; but it documents the intent explicitly.

STAP-A IDR detection is not implemented

isH264IDRStart handles single-NAL (type 5) and FU-A start (type 28, start bit, inner type 5). It does not handle STAP-A aggregates (type 24) that happen to lead with an IDR NAL.

In practice, FFmpeg and GStreamer never emit IDR slices inside STAP-A — IDR frames are large and STAP-A is designed for small NALs (SPS + PPS combos). If a publisher that does use STAP-A for IDR ever appears, the cache will miss the keyframe boundary; the worst outcome is a larger-than-expected burst (the cache grows until the next correctly-detected IDR) rather than a crash or incorrect video. Add STAP-A handling in a future revision if needed.

go test -race ./core/webrtc/... baseline

All 34 tests in core/webrtc pass under the race detector as of v0.3 (commit 228ed4b). The suite covers config, ICE, registry, peer creation, WHEP handler, keyframe cache, and Source subscribe/pre-fill/close. Total runtime ≈ 16 s (dominated by the two 5-second ICE gathering timeouts in TestPeerFactory_*).