docs: add Dragon Fork WebRTC egress design spec and M1 plan

This commit is contained in:
Zac Gaetano 2026-04-17 08:40:05 -04:00
parent 0de97f4a6b
commit 262a393b8d
3 changed files with 2389 additions and 0 deletions

26
NOTES.md Normal file
View file

@ -0,0 +1,26 @@
# Datarhei - Dragon Fork — Implementation Notes
This file tracks observations, gotchas, and decisions made during the Dragon
Fork WebRTC egress implementation. Keep entries chronological; each milestone
adds a new section.
## Baseline (M1, 2026-04-17)
- Forked from upstream `datarhei/core` commit `0de97f4` ("Add linux/arm/v8 build").
- Upstream module path: `github.com/datarhei/core/v16`. The Dragon Fork keeps
this module path so internal imports don't churn; the fork is distinguished
by its repo location (`forge.wilddragon.net/zgaetano/datarhei-dragonfork-core`)
and branch history, not its Go module identity.
- Toolchain: Go 1.22.8, FFmpeg 4.4.2 in the sandbox. FFmpeg 6.x recommended
for publishers in Task 10; 4.4.2 is sufficient for the PoC (libx264 +
libopus + RTP muxer all present).
- `go build ./...` on the clean fork: succeeds.
- `go test -short ./...` on the clean fork: all packages pass. No upstream
flakes observed.
### Pre-existing state of note
- None flagged.
---
<!-- Add M1 verification notes here after Task 12 succeeds. -->

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,282 @@
# Datarhei - Dragon Fork: Low-Latency WebRTC Output
**Status:** Draft for review
**Author:** Zac (Wild Dragon)
**Date:** 2026-04-16
**Upstream:** [datarhei/core](https://github.com/datarhei/core), [datarhei/restreamer](https://github.com/datarhei/restreamer)
---
## Summary
Fork datarhei Core and add a native WebRTC egress module ("Dragon Fork") that delivers sub-second live video to a small audience (15 viewers) via the WHEP protocol. All existing datarhei ingest paths (RTMP, SRT, RTSP) and outputs (HLS, DASH, SRT, etc.) remain untouched. The new module taps the existing FFmpeg pipeline via local RTP and fans packets to browser clients using [Pion](https://github.com/pion/webrtc).
The fork is branded **"Datarhei - Dragon Fork"** — preserving upstream attribution (Apache 2.0 / MIT) while marking it as a Wild Dragon-branded distribution.
## Goals
- Sub-second end-to-end latency for a 1-to-few live broadcast (target: glass-to-glass p95 < 300ms on RTMP ingest, < 200ms on SRT ingest).
- Zero changes to existing datarhei ingest, transcoding, or non-WebRTC outputs.
- Viewer connects with plain WHEP (HTTP POST with SDP offer, receives SDP answer).
- Additive package — reverting the fork's WebRTC work is a `git revert` away.
- Practical deployment: single binary, single Docker image, no new infrastructure dependencies beyond optional TURN.
## Non-Goals (v1)
- SFU clustering or cascading (irrelevant at 15 viewers).
- Simulcast, SVC, or adaptive bitrate on the WebRTC path.
- LL-HLS / LL-DASH outputs.
- WHIP *ingest* (accepting WebRTC as input). Tracked as a candidate for v2 — it is the only out-of-scope feature that would meaningfully tighten the latency budget further.
- In-memory keyframe cache for faster first-frame rendering (v2 optimization).
- DVR / recording tied to the WebRTC output.
- Bundled TURN server — users run `coturn` themselves if required.
- Any Ant Media Server or Millicast feature beyond WHEP egress (conference rooms, analytics, geo-routing, multi-view, token-gated playback, etc.).
## Context & Constraints
- **Scale:** 15 concurrent viewers per stream, typically 1. Single-node SFU is more than enough.
- **Ingest:** RTMP and SRT (both already supported by datarhei).
- **Publisher control:** Publisher codec settings are controllable. Expected feed: H.264 baseline/constrained-baseline + AAC (OBS default) or Opus where possible.
- **Latency budget:**
- RTMP ingest path: ~100300ms publisher buffering + ~30ms server hop + ~50150ms network + ~30ms decode ⇒ realistic p95 **250500ms**.
- SRT (low-latency mode) ingest path: ~20120ms publisher buffering + same server/network/decode ⇒ realistic p95 **150300ms**.
- **Existing datarhei:** Already deployed and trusted. The fork builds on that trust, it does not replace it.
## Architecture
### Data flow
```
Publisher (OBS / encoder)
│ RTMP or SRT (H.264 + AAC/Opus)
datarhei ingest [existing]
FFmpeg process [existing, orchestrated by datarhei Core]
│ -c:v copy (H.264 passthrough, no re-encode)
│ -c:a libopus (AAC → Opus, ~515ms)
│ -force_key_frames (2s GOP on the webrtc output)
│ -f rtp rtp://127.0.0.1:<video_port>
│ -f rtp rtp://127.0.0.1:<audio_port>
Local UDP sockets (RTP)
┌──────────────────────────────────────┐
│ NEW: core/webrtc module (Pion) │
│ • RTP reader per stream │
│ • Registry: stream_id → source │
│ • WHEP HTTP endpoint │
│ • PeerConnection fan-out │
└──────────────────────────────────────┘
WebRTC peers (browsers, 15)
```
### Why this shape
- **FFmpeg → local RTP → Pion** is the standard integration pattern for attaching WebRTC to a non-WebRTC media server. It reuses datarhei's existing FFmpeg supervision, keeps the new code strictly egress-side, and avoids writing RTP packetization in Go.
- **H.264 passthrough + Opus-only transcode** means no GPU dependency, minimal server CPU, and the smallest achievable added latency on the egress hop.
- **WHEP** (a simple HTTP request/response) sidesteps the complexity of custom WebSocket signaling. It is the protocol Ant Media Server and Millicast both standardized on, and is supported by modern players and browser libraries.
- **Purely additive:** existing ingest, transcode, and non-WebRTC output code paths are unchanged. The only contact with existing code is registering a new URL scheme (`webrtc://`) with the output resolver — a new handler, not a modification of existing handlers. Isolated blast radius.
## Module Design
### Package layout
```
core/webrtc/
config.go # configuration struct + validation
registry.go # stream_id → Source mapping (thread-safe)
source.go # RTP reader from local UDP, fan-out to subscribers
peer.go # PeerConnection lifecycle + track attachment
whep.go # HTTP handlers for POST/DELETE/PATCH /whep/{stream}
ice.go # ICE server + NAT1To1 config
keyframe.go # GOP enforcement helpers
```
### Peer connection lifecycle (WHEP)
1. Viewer sends `POST /whep/{stream_id}` with SDP offer (`Content-Type: application/sdp`).
2. Handler looks up `stream_id` in `Registry`. If missing, return `404 Not Found`.
3. If codec negotiation would fail (viewer does not offer H.264 or Opus), return `406 Not Acceptable` with a body describing the mismatch.
4. If `max_peers_total` would be exceeded, return `503 Service Unavailable`.
5. Create a Pion `PeerConnection`, add two `TrackLocalStaticRTP` tracks (video H.264, audio Opus) with SSRCs matching the source.
6. Set remote description, create answer, set local description, wait for ICE gathering (with a 5s timeout and trickle-ICE support via `PATCH`).
7. Return `201 Created`, `Location: /whep/{stream_id}/{resource_id}`, SDP answer in body.
8. A source goroutine now forwards RTP packets to this peer's tracks.
9. Teardown on either `DELETE /whep/{stream_id}/{resource_id}` or ICE state `disconnected`/`failed`.
### Source fan-out
One goroutine per active stream reads RTP packets from its local UDP socket and writes into an in-memory ring buffer. Each subscribed peer has a goroutine that reads from the ring and writes to its `TrackLocalStaticRTP`. At 15 viewers, overhead is negligible.
### Keyframe strategy
RTP from FFmpeg is one-way, so viewer-originated PLI/FIR cannot be propagated back to the encoder. We enforce a **2-second forced keyframe interval on the WebRTC output** via `-force_key_frames "expr:gte(t,n_forced*2)"`. Worst-case first-frame latency on join is ~2s.
RTCP PLI from viewers is absorbed and logged. Pion's built-in NACK/retransmission handles typical packet-loss recovery transparently.
### ICE / NAT / TURN
- Default STUN servers: `stun:stun.cloudflare.com:3478`, `stun:stun.l.google.com:19302` (overridable).
- Optional TURN: config field accepts one or more TURN URIs with credentials. Not required at target scale but wired through for flexibility.
- Public IP advertised via Pion `SettingEngine.SetNAT1To1IPs` — the operator provides the server's public IP once in config; Pion inserts it into candidates. Avoids requiring a STUN round-trip from the server itself.
## Datarhei Integration
### New output type: `webrtc://`
A new URL scheme recognized by the datarhei Core output resolver. Example process configuration:
```json
{
"id": "myStream",
"input": [{ "address": "{rtmp,name=myStream.stream}", "options": [] }],
"output": [
{ "address": "...existing HLS output..." },
{
"address": "webrtc://internal/myStream",
"options": ["-c:v", "copy", "-an"]
},
{
"address": "webrtc://internal/myStream?track=audio",
"options": ["-c:a", "libopus", "-b:a", "128k", "-vn"]
}
]
}
```
### Resolver behavior
On process start, each `webrtc://` output triggers the resolver to:
1. Allocate a local UDP port from the configured `udp_port_range`.
2. Register `(stream_id, track, ssrc, port)` in `webrtc.Registry`.
3. Rewrite the FFmpeg output from `webrtc://internal/{stream_id}` to `rtp://127.0.0.1:<port>?pkt_size=1200`, and (for video tracks only) prepend `-force_key_frames "expr:gte(t,n_forced*2)"` to the options list. Both transformations are done by the resolver — the user's process JSON never contains these details.
On process stop (clean exit, crash, or user stop):
1. Tear down all peer connections subscribed to this stream (RTCP BYE + `PeerConnection.Close()`).
2. Deregister from the registry.
3. Release UDP ports to the pool.
Hooked into datarhei's existing process lifecycle events — no new supervision logic required.
### API endpoints
| Method | Path | Purpose | Auth |
|---|---|---|---|
| `POST` | `/whep/{stream_id}` | Subscribe (SDP offer in, SDP answer out) | Public or token-gated (see Open Questions) |
| `DELETE` | `/whep/{stream_id}/{resource_id}` | Unsubscribe | — |
| `PATCH` | `/whep/{stream_id}/{resource_id}` | Trickle ICE | — |
| `GET` | `/api/v3/webrtc/streams` | List active streams + subscriber counts | Admin |
| `GET` | `/api/v3/webrtc/streams/{id}/peers` | Per-stream peer stats | Admin |
### Configuration
Added to datarhei Core's config (HCL/JSON; example in HCL):
```hcl
webrtc {
enabled = true
whep_listen = ":8787"
public_ip = "203.0.113.10"
udp_port_range = "10000-10100"
ice_servers = ["stun:stun.cloudflare.com:3478"]
max_peers_total = 32
}
```
### UI
**Out of scope for v1.** API-only first. The Restreamer Vue UI gets a minor addition in a later release: a "WebRTC" checkbox on each stream, the WHEP URL, and a live viewer count. UI work is decoupled and non-blocking.
## Error Handling & Edge Cases
| Scenario | Behavior |
|---|---|
| Publisher disconnects / FFmpeg exits | Registry emits "source removed"; all peers for that stream torn down with RTCP BYE; WHEP returns 404 until stream restarts. |
| Viewer disconnects (tab close, network) | Pion `OnConnectionStateChange` → cleanup; peer unsubscribed; no server-side retry. |
| First-frame on join | Up to ~2s (forced-GOP interval). Acceptable for broadcast. v2 optimization: in-memory keyframe cache. |
| Viewer codec mismatch | `406 Not Acceptable` with body describing mismatch. In practice never hit — every modern browser supports H.264 baseline + Opus via WebRTC. |
| UDP port exhaustion | Process start fails with clear error. At target scale (≤5 streams) irrelevant. |
| Peer cap reached | `503 Service Unavailable` on new WHEP POSTs. Hard safety rail. |
| ICE gathering timeout | 5s limit; return `500` with diagnostic error message. |
| TURN credential failure | Logged; surfaced in `/api/v3/webrtc/streams` so admins see it without tailing logs. |
| FFmpeg-to-UDP push failure (port conflict, etc.) | Piggybacks on existing datarhei FFmpeg supervision (restart with backoff). No new logic. |
## Testing
### Unit tests (`core/webrtc`)
- `registry`: register/deregister, concurrent access, not-found paths.
- `source`: RTP reading, fan-out to N subscribers, subscriber cleanup on close.
- `whep`: handlers with mock peer-connection factory; verify `201`/`404`/`406`/`503`; SDP parse happy path + malformed input.
- `ice`: config → Pion `SettingEngine` translation.
Coverage target: ~70% on this package. Not chasing 100% — some Pion paths are impractical to mock meaningfully.
### Integration tests (end-to-end, in CI)
1. Start forked datarhei Core in-process.
2. Launch an FFmpeg publisher sending a deterministic test pattern (`testsrc2` with burned-in frame counter + timecode) over RTMP.
3. Configure a process with `webrtc://` outputs.
4. Use a Pion-based test WHEP client (headless — no browser) to subscribe.
5. Assert: connection establishes, RTP arrives, keyframe seen within 3s of subscribe.
### Latency measurement (CI pass/fail)
- Publisher embeds a frame counter via `drawtext` in `testsrc2`.
- Test client decodes and extracts the frame counter (simple pixel sampling against a known bounding box — lighter than full OCR, no new dependency).
- Latency per frame = wall-clock at decode publisher wall-clock at encode.
- 60-second run; record p50/p95/p99.
- CI gate:
- RTMP ingest path: p95 < 300ms.
- SRT ingest path: p95 < 200ms.
### Browser smoke test (manual)
A `test/whep-player.html` — plain HTML + `RTCPeerConnection` + a WHEP URL input. Used for real-browser / real-network human verification. Documented in `TESTING.md`, not automated.
### Load test (one-shot, not CI)
Script opens 5 concurrent WHEP peers against one stream, holds 10 minutes, reports CPU/memory/packet-loss/jitter. Run once before cutting v1.
## Milestones
| # | Scope | Duration | Exit criteria |
|---|---|---|---|
| M1 | Media-path PoC (hardcoded stream, manual FFmpeg, test WHEP client, no datarhei integration) | 12 weeks | 1 publisher → 1 viewer, decoded video |
| M2 | Process integration (`webrtc://` resolver, config, WHEP served from Core, lifecycle hooks) | 1 week | Standard datarhei process JSON with `webrtc://` output works end-to-end |
| M3 | Robustness + multi-viewer (fan-out, teardown paths, keyframe enforcement, error codes, admin API) | 1 week | 5 concurrent viewers, all error paths correct, clean teardown |
| M4 | Tests & CI (unit, integration, latency p95 gate, browser smoke, `TESTING.md`) | 35 days | CI green, latency targets met |
| M5 | Dragon Fork branding & release (UI logo swap, README, `NOTICE`/`CREDITS`, Docker image, tag `v0.1.0-dragonfork`) | 12 days | Publishable release |
**Total realistic scope: ~45 weeks of focused work.**
## Branding
- **Project name:** Datarhei - Dragon Fork
- **Go module path:** `github.com/wilddragon/datarhei-dragonfork-core` (placeholder — confirm at M5)
- **Docker images:** `wilddragon/datarhei-dragonfork-core`, `wilddragon/datarhei-dragonfork-restreamer`
- **Logo asset:** Wild Dragon mark, used as Restreamer UI logo, README header, and any shipped WHEP viewer page
- **Upstream attribution:** `NOTICE` / `CREDITS` file referencing datarhei Core (Apache 2.0) and Restreamer (MIT); README header clearly labels the project as a fork.
## Open Questions (to resolve during M1M2)
1. **WHEP auth model.** Public endpoint vs. simple bearer token vs. time-limited signed URL. Not decided; for an invite-only audience of 15 viewers, a shared bearer token is probably fine. Can revisit once M1 is working.
2. **Exact Go module path.** Depends on repo location.
3. **Restreamer UI version target.** Confirm which UI repo/branch to rebrand at M5.
## References
- [datarhei/core](https://github.com/datarhei/core) (Apache 2.0)
- [datarhei/restreamer](https://github.com/datarhei/restreamer) (MIT)
- [Pion WebRTC](https://github.com/pion/webrtc) (MIT)
- [WHEP draft spec (IETF)](https://datatracker.ietf.org/doc/draft-murillo-whep/)
- [WHIP draft spec (IETF)](https://datatracker.ietf.org/doc/draft-ietf-wish-whip/) — referenced for the future v2 ingest path
- [Ant Media Server Community](https://github.com/ant-media/Ant-Media-Server) — prior-art reference for WHEP/WHIP in a Java SFU
- [OvenMediaEngine](https://github.com/AirenSoft/OvenMediaEngine) — prior-art reference for sub-second WebRTC broadcast