datarhei-dragonfork-core/docs/design/2026-04-17-datarhei-dragon-fork-m2-webrtc-core-integration.md

324 lines
15 KiB
Markdown
Raw Permalink Normal View History

# M2 — WebRTC into datarhei Core proper
**Status:** Design approved, implementation pending
**Date:** 2026-04-17
**Author:** Zac (zgaetano@wilddragon.net), Dragon Fork
**Depends on:** M1 (`2026-04-16-datarhei-dragon-fork-m1-webrtc-poc.md`)
**Branch:** `m2-webrtc-core-integration`
## 1. Purpose
M1 produced a standalone `cmd/webrtc-poc` binary that proved the Pion-based
WHEP egress path end-to-end on TrueNAS. M2 promotes that work into the
datarhei Core binary so WebRTC becomes a first-class output alongside
RTMP, SRT, and HLS, surfaced in the core-ui dashboard.
After M2 a user can:
1. Create or edit a process in core-ui.
2. Toggle a "WebRTC" switch on that process's config.
3. Save → Core restarts the process with an extra RTP output leg.
4. Open the process's "Live (WebRTC)" tab and watch the feed in the
browser with sub-second latency, authenticated by the user's JWT.
Out of scope for M2 (explicit):
- Public / unauthenticated embeds (handled in M3 via signed URLs).
- A separate "broadcast center" dashboard page (per-process tab is enough).
- Lazy / on-demand Source binding — eager binding only.
- WHIP ingest — that's M4.
## 2. High-level architecture
```
┌────────────────────────────────────────────┐
│ datarhei Core │
│ │
FFmpeg (per │ ┌──────────────┐ ┌──────────────┐ │
process, │ │ restream │─────▶│ app/webrtc │ │
spawned by │──▶│ │◀─────│ (NEW) │ │
restream) ───┐ │ │ - lifecycle │hooks │ │ │
│ │ │ - AppendOut │ │ - registry │ │
│ │ │ - config │ │ - sources │ │
│ │ │ (now incl. │ │ - PeerFactory│ │
│ │ │ WebRTC) │ │ - WHEP mux │ │
│ │ └──────────────┘ └──────┬───────┘ │
│ │ │ │
udp:// │ │ ┌──────────────┐ │ │
127.0.0.1: └─▶│ │ core/webrtc │◀────uses────┘ │
<auto>rtp │ │ (from M1, │ │
│ │ unchanged) │ ┌────────────────┐ │
│ └──────────────┘ │ http/server │ │
│ │ │ │
│ │ mounts │ │
│ │ /api/v3/process│ │
│ │ /:id/whep │ │
│ └────────┬───────┘ │
└────────────────────────────────┼───────────┘
(DTLS-SRTP over ICE) │
Browser (core-ui
player tab, RTCPeer)
```
Three boxes matter:
- **existing `restream`** — grows two tiny hooks.
- **existing `core/webrtc`** (from M1) — unchanged.
- **new `app/webrtc`** — the glue subsystem.
## 3. Key decisions (settled during brainstorming)
| # | Decision | Choice |
|---|----------|--------|
| 1 | Scope | Backend + full UI with embedded player |
| 2 | Stream addressing | `/whep/{processID}` — per-process |
| 3 | HTTP listener | Under Core's `/api/v3` group (inherits JWT) |
| 4 | Viewer auth | JWT only in M2 — public embeds are M3 |
| 5 | FFmpeg wiring | Auto-inject UDP RTP output; re-encode when needed |
| 6 | Enable state | Field on `restream.Config.WebRTC` |
| 7 | UI surface | New "Live (WebRTC)" tab on process detail view |
| 8 | Lifecycle | Eager — Source bound when process starts |
| 9 | Code placement | New `app/webrtc` sibling subsystem (not inside restream) |
## 4. Components
### 4.1 Config — `config/data.go` + `restream/app/process.go`
Per-process:
```go
// restream/app/process.go — new sibling of ConfigIO on Config
type ConfigWebRTC struct {
Enabled bool // master switch for this process
VideoPT uint8 // default 102 (H.264)
AudioPT uint8 // default 111 (Opus)
ForceTranscode bool // default false — true => always re-encode
}
```
Global (Core config, one block):
```go
// config/data.go
type DataWebRTC struct {
Enable bool // master feature flag; default false for safety
PublicIP string // NAT1To1 / ICE host candidate rewrite (e.g. LAN IP)
NAT1To1IPs []string // advanced: multiple public IPs
UDPMuxPort int // optional: single UDP port for all ICE traffic
// (0 = ephemeral per peer, default)
}
```
Registered through the existing `vars.Register` mechanism in `config/config.go`.
### 4.2 New package — `app/webrtc/`
| File | Responsibility |
|------|----------------|
| `subsystem.go` | `type WebRTC struct` with `Start()` / `Stop()`; owns the `core/webrtc.Registry` and a single `core/webrtc.PeerFactory`. Implements the same shape as other Core subsystems. |
| `lifecycle.go` | `OnProcessStart(id, cfg)` / `OnProcessStop(id)` callbacks registered with restream. Allocates a UDP port, calls `restream.AppendOutput`, binds a `core/webrtc.Source`, registers it. |
| `portalloc.go` | `Alloc() (int, error)` — binds `:0` on loopback, reads the port, closes the listener, returns the number. Race window is microseconds; `NewSourceOn` re-binds immediately. If the rebind fails (rare: another process grabbed the port in the gap), `OnStart` returns the error, restream aborts the start, operator retries. Tested with 100× tight-loop. |
| `ffmpeg_args.go` | `BuildArgs(cfg ConfigWebRTC, port int) []string` — emits the `-map`, `-c:v`, `-c:a`, `-f rtp`, `udp://127.0.0.1:PORT?pkt_size=1316` fragments. Branches on `ForceTranscode`. |
| `handler.go` | HTTP handler for WHEP — wraps the M1 `core/webrtc.NewWHEPHandler`, but looks up the Source by `processID` path param. Adds `DELETE /api/v3/process/:id/whep/:peerid`. |
### 4.3 Two additions to `restream`
1. **Lifecycle callback pair.** Added as fields on the restream manager:
```go
type ProcessHook func(id string, cfg *app.Config) error
type ProcessHooks struct {
OnStart ProcessHook // fires after args are assembled, before exec
OnStop ProcessHook // fires after wait() returns
}
```
Single consumer is fine — no event bus yet. `app/webrtc` registers itself at subsystem start.
2. **`AppendOutput(id string, extra []string) error`** — mutates the *pending*
FFmpeg args for a process that has fired `OnStart` but has not yet exec'd.
Inside `OnStart`, the subsystem calls `AppendOutput` to add the
`-f rtp udp://…` fragment; restream then exec's with the augmented
args. Outside the `OnStart` window `AppendOutput` returns an error —
Core does not mutate running FFmpeg processes.
These two additions are useful beyond WebRTC (stats consumers, future
sidecar modules), so the surface cost is justified.
### 4.4 One route in `http/server.go`
Inside the existing `/api/v3` group (inherits JWT auth):
```go
api.POST("/process/:id/whep", webrtcHandler.Subscribe)
api.DELETE("/process/:id/whep/:peerid", webrtcHandler.Unsubscribe)
```
### 4.5 UI — `core-ui/src/views/Edit/LiveTab.jsx` (new)
- Shown only when `process.config.webrtc.enabled === true`.
- `<video autoplay muted playsinline />` driven by a small `useWHEP()` hook
that does:
1. `new RTCPeerConnection({ iceServers: [] })`
2. `pc.addTransceiver('video', { direction: 'recvonly' })`
3. `pc.addTransceiver('audio', { direction: 'recvonly' })`
4. `await pc.setLocalDescription(await pc.createOffer())`
5. POST offer SDP to `/api/v3/process/{id}/whep` with the JWT.
6. `pc.setRemoteDescription(answer)`.
7. `pc.ontrack` → attach stream to the `<video>`.
- "Copy WHEP URL" button.
- Status line derived from `pc.connectionState` + `pc.getStats()` (codec, bitrate).
- No external WebRTC dependency — browser-native `RTCPeerConnection`.
## 5. Data flow
### 5.1 Enabling WebRTC (write)
```
core-ui ──PUT /api/v3/process/{id} { ..., config: { webrtc: { enabled: true }}}──▶ http
http ──restream.UpdateProcess(id, cfg)──▶ restream
restream ──persist → stop old → about to exec new──▶ OnProcessStart(id, cfg)
app/webrtc ─port P = Alloc()
app/webrtc ─restream.AppendOutput(id, BuildArgs(cfg.WebRTC, P))
app/webrtc ─NewSourceOn(id, "127.0.0.1", P).Start() → registry[id] = src
restream ─exec ffmpeg with augmented args
```
Ordering guarantee: Source is bound *before* FFmpeg execs. No race window.
### 5.2 WHEP subscribe (read)
```
browser ──POST /api/v3/process/{id}/whep (SDP offer, JWT)──▶ http
http (JWT ok) ──handler.Subscribe──▶ app/webrtc
app/webrtc ─src = registry[id] (404 if absent)
app/webrtc ─peer, answer = factory.NewPeer(src, offer)
app/webrtc ─go forwarder: src.Subscribe(ch) → peer.WriteRTP
http ──201 Created, Location: .../whep/{peerid}, body=answer──▶ browser
browser ──ICE, DTLS-SRTP──▶ peer ──▶ <video>
```
### 5.3 Process stop (teardown)
```
restream ─kill ffmpeg, wait()──▶ OnProcessStop(id)
app/webrtc ─for each peer in peers[id]: peer.Close()
app/webrtc ─src = registry.Remove(id); src.Close()
app/webrtc ─delete peers[id]
```
### 5.4 Disabling WebRTC on a running process
Same as 5.1 in reverse: new cfg has `webrtc.enabled = false`. Restream
persists → stops (fires `OnProcessStop` → 5.3 runs) → starts without RTP leg.
### 5.5 Core restart
Restream enumerates stored configs at boot and starts each process.
`OnProcessStart` fires inside that loop for every `webrtc.enabled = true`
process. WebRTC state rebuilds from the persisted config — no separate
bootstrap path.
## 6. Error handling
| Failure | Surface |
|---------|---------|
| Port alloc fails | `OnProcessStart` returns error → restream aborts start, logs `webrtc: port alloc failed`. Process shows failed in UI. |
| FFmpeg wiring fails (bad codec + !ForceTranscode) | Source binds; RTP counter stays zero. Log after N seconds of silence; expose `RTPPacketsReceived` to UI. |
| WHEP POST for unknown id | `404 stream not found` (same as M1). |
| Peer DELETE unknown peerid | `204 No Content` (idempotent). |
| JWT missing / invalid | `401` — inherited from `/api` group. No code in handler. |
| ICE fails on client | Browser `iceconnectionstatechange = failed` → UI retry button. Server no-op. |
| Subsystem Start fails at boot (bad `PublicIP`, etc.) | Subsystem logs the error and declines to start; the hooks are never registered; restream runs all processes without the RTP leg. Core does **not** exit — WebRTC is non-critical. |
| Subscriber backpressure | Already handled in `core/webrtc.Source` — full channel drops. No change. |
**Design rule:** a WebRTC subsystem failure must not prevent a process's
RTMP/SRT/HLS outputs from running. Hooks wrap their own errors and log;
restream does not abort a start because of a WebRTC problem *unless* the
`AppendOutput` itself fails (wrong args shape — a programming bug, not a
runtime condition).
## 7. Testing strategy
### 7.1 Unit (fast, in-package, no network)
- `app/webrtc/ffmpeg_args_test.go` — table-driven: video-only, audio-only,
both, transcode on/off. Asserts exact arg slice.
- `app/webrtc/portalloc_test.go``Alloc()` returns a port that a
subsequent `ListenUDP` can bind; run 100× to catch races.
- `app/webrtc/lifecycle_test.go` — fake restream calls `OnProcessStart` /
`OnProcessStop`; asserts registry state transitions and Source is closed
exactly once.
### 7.2 Integration (in-process, real HTTP, no FFmpeg)
- `app/api/api_webrtc_whep_test.go` — boot a Core with a fake process that
has `webrtc.enabled=true`; inject synthetic RTP on the allocated port;
POST a WHEP offer using the M1 `test/whep-client.Subscribe` helper (now
imported as a library); assert both tracks receive a packet within 2s.
- `app/api/api_webrtc_auth_test.go` — POST without JWT → 401; POST for
unknown id → 404; DELETE unknown peerid → 204.
- `app/api/config_persist_test.go` — create process with `webrtc.enabled`,
simulate Core restart, assert Source is re-bound and WHEP still works.
### 7.3 End-to-end (manual, TrueNAS)
- Replace the M1 `test/publish.sh` workflow with a real Core process
configured via core-ui (`testsrc2` as input), flip WebRTC on, open the
Live tab, verify the test pattern plays.
- Use `chrome://webrtc-internals` to confirm ICE completes and SRTP is
flowing.
No new test dependencies. `test/whep-client` graduates from binary to
importable helper package.
## 8. Acceptance criteria
M2 is done when, on a fresh TrueNAS deploy of the Core binary:
1. `POST /api/v3/config` with a `webrtc.enable=true` global block succeeds.
2. Creating a process with `config.webrtc.enabled=true` via core-ui
persists and starts.
3. `POST /api/v3/process/{id}/whep` with a valid JWT returns `201` with an
SDP answer, and the connection reaches `iceconnectionstate=connected`.
4. The core-ui "Live (WebRTC)" tab plays video within 3 seconds of opening.
5. Disabling WebRTC in the UI stops the stream and subsequent WHEP POSTs
return `404`.
6. Restarting the Core binary keeps the stream working without manual
reconfiguration.
7. All unit and integration tests pass with `-race`.
## 9. Rollback
Each layer has a rollback lever:
- **Operator:** set global `webrtc.enable = false` in Core config → subsystem
declines to start (no hooks registered); processes run without the RTP
leg; existing RTMP/SRT/HLS unaffected. Core continues to serve normally.
- **Per-process:** toggle `config.webrtc.enabled = false` in the process
config → restream restarts the process without the leg.
- **Code:** the `app/webrtc` subsystem is a single import in `main.go`.
Removing that import and the two restream hook wires restores pre-M2
behavior. `core/webrtc` stays in the tree as inert code.
## 10. Milestones inside M2
Not the full plan — that lives in a separate plan doc after this spec is
approved. This is a sanity breakdown:
1. **Config wiring** — add `DataWebRTC` and `ConfigWebRTC`; tests for
marshal/unmarshal and defaults.
2. **Restream hooks** — add `ProcessHooks` and `AppendOutput`; unit tests
using the existing restream test harness.
3. **`app/webrtc` package** — subsystem, lifecycle, portalloc, ffmpeg_args,
handler; unit tests per the testing strategy.
4. **Core main.go wiring** — instantiate subsystem, register hooks, mount
HTTP route.
5. **Integration tests** — in-process WHEP end-to-end, auth, persistence.
6. **core-ui LiveTab** — new React tab + WHEP hook.
7. **TrueNAS smoke test** — rebuild Core image, redeploy, verify live.
Each milestone ends with a commit. The feature branch is
`m2-webrtc-core-integration` (created from `m1-webrtc-poc`).