M3: Robustness, multi-viewer, full error matrix #5

Closed
zgaetano wants to merge 0 commits from m3-robustness into m2-webrtc-core-integration
Owner

Stacked on top of PR #4. Targets m2-webrtc-core-integration so M2's review cadence isn't blocked.

Acceptance criterion (from design)

"5 concurrent viewers, all error paths correct, clean teardown."

Met in this PR. Verified by the new TestIntegration_FiveViewerFanout (5 Pion subscribers attached to one process, each receives RTP, every peer is reaped on onProcessStop).

Multi-viewer correctness

  • streamID → resourceID → Peer two-level index in the Handler (was a single flat map). Per-stream and total caps are now separately enforceable.
  • Peer.Done() channel — read-only view of the existing close signal. Lets external indexes await teardown without polling.
  • awaitPeerClose goroutine per peer — when Pion's OnConnectionStateChange triggers peer.Close() after ICE failure/disconnect, the index entry is removed and the counter ticks back down. No leaks if a browser rage-quits.
  • Subsystem.SetTeardownHook(fn) — process stop drives all subscribed peers to close before their Sources go away, closing the "subscribers fan out into a closed channel" race the design's §6 calls out.
  • Per-stream peer cap alongside the existing total cap. Defaults: 8/stream, total from corewebrtc.DefaultConfig (32). Overridable via NewHandlerWithCaps.

Error matrix (design §6)

Status Trigger Was Now
406 Offer SDP missing H264 or Opus rtpmap (silent black frame) descriptive 406
504 ICE gathering timeout (5s ceiling) swallowed as 500 504 with body
204 DELETE for unknown resource id 404 204 (idempotent per WHEP spec)
503 Per-stream peer cap reached n/a distinct body from total-cap 503

400/404/201 paths unchanged.

WHEP spec compatibility additions

  • PATCH /whep/:id/:resource for trickle-ICE; reads application/trickle-ice-sdpfrag, forwards each a=candidate: line via peer.AddICECandidate.
  • OPTIONS preflight answered on every WHEP route.
  • CORS headers (Access-Control-Allow-{Origin,Methods,Headers} + Access-Control-Expose-Headers: Location, ETag) on every response, so browser-side WHEP players living on a different origin can subscribe.
  • ETag header on the Subscribe response (some WHEP servers expect it for resumption / If-Match).

Test coverage

ok  github.com/datarhei/core/v16/core/webrtc       1.066s
ok  github.com/datarhei/core/v16/app/webrtc        1.097s
ok  github.com/datarhei/core/v16/config            1.010s
ok  github.com/datarhei/core/v16/restream         15.113s
ok  github.com/datarhei/core/v16/http/handler/api  1.841s

New tests:

Test Surface
TestIntegration_FiveViewerFanout end-to-end 5-peer fanout + teardown
TestSubsystem_TeardownHookFiresOnProcessStop callback wiring
TestHandler_Subscribe_406OnCodecMismatch 406 path
TestHandler_Subscribe_503OnTotalCap total cap
TestHandler_Subscribe_503OnPerStreamCap per-stream cap
TestHandler_Trickle_404WhenUnknown PATCH unknown
TestHandler_PreflightCORS OPTIONS + CORS
TestHandler_RegisterMountsAllRoutes all 5 routes mounted
TestHandler_Close_DrainsPeers shutdown path
TestRequireH264AndOpus SDP scanner (table-driven)

Updated:

  • TestHandler_Unsubscribe_204WhenUnknown (was _404WhenUnknown).

Out of scope for this PR

  • Per-process auth — design's open question; M3 originally listed this but I'm punting it to a follow-on so this PR stays reviewable. The existing /api/v3 group still inherits Core's JWT, so it isn't unprotected.
  • Issue #2 (-map 0:v:0/0:a:0 hardcoding) — separate small PR.
  • Issue #3 (Swagger doesn't list /whep) — M4 docs polish.
  • RTCP PLI drain + counter — punted to M4 with the metrics work.

Commits

de4b215 chore: ignore the whep-client test binary (top-level build artifact)
8d60cbd test(app/webrtc): 5-viewer fanout integration + teardown-hook unit test
07b6b43 test(app/webrtc): M3 unit tests for error matrix + Register + CORS
4d2f11d feat(app/webrtc): M3 robustness — error matrix, per-stream index, PATCH, CORS
3abd4d8 feat(app/webrtc): broadcast process-stop via SetTeardownHook
4f84c72 feat(core/webrtc): expose Peer.Done() channel + AddICECandidate

Co-authored with Claude Opus 4.7.

Stacked on top of [PR #4](https://forge.wilddragon.net/zgaetano/datarhei-dragonfork-core/pulls/4). Targets `m2-webrtc-core-integration` so M2's review cadence isn't blocked. ## Acceptance criterion (from design) > *"5 concurrent viewers, all error paths correct, clean teardown."* Met in this PR. Verified by the new `TestIntegration_FiveViewerFanout` (5 Pion subscribers attached to one process, each receives RTP, every peer is reaped on `onProcessStop`). ## Multi-viewer correctness - **`streamID → resourceID → Peer`** two-level index in the Handler (was a single flat map). Per-stream and total caps are now separately enforceable. - **`Peer.Done()`** channel — read-only view of the existing close signal. Lets external indexes await teardown without polling. - **`awaitPeerClose` goroutine per peer** — when Pion's `OnConnectionStateChange` triggers `peer.Close()` after ICE failure/disconnect, the index entry is removed and the counter ticks back down. No leaks if a browser rage-quits. - **`Subsystem.SetTeardownHook(fn)`** — process stop drives all subscribed peers to close *before* their Sources go away, closing the "subscribers fan out into a closed channel" race the design's §6 calls out. - **Per-stream peer cap** alongside the existing total cap. Defaults: 8/stream, total from `corewebrtc.DefaultConfig` (32). Overridable via `NewHandlerWithCaps`. ## Error matrix (design §6) | Status | Trigger | Was | Now | | :----: | ------- | :-: | :-: | | 406 | Offer SDP missing H264 or Opus rtpmap | (silent black frame) | descriptive 406 | | 504 | ICE gathering timeout (5s ceiling) | swallowed as 500 | 504 with body | | 204 | DELETE for unknown resource id | 404 | 204 (idempotent per WHEP spec) | | 503 | Per-stream peer cap reached | n/a | distinct body from total-cap 503 | 400/404/201 paths unchanged. ## WHEP spec compatibility additions - **`PATCH /whep/:id/:resource`** for trickle-ICE; reads `application/trickle-ice-sdpfrag`, forwards each `a=candidate:` line via `peer.AddICECandidate`. - **`OPTIONS` preflight** answered on every WHEP route. - **CORS headers** (`Access-Control-Allow-{Origin,Methods,Headers}` + `Access-Control-Expose-Headers: Location, ETag`) on every response, so browser-side WHEP players living on a different origin can subscribe. - **`ETag` header** on the Subscribe response (some WHEP servers expect it for resumption / If-Match). ## Test coverage ``` ok github.com/datarhei/core/v16/core/webrtc 1.066s ok github.com/datarhei/core/v16/app/webrtc 1.097s ok github.com/datarhei/core/v16/config 1.010s ok github.com/datarhei/core/v16/restream 15.113s ok github.com/datarhei/core/v16/http/handler/api 1.841s ``` New tests: | Test | Surface | | ---- | ------- | | `TestIntegration_FiveViewerFanout` | end-to-end 5-peer fanout + teardown | | `TestSubsystem_TeardownHookFiresOnProcessStop` | callback wiring | | `TestHandler_Subscribe_406OnCodecMismatch` | 406 path | | `TestHandler_Subscribe_503OnTotalCap` | total cap | | `TestHandler_Subscribe_503OnPerStreamCap` | per-stream cap | | `TestHandler_Trickle_404WhenUnknown` | PATCH unknown | | `TestHandler_PreflightCORS` | OPTIONS + CORS | | `TestHandler_RegisterMountsAllRoutes` | all 5 routes mounted | | `TestHandler_Close_DrainsPeers` | shutdown path | | `TestRequireH264AndOpus` | SDP scanner (table-driven) | Updated: - `TestHandler_Unsubscribe_204WhenUnknown` (was `_404WhenUnknown`). ## Out of scope for this PR - **Per-process auth** — design's open question; M3 originally listed this but I'm punting it to a follow-on so this PR stays reviewable. The existing `/api/v3` group still inherits Core's JWT, so it isn't unprotected. - **Issue [#2](https://forge.wilddragon.net/zgaetano/datarhei-dragonfork-core/issues/2)** (`-map 0:v:0`/`0:a:0` hardcoding) — separate small PR. - **Issue [#3](https://forge.wilddragon.net/zgaetano/datarhei-dragonfork-core/issues/3)** (Swagger doesn't list /whep) — M4 docs polish. - **RTCP PLI drain + counter** — punted to M4 with the metrics work. ## Commits ``` de4b215 chore: ignore the whep-client test binary (top-level build artifact) 8d60cbd test(app/webrtc): 5-viewer fanout integration + teardown-hook unit test 07b6b43 test(app/webrtc): M3 unit tests for error matrix + Register + CORS 4d2f11d feat(app/webrtc): M3 robustness — error matrix, per-stream index, PATCH, CORS 3abd4d8 feat(app/webrtc): broadcast process-stop via SetTeardownHook 4f84c72 feat(core/webrtc): expose Peer.Done() channel + AddICECandidate ``` Co-authored with Claude Opus 4.7.
zgaetano added 6 commits 2026-05-03 07:24:37 -04:00
Two small additions to support the M3 handler:

- Peer.Done() — read-only view of the existing 'done' channel,
  closed on Close(). Lets external indexes (Handler, admin API)
  await peer teardown without polling.
- Peer.AddICECandidate — passthrough so the WHEP PATCH handler
  can forward trickle-ICE candidates without reaching into the
  PeerConnection directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Subsystem.SetTeardownHook installs a callback the subsystem invokes
just before closing per-stream Sources in onProcessStop. Used by the
WHEP Handler in M3 to drain its per-stream peer index before the
underlying Sources go away — closes the 'subscribers fan out into a
closed channel' race the design's §6 error matrix calls out as
'Publisher disconnects / FFmpeg exits'.

Single consumer by design (one subsystem, one handler). Calling
SetTeardownHook again replaces the previous callback; nil detaches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Major Handler rewrite implementing the design's M3 acceptance
criteria ('5 concurrent viewers, all error paths correct, clean
teardown'):

Multi-viewer correctness:
- streamID -> resourceID -> Peer two-level index (was flat)
- per-stream peer cap alongside total cap, defaults match the
  design's '5–8 viewer' target (8/stream, total from corewebrtc)
- per-peer awaitPeerClose goroutine watches Peer.Done() so ICE
  failures yank the index entry + decrement the counter (no leaks)
- tearDownStreamPeers callback (registered with Subsystem in
  NewHandler) drives all peer closes when the source process stops

Error matrix from design §6:
- 406 on codec mismatch (offer missing H264 or Opus rtpmap)
- 504 on ICE gathering timeout (passthrough from CreatePeerFromSources)
- 204 on DELETE unknown resource (idempotent per WHEP spec; was 404)
- 503 on per-stream cap reached (separate body from total-cap 503)
- 400 on missing/empty body (unchanged)
- 404 on unknown stream (unchanged)

WHEP spec compatibility:
- PATCH /whep/:id/:resource for trickle-ICE
- OPTIONS preflight on every WHEP path
- CORS Allow-Origin/Methods/Headers + Expose-Headers (Location, ETag)
- ETag header on Subscribe response

Defensive nil-peer guards in tearDown / Close paths so a partial
state doesn't panic.

Refactor: 134 -> 341 lines on handler.go but the surface is the
same (NewHandler/Register/Subscribe/Unsubscribe/Close); existing
callers continue to work. Pre-M3 test 'Unsubscribe_404WhenUnknown'
renamed and updated to the new 204 expectation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Covers each new code path that the design's §6 table requires:
- Subscribe -> 406 on non-H264 / non-Opus offer (TestHandler_Subscribe_406OnCodecMismatch)
- Subscribe -> 503 when total cap exhausted (TestHandler_Subscribe_503OnTotalCap)
- Subscribe -> 503 when per-stream cap exhausted (TestHandler_Subscribe_503OnPerStreamCap)
- Trickle -> 404 on unknown resource (TestHandler_Trickle_404WhenUnknown)
- preflight -> 204 + CORS headers (TestHandler_PreflightCORS)
- Register installs all 5 routes (TestHandler_RegisterMountsAllRoutes)
- Close drains the index without panicking (TestHandler_Close_DrainsPeers)
- requireH264AndOpus table-driven (TestRequireH264AndOpus)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
TestIntegration_FiveViewerFanout drives the M3 acceptance criterion
in the wide direction: spin up the subsystem, register one process,
attach 5 Pion subscribers in parallel via the real Echo handler,
spray synthetic RTP at the allocated UDP ports, and assert each
subscriber's video + audio track receive at least one packet inside
a 15s window. After onProcessStop, the per-stream peer index must
drain to zero within 3s.

TestSubsystem_TeardownHookFiresOnProcessStop is the unit-level
counterpart — confirms the callback registered via
SetTeardownHook actually fires when a process is torn down, even
without a full Pion handshake.

Together these cover the acceptance language: '5 concurrent viewers,
all error paths correct, clean teardown'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chore: ignore the whep-client test binary (top-level build artifact)
Some checks failed
tests / build (push) Failing after 2s
tests / build (pull_request) Failing after 2s
de4b215123
Author
Owner

Merged into main via direct push as part of the v0.1.0-dragonfork release. Branch commits are reachable from main; closing this PR. Release: https://forge.wilddragon.net/zgaetano/datarhei-dragonfork-core/releases/tag/v0.1.0-dragonfork

Merged into `main` via direct push as part of the v0.1.0-dragonfork release. Branch commits are reachable from main; closing this PR. Release: https://forge.wilddragon.net/zgaetano/datarhei-dragonfork-core/releases/tag/v0.1.0-dragonfork
zgaetano closed this pull request 2026-05-03 08:28:57 -04:00
Some checks failed
tests / build (push) Failing after 2s
tests / build (pull_request) Failing after 2s

Pull request closed

Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: zgaetano/datarhei-dragonfork-core#5
No description provided.