Commit graph

2 commits

Author SHA1 Message Date
b7afd0f08a ci(webrtc): server-hop latency p95 gate
Some checks failed
ci / vet + build (push) Successful in 9m54s
ci / vet + build (pull_request) Successful in 9m49s
ci / race tests (push) Failing after 8m1s
ci / WebRTC smoke (5-viewer fanout) (push) Successful in 9m45s
ci / WebRTC latency p95 gate (push) Successful in 10m3s
ci / race tests (pull_request) Failing after 7m59s
ci / WebRTC smoke (5-viewer fanout) (pull_request) Successful in 9m45s
ci / WebRTC latency p95 gate (pull_request) Successful in 10m4s
Adds an end-to-end RTP-arrival latency probe that runs as a dedicated
CI job and asserts p95 < 50ms.

Implementation
--------------
A build-tagged test (-tags latency, off by default) sends 1000
synthetic RTP packets at 60Hz into corewebrtc.Source and reads them
back via a Pion subscriber's track.ReadRTP(). Each packet's payload
starts with the publisher's UnixNano send time; the subscriber diffs
against time.Now() at arrival and accumulates p50/p95/p99.

This exercises every link of the egress hop: Source UDP read,
subscriber fan-out, forwardRTPSplit, Pion's TrackLocalStaticRTP
write, DTLS-SRTP encrypt, ICE socket write, decrypt at the
subscriber, RTP unmarshal at ReadRTP. Pure server-side; no FFmpeg
or codecs involved.

Why not glass-to-glass
----------------------
The design's §7 calls for FFmpeg drawtext frame counters + decode-
side pixel sampling, p95<300ms RTMP / <200ms SRT. Implementing that
in pure Go needs a cgo H.264 decoder or an FFmpeg sidecar pipe — a
significantly bigger lift for a marginal regression-detection win
(encode/decode latency is roughly fixed by the codec stack and
isn't moved by Core code changes). The server-hop measurement
captures everything Core code can actually regress.

Threshold
---------
50ms p95. Locally observed on a quiet host:
  p50=110µs, p95=237µs, p99=318µs.
The 50ms gate is ~200x headroom — generous enough to absorb CI
runner noise without false alarms, tight enough to catch a real
slowdown.

Race-clean: latencySamples uses a sync.Mutex around the slice append
(initial draft had a slice racing with the receive goroutine; vet
caught it).

Documented in test/TESTING.md and wired to .forgejo/workflows/test.yml
as the latency-gate job (depends on lint-and-vet, parallel with test
and webrtc-smoke).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-03 12:18:57 +00:00
927ccc6ced ci+test: forgejo workflow, browser WHEP player, TESTING.md (M4 part 1)
Some checks failed
ci / vet + build (push) Successful in 9m50s
ci / vet + build (pull_request) Successful in 9m49s
ci / race tests (push) Failing after 8m4s
ci / WebRTC smoke (5-viewer fanout) (push) Successful in 9m48s
ci / race tests (pull_request) Failing after 6m28s
ci / WebRTC smoke (5-viewer fanout) (pull_request) Successful in 9m46s
Three artifacts that close out the easier half of the M4 milestone:

1. .forgejo/workflows/test.yml — CI on every push and PR. Three jobs:
     - lint-and-vet: go vet + go build (~30s)
     - test:        go test -race -short ./... + a no-race coverage
                    pass that uploads coverage.out as an artifact
     - webrtc-smoke: TestIntegration_FiveViewerFanout and the rest of
                     the WebRTC subsystem tests in isolation, so a
                     failure on the egress path stays readable in the
                     log.
   Pinned to Go 1.24 to match go.mod. The forge has a
   forgejo-runner sibling container; this YAML uses GitHub Actions
   syntax which Forgejo Actions accepts unchanged.

2. test/whep-player.html — self-contained browser WHEP subscriber for
   manual smoke testing. RTCPeerConnection (recvonly V+A) + fetch()
   POST/DELETE/PATCH against /api/v3/whep/:id, ICE/PC state pills,
   inbound-bitrate sampling at 1 Hz, codec hint pulled from the answer
   SDP, JWT token field, ?url=&token= shareable query string. No
   external deps; works from file:// or any static host.

3. test/TESTING.md — short doc that ties together the in-process race
   tests, the browser player, and the existing Pion CLI helper at
   test/whep-client/. Notes the latency p95 gate as a follow-up.

Latency gate (FFmpeg drawtext frame counter + decode-side pixel
sampling, p95 < 300ms RTMP / < 200ms SRT) is queued for a separate
PR — it's a several-hundred-line addition in its own right and
shouldn't block CI from landing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-03 12:14:43 +00:00