Closes the v0.1 observability gap. Eleven new metrics in the dragonfork_webrtc_* namespace (RED-method on the WHEP surface plus state gauges from the WebRTC subsystem), Prom + Grafana containers added to deploy/truenas/core/, four pre-loaded alert rules, one pre-provisioned dashboard. Hybrid instrumentation: direct client_golang in app/webrtc/ for hot-path counters and histograms; snapshot collector in prometheus/webrtc.go for slow-changing gauges. Rationale and trade-offs against the upstream monitor/metric bus pattern documented in the Approach section. Targets v0.2.0-dragonfork.
26 KiB
Datarhei - Dragon Fork: WebRTC Prometheus Metrics
Status: Draft for review Author: Zac (Wild Dragon) Date: 2026-05-03 Predecessors:
2026-04-16-datarhei-dragon-fork-webrtc-design.md2026-04-17-datarhei-dragon-fork-m2-webrtc-core-integration.md- v0.1.0-dragonfork released 2026-05-03
Summary
Add Prometheus instrumentation to Dragon Fork's WebRTC subsystem and ship a collection-and-dashboard stack in the existing TrueNAS deploy bundle. Closes the v0.1 observability gap: the WHEP egress has been running in production since 2026-04-17 with zero per-subsystem signal.
The deliverable is a RED-method dashboard ("rate, errors, duration") that
answers a single operator question — is the WebRTC stack healthy right now?
Eleven new metrics in the dragonfork_webrtc_* namespace, two new containers
(Prometheus + Grafana) in deploy/truenas/core/, four pre-loaded alert rules,
one pre-provisioned dashboard.
Goals
- Operator can answer "is WebRTC healthy right now?" from a single Grafana dashboard, without tailing logs or hitting the API.
- Per-stream drill-down available when the dashboard goes red — labels carry
stream_ideverywhere it's meaningful, neverpeer_id. - Deploy is one-command on a fresh TrueNAS box (
docker compose up -d), matching the existing v0.1 deploy ergonomics. - Backwards-compatible: zero changes to upstream's
/metricspayload. New metrics are purely additive. - Bucket choices and label sets are tuned for the realistic latency ranges observed in v0.1 (server-hop p95 ≈ 240µs, ICE establishment seconds-scale).
Non-Goals
- Alertmanager bundling. Alert rules are loaded into Prometheus but not routed. Paging configuration is too opinionated to ship a default; separate spec if/when paging is wanted.
- Per-peer metric labels. Peer-level forensics (individual session
lifetimes, per-resource teardown reasons) is out of scope.
peer_idis unbounded under churn and risks cardinality bloat. - Federated multi-Core scrape. Single-deploy scrape config only. The
corelabel is set statically todragonfork-truenas. - Latency p95 CI gate via Prometheus. Server-hop latency stays a Go
test gate (
-tags latency); not a Prometheus histogram. - Server-hop microsecond histogram. The 240µs server-hop is well below HTTP request scales and would need its own bucket set; it's already covered by the latency CI test, no need to duplicate in Prom.
- Custom monitor/metric bus integration. Upstream pulls from
monitor/metric.Reader. We diverge — see Module Layout for rationale.
Context
v0.1 surface area:
- WHEP HTTP routes:
POST /api/v3/whep/{id},DELETE /api/v3/whep/{id}/{r},PATCH /api/v3/whep/{id}/{r}, plus adminGET /api/v3/webrtc/streamsandGET /api/v3/webrtc/streams/{id}/peers. - Error matrix in v0.1:
406codec mismatch,503cap reached (split into global vs per-stream in response body),504ICE timeout,204DELETE idempotent,404unknown stream. - Pion-mediated peer connection lifecycle in
app/webrtc/lifecycle.go— ICE state transitions are the natural hook for ICE timing/failure metrics. - FFmpeg RTP output legs supervised by the existing process supervisor; silent leg failure is a known "quietly degrading" risk worth instrumenting.
Existing Prometheus integration (upstream):
prometheus/prometheus.goexposes aMetricsinterface withRegisterand anHTTPHandler(). Single sharedprometheus.Registry.prometheus/restream.gois the reference collector — pulls frommonitor/metric.Readerviametric.Patternqueries, emits viaprometheus.MustNewConstMetric. All upstream collectors carry acorelabel as the first dimension./metricsendpoint already exposed by Core; auth handled at the same layer as the rest of the API.
Approach
Hybrid instrumentation, in two surfaces:
-
Direct
prometheus/client_golanginstrumentation inapp/webrtc/for hot-path counters and histograms (request rate, request duration, ICE establishment duration, error counters by reason). Histograms can't be reconstructed from a scrape-time snapshot, so this is non-negotiable for RED-method. -
Snapshot-style collector in
prometheus/webrtc.gofor slow-changing gauges (active streams, active peers per stream, UDP port pool usage). Calls a newStats()method on the WebRTC subsystem at scrape time.
Both surfaces register against the same prometheus.Registerer exposed by
prometheus.Metrics. No new HTTP endpoint, no new auth path. Both take a
core first-label dimension to match upstream collector convention.
Why not pure snapshot?
Upstream's prometheus/restream.go pulls from a monitor/metric bus that
the FFmpeg supervision layer writes into. We could mirror that for WebRTC
— have app/webrtc/lifecycle.go and handler.go push events onto the bus,
have prometheus/webrtc.go pull them. Two reasons not to:
- Histograms don't fit the pattern. The bus stores point-in-time values
(gauges and counters), not distributions. RED-method needs duration p50
and p95; you'd end up maintaining an in-process sliding-window quantile
estimator inside the WebRTC subsystem, which is more code than just using
client_golang.Histogramdirectly. - The bus is FFmpeg-shaped.
metric.Patternqueries are designed for process-state metrics (process IDs, FFmpeg states). Bolting WebRTC semantics on requires defining new patterns the bus consumers all need to know about, for a payload only the WebRTC collector cares about.
The hybrid keeps each metric type on the cleanest path. The cost is two
patterns in the codebase instead of one — accepted, with a comment in
prometheus/webrtc.go pointing at this rationale so the next contributor
doesn't try to "fix" the divergence.
Why not pure direct?
Pure client_golang everywhere would mean the gauges (active streams,
active peers, UDP ports) sit in app/webrtc/ alongside histograms. Workable,
but loses the "one collector file per subsystem in prometheus/" pattern
that anyone reading the repo's existing structure would expect. Snapshot
gauges are cheap to implement via the existing pattern, so we keep them
where a casual reader would look.
Module Layout
New files
app/webrtc/metrics.go (~150 LOC)
app/webrtc/metrics_test.go (~200 LOC)
prometheus/webrtc.go (~120 LOC)
prometheus/webrtc_test.go (~150 LOC)
deploy/truenas/core/prom/prometheus.yml
deploy/truenas/core/prom/rules/webrtc-alerts.yml
deploy/truenas/core/grafana/provisioning/datasources/prometheus.yml
deploy/truenas/core/grafana/provisioning/dashboards/webrtc.yml
deploy/truenas/core/grafana/dashboards/dragonfork-webrtc-health.json
Modified files
app/webrtc/handler.go — add metric middleware around WHEP routes
app/webrtc/lifecycle.go — record ICE timing in OnConnectionStateChange
app/webrtc/subsystem.go — add Stats() method, instrument process hooks
deploy/truenas/core/docker-compose.yml — add prom + grafana services
deploy/truenas/core/README.md — document new env vars + ports
README.md — quick-start mentions Grafana URL
CHANGELOG.md — v0.2.0-dragonfork section
app/webrtc/metrics.go — direct instrumentation
promauto-registered into the shared registry, exposed as package-level
vars so handler.go and lifecycle.go can increment without dependency
injection. Single Init(reg prometheus.Registerer, core string) called
from subsystem.New after the registry is available.
// Sketch — exact wire format finalized at implementation.
package webrtc
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var histBuckets = []float64{0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10}
type metrics struct {
whepRequests *prometheus.CounterVec // route, code, stream_id
whepRequestDuration *prometheus.HistogramVec // route, stream_id
iceEstablishment *prometheus.HistogramVec // stream_id, result
iceFailures *prometheus.CounterVec // stream_id, reason
codecMismatches *prometheus.CounterVec // stream_id, kind
capRejections *prometheus.CounterVec // stream_id, scope
ffmpegLegFailures *prometheus.CounterVec // stream_id, leg
}
func newMetrics(reg prometheus.Registerer, core string) *metrics {
factory := promauto.With(reg)
return &metrics{
whepRequests: factory.NewCounterVec(prometheus.CounterOpts{
Name: "dragonfork_webrtc_whep_requests_total",
Help: "Count of WHEP requests by route, status code, and stream.",
ConstLabels: prometheus.Labels{"core": core},
}, []string{"route", "code", "stream_id"}),
// ... etc
}
}
The core label is a ConstLabels (set once at construction) rather than a
per-request dimension — matches the upstream collector pattern and avoids
threading it through every call site.
prometheus/webrtc.go — snapshot collector
Standard prometheus.Collector interface (Describe / Collect). Keeps a
reference to a WebRTCStatsSource interface, which the WebRTC subsystem
implements via its Stats() method. Avoids importing app/webrtc from
prometheus/ — the dependency arrow points the right way.
// Sketch.
type WebRTCStatsSource interface {
Stats() WebRTCStats
}
type WebRTCStats struct {
StreamCount int
PeersByStream map[string]int
UDPPortsInUse int
UDPPortsAvailable int
}
type webrtcCollector struct {
core string
source WebRTCStatsSource
activeStreamsDesc *prometheus.Desc
activePeersDesc *prometheus.Desc
udpPortsInUseDesc *prometheus.Desc
udpPortsAvailableDesc *prometheus.Desc
}
func NewWebRTCCollector(core string, source WebRTCStatsSource) prometheus.Collector { ... }
The WebRTCStats type lives in prometheus/webrtc.go (not in app/webrtc/)
so the dependency stays one-directional. The subsystem implements the
interface by satisfying the shape, not by importing from prometheus/.
app/webrtc/subsystem.go — Stats() method
func (s *Subsystem) Stats() prometheus.WebRTCStats {
s.mu.Lock()
defer s.mu.Unlock()
peers := make(map[string]int, len(s.streams))
for id, st := range s.streams {
peers[id] = len(st.peers) // assume peers tracked per-stream
}
return prometheus.WebRTCStats{
StreamCount: len(s.streams),
PeersByStream: peers,
UDPPortsInUse: s.portAlloc.InUse(),
UDPPortsAvailable: s.portAlloc.Available(),
}
}
The existing subsystem tracks streams in s.streams under s.mu. Peer
count per stream needs the per-stream peer index that already exists in
handler.go — the Stats() method consults it via the existing teardown
hook plumbing or a small new accessor on Handler. Pick whichever surface
introduces the smaller blast radius at implementation time.
Metric Inventory
Eleven metrics. Eight new label dimensions across them. ~50 active series at typical 1-5 stream scale.
Direct instrumentation (app/webrtc/metrics.go)
| Name | Type | Labels | Description |
|---|---|---|---|
dragonfork_webrtc_whep_requests_total |
Counter | core, route, code, stream_id | Count of WHEP requests by route+status code. |
dragonfork_webrtc_whep_request_duration_seconds |
Histogram | core, route, stream_id | Server-side WHEP request duration. Buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]. |
dragonfork_webrtc_ice_establishment_duration_seconds |
Histogram | core, stream_id, result | Time from SetLocalDescription to first connected or failed ICE state. Same buckets. |
dragonfork_webrtc_ice_failures_total |
Counter | core, stream_id, reason | ICE failure count. reason ∈ {timeout, disconnected, failed}. |
dragonfork_webrtc_codec_mismatches_total |
Counter | core, stream_id, kind | 406 rejections by kind. kind ∈ {video, audio}. |
dragonfork_webrtc_cap_rejections_total |
Counter | core, stream_id, scope | 503 rejections. scope ∈ {global, stream}. |
dragonfork_webrtc_ffmpeg_leg_failures_total |
Counter | core, stream_id, leg | RTP output leg failures. leg ∈ {video, audio}. |
Snapshot collector (prometheus/webrtc.go)
| Name | Type | Labels | Description |
|---|---|---|---|
dragonfork_webrtc_active_streams |
Gauge | core | Streams currently registered (processes with webrtc.enabled=true running). |
dragonfork_webrtc_active_peers |
Gauge | core, stream_id | Currently subscribed WHEP peers per stream. |
dragonfork_webrtc_udp_ports_in_use |
Gauge | core | UDP ports currently allocated from the pool. |
dragonfork_webrtc_udp_ports_available |
Gauge | core | Pool size minus in-use (explicit for alert friendliness). |
Label rationale
whep_request_duration_secondsdeliberately omitscode— separating distributions per outcome makes p95 noisy, and per-route per-stream p95 is what an operator actually looks at. Errors get visibility through the request-counter ratio.ice_establishment_duration_secondsincludes bothconnectedandfailedresults in the same histogram via theresultlabel — intentionally — so the dashboard can compare success latency to failure-tail latency on the same axis.cap_rejections_totalkeeps thescopelabel because v0.1's response body already splits global vs per-stream rejections; metrics mirror that distinction so the dashboard shows whether to raisemax_peers_totalor just one stream's per-stream cap.ffmpeg_leg_failures_totalis the "quietly degrading" canary — a silent RTP-output-leg failure (port bind, encoder crash) is exactly what the "is it healthy?" framing is meant to catch.
Cardinality budget
At typical scale (5 streams, 3 routes, ~6 status codes seen in practice):
whep_requests_total: 5 × 3 × 6 = 90 series (worst case)whep_request_duration_seconds: 5 × 3 × (8 buckets + sum + count) = 150 seriesice_establishment_duration_seconds: 5 × 2 × 10 = 100 series- All others: 5–15 series each
- Total: <500 active series at 5-stream sustained load
Well within Prometheus's comfort zone. At 15s scrape interval × 15-day retention, on-disk storage ~80MB.
Specifically excluded metrics
- Per-peer session metrics. Listed under non-goals.
- Bytes-out / bandwidth. Pion exposes RTP write bytes via stats; would be useful but pulls peer-level state. Defer to a future v0.3 spec ("WebRTC bandwidth observability") if needed.
- Server-hop latency (FFmpeg → peer). Microsecond scale, already
covered by
-tags latencytest gate, would need its own bucket set.
Deploy Bundle
deploy/truenas/core/docker-compose.yml additions
Two new services on a new bridge network dragonfork-mon. Core continues
on network_mode: host unchanged. The new containers reach Core via
host.docker.internal:${CORE_HTTP_PORT} (Linux Docker resolves this when
extra_hosts: ["host.docker.internal:host-gateway"] is set on the service).
services:
core:
# ... existing definition unchanged
prom:
image: prom/prometheus:v2.55.0
container_name: dragonfork-prom
restart: unless-stopped
networks: [dragonfork-mon]
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- ./prom/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prom/rules:/etc/prometheus/rules:ro
- ./prom-data:/prometheus
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.retention.time=15d
- --storage.tsdb.path=/prometheus
- --web.console.libraries=/usr/share/prometheus/console_libraries
- --web.console.templates=/usr/share/prometheus/consoles
ports:
- "${PROM_PORT:-9090}:9090"
grafana:
image: grafana/grafana-oss:11.3.0
container_name: dragonfork-grafana
restart: unless-stopped
networks: [dragonfork-mon]
depends_on: [prom]
environment:
GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_ADMIN_PASSWORD:?set in .env}"
GF_USERS_ALLOW_SIGN_UP: "false"
GF_AUTH_ANONYMOUS_ENABLED: "false"
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning:ro
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
- ./grafana-data:/var/lib/grafana
ports:
- "${GRAFANA_PORT:-3000}:3000"
networks:
dragonfork-mon:
driver: bridge
prom/prometheus.yml
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
external_labels:
core: dragonfork-truenas
rule_files:
- /etc/prometheus/rules/*.yml
scrape_configs:
- job_name: dragonfork-core
static_configs:
- targets: ["host.docker.internal:8080"]
metrics_path: /metrics
# If API auth is enabled on /metrics, uncomment and provide creds via
# env-substituted file. v0.1 leaves /metrics public by default.
# basic_auth:
# username_file: /run/secrets/prom_basic_user
# password_file: /run/secrets/prom_basic_pass
prom/rules/webrtc-alerts.yml
groups:
- name: dragonfork-webrtc
rules:
- alert: WebRTCWHEPErrorRateHigh
expr: |
sum by (stream_id) (
rate(dragonfork_webrtc_whep_requests_total{code=~"4..|5.."}[5m])
) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "WHEP error rate high on stream {{ $labels.stream_id }}"
description: "Sustained 4xx/5xx rate >0.5/sec for 5m."
- alert: WebRTCICEEstablishmentSlow
expr: |
histogram_quantile(0.95,
sum by (le, stream_id) (
rate(dragonfork_webrtc_ice_establishment_duration_seconds_bucket[10m])
)
) > 3
for: 10m
labels:
severity: warning
annotations:
summary: "ICE establishment p95 >3s on {{ $labels.stream_id }}"
- alert: WebRTCICEFailureRateHigh
expr: |
sum by (stream_id) (rate(dragonfork_webrtc_ice_failures_total[5m])) > 0.2
for: 5m
labels:
severity: warning
annotations:
summary: "ICE failures sustained on {{ $labels.stream_id }}"
- alert: WebRTCFFmpegLegFailure
expr: |
increase(dragonfork_webrtc_ffmpeg_leg_failures_total[5m]) > 0
labels:
severity: critical
annotations:
summary: "FFmpeg RTP leg failed on {{ $labels.stream_id }} ({{ $labels.leg }})"
description: "Silent degradation of RTP output. Check FFmpeg logs."
Alerts evaluate but route nowhere. Alertmanager bundling deferred — see non-goals.
Grafana provisioning
Datasource provisioning at grafana/provisioning/datasources/prometheus.yml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prom:9090
isDefault: true
editable: false
Dashboard provisioning at grafana/provisioning/dashboards/webrtc.yml:
apiVersion: 1
providers:
- name: dragonfork
orgId: 1
folder: "Dragon Fork"
type: file
disableDeletion: false
updateIntervalSeconds: 30
options:
path: /var/lib/grafana/dashboards
Dashboard JSON: dragonfork-webrtc-health.json
Single dashboard, five rows aligned to the questions from the metric inventory:
- WHEP API health — request rate by route (stat panel), error rate stacked by code (timeseries), p95 request duration by route (timeseries).
- ICE establishment — success/failure rate (gauge), p50/p95 establishment duration (timeseries with a 3s threshold line for the alert), failure breakdown by reason (table).
- What's flowing —
active_streams(stat),active_peersper stream (timeseries), top 5 streams by peer count (table). - Capacity headroom —
udp_ports_available(gauge with red-zone <10), cap rejection rate by scope (timeseries). - Silent degradation — FFmpeg leg failure timeline (timeseries with annotations), codec mismatch counter (stat).
Built in Grafana 11.3, exported as JSON, committed to the repo. Refresh default 30s.
.env template additions
Append to deploy/truenas/core/README.md's example .env:
# --- Observability (added in v0.2) ---
GRAFANA_ADMIN_PASSWORD=$(openssl rand -base64 24)
GRAFANA_PORT=3000
PROM_PORT=9090
Testing
Unit tests — prometheus/webrtc_test.go
Mock WebRTCStatsSource. Drive the collector through three states (no
streams, one stream with N peers, multiple streams). Use
testutil.CollectAndCompare to assert exact metric/label/value output
against a golden plaintext fixture.
// Golden fixture (excerpt):
// # HELP dragonfork_webrtc_active_streams ...
// # TYPE dragonfork_webrtc_active_streams gauge
// dragonfork_webrtc_active_streams{core="test"} 2
// # HELP dragonfork_webrtc_active_peers ...
// # TYPE dragonfork_webrtc_active_peers gauge
// dragonfork_webrtc_active_peers{core="test",stream_id="live"} 3
// dragonfork_webrtc_active_peers{core="test",stream_id="cam"} 1
Unit tests — app/webrtc/metrics_test.go
Reuse handler_test.go setup (fake registry, in-process Echo router).
Hit each WHEP route, assert the corresponding counter and histogram have
the expected increment via testutil.ToFloat64. Drive forced error paths
(unknown stream → 404, codec-less SDP → 406, cap exceeded → 503, ICE
timeout → 504) and assert the right error-bucket counters bumped.
Integration verification — test/TESTING.md
New section "Verifying Prometheus metrics":
1. docker compose up -d
2. curl -s http://<host>:8080/metrics | grep dragonfork_webrtc_
- expect: 11 metric families present, all with `core="dragonfork-truenas"`
3. Open http://<host>:3000 (Grafana), log in with GRAFANA_ADMIN_PASSWORD
4. Navigate to Dashboards → Dragon Fork → WebRTC Health
- expect: all 5 rows render, no "no data" panels except where stream traffic is absent
5. Trigger one of each error in test/whep-player.html (intentional codec
mismatch via SDP edit, kill the publisher mid-stream, etc.)
6. Watch the Grafana panels and verify counters tick within 15s.
CI
Existing test runner picks up the new _test.go files. No new CI gates
beyond standard build+test — observability isn't a contract; the unit
tests verify shape only. Grafana dashboard JSON is not validated in CI
(no good lightweight validator); manual verification only.
Load test alignment
The deferred 5-peer × 10-min load test (separate spec) will use this dashboard as its primary observation surface. Recording rules for the load test's specific aggregations can be added in that spec without touching this one.
Rollout
The TrueNAS v0.1.0-dragonfork deploy upgrades via:
cd deploy/truenas/core
git pull # latest main with this change
# Add new lines to .env (see template above)
docker compose pull # grabs prom + grafana images
docker compose up -d # core unchanged, prom + grafana new
Core continues on host networking. The new containers connect via
host.docker.internal:host-gateway, no firewall changes required for
intra-host traffic. External Grafana access is on ${GRAFANA_PORT}.
Backwards compatibility
- No upstream metric names or labels modified. New metrics are purely
additive in
dragonfork_webrtc_*namespace. - No API changes.
/metricspayload grows but stays well-formed Prometheus exposition. - Existing config, env vars, and process JSON formats unchanged.
Forward compatibility
- The
corelabel being aConstLabelsvalue (not a per-event dimension) means future federated multi-Core scrapes will distinguish series cleanly by settingcore="dragonfork-truenas-east"etc. in each deploy's config loader. Spec'd here, implemented when needed. - New metrics in this spec follow the
dragonfork_<subsystem>_<noun>naming pattern. Future Dragon-Fork-specific metrics (WHIP, keyframe cache, bandwidth) should adopt the same convention.
Known gaps post-rollout
- No paging. Alerts evaluate, no Alertmanager. If
WebRTCFFmpegLegFailurefires at 3am, no notification — operator notices at next dashboard check. Acceptable for v0.2 single-operator deploy. Track as a v0.3 spec. - Grafana dashboard JSON is hand-edited via Grafana UI then re-exported. No JSON-as-code library used. If dashboard maintenance gets painful, Grafonnet/Grafana-as-code is a v0.3+ refactor.
/metricsitself is unauthenticated by default in v0.1 (matches upstream). If Core's deploy bundle is exposed to untrusted networks, the operator should already be using auth on Core's HTTP listener. Not this spec's problem to solve, but worth a one-line note indeploy/truenas/core/README.md.
Open Decisions
-
Should the
Stats()method live onSubsystemor onHandler? The peer count is inHandler's per-stream peer index; stream count is inSubsystem's registry; UDP port pool is inportalloc. Easiest shape:Subsystem.Stats()is the public surface and internally gathers fromHandler(via the existing teardown-hook plumbing) andportalloc. Decide at implementation time based on which surface exposes the cleanest seams. -
Should histograms also include a
corelabel, given it's already aConstLabels? Yes —ConstLabelsis automatically present on every sample, no per-call overhead, and federations need it. -
Should Prometheus retention be configurable via
.env? Defaulting to 15d covers the realistic window for "what happened last week?" queries. AddingPROM_RETENTION_DAYS=15dto.envis a one-line change. Including it as optional, defaulting to 15d. -
Import-alias collision. The local package is
package prometheus(atgithub.com/datarhei/core/v16/prometheus) andclient_golangis alsopackage prometheus. Files inapp/webrtc/that need both must alias one — convention iscoreprom "github.com/datarhei/core/v16/prometheus". Implementation note only; doesn't change the design.
References
- Prometheus client_golang
- Prometheus instrumentation best practices
- Histogram bucket design
- Grafana provisioning docs
- v0.1 design:
docs/design/2026-04-16-datarhei-dragon-fork-webrtc-design.md - M2 integration:
docs/design/2026-04-17-datarhei-dragon-fork-m2-webrtc-core-integration.md