Feature: live NVENC/GPU encode telemetry on the Cluster screen #166

Open
opened 2026-05-29 14:11:33 -04:00 by zgaetano · 0 comments
Owner

Context

The Cluster screen shows static GPU capability (name/VRAM/bound) but not live encode load. With NVENC ingest, the operationally-critical numbers during a show are: active NVENC sessions per GPU, encoder utilization %, and whether capture is competing with worker proxy/conform jobs on the same card.

Proposal

Surface per-node, per-GPU live telemetry (poll nvidia-smi via node-agent heartbeat or a dedicated endpoint):

  • NVENC encoder utilization %, decoder %, GPU mem used
  • Active NVENC session count vs. card limit (L4 = unlimited; consumer cards = 3–5)
  • Which containers hold the GPU (capture sidecars vs worker pool)

Render in the existing cluster hardware panel, glance-readable per PRODUCT.md.

Why

The 8-signals-per-node plan (design §5) explicitly flags GPU contention between live capture and post-record proxies as the scaling risk. Operators need to see that headroom in real time, not discover it when a 9th encode session fails.

## Context The Cluster screen shows **static** GPU capability (name/VRAM/bound) but not **live** encode load. With NVENC ingest, the operationally-critical numbers during a show are: active NVENC sessions per GPU, encoder utilization %, and whether capture is competing with worker proxy/conform jobs on the same card. ## Proposal Surface per-node, per-GPU live telemetry (poll `nvidia-smi` via node-agent heartbeat or a dedicated endpoint): - NVENC encoder utilization %, decoder %, GPU mem used - Active NVENC session count vs. card limit (L4 = unlimited; consumer cards = 3–5) - Which containers hold the GPU (capture sidecars vs worker pool) Render in the existing cluster hardware panel, glance-readable per PRODUCT.md. ## Why The 8-signals-per-node plan (design §5) explicitly flags GPU contention between live capture and post-record proxies as the scaling risk. Operators need to *see* that headroom in real time, not discover it when a 9th encode session fails.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: WildDragonLLC/dragonflight#166
No description provided.