Feature: per-recorder GPU affinity (replace NVIDIA_VISIBLE_DEVICES=all) #167

Open
opened 2026-05-29 14:11:43 -04:00 by zgaetano · 0 comments
Owner

Context

The NVENC sidecar (#161) currently attaches the GPU with NVIDIA_VISIBLE_DEVICES=all — fine for the single-GPU L4 node, but it pins nothing on multi-GPU hosts and gives no way to balance capture load across cards.

Proposal

  • Store an optional gpu_uuid (or device index) per recorder (or per node-pool policy).
  • node-agent handleSidecarStart passes the specific UUID in NVIDIA_VISIBLE_DEVICES instead of all (the code already has a TODO comment noting this).
  • Default behavior unchanged when unset.

Why

Prerequisite for capacity planning and for dedicating a card to capture vs. the worker proxy/conform pool on multi-GPU nodes (e.g. zampp1's Tesla P4 + 2× Quadro P400). Pairs with the capacity telemetry in the sibling issue.

## Context The NVENC sidecar (#161) currently attaches the GPU with `NVIDIA_VISIBLE_DEVICES=all` — fine for the single-GPU L4 node, but it pins nothing on multi-GPU hosts and gives no way to balance capture load across cards. ## Proposal - Store an optional `gpu_uuid` (or device index) per recorder (or per node-pool policy). - node-agent `handleSidecarStart` passes the specific UUID in `NVIDIA_VISIBLE_DEVICES` instead of `all` (the code already has a TODO comment noting this). - Default behavior unchanged when unset. ## Why Prerequisite for capacity planning and for dedicating a card to capture vs. the worker proxy/conform pool on multi-GPU nodes (e.g. zampp1's Tesla P4 + 2× Quadro P400). Pairs with the capacity telemetry in the sibling issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: WildDragonLLC/dragonflight#167
No description provided.