Commit graph

8 commits

Author SHA1 Message Date
726343db96 fix(node-agent): bind nvidia-smi for full GPU info (name, VRAM, driver)
index.js:
- detectGpusViaSmi(): runs nvidia-smi --query-gpu=index,name,memory.total,
  driver_version and parses the output into structured GPU objects with
  name, memory_mb, driver, device — the same fields the cluster UI uses
  to determine BOUND status
- Falls back to /dev/nvidia* file scan if nvidia-smi isn't available

docker-compose.worker.yml:
- Bind-mount /usr/bin/nvidia-smi and libnvidia-ml.so.1 from host into
  node-agent container (read-only). These are the minimum binaries needed
  for nvidia-smi to execute inside the container.
- Mounts are optional — Docker ignores them silently if paths don't exist
  (e.g. on nodes without NVIDIA hardware)
2026-05-26 18:19:23 +00:00
8186b181cc fix(decklink): mount /dev/blackmagic in sidecar + remote node routing via node-agent
Two bugs fixed:
1. SDI capture sidecar never had /dev/blackmagic bound — ffmpeg opened the
   decklink input inside a container with no device nodes, so frame=0.
   Fix: local spawns now push '/dev/blackmagic:/dev/blackmagic' onto Binds
   when source_type='sdi'.

2. recorders.js always spawned sidecars against the local Docker socket
   (zampp1), even when a recorder's node_id pointed at zampp2 (where the
   card is). Fix: resolveNodeTarget() looks up the recorder's cluster node;
   if it's a different hostname the sidecar is spawned via a new
   POST /sidecar/start endpoint on the remote node-agent.

node-agent gains three new routes (all talk to the local Docker socket):
  POST   /sidecar/start         — create + start container (host network,
                                   privileged, /dev/blackmagic bind for sdi)
  DELETE /sidecar/:id           — stop + remove
  GET    /sidecar/:id/status    — inspect + poll capture service

docker-compose.worker.yml: add /var/run/docker.sock and LIVE_DIR to
node-agent so it can spawn sidecars, and document build-capture prerequisite.: index.js
2026-05-21 18:51:09 -04:00
3b4af6ef11 node-agent: prefer NODE_IP and skip docker bridge interfaces
In bridge mode the agent was reporting the container's 172.x address
because the first non-internal interface in os.networkInterfaces() was
docker0. Now honours NODE_IP, skips lo/docker*/br-*/veth*/etc, and
down-ranks the 172.16-31 range so real LAN IPs win. Also exposes the
detected IP on /health for the onboarding script to print.
2026-05-21 00:15:03 -04:00
cc8ee63639 fix(node-agent): replace express with built-in http — no external deps needed 2026-05-20 22:59:03 -04:00
a941f609f0 feat: node-agent detects NVIDIA GPUs and Blackmagic DeckLink cards, reports in heartbeat 2026-05-20 14:18:07 -04:00
c5a358888b feat(node-agent): heartbeat agent — CPU/mem stats, health endpoint, bearer token auth 2026-05-20 13:48:18 -04:00
0bc1ac9161 feat(node-agent): add Dockerfile 2026-05-20 13:47:57 -04:00
feb78b8bcb feat(node-agent): add package.json for cluster heartbeat agent 2026-05-20 13:47:53 -04:00