Commit graph

931 commits

Author SHA1 Message Date
869ae1aa83 fix(playout): use static ffmpeg, not apt, to avoid CasparCG SIGABRT
apt-get install ffmpeg pulls in ~80 transitive shared libs (libav*,
libx264, libdrm, libva...) that perturb CasparCG 2.4.0's headless
runtime linking and make it abort with SIGABRT (exit 134) on almost
every launch. Replace it with john van sickle's self-contained static
ffmpeg/ffprobe binaries in /usr/local/bin — the standalone CLI the HLS
re-muxer needs, with zero new shared libraries, keeping CasparCG's
environment identical to the known-good image.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 14:18:56 -04:00
abfbd034ab fix(playout): point CasparCG log/data paths at writable /media volume
The 2.4.x server aborts at startup if its configured log-path isn't
writable/creatable. /opt/casparcg is a read-only-ish symlinked install
dir; the entrypoint already mkdirs /media/casparcg/{log,data}. Point the
config there to match (the working image used these paths).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 14:11:00 -04:00
739d08d4b5 fix(web-ui): restore home screen icon fixes on feat/playout-mcr
playout tile: monitor -> signal, dashboard tile: home -> layout.
All feature-branch content (Dragon-ISO, tagline, etc.) preserved.
2026-05-31 13:56:19 -04:00
27a868aa5c fix(playout): clean video-only HLS preview via standalone ffmpeg re-mux
CasparCG's bundled FFMPEG/HLS consumer muxes a broken audio track
(aac sample_rate=0, time_base 1/0) into the preview, and silently drops
every arg that would remove it (-an, -codec:a, -g, -r all "Unused
option"). That corrupt audio black-screens the browser preview because
neither ffmpeg nor hls.js can decode the playlist.

Re-architect the preview path: CasparCG now STREAMs plain mpegts to a
UDP loopback port, and a Node-spawned STANDALONE ffmpeg (where -an
actually works) re-muxes it to clean, video-only HLS with -c:v copy.
The child process is tracked, auto-respawned while running, and killed
in stopChannel(). The PRIMARY SRT/RTMP/SDI/NDI output (with program
audio) is untouched.

Also fix the Dockerfile to match the working image: ubuntu:22.04 base +
CasparCG 2.4.0 ubuntu22 zip + NodeSource Node 20, and add a standalone
ffmpeg CLI. The old 2.3.3 tarball URL 404s. entrypoint.sh updated for
the 2.4.x bin/casparcg layout + bundled lib/ LD_LIBRARY_PATH.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:55:18 -04:00
1db0e81efb fix(web-ui): restore nav icon fixes on feat/playout-mcr
youtube: import, schedule: clock, playout: signal.
Keeps getting wiped by other agents writing full file replacements.
2026-05-31 13:54:16 -04:00
40e987b4a2 fix(web-ui): restore icon audit fixes on feat/playout-mcr
jobs: bulleted list (not hamburger), import: added, grid: rx=1,
hdd: cylinder, proxy: sliders. Keeps getting wiped by other agents.
2026-05-31 13:52:37 -04:00
426273129d fix(playout): video-only HLS preview (broken audio time_base was the black-screen cause)
Definitive root cause of the black preview, found via server-side ffmpeg
decode of the live playlist:

  Error while decoding stream #0:1: Invalid data found (x57)
  [abuffer] Value inf for parameter 'time_base' ... time_base to value 1/0

Stream #0:1 is the AAC audio. CasparCG's real-time channel feeds the HLS
consumer an audio stream whose muxed time_base is 1/0 (infinity). ffmpeg
itself cannot decode the playlist, and hls.js silently fails to append the
fragment after demux, so the <video> stays at readyState 0 (black) even
though the video PTS is perfectly continuous and segments serve 200.

Fix: drop audio from the HLS confidence monitor (-an). The video track is
clean h264 and plays in hls.js. Program audio still rides the primary
SRT/RTMP/SDI/NDI output, which is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:44:45 -04:00
87d988810f fix(playout): use CFR rate + frame GOP for uniform HLS segments
CasparCG's FFMPEG consumer ignores -force_key_frames ("Unused option")
because it routes args to the muxer, not the encoder. Revert to the
frame-based GOP (-g 60 -keyint_min 60) but keep the forced CFR rate
(-r 30000/1001): at 29.97fps a 60-frame GOP is exactly 2.0s, so keyframes
and HLS splits land on clean 2s boundaries. CFR is what was missing
originally — with the channel's irregular feed rate, "60 frames" drifted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:38:21 -04:00
f28799317d fix(playout): clean CFR HLS preview so hls.js can sync
Root cause of the black preview: CasparCG's real-time channel feeds the
HLS consumer frames with irregular timestamps (the "packet with pts X has
duration 0" warnings). With frame-count GOPs (-g 60) the muxer split
points drift, producing erratic segment durations (0.4s-4.2s) that exceed
the declared TARGETDURATION. hls.js parses the resulting live playlist but
can never establish a fragment timeline — it reloads forever
("sliding 0.00 / prev-sn na / MISSED") and never appends a fragment, so
the video element stays at readyState 0 (black). Verified live via the
browser: manifest + segments serve 200, segment is valid h264/aac with a
keyframe start, yet hls.js logs zero FRAG_LOADED.

Fix: force a constant output frame rate (-r 30000/1001, regenerates
uniform PTS) and time-based keyframes every 2s (-force_key_frames
expr:gte(t,n_forced*2)), so every segment is a clean keyframe-aligned 2.0s
chunk. Yields a spec-compliant playlist (TARGETDURATION 2, stable
8-segment/16s window) identical in shape to the capture/VOD HLS the rest
of the app already plays successfully through the same hls.js.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 13:35:45 -04:00
d778aa4cdb fix(playout): HLS preview path + live elapsed counter
- nginx.conf: add /media/live/ location serving from the media volume
  mount. CasparCG sidecar writes HLS preview to /media/live/<id>/ but
  nginx only had /live/ (capture volume). Without this, preview
  requests returned the SPA shell instead of the .m3u8 playlist.
- ProgramMonitor: add live elapsed counter (MM:SS, ticks every 500ms)
  driven by engine.currentItemStartedAt. Shows alongside clip index.
  Adds a ⚠ pip when lastError is set (e.g. NDI SDK missing) without
  blocking operation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 13:16:57 -04:00
00b04aa4a8 fix(playout): non-fatal consumer + loadPlaylist guard
- startChannel: make primary consumer ADD non-fatal. CasparCG decodes
  and routes media without an output consumer, so NDI channels (no SDK)
  and misconfigured SRT/RTMP channels still load/play clips and expose
  the HLS preview. state.lastError carries the consumer error for UI
  visibility without blocking operation.
- loadPlaylist: throw early if state.running=false (channel/start was
  never called or failed hard) with a clear error instead of a cryptic
  CasparCG AMCP error propagating to the operator.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 13:01:21 -04:00
e8f91cf4b4 fix(playout): immediate failover on new channels + play 502 vs 409
- spawnChannelSidecar: set last_heartbeat_at = NOW() when flipping
  channel to 'running'. Without this, last_heartbeat_at is NULL so
  the first scheduler tick sees ageMs = (now - epoch) >> TIMEOUT_MS
  and triggers failover before the sidecar has had a single chance
  to respond.
- scheduler playoutHealthTick: when last_heartbeat_at is NULL fall
  back to updated_at as the baseline (belt-and-suspenders with the
  spawnChannelSidecar fix). Also include updated_at in the query.
- POST /channels/:id/play: catch callSidecar errors explicitly and
  return 502 Bad Gateway instead of delegating to next(err) which
  the error middleware maps to 409 Conflict.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 12:34:41 -04:00
e51cf1aa9c feat(jobs): surface playout-stage queue in Jobs screen
- jobs.js: add playout-stage BullMQ queue to QUEUES; asset_id from
  job data is already resolved to a name by attachAssetNames
- screens-jobs.jsx: map type 'playout-stage' -> kind 'Stage' with
  monitor icon

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 12:06:08 -04:00
f7cf56ae0d fix(playout): silent-audio staging crash, home tiles, channel delete
- playout-stage: skip loudnorm pass 2 when measured_I=-inf (silent or
  no-audio clip); fall back to plain AAC transcode so staging completes
  instead of erroring out
- screens-home: add Playout tile; replace Premiere panel tile with
  Downloads tile opening a combined modal (Premiere panel releases +
  Dragon-ISO link to forge.wilddragon.net/WildDragonLLC/dragon-iso)
- screens-playout: add Delete channel button (visible only when stopped);
  removes channel from list and selects next on confirm

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 12:03:20 -04:00
12115a053a feat(playout): fix 409 drag bug, add HLS preview, advanced playlist
- Fix event bubbling: e.stopPropagation() in onItemDrop prevents
  duplicate POST when dropping on an existing playlist item
- Wrap all drop handlers in try/catch with inline error display
- ProgramMonitor: replace text placeholder with hls.js video player
  loading /media/live/<channel_id>/index.m3u8; falls back to native
  HLS on Safari; destroys Hls instance on channel stop/unmount
- Playlist: per-item duration (MM:SS), staging progress bar with
  animated stripe while staging, now-playing highlight + ▶ indicator
  driven by engine.currentIndex from 4s status poll
- Playlist footer: clip count + total duration sum
- Transport: Play button disabled + shows ' N staging' until all
  items are media_status=ready, eliminating the staging-not-ready 409

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 11:45:25 -04:00
43656a5e88 shell: youtube nav icon: download → import
download is used by the Downloads section tile on the home screen.
YouTube ingest gets the new import icon (arrow entering box) instead.
2026-05-31 10:52:10 -04:00
68461af990 icons: add import icon (arrow entering box)
Distinct from download (vertical arrow+line). Used for YouTube ingest
to avoid sharing a glyph with the Downloads section tile.
2026-05-31 10:50:43 -04:00
8bc460025d screens-home: fix launcher tile icons
- Dashboard tile: home → layout (matches sidebar nav icon)
- Playout tile: monitor → signal (matches sidebar nav fix)
2026-05-31 00:19:34 -04:00
3578c7b4e9 fix(playout): Privileged only for decklink (SRT/NDI/RTMP/HLS crashed when GPU exposed without driver) 2026-05-30 18:59:27 -04:00
cddcc9a29e fix(mam-api): selfHeartbeat writes last_seen_at so primary node isn't stale-failover-killed 2026-05-30 18:57:20 -04:00
0e844c0fc3 fix(scheduler): use updated_at as grace anchor when last_heartbeat_at NULL
Without this, a freshly-spawned channel with NULL last_heartbeat_at was
instantly failover-killed by the playoutHealthTick because `0` was used as
the lastSeen timestamp, making ageMs huge on the very first tick.
2026-05-30 17:32:15 -04:00
551af09dc7 fix(playout): install libnss3 so CEF can init (NSS -8023 was killing the channel ~30s in) 2026-05-30 17:16:54 -04:00
4d6a999665 fix(playout): pre-create NSS dir + CEF cache so CEF/HTML producer doesn't SIGABRT 2026-05-30 17:14:07 -04:00
f971d57bb9 fix(playout): use unzip not python zipfile (preserves exec bits) 2026-05-30 17:00:25 -04:00
7ab70948a0 fix(playout): entrypoint handles 2.4.x bin/casparcg layout + LD_LIBRARY_PATH for bundled libs 2026-05-30 16:50:04 -04:00
13bbd4216e fix(playout): correct 2.4.0 zip layout — binary is at casparcg_server/bin/casparcg 2026-05-30 16:49:48 -04:00
fcd8e8dd2e fix(playout): entrypoint finds binary in /opt/casparcg for 2.4.x tarball layout 2026-05-30 16:44:23 -04:00
67ac007706 fix(playout): downgrade CasparCG to 2.4.0 ubuntu22 zip (2.5 requires AVX2, ZAMPP has AVX only) 2026-05-30 16:44:07 -04:00
b4f2fb12ff fix(mam-api): heartbeat writes last_seen_at so playout failover sees healthy nodes 2026-05-30 16:32:11 -04:00
aa7f836493 fix(playout): strip XML comments from casparcg.config (2.5 rejects them) 2026-05-30 16:30:54 -04:00
c2409bd037 fix(mam-api): add last_seen_at to cluster_nodes for playout failover
Playout failover queries cluster_nodes.last_seen_at to find healthy nodes
for channel re-placement. Column missing from original cluster schema.

Migration 031 adds column + backfills existing nodes to NOW().

Fixes scheduler error: column "last_seen_at" does not exist

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-05-30 13:39:06 -04:00
42064acefa shell: fix nav icon conflicts
- schedule: jobs → clock (was sharing hamburger icon with Jobs)
- playout: monitor → signal (was sharing TV icon with Monitors)
2026-05-30 13:30:42 -04:00
2e2b091653 icons: fix 4 icon issues found in audit
- jobs: replace hamburger (nav menu) with bulleted list (task queue)
- grid: add rx="1" to match library icon (consistency)
- hdd: replace circle+dot (vinyl) with cylinder (storage)
- proxy: replace upload-arrow with sliders (transcode/transform)
2026-05-30 13:29:18 -04:00
Zac
c502d4a16f feat(web-ui): update home tagline + add "Let's create" motto
Tagline "Self-hosted broadcast media-asset management" ->
"Media Asset Management & Production Platform"; add italic accent motto
"Let's create" below it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 16:04:28 +00:00
Zac
9d098e9778 feat(auth-ui): interactive permissions matrix, admin 2FA reset, Downloads button
Backend (routes/users.js):
- GET / now returns totp_enabled so the UI can show 2FA status
- GET /:id/access — admin-only effective per-project access (MAX over direct +
  group grants), labels via=direct|group:<name>; admins report all/edit
- POST /:id/totp/disable — admin clears a locked-out user's 2FA without their
  password (self-service disable still requires it); dev user blocked
- role validated against {admin,editor,viewer} on create + PATCH (was unchecked)

Frontend:
- Users>Policies tab: static prose replaced with interactive per-user matrix —
  inline role select, 2FA badge, Reset-2FA action, lazy per-user access expander
- Home "Premiere panel" tile -> "Downloads"; modal renamed, adds Teams ISO row
  (disabled "coming soon" until the .exe is supplied); UXP .ccx link unchanged
- data.jsx: window.TEAMS_ISO placeholder ({available:false})

Not runtime-tested in browser yet. Teams ISO .exe still pending from user.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 15:59:27 +00:00
Zac
02631f7b96 fix(playout): locate CasparCG 2.5 binary at /usr/bin/casparcg-server-2.5
First 2.5 build got past the deb install but the binary-discovery step
produced an empty $BIN (test -n failed): the 2.5 deb names its executable
casparcg-server-2.5, which the old case pattern (*/casparcg, */CasparCG
Server) didn't match. Broaden the match to /usr/bin/*casparcg*server*, fall
back to the known /usr/bin/casparcg-server-2.5, symlink it to
/usr/local/bin/casparcg, and make /opt/casparcg a real dir for our config
(no longer symlinked onto /usr/bin). Entrypoint launches `casparcg <config>`
from PATH instead of ./casparcg in a cwd.

Still NOT runtime-validated: 2.5 may reject the 2.3-era casparcg.config
schema (a bad config shows up as "Configuration file --version was not
found"); the deb ships a reference config at
/usr/share/casparcg-server-2.5/casparcg.config to diff against at smoke time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 15:34:02 +00:00
Zac
9436434599 fix(playout): build CasparCG 2.5.0 from .deb (2.3.3 tarball was a dead URL)
The image never built: CASPAR_URL pointed at a v2.3.3-stable Linux tarball
that CasparCG never published (2.3.x is Windows-only; Linux builds start at
2.4.0, and 2.4.1+ ship only as .deb). Rewrite to install the 2.5.0 noble
server + CEF debs on an ubuntu:24.04 base (Node 20 via nodesource), letting
apt resolve the GL/ffmpeg/openal runtime deps. Binary install dir is
discovered from the deb file list and symlinked to /opt/casparcg so the
entrypoint + config still run from there. Move CasparCG log/data dirs to
/media (writable mount) since the install dir may be read-only.

NOT runtime-validated: the 2.5 casparcg.config schema and the AMCP consumer
syntax (ADD <ch> STREAM/FILE) were authored against 2.3 and must be smoke-
tested against 2.5 before a channel start can be trusted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 15:25:31 +00:00
Zac
f837e57969 feat(web-ui): add Playout tile to home screen
Fetches /playout/channels separately and degrades silently when the
endpoint or schema is absent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 14:59:59 +00:00
Zac
ca71e47035 fix(playout): repair failover, authenticate scheduler self-calls, fix playlist walk + CasparCG consumer syntax
Post-review fixes for the 8-commit playout-mcr drop:

- Scheduler self-calls (callSelf -> /recorders, /playout) carried no auth, so
  under AUTH_ENABLED=true requireUiHeader 403'd every mutating POST. This broke
  playout failover AND scheduled recordings. Add a per-boot in-process service
  token (x-internal-token) the scheduler attaches; requireAuth/requireUiHeader
  treat it as the seeded admin. No env/compose config needed.

- Failover deadlocked: restartChannel set status='starting' then the scheduler
  called the guarded /start route, which 409s on 'starting'. Extract the spawn
  body into spawnChannelSidecar() shared by /start and restartChannel; failover
  now spawns directly with no self-call.

- Phase A playlist stalled after 2 clips: _scheduleAdvance cued the next clip
  via LOADBG AUTO but never advanced the pointer. Pass asset_duration_ms in the
  /play payload and arm a duration-based timer that advances currentIndex and
  cues subsequent clips, keeping as-run in sync for arbitrary-length playlists.

- CasparCG consumer syntax was invalid: "ADD <ch> FFMPEG" is the producer name,
  not a consumer keyword, and old -vcodec/-acodec short args are rejected. Use
  STREAM/FILE with -codec:v / -codec:a / -preset:v / -tune:v and a format=yuv420p
  filter ahead of libx264 (channel output is RGBA).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 14:51:35 +00:00
Zac
34352e3299 docs(playout): work log — commit map, decisions, testing checklist
Replaces the earlier aspirational "complete" log with the actual commit
sequence on feat/playout-mcr, the §7 decisions as built, the media-flow
diagram, port-contention + failover scope, and a runtime testing checklist
(migration → image build → SRT smoke → failover kill test).
2026-05-30 14:05:57 +00:00
Zac
d505a488ac build(playout): compose wiring + .env knobs
- Add /mnt/NVME/MAM/wild-dragon-media:/media to mam-api (rw) and worker-p4
  (rw); web-ui (ro, for serving HLS preview segments).
- worker-p4 WORKER_QUEUES gains 'playout-stage' so master-tier nodes pick up
  the loudnorm stage jobs (they already have ffmpeg + the media mount).
- New build-only 'playout' service with profile ["build-only"] so
  `docker compose --profile build-only build playout` produces the
  wild-dragon-playout:latest image without compose trying to up it as a
  long-running service. mam-api spawns these on demand.
- mam-api env adds PLAYOUT_IMAGE + PLAYOUT_AMCP_BASE_PORT (5250 default).
- .env.example: PLAYOUT_IMAGE, PLAYOUT_AMCP_BASE_PORT.
2026-05-30 14:05:57 +00:00
Zac
793011b78b feat(web-ui): MCR page — channels, playlist, transport, preview
screens-playout.jsx + styles-playout.css: program monitor (HLS preview from
the sidecar), media bin, drag-drop playlist editor, transport controls. Plain
HTML5 drag-drop, no extra library. Talks to /api/v1/playout via
ZAMPP_API.fetch.

Wired into the shell: "Playout" under Operations, breadcrumb mapping, route
case in app.jsx, stylesheet + dist/screens-playout.js script in index.html.
Format dropdown defaults to 1080p5994 (matches the new channel default).
2026-05-30 14:02:25 +00:00
Zac
5538683d78 feat(mam-api): /playout control plane + auto-failover
Routes: channel + playlist CRUD, start/stop/play/pause/skip transport, as-run
log. RBAC via assertProjectAccess on channel.project_id; null project ⇒
admin-only (recorder convention).

Sidecar orchestration mirrors recorders.js: Docker socket for local node,
node-agent /sidecar/start for remote. Channel start passes CHANNEL_ID env so
the sidecar can write HLS preview to /media/live/<id>.

DeckLink port-contention guard: blocks starting a decklink channel when a
recorder or another channel on the same node+device_index is active.

restartChannel(id) helper picks another healthy cluster node and re-places
non-decklink channels; decklink is alert-only. Exposed for the scheduler.

Scheduler tick adds step 6: poll each running channel's sidecar /status,
update last_heartbeat_at, and after ~3 misses trigger restartChannel +
self-call /start. Reuses the existing PG advisory lock so multi-replica
deploys don't double-fire failovers.
2026-05-30 14:02:25 +00:00
Zac
d62af34e98 feat(playout): CasparCG sidecar image + Node AMCP shim
One container per channel. Built like capture/build-with-decklink: NDI +
DeckLink SDKs fetched at build, runs --privileged with Xvfb for the GL
context where no real display is present.

Components:
- entrypoint.sh: Xvfb + CasparCG launch, creates /media/live/<CHANNEL_ID>
- src/amcp.js: TCP AMCP client
- src/playout-manager.js: channel lifecycle, playlist walk via LOADBG AUTO
  for gapless transitions; primary consumer (decklink/ndi/srt/rtmp) plus a
  second FFMPEG HLS consumer (~600 kbps, 2s segments) for the UI preview
- src/index.js: HTTP shim — /channel/start, /playlist/load, transport
- frame-rate helper picks fps from video_format (59.94 → 60000/1001) so
  SEEK / LENGTH frame math is correct
2026-05-30 14:02:25 +00:00
Zac
209f9fda52 feat(worker): playout-stage job — S3 → /media + EBU R128 loudnorm
Stages playlist items from S3 to the shared CasparCG media volume. Pass 1
measures, pass 2 applies linear loudnorm (I=-23 LUFS, TP=-1 dBTP, LRA=11);
output is AAC 192k @ 48 kHz, video stream copied. Atomic rename on success
so CasparCG never sees a partial file. Per-item audio_normalized flag means
re-stages of the same asset skip the loudnorm pass.

Wired into worker/src/index.js behind WORKER_QUEUES=playout-stage so
capability-routed deploys can pin it to nodes that already have ffmpeg +
the media mount.
2026-05-30 14:02:25 +00:00
Zac
29187a90df feat(mam-api): migration 029 — playout schema
Six tables: channels, playlists, items, sidecars (sidecar registry for
health-check), schedule (Phase B), as-run log.

- video_format default 1080p5994 (house standard, capture cadence)
- restart_count / last_restart_at / last_heartbeat_at on channels for
  auto-failover bookkeeping
- audio_normalized flag on items so re-stages skip the loudnorm pass
- unique partial index on (channel_id) for running sidecars
2026-05-30 14:02:25 +00:00
Zac
512267159a docs(playout): MCR design spec — Phase A playlist + Phase B 24/7
Single-doc design covering the playout subsystem: CasparCG-backed sidecars,
multi-channel placement, S3→/media staging, scheduling phases, the data
model, channel placement vs port contention.

§7 questions are answered inline (2026-05-30): −23 LUFS at stage time,
1080p5994 default, HLS preview v1, auto-restart-on-healthy-node failover
(DeckLink alert-only).
2026-05-30 14:02:25 +00:00
Zac
72fc608d8a fix(mam-api): harden TOTP login flow + tighten Google domain check
Review of the v2 auth landing turned up four weak spots in the MFA path.
All four are now fixed; behaviour is unchanged for the password-correct
+ correct-TOTP happy path.

1. TOTP brute-force gate (the big one). /login was calling
   ipBackoff.recordSuccess(ip) the instant the password hashed correctly,
   *before* the second factor was proven. That cleared the per-IP failure
   counter, so each /login retry let an attacker with a known password
   hammer the 6-digit /login/totp space (10^6) at full speed.
   Now recordSuccess fires only inside establishSession() — i.e. after
   every required factor has actually passed (password [+TOTP] or
   OAuth [+TOTP]).

2. MFA ticket binding. Tickets issued by /login (and the Google callback)
   were unbound — a stolen ticket replayed from a different origin still
   worked. Tickets now carry SHA-256 hashes of the issuing request's IP
   and User-Agent; redeemTicket rejects on mismatch. The ticket is burned
   even on mismatch so a wrong-binding probe can't be retried.

3. TOTP replay within the same 30s step (RFC 6238 §5.2). The verifier
   accepted the same code as many times as you submitted it. Now
   verifyToken returns the matched counter, and /login/totp does a CAS
   UPDATE on users.totp_last_counter — codes at counters <= the last
   accepted value are rejected. New migration 030 adds totp_last_counter,
   seeded on /totp/enable so the enrollment code itself can't be reused
   at first login, and zeroed on /totp/disable.

4. Google OAuth domain check no longer falls back to the email suffix
   when the hd (hosted-domain) claim is missing. Email-suffix matching
   let consumer (non-Workspace) Google accounts whose email happens to
   end in the allowed domain through; if GOOGLE_ALLOWED_DOMAIN is set,
   the operator means "only this Workspace", so accounts without a
   verified hd must be rejected.

Tests: new mfa-tickets.test.js covers ip/UA binding, single-use on
mismatch, and bindings-absent back-compat. totp.test.js updated for the
new verifyToken return shape (counter on success, null on failure;
truthiness still works at call sites) and adds an explicit
matched-counter check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 12:52:53 +00:00
Zac
3fe7d6bba2 fix(mam-api): close cross-project authz gaps in assets/bins/jobs/upload
Review of the v2 auth landing found four places where the per-project RBAC
helpers weren't applied to destination/source projects, letting a scoped
editor write into projects they don't have access to:

- assets PATCH /🆔 bin_id moved with no check, so an editor in project A
  could stuff their asset into a bin in project B. Now validates the bin's
  project_id matches the asset's own project (assets don't change project).
- assets POST /:id/copy: body's projectId/binId never checked, so any
  reachable asset could be cloned into an arbitrary project. Now asserts
  edit on the destination project and validates binId belongs there.
- bins POST /:id/assets: requireBinEdit checks edit on the bin's project but
  not on the source asset's project, so an asset from project B could be
  pulled into A's bin tree (and surfaced in A's views). Now the asset must
  belong to the bin's own project.
- jobs POST /conform: project_id from body never gated, so any logged-in
  user could enqueue conform jobs against any project. Now asserts edit.
- upload POST /init, POST /simple: projectId/binId from body never gated,
  same class of bug. Now asserts edit on projectId and validates binId.
- upload GET /: returned every in-progress upload globally, leaking
  filenames across projects. Now scoped via accessibleProjectIds.

These are the same pattern as the holes 2615143 closed on recorders/
sequences/imports/comments — these routes existed before the RBAC commit
landed and were never marked TODO(authz), so the broad sweep missed them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 12:52:29 +00:00
Zac
2615143c6d feat(mam-api): finish per-project authz on the deferred routers
Phase 1 scoped only projects/assets/bins and left recorders, sequences,
imports, comments carrying TODO(authz) markers. A scoped editor/viewer could
still read and mutate those across every project. This closes the gap using
the existing authz.js helpers — no open TODO(authz) markers remain.

- recorders: param('id') resolves project + view baseline, requireRecorderEdit
  on PATCH/DELETE/start/stop, GET / filtered by accessibleProjectIds, POST /
  asserts edit on the target project (null project = admin-only)
- sequences: same param pattern + requireSequenceEdit on PUT/:id,/clips,conform
  and DELETE; GET//POST/ assert on the query/body project
- imports: POST /youtube asserts edit on the body projectId
- comments: router.use guard resolves project via the asset (view to read, edit
  to write); also fixes the author bug (req.session.userId -> req.user.id, which
  was always NULL so comments had no recorded author)
- capture: intentionally any-logged-in (shared hardware, asset scoped on
  registration) — TODO replaced with a rationale note

Security fixes from review of this change:
- recorders POST /:id/start: a per-take projectId override could route a live
  asset into a project the caller lacks edit on — now asserts edit on the
  override target
- sequences PUT /:id/clips: spliced asset_ids weren't checked, so an editor
  could pull in (and via GET /:id leak signed proxy URLs for) assets from a
  project they can't access — now every clip asset must belong to the
  sequence's project; pre-transaction queries moved inside try/catch so a DB
  error returns 500 instead of hanging the request

- tests: recorders-access, sequences-access (incl. cross-project clip guard),
  comments-access (incl. author-id regression)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 03:48:02 +00:00