Code-review feedback:
- Dummy hash for user-enumeration-defense timing was 63 chars (bcrypt strings
are 60 chars). Worked by accident because bcrypt 5.x is lenient about
trailing chars; a future tightening would silently regress the timing
defense. Replaced with a real pre-computed bcrypt hash.
- last_login_at UPDATE now logs errors instead of silently swallowing them,
matching the pattern in requireAuth for api_tokens.last_used_at.
- Removed dead import of comparePassword from auth.test.js.
Code-review feedback: startsWith('/cluster') was a prefix match that exposed
destructive operator endpoints (POST /containers/:id/restart, DELETE /:id,
GET /devices/blackmagic/*) unauthenticated. Only POST /heartbeat is genuine
node-agent traffic; everything else in cluster.js is operator/UI surface
that should go through requireAuth. Long-term: issue node-agent a bound
api_token and drop the carve-out entirely.
Code-review feedback:
- Hard-fail boot when AUTH_ENABLED=true and SESSION_SECRET is unset, so
express-session can't silently use an in-memory random secret that
invalidates sessions on restart and breaks multi-node clusters.
- CORS rejection now returns cb(null, false) instead of cb(new Error)
so misconfigured origins surface as clean CORS errors in the browser
instead of HTTP 500s. Log a warn line for operator visibility.
- pruneSessionInterval units comment.
Code-review feedback: writing last_seen_at = now before loadUser() lets
the stamp persist if the lookup throws (resave:false still writes when
modified), extending the idle window without confirming the user exists.
Also clarify DEV_USER_ID is a specific placeholder, not a generic sentinel.
Code-review feedback: ON CONFLICT (id) only catches id collisions; a pre-existing
'dev' username would trigger a unique_violation on the username index and roll
back the migration, hard-failing the mam-api boot. Switch to bare ON CONFLICT
DO NOTHING so any unique conflict is no-op-safe.
CEP's embedded Chromium (used by Premiere Pro panels) does not support
oklch() color syntax. All color tokens were rendering as invalid/transparent,
causing the panel to appear unstyled. Converted all oklch() values to their
precise hex/rgba equivalents via OKLab→sRGB math. No design changes.
- Capture screen now polls /cluster/devices/blackmagic/signal every 3s
- Per-port chips show signal state (RECEIVING/CONNECTING/LOST/ERROR/IDLE) with pulsing dot
- BMD SVG card diagram rendered per node card
- Sidebar nav badge on Capture item shows live/total port count (pulsing green dot)
The Dashboard page was rendering as plaintext because all .dash-* CSS
classes (dash-section, dash-onair-*, dash-jobs-*, dash-cluster-*,
dash-statusbar, etc.) were missing. Added them with the full dark-theme
design-system styling matching the rest of the app.
The Cluster page's .stat-row and .stat-card classes were also missing,
causing node statistics (counts, CPU, GPUs, memory) to render unstyled.
Added grid-based stat row and card styles.
- remove requireAuth from all route files
- delete auth.js, tokens.js, users.js routes
- delete auth middleware
- remove session middleware and all auth deps from index.js
- delete login.html and auth-guard.js from web-ui
Scope (locked in via planning Q&A):
- Identity: local accounts only (PG users table) + existing bearer
tokens for headless callers.
- Transport: httpOnly cookie session for browser, Bearer for API.
- RBAC: admin / editor / viewer roles, plus an orthogonal
is_client flag for external (agency, talent, customer) accounts.
- Bootstrap: ADMIN_BOOTSTRAP_USER + ADMIN_BOOTSTRAP_PASSWORD env
seed the first admin on a clean install. Set ADMIN_BOOTSTRAP_RESET
to force-reset the named user (break-glass).
- Rate limit: in-memory, 10 fails per 15min per (IP, username).
- Password policy: \u22658 chars, mixed case, digit, symbol; small
blocklist of common passwords; cannot equal username.
- Self-service: change own display name + password. Everything
else (role, is_client, other-user mgmt) is admin only.
- Audit log: append-only table, indexed by actor + event_type +
created_at, populated by every auth/admin event.
Files added:
- services/mam-api/src/db/migrations/022-auth-rework.sql
users.is_client + last_login_at + failed_attempts; audit_log
table with FK to users (ON DELETE SET NULL).
- services/mam-api/src/middleware/audit.js
Fire-and-forget audit() helper. Caller never awaits, failure
logs but never throws — auditing cannot break the request
that triggered it.
- services/mam-api/src/middleware/passwordPolicy.js
Shared checkPassword(pw, { username }) used by setup, user
create/update, and self-service password change.
- services/mam-api/src/tasks/bootstrapAdmin.js
Runs after migrations. No-ops unless ADMIN_BOOTSTRAP_USER +
ADMIN_BOOTSTRAP_PASSWORD are set AND (users table empty OR
ADMIN_BOOTSTRAP_RESET=true).
- services/mam-api/src/routes/audit.js
Admin-only GET /audit (paginated, filter by event_type /
actor / target / date) and GET /audit/event-types.
- services/web-ui/public/modal-account-settings.jsx
Profile + Password tabs. Triggered by sidebar user button.
Files rewritten:
- services/mam-api/src/routes/auth.js
- POST /login: regenerate(), no manual save(); audit success/
fail/lockout; updates last_login_at + failed_attempts.
- POST /logout: destroys session, audits logout.
- GET /me: returns is_client + last_login_at. Synthetic admin
when AUTH_ENABLED=false.
- GET /setup-status: drives login.html UI state.
- POST /setup: blocked once any user exists; password policy.
- POST /password: self-service. Requires current pw, runs
policy, audits, invalidates other sessions implicitly via
users.js if changed by admin.
- PATCH /me: self-service display_name update.
- services/mam-api/src/routes/users.js
- is_client field in create/update/list/get.
- Guardrails: cannot delete or demote last admin, cannot
delete self, admins cannot be flagged is_client.
- Password change invalidates all sessions for that user
(DELETE FROM sessions WHERE sess->>'userId' = id).
- Audit on every mutation.
- Password policy enforced.
- services/mam-api/src/middleware/auth.js
- requireAuth now exposes req.user.is_client.
- New requireRole(["admin","editor"], { rejectClients: true })
helper. Applied to cluster, sdk, capture routes (infra).
- Synthetic user when AUTH_ENABLED=false has is_client=false.
- services/mam-api/src/index.js
- Loads bootstrap admin after migrations.
- Wires /api/v1/audit.
- Cleans up an earlier comment block.
- services/web-ui/public/login.html
- Password hint added next to setup-mode password field.
- services/web-ui/public/shell.jsx
- Sidebar user footer is a button that opens AccountSettings.
- CLIENT badge next to role when is_client=true.
- Nav filters: clients lose ingest tree + jobs + editor;
viewers lose ingest + editor; only admins see the Admin
section. Power button hidden when synthetic user.
- services/web-ui/public/screens-admin.jsx
- Users table: new Client column with inline toggle.
- InviteUserModal: Client checkbox + password hint, gated
off when role=admin.
- Last login column replaces Created in primary view.
- CSV export includes client + last_login.
- services/web-ui/public/data.jsx
- ZAMPP_DATA.ME carries is_client + display_name.
- services/web-ui/public/index.html
- Loads dist/modal-account-settings.js.
- services/web-ui/public/styles-rest.css
- .user-row grid widened to 6 columns.
- docker-compose.yml
- Plumbs SESSION_COOKIE_SECURE + ADMIN_BOOTSTRAP_* env vars.
Deploy:
cd /opt/wild-dragon
git pull origin main
# In .env:
# AUTH_ENABLED=true
# SESSION_SECRET=<openssl rand -hex 48>
# ADMIN_BOOTSTRAP_USER=admin
# ADMIN_BOOTSTRAP_PASSWORD=<strong>
docker compose build mam-api web-ui
docker compose up -d --force-recreate --no-deps mam-api web-ui
Auth work is parked until after ship. While AUTH_ENABLED=false:
- login.html now auto-redirects to / on load (no one should ever see
the login screen while auth is off; it was confusing).
- sidebar power button is hidden entirely when /auth/me returns a
synthetic user, so there's no broken-feeling no-op control.
- Removed connect-pg-simple createTableIfMissing flag in case
v9.0.1's handling of that option was responsible for the recent
boot 502 (the schema is created by migration 021 anyway).
The /auth/login + session.regenerate() + cookie fix from c34a721
stays in place — when we re-enable auth it'll work end-to-end. The
sessions table from migration 021 stays. Operator action to restore
auth later: set AUTH_ENABLED=true + SESSION_SECRET=<random> in the
mam-api environment and restart.
Login was returning 200 + correct user JSON + writing a row to the
sessions table, but emitting zero Set-Cookie headers. Root cause:
session.regenerate() → set fields → session.save() → res.json()
Calling session.save() manually writes the store but bypasses
express-session's res.end() hook, which is the only path that adds
the Set-Cookie header to the response. The cookie was never sent to
the browser even though the session existed server-side — hence the
redirect loop.
Fix: remove the manual save(). Set the session fields and call
res.json() directly inside regenerate()'s callback; express-session
handles store write + Set-Cookie automatically on res.end().
The redirect loop after successful login was almost certainly the
`sessions` table never being created. `schema.sql` defines it but
only runs on first-init via the postgres entrypoint; instances
bootstrapped via mam-api's own migration loop never got the table.
express-session's `req.session.save()` then failed silently and the
cookie pointed at a sid that wasn't in the store — every subsequent
request looked like a brand-new visitor.
- New migration 021-ensure-sessions-table.sql (idempotent).
- connect-pg-simple now configured with `createTableIfMissing: true`
as belt-and-braces.
- `POST /auth/login` now explicitly waits for session.save() and
surfaces both regenerate() and save() errors instead of treating
them as 'success'. Logs sid + req.secure + req.protocol so we can
confirm trust-proxy is doing the right thing behind NPM.
Three concrete issues kept the login flow broken on dragonflight.live:
1. mam-api trusted no proxy headers, so behind nginx/Cloudflare the
session cookie's `secure` flag and the rate-limiter's IP keying
both saw the wrong values. Now sets `app.set('trust proxy', 1)`.
2. Session config was tied to NODE_ENV and lacked sameSite/name. Now:
- SESSION_COOKIE_SECURE env (default: true when AUTH_ENABLED) so a
site behind HTTPS gets Secure cookies regardless of NODE_ENV.
- `sameSite: 'lax'` for predictable post-login redirects.
- Renamed to `df.sid` so it's obvious in DevTools.
- `rolling: true` extends the 7-day TTL on active use.
- SESSION_SECRET is now required when AUTH_ENABLED=true; the
server refuses to start with a dev default in prod.
3. login.html silently showed the sign-in panel even when no users
exist or auth is off:
- New GET /auth/setup-status reports {needs_setup, user_count,
auth_enabled}.
- login.html calls it on load and auto-flips into setup mode when
needs_setup is true, or shows an explicit "auth is off" flash
when auth_enabled is false (the previous symptom: logout button
did nothing because /auth/me returned a synthetic admin no matter
what).
- Added a `.flash.info` style for the new neutral notice.
4. Sidebar logout used to call /auth/logout then `window.location
.reload()`. With auth off that reload landed back on the synthetic-
admin app and looked like nothing happened. It now redirects to
/login.html in all states so the operator sees feedback (and the
server-side messaging about auth being off) instead of a no-op.
Deploy notes for zampp1:
- Set AUTH_ENABLED=true and a random SESSION_SECRET in the
mam-api environment (e.g. /opt/wild-dragon/.env).
- Restart mam-api.
- First load of /login.html will auto-route to the setup form so
you can create the first admin.
RustFS returns empty bodies for ranged GETs whose start offset is past
~5.9 MB on single-file proxy MP4s. HEAD reports correct size, full GET
(`bytes=0-`) works, but `bytes=8179166-` comes back 206 + correct
Content-Range header with zero bytes. Confirmed via direct S3 probe
against broadcastmgmt.cloud/dragonmam (see scratch tests).
Workaround in mam-api `GET /api/v1/assets/:id/video` until the proxy
worker emits HLS (planned v1.2.1):
- HEAD the object first to learn total size (also gives ETag /
Last-Modified for conditional requests).
- No-Range / unparseable-Range / pre-EOF requests \u2192 plain pipe.
- Parsed `bytes=N-M` requests below RUSTFS_RANGE_SAFE_START
(default 5_500_000) \u2192 direct ranged GET, RustFS handles fine.
- Anything reaching into the broken zone \u2192 stream from offset 0,
drop bytes below start, stop at end. Memory stays flat; extra
bandwidth = (end+1 - requested-size) per seek.
- Genuinely out-of-range \u2192 416 with Cache-Control: no-store so the
browser doesn't poison its cache.
Also stashes (not yet wired up) the HLS pieces we'll need for the
follow-up: `segmentToHls` ffmpeg helper + `uploadDirectoryToS3`
worker s3 helper. Harmless additions; not referenced by any code path
yet.
Confirmed against the affected asset (a72aaa03-...): bytes=0-100k +
50% +100k native pass-through; 70% +100k and near-EOF previously hung
the browser, now stream correctly via the stitched path.
Refs #143.
Frontend / UX / a11y
- Sidebar collapse/expand toggle with localStorage persistence (#142)
- Settings sections wrap inputs in <form> with Enter-to-submit + native
validation; password autocomplete=new-password (#141, #138)
- Asset thumbnails get descriptive alt text (#140)
- Production deploy now precompiles JSX via esbuild and loads the
production React UMD instead of dev builds + in-browser Babel (#139,
#122)
- Search wrapper gets role=search; global search input gets aria-label,
role=combobox, aria-controls/aria-expanded/aria-activedescendant
wiring (#137, #135)
- Dashboard and Library no longer share the same nav icon (#136)
- Sidebar collapses off-canvas with a topbar menu button below 768 px;
mobile default is collapsed (#134)
- --text-3 bumped to #8B92A0 for WCAG AA contrast on --bg-0 (#133)
- Schedule and Library routes were rendering empty inside the .main
flex container — switched to flex:1 + min-height:0 (#131, #132,
editor + asset detail get the same fix)
- Jobs nav badge now polls /jobs?status=active every 10 s and reflects
the live count (#130, #113)
- aria-label sweep on every icon-only button (#126)
- Premiere panel release list moved to window.PREMIERE_RELEASES in
data.jsx; Editor + Settings read from the same source (#125)
- Typo setPgMclips → setPgmClips (#124)
- Stray console.error / console.warn calls gated behind
window.DF_LOG.{warn,error} (#123)
- Hardcoded /api/v1 paths route through window.ZAMPP_API_PREFIX (#115)
- Schedule rows no longer crash on null recorder_id (#117)
- EditorKeyboard guards against document.activeElement === null (#116)
- Unmount-safe timers for PasswordResetModal, Containers, Editor (#111)
- Player seek clamps below totalMs, server-side range clamping +
uncached 416 on EOF, client-side EOF-stall watchdog (#143)
- Duration badge overlap fix on narrow asset cards (#52)
Backend / security / reliability
- GET /recorders fixed N+1: single LATERAL JOIN for live_asset_id;
Docker inspects bounded to actually-recording rows (#121)
- Upload disk-storage (multer.diskStorage) streams parts to S3 instead
of buffering 500 MB in RAM (#120)
- /assets list clamps limit to MAX_LIMIT=500 to prevent OOM (#119)
- SDK upload archive listing + post-extract sanitize block zip-slip /
tar-slip and symlink escapes (#118)
- Migrations track applied state in schema_migrations, run in a
transaction, and exit non-zero on failure (#107)
- node-agent BMD_COUNT override uses BMD_DEVICE_PREFIX; filesystem
detection wins (#109, #127)
- GPU_COUNT override now merges with nvidia-smi enrichment (#108)
- /cluster/heartbeat requires a node-bound token or admin user;
tokens carry bound_hostname (#106)
- /recorders/:id/start error responses no longer echo the Docker
create payload — env vars stay out of client responses (#105)
- /recorders/probe restricts schemes (srt/rtmp/rtsp/udp/rtp), blocks
private + loopback hosts for non-admins, denies common service
ports (#104)
- Scheduler tick guarded by a Postgres advisory lock; pending/running
rows claimed via UPDATE...RETURNING + FOR UPDATE SKIP LOCKED to
survive multi-node deploys (#103)
- UUID validateUuid('id') param middleware on every /:id route (#102)
- Error handler scrubs Postgres error messages and 5xx detail (#101)
- Graceful SIGTERM/SIGINT shutdown — stops scheduler, drains the HTTP
server, ends the pool, 25 s force-exit watchdog (#100)
- AMPP sync moved from fire-and-forget to a persisted retry queue
(ampp_sync_status / attempts / next_attempt_at + scheduler retry
loop with exponential backoff) (#77)
Migrations
- 019: api_tokens.bound_hostname (#106)
- 020: assets.ampp_sync_status + retry bookkeeping (#77)
Other
- Defer #92 Growing-files per-upload toggle, #80 Audio tab, #57
Dashboard redesign, #56 Editor SPA polish phase 3, #114 S3
migration tool to v1.3
The BMD card SVG renderer (window.BMDCards) was created in an earlier
session but never wired into index.html, so the new video-presence
indicator from a44d8bd was silently bailing at the !window.BMDCards
guard. Loading it alongside the other helpers in /js/.
Adds per-port video signal state to the admin Cluster panel:
- New GET /cluster/devices/blackmagic/signal endpoint joins recorders by
node_id+device_index and queries each active capture container's
/capture/status (local: http://recorder-<id>:3001, remote: api_url/
sidecar/<container_id>/status). Returns receiving/connecting/lost/
error/idle/no-recorder per port plus framesReceived and currentFps.
- bmd-card.js render() now accepts portSignals (Map or object) and
overlays a colored dot on each BNC connector with pulse animation
for receiving/connecting states.
- screens-admin.jsx Cluster panel polls the new endpoint every 5s,
feeds the signal map into both the port chips (now show
RECEIVING/CONNECTING/LOST + fps) and the BMD SVG card diagram
rendered below them via a new BmdCardPanel component.
- styles-fixes.css adds bmd-card-* styles for the SVG diagram and
bmd-port-signal --pulse animation.
mam-api /video endpoint:
- S3 InvalidRange (httpStatusCode 416) was being caught and returned as 500
via next(err), which the video element treats as a fatal load error and
freezes the player. Now we catch the specific 416 case, do a no-range
HEAD-equivalent to learn the real file size, and return proper 416 with
Content-Range: bytes */<total> so the browser can recover.
screens-asset.jsx — player health + buffer visualization:
- New states: playerState ('idle'|'loading'|'playing'|'paused'|'seeking'|
'waiting'|'stalled'|'error'), playerError, buffered (array of {start,end}
ms from HTMLMediaElement.buffered), stallStart, stallElapsedMs
- Wired video element events: onProgress, onWaiting, onStalled, onPlaying,
onCanPlay, onCanPlayThrough, onSeeking, onSeeked, onError
- onError captures MediaError code+message into a console.error and the
on-screen badge so freeze causes are now visible
- Status badge overlay (top-right of player): shows SEEKING / BUFFERING /
STALLED / ERROR + elapsed seconds since the stall began
- PlaybackBar renders buffered ranges as translucent grey segments so you
can see what the browser has loaded vs. what's still pending — makes
seek-related freezes immediately obvious
Replaced sync execFileSync('docker') approach (no docker CLI in container)
with async Docker socket HTTP API calls:
- POST /containers/create with nvidia runtime + DeviceRequests
- POST /containers/:id/start
- Poll inspect until not running
- GET /containers/:id/logs, strip 8-byte frame headers, parse csv
probeGpusViaSmi() runs once at startup before the first heartbeat.
Result cached in _gpuCache; detectHardware() reads cache on every heartbeat.
Falls back to /dev/nvidia* scan if probe fails or runtime unavailable.
nsenter approach failed (requires SYS_ADMIN in container).
nvidia-smi bind-mount failed (Alpine vs Ubuntu glibc incompatibility).
Working solution: spawn 'docker run --rm --gpus all ubuntu:22.04 nvidia-smi'
via the Docker socket. The NVIDIA Container Runtime injects nvidia-smi and
driver libs into any container with --gpus all, regardless of the base image.
ubuntu:22.04 is already cached on GPU nodes.
Result: GPU reported with name, memory_mb, driver_version — shows as BOUND
in the cluster UI.