Cluster: AddNodeModal on Admin->Cluster mints a node token via /auth/tokens
and emits a ready-to-paste curl|bash onboarding command. New admin-only
GET /cluster/onboard-info returns apiUrl/scriptUrl/branch. Role->PROFILES
mapping (worker/capture/gpu); gate worker-l4 behind compose profile [gpu].
Home: restore "Let's Create" kicker + one-line "Media Asset Management &
Production Platform" tagline; animated accent pulse behind the dragon logo
(reduced-motion safe); move Settings tile to a centered bottom row.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Root cause of the persistent black preview, fully isolated: ZAMPP1's nginx
serves the live .m3u8 fresh on every request (no-store works there), but
the PUBLIC reverse proxy (159.112.211.103 -> ZAMPP1) caches the static
.m3u8 by path with a multi-second TTL, ignoring both the origin's no-store
and query params. hls.js reloads the playlist ~every second, always landing
inside that TTL, so it sees the live playlist as never advancing
("live playlist MISSED" forever), never establishes the timeline, and never
loads a fragment -> readyState 0 (black). Proven: rapid reads via ZAMPP1
localhost advance (404->405); the same rapid reads via the public URL are
stuck; query-busting doesn't help (proxy caches by path).
Fix: serve the playlist through GET /api/v1/playout/channels/:id/hls/index.m3u8
instead of the static /media/live path. /api/ is not proxy-cached (the live
status poll already updates fine through it), so hls.js always gets the fresh
live edge. Segment (.ts) lines are rewritten to absolute /media/live/<id>/
URLs so they still load from the static path (immutable; caching them is
correct). ProgramMonitor points hls.js at the /api playlist and sends the
session cookie (xhrSetup withCredentials) since /api is auth-gated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Growing-files mode became per-recorder (recorders.growing_enabled); the
global growing_enabled setting was removed. GET /assets/:id/live-path —
which the Premiere plugin calls to mount a still-growing master — still
required growing_enabled==='true', so Mount Live would 409 "Growing-files
mode is disabled" on any deploy where that stale key isn't set. Drop the
global gate: a status='live' asset already proves a growing recorder is
producing the file; only the editor-facing growing_smb_url is required.
Response contract is unchanged, so the plugin needs no update.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements docs/superpowers/specs/2026-05-31-storage-settings-growing-smb-design.md.
1. Storage warning banner at the top of Settings → Storage (set-once /
path-change-corrupts-data warning).
2. Growing-files SMB credentials + system CIFS mount (Approach A):
- settings.js: new global keys growing_smb_mount / growing_smb_username /
growing_smb_vers; growing_smb_password is write-only (GET returns only
growing_smb_password_exists; growing_smb_password_clear:true removes it).
- GrowingSettingsCard: SMB mount/username/password (masked, "saved" state) +
CIFS version fields.
- capture Dockerfile: add cifs-utils + util-linux.
- capture-manager: on growing start, mount //host/share at /growing using a
root-only credentials file (creds never on the command line); unmount on
stop; mount failure falls back to S3 streaming so a recording is never lost.
- recorders.js: pass GROWING_SMB_* env; don't host-bind /growing when a CIFS
mount is configured (an empty mountpoint is required).
3. Per-recorder growing mode (global toggle removed):
- Removed the global "capture writes to local SMB share first" checkbox; the
growing card is now SMB-infrastructure-only.
- recorders.js reads the per-recorder recorders.growing_enabled column
(already present from migration 014) instead of the global setting;
RECORDER_FIELDS += growing_enabled.
- New-recorder modal: "Growing-files mode" toggle.
- storage.js overview: "enabled" now means the SMB landing zone is configured
(mount source set), surfaced as smb_mount; health strip labels updated.
No DB migration required (recorders.growing_enabled exists; new settings are
key/value rows).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- spawnChannelSidecar: set last_heartbeat_at = NOW() when flipping
channel to 'running'. Without this, last_heartbeat_at is NULL so
the first scheduler tick sees ageMs = (now - epoch) >> TIMEOUT_MS
and triggers failover before the sidecar has had a single chance
to respond.
- scheduler playoutHealthTick: when last_heartbeat_at is NULL fall
back to updated_at as the baseline (belt-and-suspenders with the
spawnChannelSidecar fix). Also include updated_at in the query.
- POST /channels/:id/play: catch callSidecar errors explicitly and
return 502 Bad Gateway instead of delegating to next(err) which
the error middleware maps to 409 Conflict.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- jobs.js: add playout-stage BullMQ queue to QUEUES; asset_id from
job data is already resolved to a name by attachAssetNames
- screens-jobs.jsx: map type 'playout-stage' -> kind 'Stage' with
monitor icon
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend (routes/users.js):
- GET / now returns totp_enabled so the UI can show 2FA status
- GET /:id/access — admin-only effective per-project access (MAX over direct +
group grants), labels via=direct|group:<name>; admins report all/edit
- POST /:id/totp/disable — admin clears a locked-out user's 2FA without their
password (self-service disable still requires it); dev user blocked
- role validated against {admin,editor,viewer} on create + PATCH (was unchecked)
Frontend:
- Users>Policies tab: static prose replaced with interactive per-user matrix —
inline role select, 2FA badge, Reset-2FA action, lazy per-user access expander
- Home "Premiere panel" tile -> "Downloads"; modal renamed, adds Teams ISO row
(disabled "coming soon" until the .exe is supplied); UXP .ccx link unchanged
- data.jsx: window.TEAMS_ISO placeholder ({available:false})
Not runtime-tested in browser yet. Teams ISO .exe still pending from user.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Post-review fixes for the 8-commit playout-mcr drop:
- Scheduler self-calls (callSelf -> /recorders, /playout) carried no auth, so
under AUTH_ENABLED=true requireUiHeader 403'd every mutating POST. This broke
playout failover AND scheduled recordings. Add a per-boot in-process service
token (x-internal-token) the scheduler attaches; requireAuth/requireUiHeader
treat it as the seeded admin. No env/compose config needed.
- Failover deadlocked: restartChannel set status='starting' then the scheduler
called the guarded /start route, which 409s on 'starting'. Extract the spawn
body into spawnChannelSidecar() shared by /start and restartChannel; failover
now spawns directly with no self-call.
- Phase A playlist stalled after 2 clips: _scheduleAdvance cued the next clip
via LOADBG AUTO but never advanced the pointer. Pass asset_duration_ms in the
/play payload and arm a duration-based timer that advances currentIndex and
cues subsequent clips, keeping as-run in sync for arbitrary-length playlists.
- CasparCG consumer syntax was invalid: "ADD <ch> FFMPEG" is the producer name,
not a consumer keyword, and old -vcodec/-acodec short args are rejected. Use
STREAM/FILE with -codec:v / -codec:a / -preset:v / -tune:v and a format=yuv420p
filter ahead of libx264 (channel output is RGBA).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Routes: channel + playlist CRUD, start/stop/play/pause/skip transport, as-run
log. RBAC via assertProjectAccess on channel.project_id; null project ⇒
admin-only (recorder convention).
Sidecar orchestration mirrors recorders.js: Docker socket for local node,
node-agent /sidecar/start for remote. Channel start passes CHANNEL_ID env so
the sidecar can write HLS preview to /media/live/<id>.
DeckLink port-contention guard: blocks starting a decklink channel when a
recorder or another channel on the same node+device_index is active.
restartChannel(id) helper picks another healthy cluster node and re-places
non-decklink channels; decklink is alert-only. Exposed for the scheduler.
Scheduler tick adds step 6: poll each running channel's sidecar /status,
update last_heartbeat_at, and after ~3 misses trigger restartChannel +
self-call /start. Reuses the existing PG advisory lock so multi-replica
deploys don't double-fire failovers.
Review of the v2 auth landing turned up four weak spots in the MFA path.
All four are now fixed; behaviour is unchanged for the password-correct
+ correct-TOTP happy path.
1. TOTP brute-force gate (the big one). /login was calling
ipBackoff.recordSuccess(ip) the instant the password hashed correctly,
*before* the second factor was proven. That cleared the per-IP failure
counter, so each /login retry let an attacker with a known password
hammer the 6-digit /login/totp space (10^6) at full speed.
Now recordSuccess fires only inside establishSession() — i.e. after
every required factor has actually passed (password [+TOTP] or
OAuth [+TOTP]).
2. MFA ticket binding. Tickets issued by /login (and the Google callback)
were unbound — a stolen ticket replayed from a different origin still
worked. Tickets now carry SHA-256 hashes of the issuing request's IP
and User-Agent; redeemTicket rejects on mismatch. The ticket is burned
even on mismatch so a wrong-binding probe can't be retried.
3. TOTP replay within the same 30s step (RFC 6238 §5.2). The verifier
accepted the same code as many times as you submitted it. Now
verifyToken returns the matched counter, and /login/totp does a CAS
UPDATE on users.totp_last_counter — codes at counters <= the last
accepted value are rejected. New migration 030 adds totp_last_counter,
seeded on /totp/enable so the enrollment code itself can't be reused
at first login, and zeroed on /totp/disable.
4. Google OAuth domain check no longer falls back to the email suffix
when the hd (hosted-domain) claim is missing. Email-suffix matching
let consumer (non-Workspace) Google accounts whose email happens to
end in the allowed domain through; if GOOGLE_ALLOWED_DOMAIN is set,
the operator means "only this Workspace", so accounts without a
verified hd must be rejected.
Tests: new mfa-tickets.test.js covers ip/UA binding, single-use on
mismatch, and bindings-absent back-compat. totp.test.js updated for the
new verifyToken return shape (counter on success, null on failure;
truthiness still works at call sites) and adds an explicit
matched-counter check.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Review of the v2 auth landing found four places where the per-project RBAC
helpers weren't applied to destination/source projects, letting a scoped
editor write into projects they don't have access to:
- assets PATCH /🆔 bin_id moved with no check, so an editor in project A
could stuff their asset into a bin in project B. Now validates the bin's
project_id matches the asset's own project (assets don't change project).
- assets POST /:id/copy: body's projectId/binId never checked, so any
reachable asset could be cloned into an arbitrary project. Now asserts
edit on the destination project and validates binId belongs there.
- bins POST /:id/assets: requireBinEdit checks edit on the bin's project but
not on the source asset's project, so an asset from project B could be
pulled into A's bin tree (and surfaced in A's views). Now the asset must
belong to the bin's own project.
- jobs POST /conform: project_id from body never gated, so any logged-in
user could enqueue conform jobs against any project. Now asserts edit.
- upload POST /init, POST /simple: projectId/binId from body never gated,
same class of bug. Now asserts edit on projectId and validates binId.
- upload GET /: returned every in-progress upload globally, leaking
filenames across projects. Now scoped via accessibleProjectIds.
These are the same pattern as the holes 2615143 closed on recorders/
sequences/imports/comments — these routes existed before the RBAC commit
landed and were never marked TODO(authz), so the broad sweep missed them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 1 scoped only projects/assets/bins and left recorders, sequences,
imports, comments carrying TODO(authz) markers. A scoped editor/viewer could
still read and mutate those across every project. This closes the gap using
the existing authz.js helpers — no open TODO(authz) markers remain.
- recorders: param('id') resolves project + view baseline, requireRecorderEdit
on PATCH/DELETE/start/stop, GET / filtered by accessibleProjectIds, POST /
asserts edit on the target project (null project = admin-only)
- sequences: same param pattern + requireSequenceEdit on PUT/:id,/clips,conform
and DELETE; GET//POST/ assert on the query/body project
- imports: POST /youtube asserts edit on the body projectId
- comments: router.use guard resolves project via the asset (view to read, edit
to write); also fixes the author bug (req.session.userId -> req.user.id, which
was always NULL so comments had no recorded author)
- capture: intentionally any-logged-in (shared hardware, asset scoped on
registration) — TODO replaced with a rationale note
Security fixes from review of this change:
- recorders POST /:id/start: a per-take projectId override could route a live
asset into a project the caller lacks edit on — now asserts edit on the
override target
- sequences PUT /:id/clips: spliced asset_ids weren't checked, so an editor
could pull in (and via GET /:id leak signed proxy URLs for) assets from a
project they can't access — now every clip asset must belong to the
sequence's project; pre-transaction queries moved inside try/catch so a DB
error returns 500 instead of hanging the request
- tests: recorders-access, sequences-access (incl. cross-project clip guard),
comments-access (incl. author-id regression)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Optional "Sign in with Google" with auto-provisioning, fully config-gated:
without GOOGLE_CLIENT_ID/SECRET and OAUTH_REDIRECT_URL the routes 404 and the
button is hidden, so deployments without SSO are unaffected.
- migration 028: users.google_sub (unique) + email; password_hash nullable
for OAuth-only accounts
- src/auth/google-oauth.js: lazy google-auth-library, ID-token verify,
GOOGLE_ALLOWED_DOMAIN enforcement, requires email_verified === true
- auth routes: /auth/google (state-CSRF redirect), /auth/google/callback,
/auth/google/enabled; reuses establishSession
- web-ui: "Sign in with Google" on the login screen (shown only when enabled),
friendly callback error handling
- .env.example documents all new vars
Security hardening (from review of this + the TOTP work):
- resolveGoogleUser links ONLY by google_sub, never by email — a Google login
can never seize a pre-existing local account (account-takeover fix)
- a Google-linked account with TOTP still requires the second factor (ticket
in session, /?mfa=1 step) instead of bypassing it
- /login/totp now applies the per-IP login backoff
- recovery-code consumption is atomic (WHERE used_at IS NULL + rowCount)
- concurrent first-login race on google_sub is caught and re-resolved
- tests: google-oauth config helpers + google-link takeover/dedup regression
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The HLS-VOD work made GET /assets/:id/stream return the HLS playlist URL as
`url` whenever hls_s3_key was set. The Premiere plugin's "Import Proxy"
downloads `url` to a file and imports it — so it was saving an .m3u8 playlist
as .mp4, and Premiere rejected it ("unsupported compression type"). This hit
every YouTube asset (all get HLS generated), regardless of codec.
/stream now returns the directly-downloadable MP4 proxy as `url` (type mp4)
and the HLS playlist as a separate `hls_url`. The web player prefers `hls_url`
(so in-browser HLS playback is unchanged), while the already-installed plugin
gets a real MP4 again — no plugin reinstall needed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 0.2 of the NVENC All-Intra HEVC ingest plan.
node-agent/handleSidecarStart:
- Accept useGpu: true in the sidecar start body
- When useGpu: adds Runtime=nvidia, DeviceRequests=[gpu], and injects
NVIDIA_VISIBLE_DEVICES=all + NVIDIA_DRIVER_CAPABILITIES=video,compute,utility
into the container env. CPU-codec recorders are unaffected (useGpu defaults false).
mam-api/recorders (start endpoint):
- Derive useGpu from recorder.recording_codec — true for hevc_nvenc/h264_nvenc
- Pass useGpu to remote sidecar start body
- Apply same Runtime/DeviceRequests to the local Docker spawn path
capture/capture-manager:
- Update hevc_nvenc codec entry with all-intra flags:
-g 1 -bf 0 (every frame IDR, no B-frames — required for growing-file
edit-while-record), -rc vbr, -profile:v main10, pixFmt p010le (10-bit 4:2:0)
Next: validation gate (§8) — test MXF OP1a then fragmented MOV on one
DeckLink channel, mount in Premiere while recording.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stuck-live fix: capture sidecar now finalises the pre-created live asset by id (new POST /assets/:id/finalize) instead of POSTing a new asset (409 collision); node-agent gives the sidecar a 180s stop grace so the S3 upload + callback complete; node-agent logs sidecar start/stop for diagnostics.
Live SDI monitor: HLS preview is now a 2nd output of the hires ffmpeg (single DeckLink read, split to ProRes/S3 + H.264/HLS); node-agent serves /live over HTTP; mam-api proxies GET /recorders/:id/live/* to the recorder node; web-ui HlsPreview loads from the proxied URL.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- capture/src/index.js: read MAM_API_TOKEN from env; include
Authorization: Bearer header in shutdown callback fetches to mam-api
(POST /assets and POST /assets/:id/mark-empty). Without this, mam-api
AUTH_ENABLED=true rejects the callback with 401, leaving assets stuck in live
- recorders.js: pass MAM_API_TOKEN=${CAPTURE_TOKEN} in sidecar env so the
capture container receives the token at boot
- api_tokens: inserted capture-sidecar token (unbound, prefix b3d3d3c4)
- recorders.js: when isRemote=true, replace MAM_API_URL in sidecar env with
http://<NODE_IP>:<PORT_MAM_API> so capture containers on worker host network
can reach mam-api (fixes assets stuck in live status after recorder stop)
- cluster.js: add GET /api/v1/cluster/metrics endpoint returning per-node
cpu/ram/gpu utilization; update heartbeat handler to persist metrics JSONB
- web-ui: add Resources panel to dashboard with live CPU/RAM/GPU bars per node,
polling /api/v1/cluster/metrics every 5s
## capture service
- capture-manager.js: add 'deltacast' source_type to _buildInputArgs.
Uses 'deltacast://<index>' with ffmpeg deltacast demuxer when
/dev/deltacast<N> exists; falls back to lavfi testsrc2 + sine test card
(matching deltacast-sdi-recorder standalone app) when hardware absent.
- routes/capture.js: add GET /devices/deltacast endpoint (enumerates
/dev/deltacast* + DELTACAST_PORT_COUNT env fallback). Extend /probe to
handle source_type=deltacast.
## node-agent
- detectHardware(): add 'deltacast' array to capabilities payload.
Enumerates /dev/deltacast* nodes; falls back to DELTACAST_PORT_COUNT env.
Adds DELTACAST_MODEL env support. Logs dc= count in heartbeat line.
- sidecar /start: bind /dev/deltacast* device nodes into capture containers
when sourceType='deltacast'.
## mam-api
- cluster.js: add GET /cluster/devices/deltacast and
GET /cluster/devices/deltacast/signal endpoints — same shape as
blackmagic equivalents for UI parity.
- recorders.js /start: pass DELTACAST_PORT_COUNT env to capture container;
bind /dev/deltacast* device nodes on local spawn.
- migration 024: ALTER TYPE source_type ADD VALUE 'deltacast' (idempotent).
- schema.sql: add 'deltacast' to source_type ENUM for fresh installs.
## web-ui
- modal-new-recorder.jsx: add 'Deltacast' source type card; fetch
/cluster/devices/deltacast on selection; port picker with TEST CARD
badge when hardware absent; falls through to manual index entry if
no devices detected.
- capture-manager.js, routes/capture.js: fix ffmpeg -sources decklink
parse regex from v4l2 hex-address format (never matched DeckLink output)
to correct indented-line format. Port 2+ (index 1+) was falling through
to a wrong model-name fallback, causing ffmpeg to open the wrong input
and produce black frames. Now logs the detected device list and the
selected name at start.
- recorders.js (/start): accept per-take projectId override in request
body. If provided, clips go to that project instead of the recorder's
default project_id. Used for both the live-asset INSERT and the
PROJECT_ID env var passed to the capture container.
- screens-ingest.jsx (RecorderRow): add project dropdown shown when
recorder is stopped. Defaults to the recorder's configured project;
operator can change it before hitting Record without editing the
recorder config.
The conform worker's final step INSERTs the rendered output into the
assets table:
INSERT INTO assets (project_id, filename, display_name, …)
VALUES ($1, …)
-- project_id NOT NULL
It reads projectId from job.data, but the /sequences/:id/conform
endpoint never set it. Render finished cleanly, ffmpeg ran, output
uploaded to S3, then the final asset row INSERT failed:
null value in column "project_id" of relation "assets"
Pass seq.project_id from the loaded sequence row. The rendered output
lands as an asset under the same project as its source sequence —
the natural target.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two cooperating bugs left Export Timeline stuck at "Rendering Hi-Res"
forever:
A. worker emitted "Invalid FCP XML: no sequence element" because
Timeline.generateFcpXml produced fcpxml (FCP X schema:
<fcpxml><resources>/<library>/...) while the worker's parseFcpXml
expects xmeml (FCP 7 schema: <xmeml><sequence>...). Two completely
different formats.
Rewrite generateFcpXml to emit xmeml v5 with the structure the
parser walks:
xmeml/sequence/{name,duration,rate{timebase,ntsc},
media/video/{format/samplecharacteristics,
track[@currentExplodedTrackIndex]
/clipitem/{name,duration,rate,in,out,
start,end,file/{name,pathurl}}}}
Clipitem in/out are SOURCE frames (the underlying media in/out);
start/end are TIMELINE frames (the cut position). The worker uses
the rate timebase to parse them.
B. /api/v1/jobs/:id rejected the panel's polls with
"Invalid id — must be a UUID". The handlers below correctly parse
BullMQ-prefixed ids ("conform:42"), but router.param('id',
validateUuid('id')) ran first and 400'd everything that wasn't a
UUID. The panel's pollConform swallows the resulting fetch error
silently and polls forever.
Drop the validator. Comment in the file explains why.
Bumps panel to v2.2.2.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- requireAuth bearer path now selects api_tokens.bound_hostname and users.role,
populates req.tokenBoundHostname and req.user.role. /cluster/heartbeat can
now authenticate via a bound api_token (issued via POST /auth/tokens with
bound_hostname).
- routes/tokens.js POST accepts bound_hostname; GET returns it so users can
see which tokens are bound.
- Remove /cluster/heartbeat from SERVICE_PATHS so requireAuth runs on it (the
bearer auth handles the gate; the heartbeat handler still enforces the
body.hostname === bound match).
- /auth/me now returns role (final-review I2). Closes the gap where every
signed-in user appeared as 'viewer' in the UI regardless of actual role.
- loadUser SELECTs role for session auth.
- Backend tests still 37/15/0/22 — no test changes needed; existing token
CRUD tests stay passing since bound_hostname is optional.
Final-review findings:
- Mount usersRouter at /api/v1/users in addition to /api/v1/auth/users so the
existing SPA Users page works; add PATCH /:id for inline edits (display_name,
role, password).
- Add X-Requested-With: dragonflight-ui to raw XHR/fetch paths that bypass
apiFetch (file uploads, SDK uploads, EDL export) — without it, requireUiHeader
403s before reaching the route.
- Exempt SERVICE_PATHS (/cluster/heartbeat) from requireUiHeader so node-agent
heartbeats keep working when NODE_TOKEN is unset.
- Remove stale auth.js.bak.
Code-review feedback:
- Dummy hash for user-enumeration-defense timing was 63 chars (bcrypt strings
are 60 chars). Worked by accident because bcrypt 5.x is lenient about
trailing chars; a future tightening would silently regress the timing
defense. Replaced with a real pre-computed bcrypt hash.
- last_login_at UPDATE now logs errors instead of silently swallowing them,
matching the pattern in requireAuth for api_tokens.last_used_at.
- Removed dead import of comparePassword from auth.test.js.
- remove requireAuth from all route files
- delete auth.js, tokens.js, users.js routes
- delete auth middleware
- remove session middleware and all auth deps from index.js
- delete login.html and auth-guard.js from web-ui
Scope (locked in via planning Q&A):
- Identity: local accounts only (PG users table) + existing bearer
tokens for headless callers.
- Transport: httpOnly cookie session for browser, Bearer for API.
- RBAC: admin / editor / viewer roles, plus an orthogonal
is_client flag for external (agency, talent, customer) accounts.
- Bootstrap: ADMIN_BOOTSTRAP_USER + ADMIN_BOOTSTRAP_PASSWORD env
seed the first admin on a clean install. Set ADMIN_BOOTSTRAP_RESET
to force-reset the named user (break-glass).
- Rate limit: in-memory, 10 fails per 15min per (IP, username).
- Password policy: \u22658 chars, mixed case, digit, symbol; small
blocklist of common passwords; cannot equal username.
- Self-service: change own display name + password. Everything
else (role, is_client, other-user mgmt) is admin only.
- Audit log: append-only table, indexed by actor + event_type +
created_at, populated by every auth/admin event.
Files added:
- services/mam-api/src/db/migrations/022-auth-rework.sql
users.is_client + last_login_at + failed_attempts; audit_log
table with FK to users (ON DELETE SET NULL).
- services/mam-api/src/middleware/audit.js
Fire-and-forget audit() helper. Caller never awaits, failure
logs but never throws — auditing cannot break the request
that triggered it.
- services/mam-api/src/middleware/passwordPolicy.js
Shared checkPassword(pw, { username }) used by setup, user
create/update, and self-service password change.
- services/mam-api/src/tasks/bootstrapAdmin.js
Runs after migrations. No-ops unless ADMIN_BOOTSTRAP_USER +
ADMIN_BOOTSTRAP_PASSWORD are set AND (users table empty OR
ADMIN_BOOTSTRAP_RESET=true).
- services/mam-api/src/routes/audit.js
Admin-only GET /audit (paginated, filter by event_type /
actor / target / date) and GET /audit/event-types.
- services/web-ui/public/modal-account-settings.jsx
Profile + Password tabs. Triggered by sidebar user button.
Files rewritten:
- services/mam-api/src/routes/auth.js
- POST /login: regenerate(), no manual save(); audit success/
fail/lockout; updates last_login_at + failed_attempts.
- POST /logout: destroys session, audits logout.
- GET /me: returns is_client + last_login_at. Synthetic admin
when AUTH_ENABLED=false.
- GET /setup-status: drives login.html UI state.
- POST /setup: blocked once any user exists; password policy.
- POST /password: self-service. Requires current pw, runs
policy, audits, invalidates other sessions implicitly via
users.js if changed by admin.
- PATCH /me: self-service display_name update.
- services/mam-api/src/routes/users.js
- is_client field in create/update/list/get.
- Guardrails: cannot delete or demote last admin, cannot
delete self, admins cannot be flagged is_client.
- Password change invalidates all sessions for that user
(DELETE FROM sessions WHERE sess->>'userId' = id).
- Audit on every mutation.
- Password policy enforced.
- services/mam-api/src/middleware/auth.js
- requireAuth now exposes req.user.is_client.
- New requireRole(["admin","editor"], { rejectClients: true })
helper. Applied to cluster, sdk, capture routes (infra).
- Synthetic user when AUTH_ENABLED=false has is_client=false.
- services/mam-api/src/index.js
- Loads bootstrap admin after migrations.
- Wires /api/v1/audit.
- Cleans up an earlier comment block.
- services/web-ui/public/login.html
- Password hint added next to setup-mode password field.
- services/web-ui/public/shell.jsx
- Sidebar user footer is a button that opens AccountSettings.
- CLIENT badge next to role when is_client=true.
- Nav filters: clients lose ingest tree + jobs + editor;
viewers lose ingest + editor; only admins see the Admin
section. Power button hidden when synthetic user.
- services/web-ui/public/screens-admin.jsx
- Users table: new Client column with inline toggle.
- InviteUserModal: Client checkbox + password hint, gated
off when role=admin.
- Last login column replaces Created in primary view.
- CSV export includes client + last_login.
- services/web-ui/public/data.jsx
- ZAMPP_DATA.ME carries is_client + display_name.
- services/web-ui/public/index.html
- Loads dist/modal-account-settings.js.
- services/web-ui/public/styles-rest.css
- .user-row grid widened to 6 columns.
- docker-compose.yml
- Plumbs SESSION_COOKIE_SECURE + ADMIN_BOOTSTRAP_* env vars.
Deploy:
cd /opt/wild-dragon
git pull origin main
# In .env:
# AUTH_ENABLED=true
# SESSION_SECRET=<openssl rand -hex 48>
# ADMIN_BOOTSTRAP_USER=admin
# ADMIN_BOOTSTRAP_PASSWORD=<strong>
docker compose build mam-api web-ui
docker compose up -d --force-recreate --no-deps mam-api web-ui
Login was returning 200 + correct user JSON + writing a row to the
sessions table, but emitting zero Set-Cookie headers. Root cause:
session.regenerate() → set fields → session.save() → res.json()
Calling session.save() manually writes the store but bypasses
express-session's res.end() hook, which is the only path that adds
the Set-Cookie header to the response. The cookie was never sent to
the browser even though the session existed server-side — hence the
redirect loop.
Fix: remove the manual save(). Set the session fields and call
res.json() directly inside regenerate()'s callback; express-session
handles store write + Set-Cookie automatically on res.end().
The redirect loop after successful login was almost certainly the
`sessions` table never being created. `schema.sql` defines it but
only runs on first-init via the postgres entrypoint; instances
bootstrapped via mam-api's own migration loop never got the table.
express-session's `req.session.save()` then failed silently and the
cookie pointed at a sid that wasn't in the store — every subsequent
request looked like a brand-new visitor.
- New migration 021-ensure-sessions-table.sql (idempotent).
- connect-pg-simple now configured with `createTableIfMissing: true`
as belt-and-braces.
- `POST /auth/login` now explicitly waits for session.save() and
surfaces both regenerate() and save() errors instead of treating
them as 'success'. Logs sid + req.secure + req.protocol so we can
confirm trust-proxy is doing the right thing behind NPM.
Three concrete issues kept the login flow broken on dragonflight.live:
1. mam-api trusted no proxy headers, so behind nginx/Cloudflare the
session cookie's `secure` flag and the rate-limiter's IP keying
both saw the wrong values. Now sets `app.set('trust proxy', 1)`.
2. Session config was tied to NODE_ENV and lacked sameSite/name. Now:
- SESSION_COOKIE_SECURE env (default: true when AUTH_ENABLED) so a
site behind HTTPS gets Secure cookies regardless of NODE_ENV.
- `sameSite: 'lax'` for predictable post-login redirects.
- Renamed to `df.sid` so it's obvious in DevTools.
- `rolling: true` extends the 7-day TTL on active use.
- SESSION_SECRET is now required when AUTH_ENABLED=true; the
server refuses to start with a dev default in prod.
3. login.html silently showed the sign-in panel even when no users
exist or auth is off:
- New GET /auth/setup-status reports {needs_setup, user_count,
auth_enabled}.
- login.html calls it on load and auto-flips into setup mode when
needs_setup is true, or shows an explicit "auth is off" flash
when auth_enabled is false (the previous symptom: logout button
did nothing because /auth/me returned a synthetic admin no matter
what).
- Added a `.flash.info` style for the new neutral notice.
4. Sidebar logout used to call /auth/logout then `window.location
.reload()`. With auth off that reload landed back on the synthetic-
admin app and looked like nothing happened. It now redirects to
/login.html in all states so the operator sees feedback (and the
server-side messaging about auth being off) instead of a no-op.
Deploy notes for zampp1:
- Set AUTH_ENABLED=true and a random SESSION_SECRET in the
mam-api environment (e.g. /opt/wild-dragon/.env).
- Restart mam-api.
- First load of /login.html will auto-route to the setup form so
you can create the first admin.
RustFS returns empty bodies for ranged GETs whose start offset is past
~5.9 MB on single-file proxy MP4s. HEAD reports correct size, full GET
(`bytes=0-`) works, but `bytes=8179166-` comes back 206 + correct
Content-Range header with zero bytes. Confirmed via direct S3 probe
against broadcastmgmt.cloud/dragonmam (see scratch tests).
Workaround in mam-api `GET /api/v1/assets/:id/video` until the proxy
worker emits HLS (planned v1.2.1):
- HEAD the object first to learn total size (also gives ETag /
Last-Modified for conditional requests).
- No-Range / unparseable-Range / pre-EOF requests \u2192 plain pipe.
- Parsed `bytes=N-M` requests below RUSTFS_RANGE_SAFE_START
(default 5_500_000) \u2192 direct ranged GET, RustFS handles fine.
- Anything reaching into the broken zone \u2192 stream from offset 0,
drop bytes below start, stop at end. Memory stays flat; extra
bandwidth = (end+1 - requested-size) per seek.
- Genuinely out-of-range \u2192 416 with Cache-Control: no-store so the
browser doesn't poison its cache.
Also stashes (not yet wired up) the HLS pieces we'll need for the
follow-up: `segmentToHls` ffmpeg helper + `uploadDirectoryToS3`
worker s3 helper. Harmless additions; not referenced by any code path
yet.
Confirmed against the affected asset (a72aaa03-...): bytes=0-100k +
50% +100k native pass-through; 70% +100k and near-EOF previously hung
the browser, now stream correctly via the stitched path.
Refs #143.
Frontend / UX / a11y
- Sidebar collapse/expand toggle with localStorage persistence (#142)
- Settings sections wrap inputs in <form> with Enter-to-submit + native
validation; password autocomplete=new-password (#141, #138)
- Asset thumbnails get descriptive alt text (#140)
- Production deploy now precompiles JSX via esbuild and loads the
production React UMD instead of dev builds + in-browser Babel (#139,
#122)
- Search wrapper gets role=search; global search input gets aria-label,
role=combobox, aria-controls/aria-expanded/aria-activedescendant
wiring (#137, #135)
- Dashboard and Library no longer share the same nav icon (#136)
- Sidebar collapses off-canvas with a topbar menu button below 768 px;
mobile default is collapsed (#134)
- --text-3 bumped to #8B92A0 for WCAG AA contrast on --bg-0 (#133)
- Schedule and Library routes were rendering empty inside the .main
flex container — switched to flex:1 + min-height:0 (#131, #132,
editor + asset detail get the same fix)
- Jobs nav badge now polls /jobs?status=active every 10 s and reflects
the live count (#130, #113)
- aria-label sweep on every icon-only button (#126)
- Premiere panel release list moved to window.PREMIERE_RELEASES in
data.jsx; Editor + Settings read from the same source (#125)
- Typo setPgMclips → setPgmClips (#124)
- Stray console.error / console.warn calls gated behind
window.DF_LOG.{warn,error} (#123)
- Hardcoded /api/v1 paths route through window.ZAMPP_API_PREFIX (#115)
- Schedule rows no longer crash on null recorder_id (#117)
- EditorKeyboard guards against document.activeElement === null (#116)
- Unmount-safe timers for PasswordResetModal, Containers, Editor (#111)
- Player seek clamps below totalMs, server-side range clamping +
uncached 416 on EOF, client-side EOF-stall watchdog (#143)
- Duration badge overlap fix on narrow asset cards (#52)
Backend / security / reliability
- GET /recorders fixed N+1: single LATERAL JOIN for live_asset_id;
Docker inspects bounded to actually-recording rows (#121)
- Upload disk-storage (multer.diskStorage) streams parts to S3 instead
of buffering 500 MB in RAM (#120)
- /assets list clamps limit to MAX_LIMIT=500 to prevent OOM (#119)
- SDK upload archive listing + post-extract sanitize block zip-slip /
tar-slip and symlink escapes (#118)
- Migrations track applied state in schema_migrations, run in a
transaction, and exit non-zero on failure (#107)
- node-agent BMD_COUNT override uses BMD_DEVICE_PREFIX; filesystem
detection wins (#109, #127)
- GPU_COUNT override now merges with nvidia-smi enrichment (#108)
- /cluster/heartbeat requires a node-bound token or admin user;
tokens carry bound_hostname (#106)
- /recorders/:id/start error responses no longer echo the Docker
create payload — env vars stay out of client responses (#105)
- /recorders/probe restricts schemes (srt/rtmp/rtsp/udp/rtp), blocks
private + loopback hosts for non-admins, denies common service
ports (#104)
- Scheduler tick guarded by a Postgres advisory lock; pending/running
rows claimed via UPDATE...RETURNING + FOR UPDATE SKIP LOCKED to
survive multi-node deploys (#103)
- UUID validateUuid('id') param middleware on every /:id route (#102)
- Error handler scrubs Postgres error messages and 5xx detail (#101)
- Graceful SIGTERM/SIGINT shutdown — stops scheduler, drains the HTTP
server, ends the pool, 25 s force-exit watchdog (#100)
- AMPP sync moved from fire-and-forget to a persisted retry queue
(ampp_sync_status / attempts / next_attempt_at + scheduler retry
loop with exponential backoff) (#77)
Migrations
- 019: api_tokens.bound_hostname (#106)
- 020: assets.ampp_sync_status + retry bookkeeping (#77)
Other
- Defer #92 Growing-files per-upload toggle, #80 Audio tab, #57
Dashboard redesign, #56 Editor SPA polish phase 3, #114 S3
migration tool to v1.3