This is the real cause of the login loop. mam-api sets its session cookie
with Secure=true (production config). express-session refuses to emit a
Secure Set-Cookie unless req.secure is true. With `app.set('trust proxy')`
on, req.secure derives from X-Forwarded-Proto.
web-ui's nginx was unconditionally sending `X-Forwarded-Proto: $scheme`.
Inside the web-ui container nginx listens on port 80, so $scheme is always
"http" — regardless of whether the outer NPM proxy terminated TLS. mam-api
saw http, decided the connection was insecure, and silently dropped the
Set-Cookie from the login response. Login succeeded server-side (session
row landed in PG, last_login_at updated) but the browser never received a
cookie, so the very next /auth/me check came back 401 and AuthGate bounced
to the login screen. Infinite loop.
The previous Connection: "upgrade" → $connection_upgrade fix wasn't wrong
(the hardcode is a real latent bug worth fixing) — it just wasn't the
proximate cause.
Fix: a second `map` directive forwards the outer X-Forwarded-Proto through
when present, falling back to $scheme only when no proxy header exists (so
direct localhost curls still work). Both /api/ and /capture/ now send the
correct value upstream, mam-api sees https, req.secure is true, Set-Cookie
flows through, login works.
Verified by curling the existing direct-to-mam-api path: with X-Forwarded-
Proto: https on the request, Set-Cookie comes back; without it, no
Set-Cookie. That's the exact difference between web-ui-proxied and
direct-to-mam-api in our previous diagnostic curls.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Login was infinite-looping in production. Server side was healthy (sessions
landing in PG, /me returning 200 to a manually-signed cookie) but the
browser never received `Set-Cookie`. Bisected the proxy chain layer by
layer with direct curls on the box:
- mam-api direct (port 47432) → Set-Cookie present
- web-ui nginx (port 47434) → Set-Cookie STRIPPED
- NPM (https://dragonflight.live) → Set-Cookie stripped (because web-ui ate it)
Root cause was this in /api/ and /capture/:
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
The literal "upgrade" was being sent on every request, not just real
WebSocket negotiations. Nginx then routes the upstream response through
its tunnel/upgrade code path, which doesn't preserve all response headers
the same way — Set-Cookie got silently dropped. mam-api doesn't speak
WebSockets today so it never sent a 101, and the bad pattern went
unnoticed until session-cookie auth shipped.
Fix is the standard conditional pattern: a `map` directive at the top of
default.conf computes $connection_upgrade as "upgrade" only when the
client actually requested Upgrade, otherwise "close". Both location blocks
now send `Connection $connection_upgrade` instead of the hardcoded literal.
WebSocket support on either location continues to work unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
User reported infinite login loop on dragonflight.live. Root cause: openresty
fronts both http:// and https:// without redirecting, and a user landing on
http:// gets the Set-Cookie response silently dropped — cookies are Secure-only
when TRUST_PROXY=true, and the CORS allowlist refuses the http:// origin.
Result: login appears to succeed, next request has no session cookie, AuthGate
bounces back to login.
Two defensive layers (the openresty box is not in our reach):
- web-ui index.html: tiny inline redirect; if location is http://dragonflight.live,
rewrite to https:// before anything else runs. Bounded to that exact hostname
so local / LAN access on http://172.18.91.x stays as-is.
- mam-api: emit Strict-Transport-Security on HTTPS responses when AUTH_ENABLED=true.
After one successful HTTPS visit, browsers auto-upgrade future http:// requests
on their own — closes the loophole even if someone bypasses the index.html JS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- requireAuth bearer path now selects api_tokens.bound_hostname and users.role,
populates req.tokenBoundHostname and req.user.role. /cluster/heartbeat can
now authenticate via a bound api_token (issued via POST /auth/tokens with
bound_hostname).
- routes/tokens.js POST accepts bound_hostname; GET returns it so users can
see which tokens are bound.
- Remove /cluster/heartbeat from SERVICE_PATHS so requireAuth runs on it (the
bearer auth handles the gate; the heartbeat handler still enforces the
body.hostname === bound match).
- /auth/me now returns role (final-review I2). Closes the gap where every
signed-in user appeared as 'viewer' in the UI regardless of actual role.
- loadUser SELECTs role for session auth.
- Backend tests still 37/15/0/22 — no test changes needed; existing token
CRUD tests stay passing since bound_hostname is optional.
- On connect success: hide form, show compact connected-bar with hostname
- On disconnect: clear assets, reset buttons, restore form
- Wire disconnect-btn click to disconnectFromServer()
On startup the full form shows. On successful connect the form hides and a
compact connected-bar appears with the server hostname and a Disconnect button.
Task 18 documented the two new env vars in .env.example and README but never
added them to docker-compose.yml's mam-api environment block. Without that,
the vars in .env never reach the container — so AUTH_ENABLED=true was running
with effective TRUST_PROXY=false (req.ip = proxy IP, rate-limit collapses to
per-proxy bucket) and ALLOWED_ORIGINS unset (CORS allows any origin).
Migration 023 was fixed in 9dc572b to use '00000000-0000-4000-8000-000000000000'
because 'v' isn't a valid hex digit, but the DEV_USER_ID constant in
middleware/auth.js still referenced the original '...000000000dev'. Every
route that passes DEV_USER_ID as a query parameter (users list, login lookup,
setup-required count) was throwing 22P02 invalid input syntax for type uuid.
The errors were swallowed by Promise.allSettled in the SPA's data load so the
app appeared to work in dev mode, but enabling AUTH_ENABLED=true would have
broken login entirely.
Final-review findings:
- Mount usersRouter at /api/v1/users in addition to /api/v1/auth/users so the
existing SPA Users page works; add PATCH /:id for inline edits (display_name,
role, password).
- Add X-Requested-With: dragonflight-ui to raw XHR/fetch paths that bypass
apiFetch (file uploads, SDK uploads, EDL export) — without it, requireUiHeader
403s before reaching the route.
- Exempt SERVICE_PATHS (/cluster/heartbeat) from requireUiHeader so node-agent
heartbeats keep working when NODE_TOKEN is unset.
- Remove stale auth.js.bak.
Fixes three issues in the authentication system:
C1: Add boot-time warning when AUTH_ENABLED=true but TRUST_PROXY!=true.
Without TRUST_PROXY=true behind nginx, req.ip becomes the proxy IP for all
clients, collapsing per-IP rate limiting into a shared pool. Operators must
explicitly set TRUST_PROXY=true to make per-IP rate limiting effective.
C2: Mount requireUiHeader middleware in test helpers (auth.test.js,
users.test.js, tokens.test.js). The CSRF header validation was not being
exercised in the test suite. Tests now send X-Requested-With: dragonflight-ui
headers that are actually validated by the middleware.
I1: Implement bounded rate-limit Map with MAX_ENTRIES=10000 and LRU eviction.
Unbounded Maps are vulnerable to spray attacks: attackers can force memory
exhaustion by requesting with distinct IPs. Now we evict the oldest entry
(by insertion order) when the map reaches capacity.
Code-review feedback:
- Dummy hash for user-enumeration-defense timing was 63 chars (bcrypt strings
are 60 chars). Worked by accident because bcrypt 5.x is lenient about
trailing chars; a future tightening would silently regress the timing
defense. Replaced with a real pre-computed bcrypt hash.
- last_login_at UPDATE now logs errors instead of silently swallowing them,
matching the pattern in requireAuth for api_tokens.last_used_at.
- Removed dead import of comparePassword from auth.test.js.
Code-review feedback: startsWith('/cluster') was a prefix match that exposed
destructive operator endpoints (POST /containers/:id/restart, DELETE /:id,
GET /devices/blackmagic/*) unauthenticated. Only POST /heartbeat is genuine
node-agent traffic; everything else in cluster.js is operator/UI surface
that should go through requireAuth. Long-term: issue node-agent a bound
api_token and drop the carve-out entirely.
Code-review feedback:
- Hard-fail boot when AUTH_ENABLED=true and SESSION_SECRET is unset, so
express-session can't silently use an in-memory random secret that
invalidates sessions on restart and breaks multi-node clusters.
- CORS rejection now returns cb(null, false) instead of cb(new Error)
so misconfigured origins surface as clean CORS errors in the browser
instead of HTTP 500s. Log a warn line for operator visibility.
- pruneSessionInterval units comment.
Code-review feedback: writing last_seen_at = now before loadUser() lets
the stamp persist if the lookup throws (resave:false still writes when
modified), extending the idle window without confirming the user exists.
Also clarify DEV_USER_ID is a specific placeholder, not a generic sentinel.
Code-review feedback: ON CONFLICT (id) only catches id collisions; a pre-existing
'dev' username would trigger a unique_violation on the username index and roll
back the migration, hard-failing the mam-api boot. Switch to bare ON CONFLICT
DO NOTHING so any unique conflict is no-op-safe.
CEP's embedded Chromium (used by Premiere Pro panels) does not support
oklch() color syntax. All color tokens were rendering as invalid/transparent,
causing the panel to appear unstyled. Converted all oklch() values to their
precise hex/rgba equivalents via OKLab→sRGB math. No design changes.
- Capture screen now polls /cluster/devices/blackmagic/signal every 3s
- Per-port chips show signal state (RECEIVING/CONNECTING/LOST/ERROR/IDLE) with pulsing dot
- BMD SVG card diagram rendered per node card
- Sidebar nav badge on Capture item shows live/total port count (pulsing green dot)