# Dragonflight User Authentication — Design **Status:** Approved, ready for implementation planning **Date:** 2026-05-27 **Brainstormed with:** Zac ## Problem Dragonflight has the skeleton of an auth system spread across the codebase: - `users` table (`id`, `username`, `password_hash`, `display_name`, `role`) - `sessions` table (`sid`, `sess`, `expire`) for `connect-pg-simple` - `groups`, `user_groups`, `api_tokens` tables - `SESSION_SECRET` env var - `AUTH_ENABLED` env flag with boot-log toggle - PR #26 frontend handler that bounces to `/login.html` on 401 - Issue #94 "session security fixes" deployed 2026-05-26 (commit `3ebe5d6`) But the actual `express-session` middleware was never mounted in `services/mam-api/src/index.js`. There is no `/api/v1/auth/*` router. There is no `requireAuth` middleware. As a result, when `AUTH_ENABLED=true` was tried: 1. User submits login, server returns 200 OK from a stub endpoint. 2. No `Set-Cookie` is ever sent (no session middleware mounted). 3. The next request to a protected route returns 401. 4. Frontend bounces to `/login.html`. 5. **Infinite redirect loop.** The prior attempts failed because auth was being built reactively in pieces, with no single source of truth for what "logged in" means. ## Goals - One coherent, readable auth code path. - Web UI logins survive page reloads and container restarts. - Premiere panel can authenticate via long-lived bearer tokens. - First-run setup works on a fresh install with no env var or CLI gymnastics. - The whole auth flow can be exercised by automated tests, including a regression test for the redirect-loop failure mode. ## Non-goals (v1) - MFA / TOTP. - OAuth / OIDC delegation (Forgejo, Google, etc.). - Per-project or per-recorder permissions. Flat access: logged in = full access. - Email-based "forgot password" (no SMTP assumed; admin-reset only). - Audit log of who-did-what (the `last_login_at` column is the minimum). - Service-to-service auth for `node-agent` — keeps existing `019-node-token-binding` mechanism. ## Decisions | Decision | Choice | Reasoning | |---|---|---| | Client surface | Web UI + Premiere panel | Two transports (cookies + bearer), one identity backend | | Permission model | Flat (logged in = full access) | Small homogeneous operator population. `groups` / `user_groups` schemas stay inert. | | Identity provider | Local username/password | On-prem broadcast operators won't tolerate OIDC roundtrips. Matches existing schema. | | First-user bootstrap | First-run setup page | Hardest to mis-configure. No env vars to leak. No CLI to remember. | | Session lifetime | 8h absolute + 1h sliding idle | Operator security posture, tighter than typical SaaS. | | Auth library | Hand-rolled (`express-session` + `connect-pg-simple`) | Explicit, debuggable. Rejected JWT and Passport for this codebase. | ## Architecture ### Single source of truth "Logged in" means exactly one of two things: 1. The request carries a valid `dragonflight.sid` cookie whose row in `sessions` hasn't expired and isn't past its 1h-idle or 8h-absolute window, OR 2. The request carries `Authorization: Bearer ` whose SHA-256 matches an `api_tokens` row that hasn't been revoked or expired. Nothing else counts. No `localStorage` flags, no JWT, no client-side "I think I'm logged in" hints. ### One middleware, one check `services/mam-api/src/middleware/auth.js` exposes a single `requireAuth` function: ```js export async function requireAuth(req, res, next) { // Dev mode preserved. The 'dev' user is a real row in `users` seeded at // boot when AUTH_ENABLED !== 'true', so FK-bearing routes (api_tokens, // future comments, audit fields) keep working without conditional logic. if (process.env.AUTH_ENABLED !== 'true') { req.user = DEV_USER; // { id: , username: 'dev' } return next(); } // 1. Session check if (req.session?.user_id) { const now = Date.now(); if (now - req.session.first_seen_at > 8 * 3600 * 1000) return destroyAnd401(req, res); if (now - req.session.last_seen_at > 1 * 3600 * 1000) return destroyAnd401(req, res); req.session.last_seen_at = now; req.user = await loadUser(req.session.user_id); if (!req.user) return destroyAnd401(req, res); return next(); } // 2. Bearer check const bearer = parseBearer(req.headers.authorization); if (bearer) { const hash = sha256hex(bearer); const row = await pool.query( `SELECT t.id, t.user_id, t.expires_at, u.username FROM api_tokens t JOIN users u ON u.id = t.user_id WHERE t.token_hash = $1`, [hash]); if (row.rows.length && (!row.rows[0].expires_at || row.rows[0].expires_at > new Date())) { pool.query(`UPDATE api_tokens SET last_used_at = NOW() WHERE id = $1`, [row.rows[0].id]).catch(() => {}); req.user = { id: row.rows[0].user_id, username: row.rows[0].username }; return next(); } } // 3. Otherwise return res.status(401).json({ error: 'unauthorized' }); } ``` Mounted at the `/api/v1` level in `services/mam-api/src/index.js`, **before** the individual route mounts, with an allowlist for the three pre-login auth paths: ```js app.use('/api/v1', (req, res, next) => { const unauth = ['/auth/login', '/auth/setup', '/auth/setup-required']; if (unauth.some(p => req.path === p)) return next(); return requireAuth(req, res, next); }); // then: app.use('/api/v1/assets', assetsRouter), etc. ``` `/health` lives at the root, outside the `/api/v1` mount, so it's naturally unaffected. `/api/v1/cluster/*` keeps its existing `019-node-token-binding` service-auth path: requireAuth runs first, fails with 401 for an unauthenticated request, **but** the cluster routes themselves do their own token check on request bodies, so node-agent traffic must include a valid user session OR an api_token (which is the change — node-agent will need to be issued an api_token at install time). Alternative: carve `/api/v1/cluster/*` out of the requireAuth gate too, and keep node-agent on its existing binding token alone. Implementer should pick — flagged in the implementation order. ### Session middleware (actually wired this time) In `services/mam-api/src/index.js`, **before any route**: ```js import session from 'express-session'; import connectPgSimple from 'connect-pg-simple'; const PgStore = connectPgSimple(session); if (process.env.TRUST_PROXY === 'true') app.set('trust proxy', 1); app.use(session({ store: new PgStore({ pool, tableName: 'sessions', pruneSessionInterval: 60 * 15 }), secret: process.env.SESSION_SECRET, name: 'dragonflight.sid', cookie: { httpOnly: true, sameSite: 'lax', secure: process.env.TRUST_PROXY === 'true', path: '/', maxAge: 8 * 3600 * 1000, }, rolling: false, // sliding renewal handled in requireAuth so we can enforce idle + absolute separately resave: false, saveUninitialized: false, })); ``` ### Auth router `services/mam-api/src/routes/auth.js`: | Method | Path | Auth | Description | |---|---|---|---| | `GET` | `/api/v1/auth/setup-required` | none | `{ required: bool }`. Cheap, no auth. | | `POST` | `/api/v1/auth/setup` | none | Only succeeds if `users` is empty. Creates first user, logs them in. | | `POST` | `/api/v1/auth/login` | none | `{ username, password }` -> 200 + cookie or 401 | | `POST` | `/api/v1/auth/logout` | required | Destroys session row, clears cookie | | `GET` | `/api/v1/auth/me` | required | `{ id, username, display_name }` | | `POST` | `/api/v1/auth/password` | required | Change own password (requires current) | | `GET/POST/DELETE` | `/api/v1/auth/users[/:id]` | required | User CRUD | | `GET/POST/DELETE` | `/api/v1/auth/tokens[/:id]` | required | Current user's API tokens | ### Data model Existing schema is almost right. One small migration: ```sql -- services/mam-api/src/db/migrations/023-auth-session-timestamps.sql ALTER TABLE users ADD COLUMN IF NOT EXISTS password_updated_at TIMESTAMPTZ DEFAULT NOW(); ALTER TABLE users ADD COLUMN IF NOT EXISTS last_login_at TIMESTAMPTZ; -- idle / absolute timestamps live inside session.sess JSONB; no schema change needed ``` `groups` and `user_groups` stay as-is, unused for v1. `api_tokens` is already correctly shaped. ## Flows ### Browser login (the one that broke last time) 1. SPA boots, `` calls `GET /api/v1/auth/me`. 2. `requireAuth` returns 401. 3. AuthGate calls `GET /api/v1/auth/setup-required`. If `true`, render Setup screen. Otherwise, render Login screen. 4. User submits `POST /api/v1/auth/login`. Server `bcrypt.compare`s, sets `req.session.user_id`, `first_seen_at`, `last_seen_at`. **Critical:** `await new Promise(r => req.session.save(r))` before responding, so the cookie is persisted to Postgres before the next request can arrive. 5. AuthGate re-calls `/api/v1/auth/me`, gets 200, renders the app. **Why this doesn't loop:** the explicit `req.session.save()` callback before response guarantees the cookie row exists before the SPA can fire its next request. `requireAuth` returns a clean 401 (not a redirect) so the SPA decides what to render. The static `/login.html` is deleted; there is no HTML bounce. ### Premiere panel bearer 1. Web UI -> Settings -> API Tokens -> "New token" named "Premiere panel". 2. `POST /api/v1/auth/tokens` returns `{ token: 'dfl_<32 hex>', prefix: 'dfl_a3f2', id }` **exactly once**. 3. Premiere panel sends `Authorization: Bearer dfl_<...>` on every request. `requireAuth` SHA-256s it, looks up `api_tokens.token_hash`, updates `last_used_at`. ### Idle + absolute timeout (inside `requireAuth`) ``` if session present: if now - session.first_seen_at > 8h -> destroy session, 401 if now - session.last_seen_at > 1h -> destroy session, 401 session.last_seen_at = now req.user = lookup(session.user_id) next() ``` Bearer tokens have their own optional `expires_at` (`NULL` = never expires); checked the same way. ## Frontend - **`services/web-ui/src/auth-gate.jsx`** — new component that wraps the SPA. On mount: `GET /me`. On 401: check `setup-required`, render either Setup or Login. On 200: render the app shell. - **Login screen** — layout B from brainstorm: 22px wordmark over "WILD DRAGON BROADCAST" tagline above a `--bg-1` card containing username, password, "Sign in" button. Matches DESIGN.md tokens. - **Setup screen** — same chrome; fields = username, password, confirm password; button = "Create admin". - **Settings -> Account section** — change password. - **Settings -> API Tokens section** — list / create / revoke. New token shown exactly once with a copy affordance. - **Fetch wrapper** — the central `ZAMPP_API.fetch` (already exists) gains a 401 handler that re-mounts AuthGate's Login state with the current path saved as `last_path`, restored after re-auth. ### Removed - The static `/login.html` page (PR #26's bounce target) is deleted. SPA handles login internally; no full-page reload. ## Error handling | Case | Behavior | |---|---| | Wrong username or password | `401 { error: 'invalid credentials' }`. Same message either way, no user enumeration. | | Login rate limiting | Per-IP exponential backoff (1s, 2s, 4s, 8s, max 30s). In-memory `Map`. Single-instance limitation documented. | | Idle / absolute expiry | 401 -> AuthGate Login. Last path saved, restored on re-auth. | | Setup after first user exists | `409 { error: 'setup already complete' }`. Permanently disabled. | | Token revoke | `DELETE /api/v1/auth/tokens/:id` — only owner can revoke. Subsequent bearer requests 401. | | Delete-self when only user | `409 { error: 'cannot delete last user' }`. | | Forgot password | No self-serve. Any logged-in user can reset another via `POST /api/v1/auth/users/:id/password`. Documented as the recovery path. | | Password rules | Min 12 chars, no max, no character class requirements (NIST SP 800-63B). `bcrypt` cost 12. | | CSRF | `SameSite=Lax` + same origin + required `X-Requested-With: dragonflight-ui` header on mutating requests (belt-and-suspenders). | | Session table growth | `connect-pg-simple` `pruneSessionInterval: 60 * 15` (every 15 min). | ## Testing - **Unit — `services/mam-api/test/middleware/auth.test.js`**: requireAuth with (a) no creds, (b) valid session, (c) idle-expired session, (d) absolute-expired session, (e) valid bearer, (f) invalid bearer, (g) bearer matching a deleted user. - **Integration — `services/mam-api/test/auth.integration.test.js`**: spin up Express + test Postgres. Walks: setup -> login -> /me -> mutating call -> logout -> /me 401. Second pass: idle timeout simulated by mutating `last_seen_at` in DB. Third pass: bearer issue -> use -> revoke -> 401. - **Regression test for the redirect-loop bug:** explicit test that after `POST /auth/login` returns 200, a subsequent `GET /auth/me` with the returned cookie returns 200 in the same test client. This is the test that would have caught the original failure. - **Manual smoke (documented in PR):** fresh install -> setup -> create admin -> land on dashboard -> reload (stays logged in) -> wait 1h idle -> reload -> bounce to login. ## Implementation order Suggested sequencing for the implementation plan (writing-plans will refine): 1. Migration `023-auth-session-timestamps.sql`. Add idempotent seed of the dev user (`INSERT ... ON CONFLICT DO NOTHING` with a fixed UUID) so dev mode FK-bearing routes work out of the box. 2. `express-session` + `connect-pg-simple` wiring in `index.js`. 3. `requireAuth` middleware (with `DEV_USER` constant resolved from the seeded row). 4. Auth router (setup, login, logout, me, password). 5. Apply `requireAuth` to API router with allowlist. Decide cluster carve-out (see Architecture). 6. Auth tests (unit + integration + regression). 7. Frontend `` + Login screen + Setup screen. 8. Frontend Settings -> Account + API Tokens. 9. Delete `/login.html`. 10. User CRUD + token CRUD routes. 11. Rate limiting + CSRF header. 12. Documentation: README updates, `AUTH_ENABLED` transition notes. ## Out-of-band notes for the implementer - The current `cors({ origin: true, credentials: true })` in `index.js` is too permissive once cookies start carrying authority. Tighten to a specific origin list (driven by an `ALLOWED_ORIGINS` env var) at the same time as wiring the session middleware — otherwise we're undoing the `SameSite=Lax` protection from the other side. - node-agent -> mam-api traffic on `/api/v1/cluster/*` must keep working. Add a route-level carve-out comment that this path uses the existing `019-node-token-binding` token, not the user-auth path. - The boot log currently says `Authentication: ENABLED` / `DISABLED (set AUTH_ENABLED=true for production)`. Once this lands, the recommended default flips: `AUTH_ENABLED=true` becomes the documented default in `.env.example` and the README, and `AUTH_ENABLED=false` is documented as a dev-only escape hatch.