docs(auth): spec for user auth system, brainstormed 2026-05-27
This commit is contained in:
parent
ad9e1ef5f1
commit
183e10f8e6
1 changed files with 256 additions and 0 deletions
256
docs/superpowers/specs/2026-05-27-auth-system-design.md
Normal file
256
docs/superpowers/specs/2026-05-27-auth-system-design.md
Normal file
|
|
@ -0,0 +1,256 @@
|
||||||
|
# Dragonflight User Authentication — Design
|
||||||
|
|
||||||
|
**Status:** Approved, ready for implementation planning
|
||||||
|
**Date:** 2026-05-27
|
||||||
|
**Brainstormed with:** Zac
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Dragonflight has the skeleton of an auth system spread across the codebase:
|
||||||
|
|
||||||
|
- `users` table (`id`, `username`, `password_hash`, `display_name`, `role`)
|
||||||
|
- `sessions` table (`sid`, `sess`, `expire`) for `connect-pg-simple`
|
||||||
|
- `groups`, `user_groups`, `api_tokens` tables
|
||||||
|
- `SESSION_SECRET` env var
|
||||||
|
- `AUTH_ENABLED` env flag with boot-log toggle
|
||||||
|
- PR #26 frontend handler that bounces to `/login.html` on 401
|
||||||
|
- Issue #94 "session security fixes" deployed 2026-05-26 (commit `3ebe5d6`)
|
||||||
|
|
||||||
|
But the actual `express-session` middleware was never mounted in `services/mam-api/src/index.js`. There is no `/api/v1/auth/*` router. There is no `requireAuth` middleware. As a result, when `AUTH_ENABLED=true` was tried:
|
||||||
|
|
||||||
|
1. User submits login, server returns 200 OK from a stub endpoint.
|
||||||
|
2. No `Set-Cookie` is ever sent (no session middleware mounted).
|
||||||
|
3. The next request to a protected route returns 401.
|
||||||
|
4. Frontend bounces to `/login.html`.
|
||||||
|
5. **Infinite redirect loop.**
|
||||||
|
|
||||||
|
The prior attempts failed because auth was being built reactively in pieces, with no single source of truth for what "logged in" means.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
- One coherent, readable auth code path.
|
||||||
|
- Web UI logins survive page reloads and container restarts.
|
||||||
|
- Premiere panel can authenticate via long-lived bearer tokens.
|
||||||
|
- First-run setup works on a fresh install with no env var or CLI gymnastics.
|
||||||
|
- The whole auth flow can be exercised by automated tests, including a regression test for the redirect-loop failure mode.
|
||||||
|
|
||||||
|
## Non-goals (v1)
|
||||||
|
|
||||||
|
- MFA / TOTP.
|
||||||
|
- OAuth / OIDC delegation (Forgejo, Google, etc.).
|
||||||
|
- Per-project or per-recorder permissions. Flat access: logged in = full access.
|
||||||
|
- Email-based "forgot password" (no SMTP assumed; admin-reset only).
|
||||||
|
- Audit log of who-did-what (the `last_login_at` column is the minimum).
|
||||||
|
- Service-to-service auth for `node-agent` — keeps existing `019-node-token-binding` mechanism.
|
||||||
|
|
||||||
|
## Decisions
|
||||||
|
|
||||||
|
| Decision | Choice | Reasoning |
|
||||||
|
|---|---|---|
|
||||||
|
| Client surface | Web UI + Premiere panel | Two transports (cookies + bearer), one identity backend |
|
||||||
|
| Permission model | Flat (logged in = full access) | Small homogeneous operator population. `groups` / `user_groups` schemas stay inert. |
|
||||||
|
| Identity provider | Local username/password | On-prem broadcast operators won't tolerate OIDC roundtrips. Matches existing schema. |
|
||||||
|
| First-user bootstrap | First-run setup page | Hardest to mis-configure. No env vars to leak. No CLI to remember. |
|
||||||
|
| Session lifetime | 8h absolute + 1h sliding idle | Operator security posture, tighter than typical SaaS. |
|
||||||
|
| Auth library | Hand-rolled (`express-session` + `connect-pg-simple`) | Explicit, debuggable. Rejected JWT and Passport for this codebase. |
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Single source of truth
|
||||||
|
|
||||||
|
"Logged in" means exactly one of two things:
|
||||||
|
|
||||||
|
1. The request carries a valid `dragonflight.sid` cookie whose row in `sessions` hasn't expired and isn't past its 1h-idle or 8h-absolute window, OR
|
||||||
|
2. The request carries `Authorization: Bearer <token>` whose SHA-256 matches an `api_tokens` row that hasn't been revoked or expired.
|
||||||
|
|
||||||
|
Nothing else counts. No `localStorage` flags, no JWT, no client-side "I think I'm logged in" hints.
|
||||||
|
|
||||||
|
### One middleware, one check
|
||||||
|
|
||||||
|
`services/mam-api/src/middleware/auth.js` exposes a single `requireAuth` function:
|
||||||
|
|
||||||
|
```js
|
||||||
|
export async function requireAuth(req, res, next) {
|
||||||
|
// Dev mode preserved
|
||||||
|
if (process.env.AUTH_ENABLED !== 'true') {
|
||||||
|
req.user = { id: 'dev', username: 'dev' };
|
||||||
|
return next();
|
||||||
|
}
|
||||||
|
|
||||||
|
// 1. Session check
|
||||||
|
if (req.session?.user_id) {
|
||||||
|
const now = Date.now();
|
||||||
|
if (now - req.session.first_seen_at > 8 * 3600 * 1000) return destroyAnd401(req, res);
|
||||||
|
if (now - req.session.last_seen_at > 1 * 3600 * 1000) return destroyAnd401(req, res);
|
||||||
|
req.session.last_seen_at = now;
|
||||||
|
req.user = await loadUser(req.session.user_id);
|
||||||
|
if (!req.user) return destroyAnd401(req, res);
|
||||||
|
return next();
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. Bearer check
|
||||||
|
const bearer = parseBearer(req.headers.authorization);
|
||||||
|
if (bearer) {
|
||||||
|
const hash = sha256hex(bearer);
|
||||||
|
const row = await pool.query(
|
||||||
|
`SELECT t.id, t.user_id, t.expires_at, u.username
|
||||||
|
FROM api_tokens t JOIN users u ON u.id = t.user_id
|
||||||
|
WHERE t.token_hash = $1`, [hash]);
|
||||||
|
if (row.rows.length && (!row.rows[0].expires_at || row.rows[0].expires_at > new Date())) {
|
||||||
|
pool.query(`UPDATE api_tokens SET last_used_at = NOW() WHERE id = $1`, [row.rows[0].id]).catch(() => {});
|
||||||
|
req.user = { id: row.rows[0].user_id, username: row.rows[0].username };
|
||||||
|
return next();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. Otherwise
|
||||||
|
return res.status(401).json({ error: 'unauthorized' });
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Mounted at the `/api/v1` level in `services/mam-api/src/index.js`, with an allowlist for `/api/v1/auth/login`, `/api/v1/auth/setup`, `/api/v1/auth/setup-required`, and `/health`.
|
||||||
|
|
||||||
|
### Session middleware (actually wired this time)
|
||||||
|
|
||||||
|
In `services/mam-api/src/index.js`, **before any route**:
|
||||||
|
|
||||||
|
```js
|
||||||
|
import session from 'express-session';
|
||||||
|
import connectPgSimple from 'connect-pg-simple';
|
||||||
|
const PgStore = connectPgSimple(session);
|
||||||
|
|
||||||
|
if (process.env.TRUST_PROXY === 'true') app.set('trust proxy', 1);
|
||||||
|
|
||||||
|
app.use(session({
|
||||||
|
store: new PgStore({ pool, tableName: 'sessions', pruneSessionInterval: 60 * 15 }),
|
||||||
|
secret: process.env.SESSION_SECRET,
|
||||||
|
name: 'dragonflight.sid',
|
||||||
|
cookie: {
|
||||||
|
httpOnly: true,
|
||||||
|
sameSite: 'lax',
|
||||||
|
secure: process.env.TRUST_PROXY === 'true',
|
||||||
|
path: '/',
|
||||||
|
maxAge: 8 * 3600 * 1000,
|
||||||
|
},
|
||||||
|
rolling: false, // sliding renewal handled in requireAuth so we can enforce idle + absolute separately
|
||||||
|
resave: false,
|
||||||
|
saveUninitialized: false,
|
||||||
|
}));
|
||||||
|
```
|
||||||
|
|
||||||
|
### Auth router
|
||||||
|
|
||||||
|
`services/mam-api/src/routes/auth.js`:
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `GET` | `/api/v1/auth/setup-required` | none | `{ required: bool }`. Cheap, no auth. |
|
||||||
|
| `POST` | `/api/v1/auth/setup` | none | Only succeeds if `users` is empty. Creates first user, logs them in. |
|
||||||
|
| `POST` | `/api/v1/auth/login` | none | `{ username, password }` -> 200 + cookie or 401 |
|
||||||
|
| `POST` | `/api/v1/auth/logout` | required | Destroys session row, clears cookie |
|
||||||
|
| `GET` | `/api/v1/auth/me` | required | `{ id, username, display_name }` |
|
||||||
|
| `POST` | `/api/v1/auth/password` | required | Change own password (requires current) |
|
||||||
|
| `GET/POST/DELETE` | `/api/v1/auth/users[/:id]` | required | User CRUD |
|
||||||
|
| `GET/POST/DELETE` | `/api/v1/auth/tokens[/:id]` | required | Current user's API tokens |
|
||||||
|
|
||||||
|
### Data model
|
||||||
|
|
||||||
|
Existing schema is almost right. One small migration:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- services/mam-api/src/db/migrations/023-auth-session-timestamps.sql
|
||||||
|
ALTER TABLE users ADD COLUMN IF NOT EXISTS password_updated_at TIMESTAMPTZ DEFAULT NOW();
|
||||||
|
ALTER TABLE users ADD COLUMN IF NOT EXISTS last_login_at TIMESTAMPTZ;
|
||||||
|
-- idle / absolute timestamps live inside session.sess JSONB; no schema change needed
|
||||||
|
```
|
||||||
|
|
||||||
|
`groups` and `user_groups` stay as-is, unused for v1. `api_tokens` is already correctly shaped.
|
||||||
|
|
||||||
|
## Flows
|
||||||
|
|
||||||
|
### Browser login (the one that broke last time)
|
||||||
|
|
||||||
|
1. SPA boots, `<AuthGate>` calls `GET /api/v1/auth/me`.
|
||||||
|
2. `requireAuth` returns 401.
|
||||||
|
3. AuthGate calls `GET /api/v1/auth/setup-required`. If `true`, render Setup screen. Otherwise, render Login screen.
|
||||||
|
4. User submits `POST /api/v1/auth/login`. Server `bcrypt.compare`s, sets `req.session.user_id`, `first_seen_at`, `last_seen_at`. **Critical:** `await new Promise(r => req.session.save(r))` before responding, so the cookie is persisted to Postgres before the next request can arrive.
|
||||||
|
5. AuthGate re-calls `/api/v1/auth/me`, gets 200, renders the app.
|
||||||
|
|
||||||
|
**Why this doesn't loop:** the explicit `req.session.save()` callback before response guarantees the cookie row exists before the SPA can fire its next request. `requireAuth` returns a clean 401 (not a redirect) so the SPA decides what to render. The static `/login.html` is deleted; there is no HTML bounce.
|
||||||
|
|
||||||
|
### Premiere panel bearer
|
||||||
|
|
||||||
|
1. Web UI -> Settings -> API Tokens -> "New token" named "Premiere panel".
|
||||||
|
2. `POST /api/v1/auth/tokens` returns `{ token: 'dfl_<32 hex>', prefix: 'dfl_a3f2', id }` **exactly once**.
|
||||||
|
3. Premiere panel sends `Authorization: Bearer dfl_<...>` on every request. `requireAuth` SHA-256s it, looks up `api_tokens.token_hash`, updates `last_used_at`.
|
||||||
|
|
||||||
|
### Idle + absolute timeout (inside `requireAuth`)
|
||||||
|
|
||||||
|
```
|
||||||
|
if session present:
|
||||||
|
if now - session.first_seen_at > 8h -> destroy session, 401
|
||||||
|
if now - session.last_seen_at > 1h -> destroy session, 401
|
||||||
|
session.last_seen_at = now
|
||||||
|
req.user = lookup(session.user_id)
|
||||||
|
next()
|
||||||
|
```
|
||||||
|
|
||||||
|
Bearer tokens have their own optional `expires_at` (`NULL` = never expires); checked the same way.
|
||||||
|
|
||||||
|
## Frontend
|
||||||
|
|
||||||
|
- **`services/web-ui/src/auth-gate.jsx`** — new component that wraps the SPA. On mount: `GET /me`. On 401: check `setup-required`, render either Setup or Login. On 200: render the app shell.
|
||||||
|
- **Login screen** — layout B from brainstorm: 22px wordmark over "WILD DRAGON BROADCAST" tagline above a `--bg-1` card containing username, password, "Sign in" button. Matches DESIGN.md tokens.
|
||||||
|
- **Setup screen** — same chrome; fields = username, password, confirm password; button = "Create admin".
|
||||||
|
- **Settings -> Account section** — change password.
|
||||||
|
- **Settings -> API Tokens section** — list / create / revoke. New token shown exactly once with a copy affordance.
|
||||||
|
- **Fetch wrapper** — the central `ZAMPP_API.fetch` (already exists) gains a 401 handler that re-mounts AuthGate's Login state with the current path saved as `last_path`, restored after re-auth.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
|
||||||
|
- The static `/login.html` page (PR #26's bounce target) is deleted. SPA handles login internally; no full-page reload.
|
||||||
|
|
||||||
|
## Error handling
|
||||||
|
|
||||||
|
| Case | Behavior |
|
||||||
|
|---|---|
|
||||||
|
| Wrong username or password | `401 { error: 'invalid credentials' }`. Same message either way, no user enumeration. |
|
||||||
|
| Login rate limiting | Per-IP exponential backoff (1s, 2s, 4s, 8s, max 30s). In-memory `Map`. Single-instance limitation documented. |
|
||||||
|
| Idle / absolute expiry | 401 -> AuthGate Login. Last path saved, restored on re-auth. |
|
||||||
|
| Setup after first user exists | `409 { error: 'setup already complete' }`. Permanently disabled. |
|
||||||
|
| Token revoke | `DELETE /api/v1/auth/tokens/:id` — only owner can revoke. Subsequent bearer requests 401. |
|
||||||
|
| Delete-self when only user | `409 { error: 'cannot delete last user' }`. |
|
||||||
|
| Forgot password | No self-serve. Any logged-in user can reset another via `POST /api/v1/auth/users/:id/password`. Documented as the recovery path. |
|
||||||
|
| Password rules | Min 12 chars, no max, no character class requirements (NIST SP 800-63B). `bcrypt` cost 12. |
|
||||||
|
| CSRF | `SameSite=Lax` + same origin + required `X-Requested-With: dragonflight-ui` header on mutating requests (belt-and-suspenders). |
|
||||||
|
| Session table growth | `connect-pg-simple` `pruneSessionInterval: 60 * 15` (every 15 min). |
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
- **Unit — `services/mam-api/test/middleware/auth.test.js`**: requireAuth with (a) no creds, (b) valid session, (c) idle-expired session, (d) absolute-expired session, (e) valid bearer, (f) invalid bearer, (g) bearer matching a deleted user.
|
||||||
|
- **Integration — `services/mam-api/test/auth.integration.test.js`**: spin up Express + test Postgres. Walks: setup -> login -> /me -> mutating call -> logout -> /me 401. Second pass: idle timeout simulated by mutating `last_seen_at` in DB. Third pass: bearer issue -> use -> revoke -> 401.
|
||||||
|
- **Regression test for the redirect-loop bug:** explicit test that after `POST /auth/login` returns 200, a subsequent `GET /auth/me` with the returned cookie returns 200 in the same test client. This is the test that would have caught the original failure.
|
||||||
|
- **Manual smoke (documented in PR):** fresh install -> setup -> create admin -> land on dashboard -> reload (stays logged in) -> wait 1h idle -> reload -> bounce to login.
|
||||||
|
|
||||||
|
## Implementation order
|
||||||
|
|
||||||
|
Suggested sequencing for the implementation plan (writing-plans will refine):
|
||||||
|
|
||||||
|
1. Migration `023-auth-session-timestamps.sql`.
|
||||||
|
2. `express-session` + `connect-pg-simple` wiring in `index.js`.
|
||||||
|
3. `requireAuth` middleware.
|
||||||
|
4. Auth router (setup, login, logout, me, password).
|
||||||
|
5. Apply `requireAuth` to API router with allowlist.
|
||||||
|
6. Auth tests (unit + integration + regression).
|
||||||
|
7. Frontend `<AuthGate>` + Login screen + Setup screen.
|
||||||
|
8. Frontend Settings -> Account + API Tokens.
|
||||||
|
9. Delete `/login.html`.
|
||||||
|
10. User CRUD + token CRUD routes.
|
||||||
|
11. Rate limiting + CSRF header.
|
||||||
|
12. Documentation: README updates, `AUTH_ENABLED` transition notes.
|
||||||
|
|
||||||
|
## Out-of-band notes for the implementer
|
||||||
|
|
||||||
|
- The current `cors({ origin: true, credentials: true })` in `index.js` is too permissive once cookies start carrying authority. Tighten to a specific origin list (driven by an `ALLOWED_ORIGINS` env var) at the same time as wiring the session middleware — otherwise we're undoing the `SameSite=Lax` protection from the other side.
|
||||||
|
- node-agent -> mam-api traffic on `/api/v1/cluster/*` must keep working. Add a route-level carve-out comment that this path uses the existing `019-node-token-binding` token, not the user-auth path.
|
||||||
|
- The boot log currently says `Authentication: ENABLED` / `DISABLED (set AUTH_ENABLED=true for production)`. Once this lands, the recommended default flips: `AUTH_ENABLED=true` becomes the documented default in `.env.example` and the README, and `AUTH_ENABLED=false` is documented as a dev-only escape hatch.
|
||||||
Loading…
Reference in a new issue