Adds a paste-URL ingest path under Ingest → YouTube. Worker hosts yt-dlp, downloads to S3, then hands off to the existing proxy + thumbnail pipeline so imported assets share one lifecycle with uploads. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
272 lines
15 KiB
Markdown
272 lines
15 KiB
Markdown
# YouTube Importer — Design Spec
|
||
|
||
> Status: **design approved 2026-05-23**, awaiting user review of the spec before the implementation plan is written.
|
||
|
||
## Context
|
||
|
||
The Ingest group in Dragonflight today covers file Upload, Recorders (SRT/RTMP/SDI), Capture (DeckLink), Monitors, and Schedule. There is no path to bring in media that already lives on the public web. The frequent ask is: "I want to grab a YouTube link and have it become an asset in my project, with the same proxy/thumbnail pipeline as anything else." This spec adds a YouTube importer that mirrors the existing upload flow: paste a URL, pick a project, click Import, and the asset shows up in the Library once the worker is done.
|
||
|
||
The importer rides on the existing job pipeline. After the download lands in S3, the asset re-enters the same proxy → thumbnail → ready path as a regular upload, so there is no parallel "imported asset" lifecycle to maintain.
|
||
|
||
## Goals & non-goals
|
||
|
||
**Goals**
|
||
- Paste a public YouTube URL, end up with a `ready` asset in the chosen project.
|
||
- Reuse the existing `assets` table, S3 layout, BullMQ pipeline, and Jobs screen — no parallel state machine.
|
||
- Progress visible from both the import screen (queue rows) and the Jobs screen.
|
||
- Clear, actionable errors for the obvious failure modes (private, age-gated, removed, geo-blocked, network).
|
||
|
||
**Non-goals**
|
||
- Playlists, channels, or batch-paste of multiple URLs. Single URL per submission. (Easy to add later.)
|
||
- Cookies / login. Private, members-only, and age-gated videos are out of scope v1.
|
||
- Quality picker. Always grabs best MP4 (with M4A audio merge fallback).
|
||
- Non-YouTube sources (Vimeo, Twitch VODs, Dropbox links, etc.). The route is `/imports/youtube` precisely to leave room for siblings later.
|
||
- Auto-update of yt-dlp inside the running container. Updates land via image rebuild.
|
||
- Copyright enforcement. We surface a one-line "only import videos you have rights to use" note and stop there.
|
||
|
||
## Architecture
|
||
|
||
The importer threads through four existing layers:
|
||
|
||
```
|
||
[web-ui] YouTube screen ──POST /imports/youtube──▶ [mam-api]
|
||
│
|
||
assets row (status='ingesting')
|
||
jobs row (type='youtube_import')
|
||
│
|
||
BullMQ "import" queue
|
||
▼
|
||
[worker]
|
||
yt-dlp download → S3 originals/
|
||
ffprobe metadata → assets row
|
||
status='processing'
|
||
│
|
||
BullMQ "proxy" queue ◀── existing path
|
||
▼
|
||
proxy → thumbnail → ready
|
||
```
|
||
|
||
Once the worker hands off to the `proxy` queue, the asset is indistinguishable from one that came through Upload — same proxy worker, same thumbnail worker, same Library list.
|
||
|
||
## 1. UX
|
||
|
||
### Nav
|
||
|
||
A 6th child is added to the **Ingest** group in `shell.jsx`, between Upload and Recorders:
|
||
|
||
```js
|
||
{ id: "youtube", label: "YouTube", icon: "download" },
|
||
```
|
||
|
||
The `download` glyph already exists in `icons.jsx`. The matching `ingestChildren` array in `shell.jsx` and the crumbs map in `app.jsx` both gain `"youtube"`.
|
||
|
||
### Screen
|
||
|
||
A new `YouTubeImport` component lives in `screens-ingest.jsx` and is exported on `window` alongside `Upload`, `Recorders`, etc. It is registered as a route in `app.jsx`.
|
||
|
||
Layout — visually a sibling of the Upload screen:
|
||
|
||
- **Header**: title "YouTube", subtitle "Paste a link — we download and import the best available MP4."
|
||
- **Project selector**: same `select` element as Upload's, pre-selected to the first project.
|
||
- **URL input**: a single-line `field-input` with placeholder "Paste a YouTube URL (youtube.com/watch, youtu.be, or shorts)…" and an inline **Import** button. Enter submits. The button is disabled until a URL pattern matches.
|
||
- **Subtitle line under the input**: "Only import videos you have rights to use. Private, age-gated, and members-only videos are not supported."
|
||
- **Queue panel**: identical structure to Upload's queue — one row per submitted URL, showing:
|
||
- Source icon (use `link` glyph) and the URL (truncated middle, full URL in `title` tooltip).
|
||
- Title once known (filled in by a poll on the asset row).
|
||
- Progress bar tied to job `progress` (0–100). The worker drives this between 5 and 60 % for download and 60 to 100 % for upload + DB writes.
|
||
- Status pill: queued → downloading → processing → done / failed.
|
||
- Error text if the job fails (red, one line).
|
||
- A "Clear done" button at the top of the queue.
|
||
|
||
The queue persists for the session in component state only — no separate UI table. Jobs screen remains the canonical history.
|
||
|
||
### URL validation (client-side, before POST)
|
||
|
||
Accept (case-insensitive) any of these patterns:
|
||
- `https?://(www\.|m\.)?youtube\.com/watch\?[^ ]*v=[A-Za-z0-9_-]{11}`
|
||
- `https?://youtu\.be/[A-Za-z0-9_-]{11}`
|
||
- `https?://(www\.)?youtube\.com/shorts/[A-Za-z0-9_-]{11}`
|
||
|
||
Anything else is rejected inline ("That doesn't look like a YouTube URL") without an API call. The server re-validates as a defense-in-depth check.
|
||
|
||
### Out-of-scope v1 (called out, not built)
|
||
|
||
- Pasting a playlist URL. Server returns 400 "Playlists aren't supported yet."
|
||
- Multi-line paste. Single URL only.
|
||
- Quality picker. yt-dlp format string is hard-coded.
|
||
- Cookies upload. Private videos fail with a clear message.
|
||
|
||
## 2. API
|
||
|
||
### Route
|
||
|
||
New file `services/mam-api/src/routes/imports.js`, mounted at `/api/v1/imports` in `services/mam-api/src/index.js`.
|
||
|
||
**`POST /api/v1/imports/youtube`**
|
||
|
||
Request body:
|
||
```json
|
||
{ "url": "https://youtu.be/dQw4w9WgXcQ", "projectId": "uuid", "binId": "uuid?" }
|
||
```
|
||
|
||
Behavior:
|
||
1. Validate `url` against the same three regexes as the client. 400 on miss.
|
||
2. Reject playlist URLs (URL contains `list=`) with 400 "Playlists aren't supported yet."
|
||
3. Generate `assetId = uuidv4()`.
|
||
4. Insert into `assets` with:
|
||
- `status='ingesting'`
|
||
- `media_type='video'`
|
||
- `filename = url` (placeholder; worker overwrites with the sanitized title once yt-dlp prints metadata — keeps the row queryable in the meantime)
|
||
- `display_name = url` (same; worker overwrites)
|
||
- `original_s3_key = NULL` (worker fills in)
|
||
- `source_url = url` (new column — see Schema)
|
||
- `project_id`, `bin_id`, timestamps.
|
||
5. Insert into `jobs` with `type='youtube_import'`, `asset_id`, `payload={ url }`, `status='queued'`, `progress=0`.
|
||
6. Enqueue BullMQ job on the `import` queue:
|
||
```js
|
||
await importQueue.add('youtube', { assetId, url });
|
||
```
|
||
7. Respond `200 { assetId, jobId }`.
|
||
|
||
Errors:
|
||
- Missing fields → 400.
|
||
- Bad URL → 400 with `error: 'Invalid YouTube URL'`.
|
||
- Playlist URL → 400 with `error: 'Playlists aren't supported yet'`.
|
||
- Project not found → 404.
|
||
- DB / queue failure → 500 (next(err)).
|
||
|
||
### Jobs screen integration
|
||
|
||
`services/web-ui/public/screens-jobs.jsx` already normalizes job types via a `kindMap`. Add one entry:
|
||
```js
|
||
const kindMap = { proxy: 'Proxy', thumbnail: 'Thumbnail', conform: 'Conform', transcode: 'Transcode', youtube_import: 'YouTube' };
|
||
```
|
||
Retry, delete, and the SSE event stream all work for the new type with no further changes because they key off `job.id`, not `job.type`.
|
||
|
||
## 3. Worker
|
||
|
||
### Container changes
|
||
|
||
`services/worker/Dockerfile` gains two packages:
|
||
```dockerfile
|
||
RUN apk add --no-cache ffmpeg yt-dlp python3
|
||
```
|
||
`yt-dlp` is in the Alpine `community` repo and pulls `python3` as a runtime dep — we list it explicitly for clarity. Image grows by ~25 MB.
|
||
|
||
### New worker
|
||
|
||
`services/worker/src/workers/youtube-import.js`, registered in `services/worker/src/index.js`:
|
||
```js
|
||
const workers = [
|
||
createWorker('proxy', proxyWorker),
|
||
createWorker('thumbnail', thumbnailWorker),
|
||
createWorker('conform', conformWorker),
|
||
createWorker('import', youtubeImportWorker),
|
||
];
|
||
```
|
||
|
||
### Job handler
|
||
|
||
For a job with `{ assetId, url }`:
|
||
|
||
1. `job.updateProgress(2)` — accepted.
|
||
2. Build a temp directory `tmpdir()/yt-${jobId}`.
|
||
3. Run yt-dlp:
|
||
```sh
|
||
yt-dlp \
|
||
--no-playlist \
|
||
--no-warnings \
|
||
--restrict-filenames \
|
||
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]/b" \
|
||
--merge-output-format mp4 \
|
||
--print-json \
|
||
--newline \
|
||
-o "<tmpdir>/<assetId>.%(ext)s" \
|
||
"<url>"
|
||
```
|
||
- `--print-json` writes one JSON line at the end with title, duration, width, height, uploader, etc.
|
||
- `--newline` makes progress lines newline-terminated so we can parse them.
|
||
- `--restrict-filenames` prevents shell-special characters in temp paths.
|
||
4. Stream stdout line-by-line. Lines matching `r'\[download\]\s+(\d+(\.\d+)?)%'` map to `job.updateProgress(5 + Math.floor(pct * 0.55))` so download takes us from 5 to 60 %.
|
||
5. On yt-dlp non-zero exit: parse stderr for the first line containing `ERROR:` and use it as the job's error message. Mark the asset `status='error'`, mark the job failed, throw so BullMQ records it. Surface a friendly substitution for the common cases:
|
||
- "Private video" → "Private video — not supported."
|
||
- "Sign in to confirm your age" → "Age-restricted video — not supported."
|
||
- "Video unavailable" → "Video unavailable or removed."
|
||
- "This video is not available in your country" → "Video is geo-blocked from this region."
|
||
- HTTP 429 → "YouTube rate-limited the importer — try again later."
|
||
- Anything else → use yt-dlp's stderr line verbatim, truncated to 300 chars.
|
||
6. Parse the last stdout line as JSON to read metadata. The resulting file is `<tmpdir>/<assetId>.mp4`.
|
||
7. `getMediaInfo` (existing helper in `services/worker/src/ffmpeg/executor.js`) on that path. Use ffprobe's values for codec/fps/duration when yt-dlp's are missing or wrong.
|
||
8. Sanitize the title for the S3 filename: keep `[A-Za-z0-9 ._-]`, collapse runs of whitespace, trim, cap at 120 chars, append `.mp4`. If the sanitized title is empty, fall back to `youtube-<videoId>.mp4`.
|
||
9. Upload to `originals/{assetId}/{sanitized-title}.mp4` via the existing `uploadToS3` helper. Progress 60 → 90 %.
|
||
10. UPDATE the assets row with:
|
||
- `filename = <sanitized title>.mp4`
|
||
- `display_name = <yt-dlp title untouched>`
|
||
- `original_s3_key = originals/<assetId>/<sanitized-title>.mp4`
|
||
- `codec`, `resolution`, `fps`, `duration_ms`, `file_size` from ffprobe.
|
||
- `status = 'processing'`
|
||
- `updated_at = NOW()`
|
||
11. Enqueue a `proxy` job on the existing `proxy` queue with the same payload shape `upload.js` uses:
|
||
```js
|
||
await proxyQueue.add('generate', {
|
||
assetId,
|
||
inputKey: asset.original_s3_key,
|
||
outputKey: `proxies/${assetId}.mp4`,
|
||
});
|
||
```
|
||
12. `job.updateProgress(100)`. Return — BullMQ marks the import job done. The proxy job picks up the rest exactly like a regular upload.
|
||
13. Always `rm -rf` the temp directory in a `finally`.
|
||
|
||
### Concurrency & retries
|
||
|
||
- Default BullMQ concurrency for this queue: **1** per worker process. Two simultaneous yt-dlp invocations risk YouTube rate-limiting more than they help throughput. Configurable later via env if needed.
|
||
- No automatic BullMQ retry — yt-dlp failures are almost always permanent (private, geo, removed) and a silent retry storm would chew through quota. The Jobs screen's manual Retry button is the right knob for "this should be transient" cases.
|
||
|
||
## 4. Schema migration
|
||
|
||
New file `services/mam-api/src/db/migrations/011-youtube-import.sql`:
|
||
|
||
```sql
|
||
-- 1. Add the new job type to the enum.
|
||
-- Postgres requires ALTER TYPE ... ADD VALUE for enum changes.
|
||
ALTER TYPE job_type ADD VALUE IF NOT EXISTS 'youtube_import';
|
||
|
||
-- 2. Remember where an asset came from. NULL for everything that
|
||
-- pre-dates the importer; populated for any imported asset.
|
||
ALTER TABLE assets ADD COLUMN IF NOT EXISTS source_url TEXT;
|
||
```
|
||
|
||
`source_url` is exposed on the asset drawer as a "Source" line ("imported from youtu.be/…") in a follow-up PR — out of scope for this spec, but worth noting that the column exists for it.
|
||
|
||
## 5. Files touched
|
||
|
||
**New**
|
||
- `services/mam-api/src/routes/imports.js`
|
||
- `services/mam-api/src/db/migrations/011-youtube-import.sql`
|
||
- `services/worker/src/workers/youtube-import.js`
|
||
|
||
**Edited**
|
||
- `services/mam-api/src/index.js` — mount the new route.
|
||
- `services/web-ui/public/screens-ingest.jsx` — add `YouTubeImport`, export on `window`.
|
||
- `services/web-ui/public/shell.jsx` — add the nav child, extend `ingestChildren`.
|
||
- `services/web-ui/public/app.jsx` — register the route and the crumb.
|
||
- `services/web-ui/public/screens-jobs.jsx` — extend `kindMap` with `youtube_import: 'YouTube'`.
|
||
- `services/worker/src/index.js` — register the `import` queue worker.
|
||
- `services/worker/Dockerfile` — add `yt-dlp` and `python3` to the apk install line.
|
||
|
||
## 6. Risks & trade-offs
|
||
|
||
- **Worker egress**. The worker container needs outbound HTTPS to YouTube. Fine in the current homelab; will fail in a locked-down cluster. Documented in the implementation plan.
|
||
- **yt-dlp drift**. YouTube changes break old yt-dlp versions every few weeks. The Alpine package lags upstream by days. Fix is to rebuild the worker image. We do not auto-update inside the running container — too risky for an offline / locked-down deploy. If imports start failing en masse, the runbook is `docker compose build worker && docker compose up -d worker`.
|
||
- **Single-URL UX feels light**. That is deliberate for v1. Adding multi-URL paste and playlist expansion are both small follow-ups once the single-URL path is stable.
|
||
- **No copyright enforcement**. We rely on the one-line notice in the UI. If misuse becomes a real concern, the next step would be an admin allowlist of domains or a per-user import quota — not in this spec.
|
||
- **`filename = url` placeholder**. Briefly, the asset row in the Library shows the URL as the name. The worker overwrites it within seconds for a successful import. Acceptable; the Library already handles "ingesting" assets with placeholder names from the upload path.
|
||
|
||
## 7. Acceptance
|
||
|
||
The feature is done when:
|
||
- A user can navigate to **Ingest → YouTube**, paste a public YouTube URL, pick a project, click Import, and within a minute or two see the asset appear in the Library with proxy and thumbnail.
|
||
- A failed import (private video, removed video, bogus URL) shows a clear error message on both the YouTube screen's queue row and the Jobs screen.
|
||
- The Jobs screen lists "YouTube" jobs alongside Proxy / Thumbnail / Conform, with Retry working.
|
||
- `source_url` is populated on the imported asset row.
|
||
- Image rebuild + `docker compose up -d worker` is the documented recovery path if YouTube changes break yt-dlp.
|