Adds a paste-URL ingest path under Ingest → YouTube. Worker hosts yt-dlp, downloads to S3, then hands off to the existing proxy + thumbnail pipeline so imported assets share one lifecycle with uploads. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
15 KiB
YouTube Importer — Design Spec
Status: design approved 2026-05-23, awaiting user review of the spec before the implementation plan is written.
Context
The Ingest group in Dragonflight today covers file Upload, Recorders (SRT/RTMP/SDI), Capture (DeckLink), Monitors, and Schedule. There is no path to bring in media that already lives on the public web. The frequent ask is: "I want to grab a YouTube link and have it become an asset in my project, with the same proxy/thumbnail pipeline as anything else." This spec adds a YouTube importer that mirrors the existing upload flow: paste a URL, pick a project, click Import, and the asset shows up in the Library once the worker is done.
The importer rides on the existing job pipeline. After the download lands in S3, the asset re-enters the same proxy → thumbnail → ready path as a regular upload, so there is no parallel "imported asset" lifecycle to maintain.
Goals & non-goals
Goals
- Paste a public YouTube URL, end up with a
readyasset in the chosen project. - Reuse the existing
assetstable, S3 layout, BullMQ pipeline, and Jobs screen — no parallel state machine. - Progress visible from both the import screen (queue rows) and the Jobs screen.
- Clear, actionable errors for the obvious failure modes (private, age-gated, removed, geo-blocked, network).
Non-goals
- Playlists, channels, or batch-paste of multiple URLs. Single URL per submission. (Easy to add later.)
- Cookies / login. Private, members-only, and age-gated videos are out of scope v1.
- Quality picker. Always grabs best MP4 (with M4A audio merge fallback).
- Non-YouTube sources (Vimeo, Twitch VODs, Dropbox links, etc.). The route is
/imports/youtubeprecisely to leave room for siblings later. - Auto-update of yt-dlp inside the running container. Updates land via image rebuild.
- Copyright enforcement. We surface a one-line "only import videos you have rights to use" note and stop there.
Architecture
The importer threads through four existing layers:
[web-ui] YouTube screen ──POST /imports/youtube──▶ [mam-api]
│
assets row (status='ingesting')
jobs row (type='youtube_import')
│
BullMQ "import" queue
▼
[worker]
yt-dlp download → S3 originals/
ffprobe metadata → assets row
status='processing'
│
BullMQ "proxy" queue ◀── existing path
▼
proxy → thumbnail → ready
Once the worker hands off to the proxy queue, the asset is indistinguishable from one that came through Upload — same proxy worker, same thumbnail worker, same Library list.
1. UX
Nav
A 6th child is added to the Ingest group in shell.jsx, between Upload and Recorders:
{ id: "youtube", label: "YouTube", icon: "download" },
The download glyph already exists in icons.jsx. The matching ingestChildren array in shell.jsx and the crumbs map in app.jsx both gain "youtube".
Screen
A new YouTubeImport component lives in screens-ingest.jsx and is exported on window alongside Upload, Recorders, etc. It is registered as a route in app.jsx.
Layout — visually a sibling of the Upload screen:
- Header: title "YouTube", subtitle "Paste a link — we download and import the best available MP4."
- Project selector: same
selectelement as Upload's, pre-selected to the first project. - URL input: a single-line
field-inputwith placeholder "Paste a YouTube URL (youtube.com/watch, youtu.be, or shorts)…" and an inline Import button. Enter submits. The button is disabled until a URL pattern matches. - Subtitle line under the input: "Only import videos you have rights to use. Private, age-gated, and members-only videos are not supported."
- Queue panel: identical structure to Upload's queue — one row per submitted URL, showing:
- Source icon (use
linkglyph) and the URL (truncated middle, full URL intitletooltip). - Title once known (filled in by a poll on the asset row).
- Progress bar tied to job
progress(0–100). The worker drives this between 5 and 60 % for download and 60 to 100 % for upload + DB writes. - Status pill: queued → downloading → processing → done / failed.
- Error text if the job fails (red, one line).
- A "Clear done" button at the top of the queue.
- Source icon (use
The queue persists for the session in component state only — no separate UI table. Jobs screen remains the canonical history.
URL validation (client-side, before POST)
Accept (case-insensitive) any of these patterns:
https?://(www\.|m\.)?youtube\.com/watch\?[^ ]*v=[A-Za-z0-9_-]{11}https?://youtu\.be/[A-Za-z0-9_-]{11}https?://(www\.)?youtube\.com/shorts/[A-Za-z0-9_-]{11}
Anything else is rejected inline ("That doesn't look like a YouTube URL") without an API call. The server re-validates as a defense-in-depth check.
Out-of-scope v1 (called out, not built)
- Pasting a playlist URL. Server returns 400 "Playlists aren't supported yet."
- Multi-line paste. Single URL only.
- Quality picker. yt-dlp format string is hard-coded.
- Cookies upload. Private videos fail with a clear message.
2. API
Route
New file services/mam-api/src/routes/imports.js, mounted at /api/v1/imports in services/mam-api/src/index.js.
POST /api/v1/imports/youtube
Request body:
{ "url": "https://youtu.be/dQw4w9WgXcQ", "projectId": "uuid", "binId": "uuid?" }
Behavior:
- Validate
urlagainst the same three regexes as the client. 400 on miss. - Reject playlist URLs (URL contains
list=) with 400 "Playlists aren't supported yet." - Generate
assetId = uuidv4(). - Insert into
assetswith:status='ingesting'media_type='video'filename = url(placeholder; worker overwrites with the sanitized title once yt-dlp prints metadata — keeps the row queryable in the meantime)display_name = url(same; worker overwrites)original_s3_key = NULL(worker fills in)source_url = url(new column — see Schema)project_id,bin_id, timestamps.
- Insert into
jobswithtype='youtube_import',asset_id,payload={ url },status='queued',progress=0. - Enqueue BullMQ job on the
importqueue:await importQueue.add('youtube', { assetId, url }); - Respond
200 { assetId, jobId }.
Errors:
- Missing fields → 400.
- Bad URL → 400 with
error: 'Invalid YouTube URL'. - Playlist URL → 400 with
error: 'Playlists aren't supported yet'. - Project not found → 404.
- DB / queue failure → 500 (next(err)).
Jobs screen integration
services/web-ui/public/screens-jobs.jsx already normalizes job types via a kindMap. Add one entry:
const kindMap = { proxy: 'Proxy', thumbnail: 'Thumbnail', conform: 'Conform', transcode: 'Transcode', youtube_import: 'YouTube' };
Retry, delete, and the SSE event stream all work for the new type with no further changes because they key off job.id, not job.type.
3. Worker
Container changes
services/worker/Dockerfile gains two packages:
RUN apk add --no-cache ffmpeg yt-dlp python3
yt-dlp is in the Alpine community repo and pulls python3 as a runtime dep — we list it explicitly for clarity. Image grows by ~25 MB.
New worker
services/worker/src/workers/youtube-import.js, registered in services/worker/src/index.js:
const workers = [
createWorker('proxy', proxyWorker),
createWorker('thumbnail', thumbnailWorker),
createWorker('conform', conformWorker),
createWorker('import', youtubeImportWorker),
];
Job handler
For a job with { assetId, url }:
job.updateProgress(2)— accepted.- Build a temp directory
tmpdir()/yt-${jobId}. - Run yt-dlp:
yt-dlp \ --no-playlist \ --no-warnings \ --restrict-filenames \ -f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]/b" \ --merge-output-format mp4 \ --print-json \ --newline \ -o "<tmpdir>/<assetId>.%(ext)s" \ "<url>"--print-jsonwrites one JSON line at the end with title, duration, width, height, uploader, etc.--newlinemakes progress lines newline-terminated so we can parse them.--restrict-filenamesprevents shell-special characters in temp paths.
- Stream stdout line-by-line. Lines matching
r'\[download\]\s+(\d+(\.\d+)?)%'map tojob.updateProgress(5 + Math.floor(pct * 0.55))so download takes us from 5 to 60 %. - On yt-dlp non-zero exit: parse stderr for the first line containing
ERROR:and use it as the job's error message. Mark the assetstatus='error', mark the job failed, throw so BullMQ records it. Surface a friendly substitution for the common cases:- "Private video" → "Private video — not supported."
- "Sign in to confirm your age" → "Age-restricted video — not supported."
- "Video unavailable" → "Video unavailable or removed."
- "This video is not available in your country" → "Video is geo-blocked from this region."
- HTTP 429 → "YouTube rate-limited the importer — try again later."
- Anything else → use yt-dlp's stderr line verbatim, truncated to 300 chars.
- Parse the last stdout line as JSON to read metadata. The resulting file is
<tmpdir>/<assetId>.mp4. getMediaInfo(existing helper inservices/worker/src/ffmpeg/executor.js) on that path. Use ffprobe's values for codec/fps/duration when yt-dlp's are missing or wrong.- Sanitize the title for the S3 filename: keep
[A-Za-z0-9 ._-], collapse runs of whitespace, trim, cap at 120 chars, append.mp4. If the sanitized title is empty, fall back toyoutube-<videoId>.mp4. - Upload to
originals/{assetId}/{sanitized-title}.mp4via the existinguploadToS3helper. Progress 60 → 90 %. - UPDATE the assets row with:
filename = <sanitized title>.mp4display_name = <yt-dlp title untouched>original_s3_key = originals/<assetId>/<sanitized-title>.mp4codec,resolution,fps,duration_ms,file_sizefrom ffprobe.status = 'processing'updated_at = NOW()
- Enqueue a
proxyjob on the existingproxyqueue with the same payload shapeupload.jsuses:await proxyQueue.add('generate', { assetId, inputKey: asset.original_s3_key, outputKey: `proxies/${assetId}.mp4`, }); job.updateProgress(100). Return — BullMQ marks the import job done. The proxy job picks up the rest exactly like a regular upload.- Always
rm -rfthe temp directory in afinally.
Concurrency & retries
- Default BullMQ concurrency for this queue: 1 per worker process. Two simultaneous yt-dlp invocations risk YouTube rate-limiting more than they help throughput. Configurable later via env if needed.
- No automatic BullMQ retry — yt-dlp failures are almost always permanent (private, geo, removed) and a silent retry storm would chew through quota. The Jobs screen's manual Retry button is the right knob for "this should be transient" cases.
4. Schema migration
New file services/mam-api/src/db/migrations/011-youtube-import.sql:
-- 1. Add the new job type to the enum.
-- Postgres requires ALTER TYPE ... ADD VALUE for enum changes.
ALTER TYPE job_type ADD VALUE IF NOT EXISTS 'youtube_import';
-- 2. Remember where an asset came from. NULL for everything that
-- pre-dates the importer; populated for any imported asset.
ALTER TABLE assets ADD COLUMN IF NOT EXISTS source_url TEXT;
source_url is exposed on the asset drawer as a "Source" line ("imported from youtu.be/…") in a follow-up PR — out of scope for this spec, but worth noting that the column exists for it.
5. Files touched
New
services/mam-api/src/routes/imports.jsservices/mam-api/src/db/migrations/011-youtube-import.sqlservices/worker/src/workers/youtube-import.js
Edited
services/mam-api/src/index.js— mount the new route.services/web-ui/public/screens-ingest.jsx— addYouTubeImport, export onwindow.services/web-ui/public/shell.jsx— add the nav child, extendingestChildren.services/web-ui/public/app.jsx— register the route and the crumb.services/web-ui/public/screens-jobs.jsx— extendkindMapwithyoutube_import: 'YouTube'.services/worker/src/index.js— register theimportqueue worker.services/worker/Dockerfile— addyt-dlpandpython3to the apk install line.
6. Risks & trade-offs
- Worker egress. The worker container needs outbound HTTPS to YouTube. Fine in the current homelab; will fail in a locked-down cluster. Documented in the implementation plan.
- yt-dlp drift. YouTube changes break old yt-dlp versions every few weeks. The Alpine package lags upstream by days. Fix is to rebuild the worker image. We do not auto-update inside the running container — too risky for an offline / locked-down deploy. If imports start failing en masse, the runbook is
docker compose build worker && docker compose up -d worker. - Single-URL UX feels light. That is deliberate for v1. Adding multi-URL paste and playlist expansion are both small follow-ups once the single-URL path is stable.
- No copyright enforcement. We rely on the one-line notice in the UI. If misuse becomes a real concern, the next step would be an admin allowlist of domains or a per-user import quota — not in this spec.
filename = urlplaceholder. Briefly, the asset row in the Library shows the URL as the name. The worker overwrites it within seconds for a successful import. Acceptable; the Library already handles "ingesting" assets with placeholder names from the upload path.
7. Acceptance
The feature is done when:
- A user can navigate to Ingest → YouTube, paste a public YouTube URL, pick a project, click Import, and within a minute or two see the asset appear in the Library with proxy and thumbnail.
- A failed import (private video, removed video, bogus URL) shows a clear error message on both the YouTube screen's queue row and the Jobs screen.
- The Jobs screen lists "YouTube" jobs alongside Proxy / Thumbnail / Conform, with Retry working.
source_urlis populated on the imported asset row.- Image rebuild +
docker compose up -d workeris the documented recovery path if YouTube changes break yt-dlp.