dragonflight/docs/superpowers/specs/2026-05-23-youtube-importer-design.md
Zac Gaetano 7a2710dc9a docs: design spec for YouTube importer
Adds a paste-URL ingest path under Ingest → YouTube. Worker hosts
yt-dlp, downloads to S3, then hands off to the existing proxy +
thumbnail pipeline so imported assets share one lifecycle with uploads.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 16:04:28 -04:00

15 KiB
Raw Blame History

YouTube Importer — Design Spec

Status: design approved 2026-05-23, awaiting user review of the spec before the implementation plan is written.

Context

The Ingest group in Dragonflight today covers file Upload, Recorders (SRT/RTMP/SDI), Capture (DeckLink), Monitors, and Schedule. There is no path to bring in media that already lives on the public web. The frequent ask is: "I want to grab a YouTube link and have it become an asset in my project, with the same proxy/thumbnail pipeline as anything else." This spec adds a YouTube importer that mirrors the existing upload flow: paste a URL, pick a project, click Import, and the asset shows up in the Library once the worker is done.

The importer rides on the existing job pipeline. After the download lands in S3, the asset re-enters the same proxy → thumbnail → ready path as a regular upload, so there is no parallel "imported asset" lifecycle to maintain.

Goals & non-goals

Goals

  • Paste a public YouTube URL, end up with a ready asset in the chosen project.
  • Reuse the existing assets table, S3 layout, BullMQ pipeline, and Jobs screen — no parallel state machine.
  • Progress visible from both the import screen (queue rows) and the Jobs screen.
  • Clear, actionable errors for the obvious failure modes (private, age-gated, removed, geo-blocked, network).

Non-goals

  • Playlists, channels, or batch-paste of multiple URLs. Single URL per submission. (Easy to add later.)
  • Cookies / login. Private, members-only, and age-gated videos are out of scope v1.
  • Quality picker. Always grabs best MP4 (with M4A audio merge fallback).
  • Non-YouTube sources (Vimeo, Twitch VODs, Dropbox links, etc.). The route is /imports/youtube precisely to leave room for siblings later.
  • Auto-update of yt-dlp inside the running container. Updates land via image rebuild.
  • Copyright enforcement. We surface a one-line "only import videos you have rights to use" note and stop there.

Architecture

The importer threads through four existing layers:

[web-ui]  YouTube screen  ──POST /imports/youtube──▶  [mam-api]
                                                          │
                                            assets row (status='ingesting')
                                            jobs row    (type='youtube_import')
                                                          │
                                              BullMQ "import" queue
                                                          ▼
                                                  [worker]
                                            yt-dlp download → S3 originals/
                                            ffprobe metadata → assets row
                                            status='processing'
                                                          │
                                              BullMQ "proxy" queue   ◀── existing path
                                                          ▼
                                            proxy → thumbnail → ready

Once the worker hands off to the proxy queue, the asset is indistinguishable from one that came through Upload — same proxy worker, same thumbnail worker, same Library list.

1. UX

Nav

A 6th child is added to the Ingest group in shell.jsx, between Upload and Recorders:

{ id: "youtube", label: "YouTube", icon: "download" },

The download glyph already exists in icons.jsx. The matching ingestChildren array in shell.jsx and the crumbs map in app.jsx both gain "youtube".

Screen

A new YouTubeImport component lives in screens-ingest.jsx and is exported on window alongside Upload, Recorders, etc. It is registered as a route in app.jsx.

Layout — visually a sibling of the Upload screen:

  • Header: title "YouTube", subtitle "Paste a link — we download and import the best available MP4."
  • Project selector: same select element as Upload's, pre-selected to the first project.
  • URL input: a single-line field-input with placeholder "Paste a YouTube URL (youtube.com/watch, youtu.be, or shorts)…" and an inline Import button. Enter submits. The button is disabled until a URL pattern matches.
  • Subtitle line under the input: "Only import videos you have rights to use. Private, age-gated, and members-only videos are not supported."
  • Queue panel: identical structure to Upload's queue — one row per submitted URL, showing:
    • Source icon (use link glyph) and the URL (truncated middle, full URL in title tooltip).
    • Title once known (filled in by a poll on the asset row).
    • Progress bar tied to job progress (0100). The worker drives this between 5 and 60 % for download and 60 to 100 % for upload + DB writes.
    • Status pill: queued → downloading → processing → done / failed.
    • Error text if the job fails (red, one line).
    • A "Clear done" button at the top of the queue.

The queue persists for the session in component state only — no separate UI table. Jobs screen remains the canonical history.

URL validation (client-side, before POST)

Accept (case-insensitive) any of these patterns:

  • https?://(www\.|m\.)?youtube\.com/watch\?[^ ]*v=[A-Za-z0-9_-]{11}
  • https?://youtu\.be/[A-Za-z0-9_-]{11}
  • https?://(www\.)?youtube\.com/shorts/[A-Za-z0-9_-]{11}

Anything else is rejected inline ("That doesn't look like a YouTube URL") without an API call. The server re-validates as a defense-in-depth check.

Out-of-scope v1 (called out, not built)

  • Pasting a playlist URL. Server returns 400 "Playlists aren't supported yet."
  • Multi-line paste. Single URL only.
  • Quality picker. yt-dlp format string is hard-coded.
  • Cookies upload. Private videos fail with a clear message.

2. API

Route

New file services/mam-api/src/routes/imports.js, mounted at /api/v1/imports in services/mam-api/src/index.js.

POST /api/v1/imports/youtube

Request body:

{ "url": "https://youtu.be/dQw4w9WgXcQ", "projectId": "uuid", "binId": "uuid?" }

Behavior:

  1. Validate url against the same three regexes as the client. 400 on miss.
  2. Reject playlist URLs (URL contains list=) with 400 "Playlists aren't supported yet."
  3. Generate assetId = uuidv4().
  4. Insert into assets with:
    • status='ingesting'
    • media_type='video'
    • filename = url (placeholder; worker overwrites with the sanitized title once yt-dlp prints metadata — keeps the row queryable in the meantime)
    • display_name = url (same; worker overwrites)
    • original_s3_key = NULL (worker fills in)
    • source_url = url (new column — see Schema)
    • project_id, bin_id, timestamps.
  5. Insert into jobs with type='youtube_import', asset_id, payload={ url }, status='queued', progress=0.
  6. Enqueue BullMQ job on the import queue:
    await importQueue.add('youtube', { assetId, url });
    
  7. Respond 200 { assetId, jobId }.

Errors:

  • Missing fields → 400.
  • Bad URL → 400 with error: 'Invalid YouTube URL'.
  • Playlist URL → 400 with error: 'Playlists aren't supported yet'.
  • Project not found → 404.
  • DB / queue failure → 500 (next(err)).

Jobs screen integration

services/web-ui/public/screens-jobs.jsx already normalizes job types via a kindMap. Add one entry:

const kindMap = { proxy: 'Proxy', thumbnail: 'Thumbnail', conform: 'Conform', transcode: 'Transcode', youtube_import: 'YouTube' };

Retry, delete, and the SSE event stream all work for the new type with no further changes because they key off job.id, not job.type.

3. Worker

Container changes

services/worker/Dockerfile gains two packages:

RUN apk add --no-cache ffmpeg yt-dlp python3

yt-dlp is in the Alpine community repo and pulls python3 as a runtime dep — we list it explicitly for clarity. Image grows by ~25 MB.

New worker

services/worker/src/workers/youtube-import.js, registered in services/worker/src/index.js:

const workers = [
  createWorker('proxy', proxyWorker),
  createWorker('thumbnail', thumbnailWorker),
  createWorker('conform', conformWorker),
  createWorker('import', youtubeImportWorker),
];

Job handler

For a job with { assetId, url }:

  1. job.updateProgress(2) — accepted.
  2. Build a temp directory tmpdir()/yt-${jobId}.
  3. Run yt-dlp:
    yt-dlp \
      --no-playlist \
      --no-warnings \
      --restrict-filenames \
      -f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]/b" \
      --merge-output-format mp4 \
      --print-json \
      --newline \
      -o "<tmpdir>/<assetId>.%(ext)s" \
      "<url>"
    
    • --print-json writes one JSON line at the end with title, duration, width, height, uploader, etc.
    • --newline makes progress lines newline-terminated so we can parse them.
    • --restrict-filenames prevents shell-special characters in temp paths.
  4. Stream stdout line-by-line. Lines matching r'\[download\]\s+(\d+(\.\d+)?)%' map to job.updateProgress(5 + Math.floor(pct * 0.55)) so download takes us from 5 to 60 %.
  5. On yt-dlp non-zero exit: parse stderr for the first line containing ERROR: and use it as the job's error message. Mark the asset status='error', mark the job failed, throw so BullMQ records it. Surface a friendly substitution for the common cases:
    • "Private video" → "Private video — not supported."
    • "Sign in to confirm your age" → "Age-restricted video — not supported."
    • "Video unavailable" → "Video unavailable or removed."
    • "This video is not available in your country" → "Video is geo-blocked from this region."
    • HTTP 429 → "YouTube rate-limited the importer — try again later."
    • Anything else → use yt-dlp's stderr line verbatim, truncated to 300 chars.
  6. Parse the last stdout line as JSON to read metadata. The resulting file is <tmpdir>/<assetId>.mp4.
  7. getMediaInfo (existing helper in services/worker/src/ffmpeg/executor.js) on that path. Use ffprobe's values for codec/fps/duration when yt-dlp's are missing or wrong.
  8. Sanitize the title for the S3 filename: keep [A-Za-z0-9 ._-], collapse runs of whitespace, trim, cap at 120 chars, append .mp4. If the sanitized title is empty, fall back to youtube-<videoId>.mp4.
  9. Upload to originals/{assetId}/{sanitized-title}.mp4 via the existing uploadToS3 helper. Progress 60 → 90 %.
  10. UPDATE the assets row with:
    • filename = <sanitized title>.mp4
    • display_name = <yt-dlp title untouched>
    • original_s3_key = originals/<assetId>/<sanitized-title>.mp4
    • codec, resolution, fps, duration_ms, file_size from ffprobe.
    • status = 'processing'
    • updated_at = NOW()
  11. Enqueue a proxy job on the existing proxy queue with the same payload shape upload.js uses:
    await proxyQueue.add('generate', {
      assetId,
      inputKey:  asset.original_s3_key,
      outputKey: `proxies/${assetId}.mp4`,
    });
    
  12. job.updateProgress(100). Return — BullMQ marks the import job done. The proxy job picks up the rest exactly like a regular upload.
  13. Always rm -rf the temp directory in a finally.

Concurrency & retries

  • Default BullMQ concurrency for this queue: 1 per worker process. Two simultaneous yt-dlp invocations risk YouTube rate-limiting more than they help throughput. Configurable later via env if needed.
  • No automatic BullMQ retry — yt-dlp failures are almost always permanent (private, geo, removed) and a silent retry storm would chew through quota. The Jobs screen's manual Retry button is the right knob for "this should be transient" cases.

4. Schema migration

New file services/mam-api/src/db/migrations/011-youtube-import.sql:

-- 1. Add the new job type to the enum.
--    Postgres requires ALTER TYPE ... ADD VALUE for enum changes.
ALTER TYPE job_type ADD VALUE IF NOT EXISTS 'youtube_import';

-- 2. Remember where an asset came from. NULL for everything that
--    pre-dates the importer; populated for any imported asset.
ALTER TABLE assets ADD COLUMN IF NOT EXISTS source_url TEXT;

source_url is exposed on the asset drawer as a "Source" line ("imported from youtu.be/…") in a follow-up PR — out of scope for this spec, but worth noting that the column exists for it.

5. Files touched

New

  • services/mam-api/src/routes/imports.js
  • services/mam-api/src/db/migrations/011-youtube-import.sql
  • services/worker/src/workers/youtube-import.js

Edited

  • services/mam-api/src/index.js — mount the new route.
  • services/web-ui/public/screens-ingest.jsx — add YouTubeImport, export on window.
  • services/web-ui/public/shell.jsx — add the nav child, extend ingestChildren.
  • services/web-ui/public/app.jsx — register the route and the crumb.
  • services/web-ui/public/screens-jobs.jsx — extend kindMap with youtube_import: 'YouTube'.
  • services/worker/src/index.js — register the import queue worker.
  • services/worker/Dockerfile — add yt-dlp and python3 to the apk install line.

6. Risks & trade-offs

  • Worker egress. The worker container needs outbound HTTPS to YouTube. Fine in the current homelab; will fail in a locked-down cluster. Documented in the implementation plan.
  • yt-dlp drift. YouTube changes break old yt-dlp versions every few weeks. The Alpine package lags upstream by days. Fix is to rebuild the worker image. We do not auto-update inside the running container — too risky for an offline / locked-down deploy. If imports start failing en masse, the runbook is docker compose build worker && docker compose up -d worker.
  • Single-URL UX feels light. That is deliberate for v1. Adding multi-URL paste and playlist expansion are both small follow-ups once the single-URL path is stable.
  • No copyright enforcement. We rely on the one-line notice in the UI. If misuse becomes a real concern, the next step would be an admin allowlist of domains or a per-user import quota — not in this spec.
  • filename = url placeholder. Briefly, the asset row in the Library shows the URL as the name. The worker overwrites it within seconds for a successful import. Acceptable; the Library already handles "ingesting" assets with placeholder names from the upload path.

7. Acceptance

The feature is done when:

  • A user can navigate to Ingest → YouTube, paste a public YouTube URL, pick a project, click Import, and within a minute or two see the asset appear in the Library with proxy and thumbnail.
  • A failed import (private video, removed video, bogus URL) shows a clear error message on both the YouTube screen's queue row and the Jobs screen.
  • The Jobs screen lists "YouTube" jobs alongside Proxy / Thumbnail / Conform, with Retry working.
  • source_url is populated on the imported asset row.
  • Image rebuild + docker compose up -d worker is the documented recovery path if YouTube changes break yt-dlp.