Burn test: 5 assets errored during proxy with 'aborted'/'socket hang up'
during the master DOWNLOAD. The masters all exist in S3 (262-269MB) — it's
the connection-limited RustFS backend dropping streams when 8 jobs hammer it
at once. Two fixes:
1. downloadFromS3/uploadToS3 now retry transient failures (aborted, socket
hang up, ECONNRESET, timeout, 5xx, throttle) up to 5x with exponential
backoff, cleaning the partial file between download attempts. A single
mid-stream abort no longer errors the whole asset.
2. Reuse ONE shared S3 client instead of createS3Client()+client.destroy()
per call. The per-call destroy tore down the keep-alive agent's sockets
every time, so connection pooling never happened and each transfer opened
fresh connections — exactly what overwhelmed RustFS. A long-lived client
lets the keep-alive pool actually be reused.
Root cause of stuck 'processing', failed deletes, and dead playback:
The mam-api proxies media (/video, /hls pipe the full S3 body through
Express), holding long-lived streaming sockets. With the SDK's default
http agents (no keep-alive, unbounded but unpooled) those streams starved
control-plane calls — DeleteObject and the proxy worker's master download
— which timed out (10s connectionTimeout) in bursts.
Fixes:
- mam-api S3 client: dedicated keep-alive http/https Agents (maxSockets 256)
+ requestTimeout raised 30s→300s so large master GETs finish.
- worker S3 client: previously had NO handler config at all (SDK defaults).
Added keep-alive agents + 600s requestTimeout so proxy/conform master
downloads (hundreds of MB) don't stall and leave assets in 'processing'.
RustFS returns empty bodies for ranged GETs whose start offset is past
~5.9 MB on single-file proxy MP4s. HEAD reports correct size, full GET
(`bytes=0-`) works, but `bytes=8179166-` comes back 206 + correct
Content-Range header with zero bytes. Confirmed via direct S3 probe
against broadcastmgmt.cloud/dragonmam (see scratch tests).
Workaround in mam-api `GET /api/v1/assets/:id/video` until the proxy
worker emits HLS (planned v1.2.1):
- HEAD the object first to learn total size (also gives ETag /
Last-Modified for conditional requests).
- No-Range / unparseable-Range / pre-EOF requests \u2192 plain pipe.
- Parsed `bytes=N-M` requests below RUSTFS_RANGE_SAFE_START
(default 5_500_000) \u2192 direct ranged GET, RustFS handles fine.
- Anything reaching into the broken zone \u2192 stream from offset 0,
drop bytes below start, stop at end. Memory stays flat; extra
bandwidth = (end+1 - requested-size) per seek.
- Genuinely out-of-range \u2192 416 with Cache-Control: no-store so the
browser doesn't poison its cache.
Also stashes (not yet wired up) the HLS pieces we'll need for the
follow-up: `segmentToHls` ffmpeg helper + `uploadDirectoryToS3`
worker s3 helper. Harmless additions; not referenced by any code path
yet.
Confirmed against the affected asset (a72aaa03-...): bytes=0-100k +
50% +100k native pass-through; 70% +100k and near-EOF previously hung
the browser, now stream correctly via the stitched path.
Refs #143.