BUG: Thumbnail failure silently overrides proxy failure — flips asset to ready even when proxy is absent #87

Closed
opened 2026-05-25 19:25:49 -04:00 by zgaetano · 0 comments
Owner

Description

In services/worker/src/workers/thumbnail.js, the catch block unconditionally sets the asset status = 'ready' when a thumbnail job fails:

} catch (error) {
  // Thumbnail failed but the asset is still usable via proxy — mark it ready
  // so it doesn't stay in 'processing' state forever.
  await query(
    `UPDATE assets SET status = 'ready', updated_at = NOW() WHERE id = $1`,
    [assetId]
  );
  throw error;
}

The comment correctly describes the happy-failure case: proxy succeeded, thumbnail failed → mark ready so the asset isn't stuck in processing. However, when the proxy job also fails, the proxy worker sets status = 'error'. If the thumbnail job was already queued and runs after the proxy fails (BullMQ delivers queued jobs regardless), the thumbnail job fails to download from S3 (there is no proxy file), then its catch block flips the asset status from 'error' back to 'ready' — even though no proxy and no thumbnail exist.

Sequence that reproduces the bug:

  1. File upload succeeds → asset status = 'processing'
  2. Proxy job starts, transcoding fails (e.g., corrupt source, ffmpeg crash)
  3. Proxy worker sets status = 'error'
  4. Meanwhile the thumbnail job was already enqueued (it's queued at the end of the proxy worker — but wait, the thumbnail is only queued after successful proxy upload: await thumbnailQueue.add('generate', ...) comes after await uploadToS3(...) in proxy.js). So the thumbnail job is only queued if the proxy upload succeeds.

Revised assessment: The sequence is safe in the normal case. However, there is a scenario where proxy_s3_key and thumbnail_s3_key are already set (from a prior successful run), then a re-queued thumbnail job fails for an asset that was later set to 'error' by a different pathway (e.g., operator re-ran a proxy job on a corrupted re-upload). In that case, the thumbnail catch would silently flip 'error''ready'.

Also: If someone manually enqueues a thumbnail job for an asset that has no proxy (e.g., via the API or a future admin tool), the download will 404, the catch will set status = 'ready', and the asset appears ready with no proxy and no thumbnail.

Suggested fix: In the thumbnail catch block, only set status = 'ready' when the current status is 'processing' (i.e., don't silently override 'error'):

await query(
  `UPDATE assets SET status = 'ready', updated_at = NOW()
   WHERE id = $1 AND status = 'processing'`,
  [assetId]
);
## Description In `services/worker/src/workers/thumbnail.js`, the `catch` block unconditionally sets the asset `status = 'ready'` when a thumbnail job fails: ```js } catch (error) { // Thumbnail failed but the asset is still usable via proxy — mark it ready // so it doesn't stay in 'processing' state forever. await query( `UPDATE assets SET status = 'ready', updated_at = NOW() WHERE id = $1`, [assetId] ); throw error; } ``` The comment correctly describes the happy-failure case: proxy succeeded, thumbnail failed → mark ready so the asset isn't stuck in `processing`. However, **when the proxy job also fails**, the proxy worker sets `status = 'error'`. If the thumbnail job was already queued and runs after the proxy fails (BullMQ delivers queued jobs regardless), the thumbnail job fails to download from S3 (there is no proxy file), then its `catch` block flips the asset status from `'error'` back to `'ready'` — even though no proxy and no thumbnail exist. **Sequence that reproduces the bug:** 1. File upload succeeds → asset `status = 'processing'` 2. Proxy job starts, transcoding fails (e.g., corrupt source, ffmpeg crash) 3. Proxy worker sets `status = 'error'` 4. Meanwhile the thumbnail job was already enqueued (it's queued at the end of the proxy worker — **but wait**, the thumbnail is only queued after successful proxy upload: `await thumbnailQueue.add('generate', ...)` comes after `await uploadToS3(...)` in proxy.js). So the thumbnail job is only queued if the proxy upload succeeds. **Revised assessment:** The sequence is safe in the *normal* case. However, there is a scenario where `proxy_s3_key` and `thumbnail_s3_key` are already set (from a prior successful run), then a *re-queued* thumbnail job fails for an asset that was later set to `'error'` by a different pathway (e.g., operator re-ran a proxy job on a corrupted re-upload). In that case, the thumbnail `catch` would silently flip `'error'` → `'ready'`. **Also:** If someone manually enqueues a thumbnail job for an asset that has no proxy (e.g., via the API or a future admin tool), the download will 404, the `catch` will set `status = 'ready'`, and the asset appears ready with no proxy and no thumbnail. **Suggested fix:** In the thumbnail `catch` block, only set `status = 'ready'` when the current status is `'processing'` (i.e., don't silently override `'error'`): ```js await query( `UPDATE assets SET status = 'ready', updated_at = NOW() WHERE id = $1 AND status = 'processing'`, [assetId] ); ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: WildDragonLLC/dragonflight#87
No description provided.