fix(worker): conform — lock fps + sample rate in concat filter graph

After the demuxer → filter switch, concat still failed with
  [fc#0] Error sending frames to consumers: Invalid argument
on Job 8. The filter graph normalised pixels (scale+pad+yuv420p) but
left the time-domain axes mixed:

  segment-1: 23.98 fps video, 44100 Hz audio
  segment-2: 60    fps video, 48000 Hz audio
  segment-3: …

ffmpeg 8's concat filter requires identical frame rate + audio sample
rate + channel layout across inputs. Force them on each leg:

  video: fps=<seqFps>, setpts=PTS-STARTPTS
  audio: aresample=48000,
         aformat=channel_layouts=stereo:sample_fmts=fltp,
         asetpts=PTS-STARTPTS

setpts/asetpts re-zero each input's clock so concat's per-input PTS
window resets cleanly between segments.

Target fps comes from the sequence's frame_rate (rounded) — same axis
the sequence editor stores. Sample rate is pinned to 48000 (broadcast
standard) so the AAC encode is consistent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Claude 2026-05-28 15:21:23 -04:00
parent 94b6710e2d
commit fcf4c8bbe7

View file

@ -282,29 +282,44 @@ export const conformWorker = async (job) => {
const inputArgs = [];
concatList.forEach(p => { inputArgs.push('-i', p); });
// Build the filter graph: scale each video stream to a consistent
// resolution + pixel format, then concat them. The audio leg only
// runs if audio is being kept.
const targetW = isProRes ? 1920 : 1920;
// Build the filter graph. The concat filter in ffmpeg 8.x requires
// identical resolution, pixel format, SAR, FRAME RATE and audio
// SAMPLE RATE / CHANNEL LAYOUT across all inputs. Different-spec
// sources (e.g. a 23.98 fps clip + a 60 fps clip) trip
// [fc#0] Error sending frames to consumers: Invalid argument
// even though our earlier scale+pad+format pass took care of the
// pixel side. Force the time-domain axes too:
// fps=<target> — resample video to a constant rate
// setpts=PTS-STARTPTS — re-zero PTS so concat's per-input clock
// resets cleanly
// aresample=48000 — force a single audio sample rate
// asetpts=PTS-STARTPTS — same for audio
const targetW = 1920;
const targetH = 1080;
const targetFps = Math.round(seqFps) || 30;
const targetSampleRate = 48000;
const vLabels = [];
const aLabels = [];
let normalize = '';
for (let i = 0; i < concatList.length; i++) {
// scale=W:H force_original_aspect_ratio=decrease + pad to box keeps
// mixed-aspect sources inside the frame without distortion.
normalize += `[${i}:v:0]scale=${targetW}:${targetH}:force_original_aspect_ratio=decrease,pad=${targetW}:${targetH}:(ow-iw)/2:(oh-ih)/2,setsar=1,format=yuv420p[v${i}];`;
normalize +=
`[${i}:v:0]fps=${targetFps},` +
`scale=${targetW}:${targetH}:force_original_aspect_ratio=decrease,` +
`pad=${targetW}:${targetH}:(ow-iw)/2:(oh-ih)/2,` +
`setsar=1,format=yuv420p,setpts=PTS-STARTPTS[v${i}];`;
vLabels.push(`[v${i}]`);
if (wantAudio) {
// anullsrc as a fallback so missing audio doesn't blow up concat.
normalize += `[${i}:a:0]aresample=async=1:first_pts=0[a${i}];`;
normalize +=
`[${i}:a:0]aresample=${targetSampleRate},` +
`aformat=channel_layouts=stereo:sample_fmts=fltp,` +
`asetpts=PTS-STARTPTS[a${i}];`;
aLabels.push(`[a${i}]`);
}
}
const n = concatList.length;
let concatExpr;
if (wantAudio) {
// interleaved [v0][a0][v1][a1]…
const interleaved = [];
for (let i = 0; i < n; i++) { interleaved.push(vLabels[i], aLabels[i]); }
concatExpr = `${interleaved.join('')}concat=n=${n}:v=1:a=1[outv][outa]`;