FFmpeg – How to Mix Audio/Video File with Audio File with Offset


I have two media files that I want to combine using ffmpeg.

  1. File 1 contains audio and video and starts at time = 0 sec.
  2. File 2 contains contains audio only, and the start of this file is at time = 2.5 sec.

How do I use ffmpeg to combine these files, with the audio file properly offset?

This is what I have tried:

ffmpeg -i video_and_audio.webm -itsoffset 2.5 -i audio_only.webm -filter_complex amix out.webm

This results in an audio/video file with that appears to have the correct length, and with the audio from both files mixed, but the audio is not offset properly from the audio-only file. The audio-only file appears to start time=0, as if there were no itsoffset argument. I have also tried other values after itsoffset, thinking the units might not actually be seconds. Even with a value of 2500, the audio-only file still seems to start immediately.

ffmpeg output appears to be commonly requested:

ffmpeg version 2.8.2 Copyright (c) 2000-2015 the FFmpeg developers
  built with gcc 4.8 (SUSE Linux)
  configuration: --shlibdir=/usr/lib64 --prefix=/usr --mandir=/usr/share/man --libdir=/usr/lib64 --enable-shared --disable-static --enable-debug --disable-stripping --extra-cflags='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g' --enable-pic --optflags='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g' --enable-gpl --enable-x11grab --enable-version3 --enable-pthreads --datadir=/usr/share/ffmpeg --enable-avfilter --enable-libpulse --enable-libwebp --enable-libvpx --enable-libopus --enable-libmp3lame --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libxvid --enable-libx264 --enable-libx265 --enable-libschroedinger --enable-libgsm --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-postproc --enable-libdc1394 --enable-librtmp --enable-libfreetype --enable-avresample --enable-libtwolame --enable-libvo-aacenc --enable-gnutls --enable-libass --disable-decoder=dca --enable-libdcadec --enable-frei0r --enable-libcelt --enable-libcdio --enable-ladspa
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, matroska,webm, from 'video_and_audio.webm':
    encoder         : GStreamer matroskamux version 1.5.91
    creation_time   : 2015-11-23 02:57:26
  Duration: 00:01:20.78, start: 0.000000, bitrate: 323 kb/s
    Stream #0:0(eng): Video: vp8, yuv420p, 480x640, SAR 1:1 DAR 3:4, 1k tbr, 1k tbn, 1k tbc (default)
      title           : Video
    Stream #0:1(eng): Audio: vorbis, 48000 Hz, stereo, fltp (default)
      title           : Audio
Input #1, matroska,webm, from 'audio_only.webm':
    encoder         : GStreamer matroskamux version 1.5.91
    creation_time   : 2015-11-23 02:58:46
  Duration: 00:01:17.11, start: 0.000000, bitrate: 79 kb/s
    Stream #1:0(eng): Audio: vorbis, 48000 Hz, stereo, fltp (default)
      title           : Audio
[libopus @ 0x16c3320] No bit rate set. Defaulting to 96000 bps.
[libvpx-vp9 @ 0x16bd800] v1.3.0
Output #0, webm, to 'out.webm':
    encoder         : Lavf56.40.101
    Stream #0:0: Audio: opus (libopus), 48000 Hz, stereo, flt, 96 kb/s (default)
      encoder         : Lavc56.60.100 libopus
    Stream #0:1(eng): Video: vp9 (libvpx-vp9), yuv420p, 480x640 [SAR 1:1 DAR 3:4], q=-1--1, 200 kb/s, 1k fps, 1k tbn, 1k tbc (default)
      title           : Video
      encoder         : Lavc56.60.100 libvpx-vp9
Stream mapping:
  Stream #0:1 (vorbis) -> amix:input0 (graph 0)
  Stream #1:0 (vorbis) -> amix:input1 (graph 0)
  amix (graph 0) -> Stream #0:0 (libopus)
  Stream #0:0 -> #0:1 (vp8 (native) -> vp9 (libvpx-vp9))
Press [q] to stop, [?] for help
Input stream #0:0 frame changed from size:480x640 fmt:yuv420p to size:360x480 fmt:yuv420p
[libopus @ 0x16c3320] Queue input is backward in time30.41 bitrate= 265.3kbits/s    
Input stream #0:0 frame changed from size:360x480 fmt:yuv420p to size:240x320 fmt:yuv420p
[libopus @ 0x16c3320] Queue input is backward in time00.28 bitrate= 254.3kbits/s    
frame= 1077 fps=4.9 q=0.0 Lsize=    2433kB time=00:01:20.79 bitrate= 246.7kbits/s    
video:1660kB audio:738kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.456821%

Best Answer

I have successfully used the adelay filter to do this. Here is the command:

ffmpeg \
    -i video_and_audio.webm \
    -i audio_only.webm \
    -c:v copy \
    -filter_complex '[1:a] adelay=2500|2500 [delayed]; [0:a] [delayed] amix [out]' \
    -map 0:v \
    -map '[out]' \

Note that when using the adelay filter, the delay must be specified in milliseconds and must be specified individually for each audio channel. In this example, the audio is stereo (2-channel) so the delay is specified twice.

Line by line explanation:

  • first input file
  • second input file
  • copy video directly to output without any changes
  • complex filter which takes the audio of the second input, delays it, and then mixes that with the audio of the first input
  • selects the video stream from the first file to output
  • selects the mixed audio stream to output
  • output file name

You may not actually need the -map lines, but I prefer to use them.

Related Question