FFmpeg: audio start time way off

audioffmpegtimestamp

Real-time audio and video is recorded from one audio source and two video sources:

ffmpeg -y -copyts \
  -f pulse -thread_queue_size 1024 -i alsa_input.usb-Focusrite_Scarlett_2i2_USB_Y8CAJW2063E5BD-00.analog-stereo \
  -f v4l2 -thread_queue_size 1024 -video_size 1920x1080 -input_format mjpeg -i /dev/video0 \
  -f v4l2 -thread_queue_size 1024 -video_size 1920x1080 -input_format mjpeg -i /dev/video6 \
  -map 0:a -map 1:v -map 2:v -c:v libx264 -preset ultrafast test.mp4

The -copyts effects a synchronization of the two video streams (see also FFmpeg: synchronize streams from two webcams). But there is no audio to be heard on the recording. Looking at the start times in the output explains why:

Input #0, pulse, from 'alsa_input.usb-Focusrite_Scarlett_2i2_USB_Y8CAJW2063E5BD-00.analog-stereo':
  Duration: N/A, start: 1599927759.812456, bitrate: 1536 kb/s
    Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
Input #1, video4linux2,v4l2, from '/dev/video0':
  Duration: N/A, start: 54432.851793, bitrate: N/A
    Stream #1:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 1920x1080, 30 fps, 30 tbr, 1000k tbn, 1000k tbc
Input #2, video4linux2,v4l2, from '/dev/video6':
  Duration: N/A, start: 54433.882342, bitrate: N/A
    Stream #2:0: Video: mjpeg, yuvj422p(pc, bt470bg/unknown/unknown), 1920x1080, 30 fps, 30 tbr, 1000k tbn, 1000k tbc

The two times from the video devices are close together, as expected: 54432.851793 and 54433.882342. But that from the audio device is way off: 1599927759.812456.

Any ideas how to fix this?

Best Answer

The time provided by the audio device actually is the saner one: it appears to be the time since the Epoch. See https://stackoverflow.com/questions/10266451/where-does-v4l2-buffer-timestamp-value-starts-counting for more on this. This also indicates a way to fix this: -af "asetpts=PTS-x/TB", where x is the time in seconds when the system was booted (not the uptime, but the point in time when the system was booted). You can get it from /proc/stats, the line starting with btime. It appears that this can be messed up when the system goes to suspend or similar, so pay attention to this.

The use of asetpts and setpts can also get rid of the second pass with -start_at_zero, which was suggested in FFmpeg: synchronize streams from two webcams. Here is a full example solution:

btime_=`grep btime /proc/stat | awk '{print $2}'`
utime_=`awk '{print $1}' /proc/uptime`
btime_utime_=`echo "${btime_?} + ${utime_?}" | bc -l`
ffmpeg -y -copyts \
  -f pulse -thread_queue_size 1024 -i alsa_input.usb-Focusrite_Scarlett_2i2_USB_Y8CAJW2063E5BD-00.analog-stereo \
  -f v4l2 -thread_queue_size 1024 -video_size 1920x1080 -input_format mjpeg -i /dev/video0 \
  -f v4l2 -thread_queue_size 1024 -video_size 1920x1080 -input_format mjpeg -i /dev/video6 \
  -map 0:a -map 1:v -map 2:v -c:v libx264 -preset ultrafast \
  -af "asetpts=PTS-${btime_utime_?}/TB" \
  -vf "setpts=PTS-${utime_?}/TB" \
  test.mp4

Note that shifting by the uptime (in order to avoid the second pass with -start_at_zero) is not totally exact in that the first PTS will be slightly more than zero. But it is good enough for my purposes so far.

Related Question