FFMPEG audio out of sync when transcoding (demuxing) from DV

audiodvffmpegsyncvideo

I've been stuck with this problem for months. I have over 50 DV tapes (from and old Sony camcorder) to be converted to a more modern, usable format (most likely H264). I've started off with pulling the files to my PC (via firewire) using DVGRAB. There I had two options: pulling RAW data from the dv tape, resulting in a muxed file OR demuxing it and saving to a DVI file.

That's where the problems started. Saving it to a DVI file resulted in the audio being out of sync. I thought it's a problem with DVGRAB so I saved the RAW files (which are synced correctly) and wanted to process them with ffmpeg.

It turns out that no matter how I demux it the audio is always out of sync. BEFORE you say anything about the sampling frequency – the audio differences are of absolutely random length. An hour long tape can have between 0.1 and 4 seconds of audio lag at the end.

Here's an example file that I've split into separate audio and video files to check the differences.

# ffprobe -i ./video_conversion/13.dv 
ffprobe version 2.8.4 Copyright (c) 2007-2015 the FFmpeg developers
  built with gcc 5.3.0 (GCC)
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gnutls --enable-gpl --enable-ladspa --enable-libass --enable-libbluray --enable-libdcadec --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-shared --enable-version3 --enable-x11grab
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
[dv @ 0x864f2a0] Detected timecode is invalid
[dv @ 0x864f2a0] Estimating duration from bitrate, this may be inaccurate
Input #0, dv, from './video_conversion/13.dv':
  Duration: 01:00:45.80, start: 0.000000, bitrate: 28800 kb/s
    Stream #0:0: Video: dvvideo, yuv420p, 720x576 [SAR 16:15 DAR 4:3], 28800 kb/s, 25 fps, 25 tbr, 25 tbn, 25 tbc
    Stream #0:1: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s

# ffprobe -i ./video_conversion/tmp/13.mp4
ffprobe version 2.8.4 Copyright (c) 2007-2015 the FFmpeg developers
  built with gcc 5.3.0 (GCC)
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gnutls --enable-gpl --enable-ladspa --enable-libass --enable-libbluray --enable-libdcadec --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-shared --enable-version3 --enable-x11grab
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './video_conversion/tmp/13.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf56.40.101
  Duration: 01:00:45.80, start: 0.000000, bitrate: 5685 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 720x576 [SAR 16:15 DAR 4:3], 5683 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler

# ffprobe -i ./video_conversion/tmp/13.mp3
ffprobe version 2.8.4 Copyright (c) 2007-2015 the FFmpeg developers
  built with gcc 5.3.0 (GCC)
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gnutls --enable-gpl --enable-ladspa --enable-libass --enable-libbluray --enable-libdcadec --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-shared --enable-version3 --enable-x11grab
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
[mp3 @ 0x954c2a0] Skipping 0 bytes of junk at 237.
Input #0, mp3, from './video_conversion/tmp/13.mp3':
  Metadata:
    encoder         : Lavf56.40.101
  Duration: 01:00:44.35, start: 0.023021, bitrate: 128 kb/s
    Stream #0:0: Audio: mp3, 48000 Hz, stereo, s16p, 128 kb/s
    Metadata:
      encoder         : Lavc56.60

This particular one differs by 1.448 seconds. As I said the differences vary greatly.

As for the solution. I could just stretch the audio and combine it with the video (I've tested that), but I can't be certain if the audio will be in sync somewhere in the middle of the recording.

I think I've pinpointed the source of this behaviour. Whenever I turn the camera on or off (as to start and stop recording) the video starts just a tiny bit faster then the audio. So the more "fragments" are on the tape, the more these differences add up.

How can I fix this? Is there a way to demux the audio and video with timestamps, so that after conversion they will add up correctly? Or is there anyway to fill these gaps in audio, so that both streams are the same size to begin with?

Best Answer

Here are three wildcard attempts at solving this issue:

Method 1a Use system time as timestamps

ffmpeg -use_wallclock_as_timestamps 1 -i input.dv \
       -c:v libx264 -b:v 4000k -c:a aac -b:a 128k -fflags +genpts method1.ts

Method 1b Use resampler with flag set to inject silence when input audio timestamps have gaps

ffmpeg -i input.dv -c:v libx264 -b:v 4000k \
       -af "aresample=async=1:first_pts=0" -c:a aac -b:a 128k -fflags +genpts method1.ts

Method 2 Merge with dummy audio

ffmpeg -i input.dv -f lavfi -i "aevalsrc=0:c=2:s=48000" \
       -filter_complex "[0:a][1:a]amerge[a]" -map 0:v -map "[a]" -c:v libx264 -b:v 4000k -c:a aac -b:a 128k -ac 2 -shortest method2.ts

Method 3 Combination of the above

ffmpeg -use_wallclock_as_timestamps 1 -i input.dv -f lavfi -use_wallclock_as_timestamps 1 -i "aevalsrc=0:c=2:s=48000" \
       -filter_complex "[0:a][1:a]amerge[a]" -map 0:v -map "[a]"  -c:v libx264 -b:v 4000k -c:a aac -b:a 128k -ac 2 -shortest method3.ts

You can test each of them for a short duration by inserting -t N e.g. -t 20 for a 20 second test.

If any of them work, we can then proceed to wrapping the output as MP4.

Related Question