Audio Sync – Fixing Audio Desync Issues with avconv Commands

I'm currently trying to combine several videos from different devices into 1 video. The setup is as following:

User records 3 videos on a phone ( android or iOs )
The system sends these video's (mp4 format) to our server
The server moves the files and transforms these into TS-files (why? because we thought it would be nice to format every video to this format)

After these steps, we would like to combine it with 4 of our videos. This would mean that, by calling our videos 'ov', the final combined steps would be:

ov1 > sent video 1 > ov2 > sent video 2 > ov3 > sent video 3 > ov4

This does bring up some issues. We've been formatting every video to the same formats, same audio channels and what so ever with these lines (only thing that changes is the filename)

avconv -ss 0 -i "48_0.85825500-14815530815710.mp4" -vcodec libx264 -acodec aac -bsf:a aac_adtstoasc -bsf:v h264_mp4toannexb -f mpegts -strict experimental -y "tmp_48_0.ts"

This means, as far as we are concerned, that every video will be compiled to the same video(h264_mp4toannexb) and audio(aac_adtstoasc) format. While combining these videos like so, the audio desyncs:

avconv -i concat:"2connect-ae.ts|tmp_48_1.ts" -c copy -bsf:a aac_adtstoasc -bsf:v h264_mp4toannexb -y "output.mp4"

We've been trying to change the way we compress/compile/transform videos in different ways, from different sources but we had no success. We got 4 exported .wov files from Final Cut Pro and Adobe After Effects. Changing our tools didn't work.

Somehow the sound from our videos (ov1, ov2, ov3, ov4) are overlapping into the videos sent by the user, resulting in a delay of sound which keeps increasing every second. We are a kind of mind blown why this keeps occuring, any help would be greatly appreciated.

If we're attaching the sent video's to each other, there is no delay. If we add any video from ourself into the list, it starts to delay the audio. If you would like to have some more information, please do let me know.

Information about our versions:

avconv version 9.11-6:9.11-2ubuntu2, Copyright (c) 2000-2013 the Libav developers built on Mar 24 2014 06:12:33 with gcc 4.8 (Ubuntu 4.8.2-17ubuntu1) avconv 9.11-6:9.11-2ubuntu2 libavutil 52. 3. 0 / 52. 3. 0 libavcodec 54. 35. 0 / 54. 35. 0 libavformat 54. 20. 3 / 54. 20. 4 libavdevice 53. 2. 0 / 53. 2. 0 libavfilter 3. 3. 0 / 3. 3. 0 libavresample 1. 0. 1 / 1. 0. 1 libswscale 2. 1. 1 / 2. 1. 1

Edit for the given comments:
Currently we're trying to transform every video into the same dimensions using ths option:
-filter:v 'scale=iw*min(1920/iw\,1080/ih):ih*min(1920/iw\,1080/ih), pad=1920:1080:(1920-iw*min(1920/iw\,1080/ih))/2:(1080-ih*min(1920/iw\,1080/ih))/2'

Every video gets transformed into 1920×1080. We're also trying to take out every option we had and build our way back up to what we had to see where it went wrong. But this does seem like a huge step and a kind of desperate one too.

edit2

These are the commands we're currently running:

avconv -i "33_0.98002500-14821542538733.mp4" -f mpegts -filter:v 'scale=iw*min(1920/iw\,1080/ih):ih*min(1920/iw\,1080/ih), pad=1920:1080:(1920-iw*min(1920/iw\,1080/ih))/2:(1080-ih*min(1920/iw\,1080/ih))/2' "tmp_ruud_0.ts"
avconv -i "33_0.57471800-14821542544448.mp4" -f mpegts -filter:v 'scale=iw*min(1920/iw\,1080/ih):ih*min(1920/iw\,1080/ih), pad=1920:1080:(1920-iw*min(1920/iw\,1080/ih))/2:(1080-ih*min(1920/iw\,1080/ih))/2' "tmp_ruud_1.ts"
avconv -i "33_0.27939600-14821542541226.mp4" -f mpegts -filter:v 'scale=iw*min(1920/iw\,1080/ih):ih*min(1920/iw\,1080/ih), pad=1920:1080:(1920-iw*min(1920/iw\,1080/ih))/2:(1080-ih*min(1920/iw\,1080/ih))/2' "tmp_ruud_2.ts"

avconv -i "templates/v2/wiebenik-v2.mov" -f mpegts "wiebenik-v5.ts"
avconv -i "templates/v2/ikkanjehelpenmet-v2.mov" -f mpegts "ikkanjehelpenmet-v5.ts"
avconv -i "templates/v2/ikbenopzoeknaar-v2.mov" -f mpegts "ikbenopzoeknaar-v5.ts"

avconv -i concat:"wiebenik-v5.ts|tmp_ruud_0.ts|ikkanjehelpenmet-v5.ts|tmp_ruud_1.ts|ikbenopzoeknaar-v5.ts|tmp_ruud_2.ts" -strict experimental "test.mp4"

Edit 3; I'm currently converting every file and changing their size to the same formats and also normalizing them, as show here:

avconv -i "public/videos/44_0.90768300-14822311688474.mp4" -filter:v "transpose=1, scale=iw*min(1920/iw\,1080/ih):ih*min(1920/iw\,1080/ih), pad=1920:1080:(1920-iw*min(1920/iw\,1080/ih))/2:(1080-ih*min(1920/iw\,1080/ih))/2" -c:a copy "public/videos/tmp_44_0.mp4"
avconv -i public/videos/tmp_44_0.mp4 -c:v copy -c:a libmp3lame -b:a 128k -ac 2 -ar 48000 public/videos/tmp_44_0_normalized.mp4
avconv -i "public/videos/44_0.09416600-14822311719723.mp4" -filter:v "transpose=1, scale=iw*min(1920/iw\,1080/ih):ih*min(1920/iw\,1080/ih), pad=1920:1080:(1920-iw*min(1920/iw\,1080/ih))/2:(1080-ih*min(1920/iw\,1080/ih))/2" -c:a copy "public/videos/tmp_44_1.mp4"
avconv -i public/videos/tmp_44_1.mp4 -c:v copy -c:a libmp3lame -b:a 128k -ac 2 -ar 48000 public/videos/tmp_44_1_normalized.mp4
avconv -i "public/videos/44_0.37376500-14822311735955.mp4" -filter:v "transpose=1, scale=iw*min(1920/iw\,1080/ih):ih*min(1920/iw\,1080/ih), pad=1920:1080:(1920-iw*min(1920/iw\,1080/ih))/2:(1080-ih*min(1920/iw\,1080/ih))/2" -c:a copy "public/videos/tmp_44_2.mp4"
avconv -i public/videos/tmp_44_2.mp4 -c:v copy -c:a libmp3lame -b:a 128k -ac 2 -ar 48000 public/videos/tmp_44_2_normalized.mp4

After these changes, I'm combining them like so:

MP4Box public/videos/templates/v2/wiebenik-v2.mp4 -cat public/videos/tmp_44_0_normalized.mp4 -cat public/videos/templates/v2/ikkanjehelpenmet-v2.mp4 -cat public/videos/tmp_44_1_normalized.mp4 -cat public/videos/templates/v2/ikbenopzoeknaar-v2.mp4 -cat public/videos/tmp_44_2_normalized.mp4 -cat public/videos/templates/v2/2connect-v2.mp4 -out public/pitches/44_9b374c31dcaa7433daf0f5163a3789dc.mp4

But somehow, even with just 2 files, I'm getting the error:

Cannot concatenate files: Different AVC Level Indication between source (42) and destination (40)

It turns out that my files have a different version of the high quality (4 and 4.2), no idea how to resolve this yet.

Best Answer

It would appear that the problem you are having is related to a bad combination of choices.

You've chosen an MPEG transport stream as a container: Transport streams tend to be broadcast as constant bitrate (CBR) to maintain a consistent broadcast rate, filled with padding bytes when not enough data exists. It would seem that this has been complicated by the choice of x264 as a codec as X264 does not have a native constant bitrate mode. To me this seems counter-intuitive.

If I were in charge of this project there are a number if things I would want to insure.

1). All clips intended to be joined must be normalized (I.E. same video dimensions, same audio sampling rates, same audio and video bitrates)

2) The codecs and container format I choose have compatible characteristics.

It would likely be prudent to do all editing with lossless samples at a specific bit-rate to insure accurate timing.

As you are beginning with aac which is lossy compression your audio timing will be questionable.

Further research indicates that there is a long standing aac copy issue with -bsf:a aac_adtstoasc when container is not flv, m4a, mov or mp4. You might wish to read the final comment in that post.

For actual example options see my answer here and if you truly need -f mpegts do that step last after confirming full A/V sync.

Best Answer

Related Solutions

Ubuntu – Recording speaker audio using avconv

AVConv – How to Drop an Audio Stream Using avconv

Related Question