FFmpeg raw audio and H264 in RTSP

audioffmpegpcmrtspvideo

Trying to grab correctly video and audio data from an IP camera Hikvision.

Everything works like a charm when doing so for H264 + MP2 for example.

When trying to grab RAW audio in PCM s16le – smile goes off of my face.

Here is how I grab my camera (you can try it is opened to the world):

ffmpeg -re -acodec pcm_s16le -ac 1 -rtsp_transport tcp -i rtsp://superuser:superuser12345@91.214.203.250:10554 -vcodec copy -acodec libfdk_aac -vbr 5 test.ts

The command works and packs RTSP stream to a TS file.

However the duration of audio and video is different. For an example, I am recording 21 sec, from that I have 21 sec of Audio and 15 of Video.

The audio is being stretched and pitch is lowered. Have spent several days reading FFmpeg documentation and applied various options like async, changing sample rate and so on – no luck.

I hope Mulvya or other FFmpeg experts will advice me a FIX to get things done correctly.

C:\Users\User>d:/ffmpeg/bin/ffmpeg -y -re -acodec pcm_s16le -rtsp_transport 
tcp -i rtsp://superuser:superuser12345@91.214.203.250:10554 -vcodec copy -
acodec aac -b:a 96k d:/ffmpeg/hik_aac.ts
ffmpeg version N-83410-gb1e2192 Copyright (c) 2000-2017 the FFmpeg 
developers
built with gcc 5.4.0 (GCC)
configuration: --enable-gpl --enable-version3 --enable-cuda --enable-cuvid -
-enable-d3d11va --enable-dxva2 --enable-libmfx --enable-nvenc --enable-
avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls 
--enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-
libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-
libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb -
-enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --
enable-libopus --enable-librtmp --enable-libsnappy --enable-libsoxr --
enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab -
-enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-
libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-
libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --
enable-zlib
libavutil      55. 46.100 / 55. 46.100
libavcodec     57. 75.100 / 57. 75.100
libavformat    57. 66.101 / 57. 66.101
libavdevice    57.  2.100 / 57.  2.100
libavfilter     6. 72.100 /  6. 72.100
libswscale      4.  3.101 /  4.  3.101
libswresample   2.  4.100 /  2.  4.100
libpostproc    54.  2.100 / 54.  2.100
Guessed Channel Layout for Input Stream #0.1 : mono
Input #0, rtsp, from 'rtsp://superuser:superuser12345@91.214.203.250:10554':
Metadata:
title           : Media Presentation
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, 16 fps, 25 
tbr, 90k tbn, 32.01 tbc
Stream #0:1: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s
Output #0, mpegts, to 'd:/ffmpeg/hik_aac.ts':
Metadata:
title           : Media Presentation
encoder         : Lavf57.66.101
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, q=2-31, 16 
fps, 25 tbr, 90k tbn, 90k tbc
Stream #0:1: Audio: aac (LC), 16000 Hz, mono, fltp, 96 kb/s
Metadata:
  encoder         : Lavc57.75.100 aac
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33976, current: 7200; changing to 33977. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33977, current: 14400; changing to 33978. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33978, current: 18000; changing to 33979. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33979, current: 25200; changing to 33980. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33980, current: 28800; changing to 33981. This may result in 
incorrect timestamps in the output file.
frame=   85 fps= 11 q=-1.0 Lsize=    1357kB time=00:00:07.42 
bitrate=1497.1kbits/s speed=0.997x
video:1196kB audio:51kB subtitle:0kB other streams:0kB global headers:0kB 
muxing overhead: 8.805858%
aac @ 00000000030a0a00] Qavg: 63342.980
Exiting normally, received signal 2.

Best Answer

As per the comments, since the actual sampling rate appears to be 22.05 kHz, we can conform the audio to that rate.

Use

ffmpeg -y -re -acodec pcm_s16le -rtsp_transport tcp -i rtsp://URL
       -vcodec copy -af asetrate=22050 -acodec aac -b:a 96k test.mp4

The asetrate does not resample the audio, it simply resets the sample rate context.

Related Question