The problem is that ffmpeg chooses the default for -vsync
based on the output muxer. Its mp4 muxer defaults to vsync 1, but it chooses a very high framerate so that it can put a frame at the exact right time for every input frame.
(The input frame timing isn't constant. It averages 14.97 fps, according to ffmpeg. Probably from a phone camera? They do variable FPS. I think they slow down to get more light for each frame, but it might be another reason.)
So ffmpeg will duplicate frames up to the 30k fps that it's chosen, or something. h.264 is pretty efficient at storing duplicate frames, but that's ridiculous.
Anyway, the solution is to use -vsync 2
on your ffmpeg command line. Or output to mkv, and then remux to mp4, but the reason that works is that mkv defaults to -vsync 2
. It's really that simple. You don't need to make your output CFR. Youtube handles arbitrary frame rates just fine, as long as they're <= 60
, and so do most other players. I assume phones are fine, since they make variable FPS videos in the first place. You don't need to use -r something
to force frame duplication to hit exactly 30fps or anything.
TL;DR
I would recommend the following:
libx264
: -g X -keyint_min X
(and optionally add -force_key_frames "expr:gte(t,n_forced*N)"
)
libx265
: -x265-params "keyint=X:min-keyint=X"
libvpx-vp9
: -g X
where X
is the interval in frames and N
is the interval in seconds. For example, for a 2-second interval with a 30fps video, X
= 60 and N
= 2.
A note about different frame types
In order to properly explain this topic, we first have to define the two types of I-frames / keyframes:
- Instantaneous Decoder Refresh (IDR) frames: These allow independent decoding of the following frames, without access to frames previous to the IDR frame.
- Non-IDR-frames: These require a previous IDR frame for the decoding to work. Non-IDR frames can be used for scene cuts in the middle of a GOP (group of pictures).
What is recommended for streaming?
For the streaming case, you want to:
- Ensure that all IDR frames are at regular positions (e.g. at 2, 4, 6, … seconds) so that the video can be split up into segments of equal length.
- Enable scene cut detection, so as to improve coding efficiency / quality. This means allowing I-frames to be placed in between IDR frames. You can still work with scene cut detection disabled (and this is part of many guides, still), but it's not necessary.
What do the parameters do?
In order to configure the encoder, we have to understand what the keyframe parameters do. I did some tests and discovered the following, for the three encoders libx264
, libx265
and libvpx-vp9
in FFmpeg:
libx264
:
-g
sets the keyframe interval.
-keyint_min
sets the minimum keyframe interval.
-x264-params "keyint=x:min-keyint=y"
is the same as -g x -keyint_min y
.
Note: When setting both to the same value, the minimum is internally set to half the maximum interval plus one, as seen in the x264
code:
h->param.i_keyint_min = x264_clip3( h->param.i_keyint_min, 1, h->param.i_keyint_max/2+1 );
libx265
:
-g
is not implemented.
-x265-params "keyint=x:min-keyint=y"
works.
libvpx-vp9
:
-g
sets the keyframe interval.
-keyint_min
sets the minimum keyframe interval
Note: Due to how FFmpeg works, -keyint_min
is only forwarded to the encoder when it is the same as -g
. In the code from libvpxenc.c
in FFmpeg we can find:
if (avctx->keyint_min >= 0 && avctx->keyint_min == avctx->gop_size)
enccfg.kf_min_dist = avctx->keyint_min;
if (avctx->gop_size >= 0)
enccfg.kf_max_dist = avctx->gop_size;
This might be a bug (or lack of feature?), since libvpx
definitely supports setting a different value for kf_min_dist
.
Should you use -force_key_frames
?
The -force_key_frames
option forcibly inserts keyframes at the given interval (expression). This works for all encoders, but it might mess with the rate control mechanism. Especially for VP9, I've noticed severe quality fluctuations, so I cannot recommend using it in this case.
Best Answer
Don't consider the guidelines as strict requirements. General recommendation is to provide the highest quality that is practical for you to upload. It's that simple; whatever you upload is going to be re-encoded anyway and YouTube will almost always accept whatever you give them. That means you either upload the original content, or if the original is too big you can re-encode it using a high quality. Example using
ffmpeg
:See FFmpeg Wiki: H.264 for more details–specifically the
-crf
and-preset
options. Notice that I simply copied the audio, but you may choose to re-encode it if the source contains uncompressed audio.Your player may not be able to play the output for various reasons, but YouTube certainly will.