What settings can I use to maximize FFMpeg performance

codecffmpegperformancesettings

I'm trying to do some screencasting, but since my computer is much too old and slow the video being produced is laggy and the application I'm trying to record becomes unresponsive due to the CPU consumption. Since I can't upgrade at the moment, I'm looking to optimize what I have.
I switched from GNOME to OpenBox, and I've killed unnecessary processes, etc. Still no good. I've also optimized the application I'm recording, so the only thing left would be the recorder itself, which is FFMpeg.

What are some options and flags that can be used to get the best performance out of FFMpeg? Are there any specific codecs that are easier to process than others? I'll be recording about 10-15 mins at a time, so something that doesn't produce HUGE files sizes would also be appreciated. Any ideas?

Best Answer

If you have old computer any codec will make problems for real time encoding (not just for CPU but for disk as well). I suggest to find resolution which is low enough to reduce file size. I you want to have custom resolutions and framerate (mpeg1/mpeg2 can't be used). Resolution and framerate choose wisely.

Here is my suggestions for switches in ffmpeg:

-vcodec libx264 
-r 15
-preset ultrafast
-s 800x600

here I put 800x600 (this is maybe too low) and framerate is 15 fps. For better performance set framerate from 15 to 10 FPS. In my experience x264 is fast codec and allows custom resolutins and framerates.

Here is setup for MPEG2 (which is faster but is limited by resolutions & framerate)

-r 25 
-s 720x480 
-preset ultrafast 
-vcodec mpeg2video

Related Solutions

How to use a wildcard in FFMPEG

You're dealing with a directory of videos, so you will probably need to use a loop. The following loop will split each matched file into ten minute segments, as requested in your comment:

for i in *.mp4; do 
    ffmpeg -i "$i" -c copy \
    -f segment -segment_time 600 \
    -reset_timestamps 1 \
    "${i/%.mp4/_part%02d.mp4}"; 
done

However, if your input is a directory of images, then the image2 demuxer lets you use wildcards. Just specify -pattern_type glob and pass your glob pattern to -i in a non-interpolated string (so that the shell does not expand it).

For example, I did the following when converting a directory of JPEG files to an MPEG-4 video:

ffmpeg -f image2 -pattern_type glob -i '*.jpg' output.mp4

Just be aware that this depends entirely on the glob pattern to determine the order that the matched image files are processed.

“Useless” use of ‘cat’ increases performance. Why

This is by no means a "useless use of cat".

some_command | cat | some_command

This isn't a traditional "useless use of cat" which are usually derived from ignorance of the shell. Instead this appears to be a deliberate attempt to do something using the dynamics of cat. In this case I believe it's caching.

My Second thoughts

Even if the size of read and write isn't any different there are a couple of things which might be undetectable which could also be in play.

Firstly (and this is very important): Why is processing a sorted array faster than processing an unsorted array?. If you do anything to change the sequence in which the CPU is processing this, the timing may change. If cat succeeds in making each sort run for longer without suspending (and switching to a different process) then this could dramatically affect the CPU's branch prediction and result in a much larger or smaller time.

Secondly, even if the number and size of read is left unaffected number of times a task has to suspend (block) may be different. This in itself is likely to add or remove an overhead. So even if the reads and writes are of the same size, the cat (caching) layer might be reducing the number of times each read() and write() occurs.

Cat might simply be forcing sort to wait longer and thus have more available to do without suspending and reducing the number of times each process blocks. This would be very difficult to detect.

My First thoughts

My expectation here would be that if you put both versions in their own script and ran strace -f on each script, you would see fewer read or / write calls in the example with cat. At least, I would expect to see much larger reads at each layer using cat. My expectation for sort would be that it writes single lines and doesn't internally buffer much. Indeed I would expect it to read() in large enough blocks but only write() in single lines. This means it's not well designed for piping to itself.

As laktak points out in his answer, cat reads in blocks of 128KB (see here) but pipes typically only buffer 64KB. If I'm right then when cat is suspended waiting for a read() to complete this will give a large (128 + 64 KB) buffer for the writing sort operation to write into without ever needing to suspend. By the time cat is resumed there will be a good chunk of data (much more than sort sent in a single write) to pass onto the next sort. As a result the next sort can read from this quite a lot without being suspended.

I also suspect that adding a layer of cat closest the files would have next to no impact or negative impact on performance. These files are already cached on your ram disk. But the layers in-between calls to sort will be acting as a buffer and should reduce the number. That is the truly "useless uses of cat" are those which use cat to read from a file. That's those of the form:

cat some_file | some_command

An interesting experiment

I would be interested to know if the same effect can be induced by increasing the buffer size on the pipes. If you setup the same pipeline from a proper programming language (not a shell). Eg in C you could create your pipeline using pipe(),dup2(),fork(),exec() and call ioctl() on each pipe first to raise the buffer size (see Pipe Capacity)

Best Answer

Related Solutions

How to use a wildcard in FFMPEG

“Useless” use of ‘cat’ increases performance. Why

Related Question