Gentoo – Why Recommend -j3 Option for Make on Dual-Core CPU?

gentoomake

In Gentoo Linux it is possible to set the MAKEOPTS variable in /etc/portage/make.conf to tell make how many jobs it should run in parallel when building packages. Since I have a dual-core CPU, I naively chose to use the -j2 option: one job per core, so both have something to do. The "problem" is there are a lot of references that tell users having a dual-core CPU to set the -j3 option instead. Some of them are:

For example, the Gentoo handbook says:

A good choice is the number of CPUs (or CPU cores) in your system plus one, but this guideline isn't always perfect.

But what is the rationale for "CPUs + 1" rule? Why the extra job?

The make.conf(5) man page even says:

Suggested settings are between CPUs+1 and 2*CPUs+1.

I also read section 5.4 (Parallel Execution) in the make info page and make man page explanation for the -j option, but it seems there's no answers there.

Best Answer

There isn't a simple rule that always works. People might recommend a particular figure because they experimented with a particular compilation on a particular machine and this was the best setting, or because they followed some reasoning that may or may not have some relation with reality.

If you're blessed with a lot of RAM, then the limiting factor in a long compilation will be CPU time. Then one task per CPU, plus one pending task for those occasional I/O blocks, is a good setting. That makes it -j3 for a dual-core CPU (or more precisely, for a dual-CPU machine — if each core is hyperthreaded, that would be 4 CPUs, so -j5).

If you have very little RAM, then a limiting factor may be that you can't have many concurrent jobs, or else they'll keep making each other swap out. For example, if you can't comfortably fit two compiler instances in memory, make -j2 may already be slower than make. Since this is dependent on how many compiler processes you can fit in RAM at once, there's no way to derive a general figure.

In between, it may be beneficial to have more jobs. If each compiler process is small, but the build as a whole touches a lot of data, then disk I/O may be the blocking factor. In this case, you'll want several jobs per CPU at once, so that there is always one job using each CPU while others are waiting for I/O. Again, this is very dependent on the build job and on the available RAM, here on what's available for data cache (there's an optimum after which having too many jobs pollutes the cache too much).

Related Question