Linux – What could be causing make to hang when compiling on multiple cores

compilinglinuxmake

Yesterday I was trying to compile the ROOT package from source. Since I was compiling it on a 6 core monster machine, I decided to go ahead and build using multiple cores using make -j 6. The compiling went smooth and really fast at first, but at some point make hung using 100% CPU on just one core.

I did some googling and found this post on the ROOT message boards. Since I built this computer myself, I was worried that I hadn't properly applied the heatsink and the CPU was overheating or something. Unfortunately, I don't have a fridge here at work that I can stick it in. 😉

I installed the lm-sensors package and ran make -j 6 again, this time monitoring the CPU temperature. Although it got high (close to 60 C), it never went past the high or critical temperature.

I tried running make -j 4 but again make hung sometime during the compile, this time at a different spot.

In the end, I compiled just running make and it worked fine. My question is: Why was it hanging? Due to the fact that it stopped at two different spots, I would guess it was due to some sort of race condition, but I would think make should be clever enough to get everything in the right order since it offers the -j option.

Best Answer

I don't have an answer to this precise issue, but I can try to give you a hint of what may be happening: Missing dependencies in Makefiles.

Example:

target: a.bytecode b.bytecode
    link a.bytecode b.bytecode -o target

a.bytecode: a.source
    compile a.source -o a.bytecode

b.bytecode: b.source
    compile b.source a.bytecode -o a.bytecode

If you call make target everything will compile correctly. Compilation of a.source is performed (arbitrarily, but deterministically) first. Then compilation of b.source is performed.

But if you make -j2 target both compile commands will be run in parallel. And you'll actually notice that your Makefile's dependencies are broken. The second compile assumes a.bytecode is already compiled, but it does not appear in dependencies. So an error is likely to happen. The correct dependency line for b.bytecode should be:

b.bytecode: b.source a.bytecode

To come back to your problem, if you are not lucky it's possible that a command hang in a 100% CPU loop, because of a missing dependency. That's probably what is happening here, the missing dependency couldn't be revealed by a sequential build, but it has been revealed by your parallel build.

Related Question