Multi threaded CPU intensive task throttles CPU way before temperatures limits

cpuhardwareperformancetemperature

I have written a very CPU intensive threaded task that works as expected on my 2012 MacBook Pro quad core. I turn it loose with 20 threads and the temperatures get up to about 100 °C as measured with Intel Power Gadget with minimal throttling.

Take the same program and data files home to my 2016 13" MacBook Pro with a dual core machine and start it up, I would expect that it would also keep up the 3.3-3.4 GHz until the temperature gets near the 100 °C mark. Top command shows the task at 350% (2 cores each dual threaded), but the CPU frequency gets cut to 1.6-1-8 GHz with the temperature only at 60 °C or so with the fans dead quiet. If I start 4 separate single threaded CPU tasks, the machine behaves as expected with it keeping up the 3.3-3.4 GHz until it hits the 100 °C and the fans get cranking. The question is why is my CPU being throttled?

Both machines are up to date and running the same versions of gcc. If I even take the binary from the working machine and put it on the 2016 Mac, it has the same problem.

If I run 3 or 4 CPU single threads so the machine is going at full speed, then start the threaded program, it slows the frequency down also.

Both machines have 16 GB of RAM.

Edit

After playing around with code, I suspect that it is getting throttled when a task creates too many threads. In this program, I take each record I read and create a thread for it. I only let 20 or so threads go at a time so at no time is there more than 21 threads, but there are 14,400,000 records to be processed so over the 30 minutes or so each of those records will be processed by a separate thread.

I created a trivial pthread program that sucked CPU time and set 10 of them running. The problem laptop ran that and warmed up to 95C without issues.

I guess I will rewrite my code to reuse the same thread instead of destroying them and starting them up again.

Update 5/13/17

After several hours of work, it now only creates n threads and just reuses them, that didn't help. Other than CPU temperature, what will cause this machine to throttle down?

Best Answer

This may be a longshot, but perhaps the difference in single-core performance and/or cache performance between the 2012 and 2016 cpu packages are large enough that the cores are data-starved and throttling down until they are able to work again?

I'm making that guess because you indicate enough single-thread processes can run full speed on all cores, and a simple multi-thread program can run full speed on all cores.

That makes me think there is something in the program design of your real workload vs the test multi-thread workload that isn't letting the CPUs work all the time