Is the L3 cache shared by all cores for a Sandy-Bridge E Xeon CPU

cacheperformancexeon

In a related question I asked about the benefit of a dual-CPU system in terms of doubling the L3 cache.

However, I have noticed that the Xeon E5-2600 series of CPU's has exactly 2.5 MB of L3 cache per core.

This leads me to believe that the operating system reserves 2.5 MB of L3 cache per core. However, I also have the contradictory impression that the L3 cache is shared among all cores. There is surprisingly little information or discussion about this.

My major concern is whether low-priority background applications might "hog" the L3 cache and slow down performance for higher-priority foreground applications. Two specific performance problems that I have motivate this question.

  1. Compiling a certain C++ program requires 25 minutes on my current development system in VS 2008, whereas on another system it goes vastly faster, requiring only 5 minutes on VS 2008 with identical settings – despite the fact that I have a near high-end i7-970 CPU and sufficient RAM.

  2. Programs often take up to 20 seconds to run (i.e., display their main window) on my system; and on a related noted, the Windows shell requires up to 10 seconds to display the Windows Explorer context menu (and related behaviors also take about as long), despite my attempts to limit the context menu entries (there are currently perhaps 10 additional ones beyond the default).

My system is certainly loaded with a very large number of applications that I have installed (and uninstalled) over the years, but I do my best to streamline the system nonetheless.

I also have many low-priority background applications running; in particular redundant cloud backup software such as CrashPlan, which typically add up to utilize about 25% of the total CPU utilization on this 6-core 12-thread system.

I will be getting a new computer. I know that I will continue to be running many background applications, and installing/uninstalling many programs. If I thought that getting a dual-CPU system that doubles not only the cores but the L3 cache would assist with overcoming the horrible C++ compiler performance and the general system slow-down, I would gladly do it.

There should be no reason why a high-end system operates so slowly, even with many programs and background applications. But if my problems will occur no matter how much CPU power and L3 cache I give the system, simply because I do have so many programs and background applications installed and running, I don't want to waste $2,500 additional dollars on a dual-CPU system that won't help solve my problem.

Any suggestions, in particular regarding my question about whether the L3 cache is shared among all cores (such that low-priority background applications might conceivably be hogging the L3 cache, slowing down higher-priority programs), or rather if it is tied to individual cores, would be appreciated.

Best Answer

On these CPUs, each physical core has its own L2 cache. The L3 cache is shared by all cores and is inclusive -- that is, any data that resides in any core's L2 cache also resides on the L3 cache.

While this may seem a waste of L3 space, it actually makes the L3 invaluable for accelerating inter-core memory operations. The primary purpose of the L3 cache is to act as a switchboard and staging area for the cores. For example, if one core wants to know if a region of memory might be cached by another core, it can check the L3 cache. If information was processed by one core and next needs to be processed by another core, they hand it off through the L3 cache rather than the slower off-chip memory. Beyond that, its performance impact is not that much except for unusual algorithms -- the L2 cache is big enough for small things and the L3 cache is too small for big things.

So while each core does have its own 256KB L2 cache and effectively 256KB reserved in the L3 cache, the balance is shared by all cores. Less important activity in other cores can harm the performance of a more important task that benefits from using L3 space. But for the reasons I mentioned, it's generally not a significant effect in practice and it's generally not worth worrying about beyond optimizing "bulk data" operations (such as compression and scanning) to minimize cache pollution. (For example, using non-temporal operations.)

Related Question