Linux – Using Linux cgroups to balance CPU performance

cgroupslinuxperformance

I have two dual-core Linux systems installed using Linux cgroups with relatively recent kernels; one is running Debian Squeeze, the other Ubuntu 11.04 Natty Narwhal. I've gotten CPU load balancing with cgroups working a bit better on the Debian system despite its older kernel. But it's not right for everything, and the specific oddity I'm asking about here happens on both systems.

If you read Resource Management in Linux with Control Groups it gives an example showing how to reproduce the problem. Here's the Ubuntu version (run this as root):

cd /sys/fs/cgroup/cpu
    [On Debian Squeeze start at /mnt/cgroups/cpu instead]
mkdir low high
echo 512 > low/cpu.shares
echo 2048 > high/cpu.shares
yes low > /dev/null &
echo $! > low/tasks
yes high > /dev/null &
echo $! > high/tasks
ps -C yes -opid,%cpu,psr,args
    [repeat that a few times]
killall -9 yes

I was expecting the "high" process to be allocated more time than the "low" one; what actually happens with this test case is always more like this:

root@black:/sys/fs/cgroup/cpu# ps -C yes -opid,%cpu,psr,args
  PID %CPU PSR COMMAND
 3105 88.3   1 yes low
 3106 94.5   0 yes high

Where the times are almost equal. Here's my question: why is that happening?

In the presentation, this problem is shown going away by pinning each process to the same CPU; additional lines to test that:

taskset -c 1 yes high > /dev/null &
echo $! > high/tasks
taskset -c 1 yes low > /dev/null &
echo $! > low/tasks
ps -C yes -opid,%cpu,psr,args
[later, rinse, repeat]
killall -9 yes

The result then is what I was expecting to see all the time: the "high" process getting a much higher percentage of the CPU:

root@black:/sys/fs/cgroup/cpu# ps -C yes -opid,%cpu,psr,args
  PID %CPU PSR COMMAND
 3128 83.3   1 yes high
 3129 20.7   1 yes low

Explaining why this works would be a useful step toward figuring out why the earlier one doesn't too.

Best Answer

I've gotten an initial explanation about this test case from Stefan Seyfried, who wrote the paper this example was taken from. The problem here is that the CPU scheduler parts of cgroups always aims to keep any available CPU busy; it doesn't ever enforce a hard limit if everything will fit at once.

In the case where two processes (high and low here) are running on >=2 cores, it's just going to keep high on one core and low on the other. Both will then run all the time, at close to 100% usage, because they can do so without hitting the situation where the scheduler doesn't give them enough time on the CPU. cpu.share scheduling only happens if there's a shortage.

In the second case, both processes are pinned to the same CPU. Then the CPU sharing logic has to do something useful with the relative cpu.shares numbers to balance them out, and it does that as hoped.

Hard limits on CPU usage aren't likely to appear until after the CFS Bandwidth Control patch hits. At that point it may be possible to get something more like what I was hoping for.

Related Question