Linux – Why is cpu.cfs_quota_us not limiting CPU bandwidth of LXC container

cgroupslinuxlxc

I'd like to limit the container to 25% of the system's total CPU bandwidth.
Here's my setup:

LXC version 1.0.2
kernel 3.2.45
one user created cgroup (foo) for an LXC container
40 available cores on the host
the host and container have default values for every other cgroup subsystem except:
/sys/fs/cgroup/cpu/lxc/foo/cpu.cfs_quota_us = 400000
/sys/fs/cgroup/cpu/lxc/foo/cpu.cfs_period_us = 100000
/sys/fs/cgroup/cpuset/lxc/foo/cpuset.cpus = 0-15

I calculated the quota using this formula:

(# of cpus available to container) * (cpu.cfs_period_us) * (.25) so 16 * 100000 * .25 = 400000

I ran a basic stress-ng inside and outside the container at the same time to get a gauge of how many operations per second were being allowed inside and out and the results were basically the same as running with a quota of "-1", which is to say no quota.

Outside Run:

$ ./stress-ng  --cpu-load 50 -c 40 --timeout 20s --metrics-brief
stress-ng: info: [25649] dispatching hogs: 40 cpu  
stress-ng: info: [25649] successful run completed in 20.44s  
stress-ng: info: [25649] stressor      bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s  
stress-ng: info: [25649]                          (secs)    (secs)    (secs)   (real time) (usr+sys time)  
stress-ng: info: [25649] cpu              37348     20.18    380.56      0.58      1850.85        97.99

Inside Run:

$ ./stress-ng --cpu-load 100 -c 16 --timeout 20s --metrics-brief  
stress-ng: info: [34256] dispatching hogs: 16 cpu  
stress-ng: info: [34256] successful run completed in 20.10s  
stress-ng: info: [34256] stressor      bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s  
stress-ng: info: [34256]                          (secs)    (secs)    (secs)   (real time) (usr+sys time)  
stress-ng: info: [34256] cpu              24147     20.03    205.20      0.17      1205.67       117.58

Based on the ops/s I'm getting 39%. Why does this happen? Shouldn't it be limited by cpu.cfs_quota_us?

Thanks for the help in advance.

Best Answer

Wanted to post the answer to this question in case anyone else sees a similar confusing result. It looks like I had two problems:

Need to use # of cpus on host, not # available CPUs in the cgroups cpuset to estimate CPU bandwidth:

(# of cpus on the host) * (cpu.cfs_period_us) * (.25) so 40 * 100000 * .25 = 1000000
My run of stress-ng inside the container was using the cpu and cpuset controllers of the /lxc/foo cgroup while the run of stress-ng outside of the container was using the /system/sshd.service cgroup

To better model my real world application I should have specified which controllers to use by using cgexec:

$ cgexec -g cpuset:/lxc/foo -g cpu:/lxc/foo ./stress-ng --cpu-load 100 -c 48 --times --timeout 10s --metrics-brief  
stress-ng: info: [6252] dispatching hogs: 48 cpu  
stress-ng: info: [6252] successful run completed in 10.36s  
stress-ng: info: [6252] stressor      bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s  
stress-ng: info: [6252]                          (secs)    (secs)    (secs)   (real time) (usr+sys time)  
stress-ng: info: [6252] cpu              11152     10.09    102.83      0.12      1105.60       108.32  
stress-ng: info: [6252] for a 10.36s run time:  
stress-ng: info: [6252]     414.46s available CPU time  
stress-ng: info: [6252]     102.85s user time   ( 24.82%)  
stress-ng: info: [6252]       0.12s system time (  0.03%)  
stress-ng: info: [6252]     102.97s total time  ( 24.84%)

Unprivileged container owned and run by root

To rub that in again. An unprivileged LXC guest does not require to be run by an unprivileged user on the host.

Configuring your container with a subordinate UID/GID mapping like this:

lxc.id_map = u 0 100000 100000
lxc.id_map = g 0 100000 100000

where the user root on the host owns that given subordinate ID range, will allow you to confine guests even better.

However, there is one important additional advantage in such a scenario (and yes, I have verified that it works): you can auto-start your container at system startup.

Usually when scouring the web for information about LXC you will be told that it is not possible to autostart an unprivileged LXC guest. However, that is only true by default for those containers which are not in the system-wide storage for containers (usually something like /var/lib/lxc). If they are (which usually means they were created by root and are started by root), it's a whole different story.

lxc.start.auto = 1

will do the job quite nicely, once you put it into your container config.

Getting permissions and configuration right

I struggled with this myself a bit, so I'm adding a section here.

In addition to the configuration snippet included via lxc.include which usually goes by the name /usr/share/lxc/config/$distro.common.conf (where $distro is the name of a distro), you should check if there is also a /usr/share/lxc/config/$distro.userns.conf on your system and include that as well. E.g.:

lxc.include = /usr/share/lxc/config/ubuntu.common.conf
lxc.include = /usr/share/lxc/config/ubuntu.userns.conf

Furthermore add the subordinate ID mappings:

lxc.id_map = u 0 100000 65535
lxc.id_map = g 0 100000 65535

which means that the host UID 100000 is root inside the user namespace of the LXC guest.

Now make sure that the permissions are correct. If the name of your guest would be stored in the environment variable $lxcguest you'd run the following:

# Directory for the container
chown root:root $(lxc-config lxc.lxcpath)/$lxcguest
chmod ug=rwX,o=rX $(lxc-config lxc.lxcpath)/$lxcguest
# Container config
chown root:root $(lxc-config lxc.lxcpath)/$lxcguest/config
chmod u=rw,go=r $(lxc-config lxc.lxcpath)/$lxcguest/config
# Container rootfs
chown 100000:100000 $(lxc-config lxc.lxcpath)/$lxcguest/rootfs
chmod u=rwX,go=rX $(lxc-config lxc.lxcpath)/$lxcguest/rootfs

This should allow you to run the container after your first attempt may have given some permission-related errors.

Best Answer

Related Solutions

An unprivileged LXC container

Unprivileged container owned and run by root

Getting permissions and configuration right

Related Question