Systemd Cgroups – Set a Default Resource Limit for All Users

cgroupssystemd

I can set a memory limit for users like so:

systemctl set-property user-UID.slice MemoryHigh=24G

Is there a way for this to apply for all users? I would like each user to get 24G, not a total of 24G for all user processes (which I think would be the result of setting it on user.slice directly).

Best Answer

There seems no officially supported way to do that. (This is incorrect. See the bottom) An officially discouraged way (because it manipulates the cgroup) is as follows:

Make the following file as /etc/systemd/system/user@.service.d/set-memhigh.conf

[Service]
Type=simple
ExecStartPost=+/root/set-memoryhigh.sh %i

Then make the following file as "/root/set-memoryhigh.sh"

#!/bin/bash
exec >>/var/tmp/log.txt 2>&1 # for logging
set -x # for logging 
for d in /sys/fs/cgroup /sys/fs/cgroup/user.slice /sys/fs/cgroup/user.slice/user-$1.slice; do
  echo "+memory" >>${d}/cgroup.subtree_control
done
/bin/echo "24G" >> /sys/fs/cgroup/user.slice/user-$1.slice/memory.high

You can see if it works or not by running

cat /sys/fs/cgroup/user.slice/user-${UID}.slice/memory.high

If "/sys/fs/cgroup/user.slice" does not exist, then the unified cgroup hierarchy is not enabled. We have to enable it as https://unix.stackexchange.com/a/452728/297666

Although it works, I am not sure if you like this...

Note added on July 25: Making the following file as /etc/systemd/system/user-1000.slice for each user (replacing 1000 by user's UID) imposes a memory limitation on that user. I verified it on systemd 237 on ubuntu 18.04 and Debian strecth with systemd 237 installed from stretch-backports:

[Slice]
Slice=user.slice
MemoryHigh=24G

The inconvenience is that we have to make the above file for each user. With systemd 239, we can make the above file as /etc/systemd/system/user-.slice.d/memory.conf and the memory limitation is imposed on every user. But there is a bug in systemd 239 (this bug was corrected in 240) and it does not work as intended. To work around the bug, make the following file as user-0.slice and run systemctl enable user-0.slice. We do not have to make the following file for each user.

[Unit]
Before=systemd-logind.service
[Slice]
Slice=user.slice
[Install]
WantedBy=multi-user.target

Related Solutions

Cgroups – How to Limit All Processes Except Whitelist to a Single CPU

UPDATE: Note that the answer below applies to RHEL 6. In RHEL 7, most cgroups are managed by systemd, and libcgroup is deprecated.

Since posting this question I have studied the entire guide that I linked to above, as well as the majority of the cgroups.txt documentation and cpusets.txt. I now know more than I ever expected to learn about cgroups, so I'll answer my own question here.

There are multiple approaches you can take. Our company's contact at Red Hat (a Technical Architect) recommended against a blanket restriction of all processes in preference to a more declarative approach—restricting only the processes we specifically wanted restricted. The reason for this, according to his statements on the subject, is that it is possible for system calls to depend on user space code (such as LVM processes) which if restricted could slow the system down—the opposite of the intended effect. So I ended up restricting several specifically-named processes and leaving everything else alone.

Additionally, I want to mention some cgroup basic data that I was missing when I posted my question.

Cgroups do not depend on libcgroup being installed. However, that is a set of tools for automatically handling cgroup configuration and process assignments to cgroups and can be very helpful.

I found that the libcgroup tools can also be misleading, because the libcgroup package is built on its own set of abstractions and assumptions about your use of cgroups, which are slightly different than the actual kernel level implementation of cgroups. (I can put examples but it would take some work; comment if you're interested.)

Therefore, before using libcgroup tools (such as /etc/cgconfig.conf, /etc/cgrules.conf, cgexec, cgcreate, cgclassify, etc.) I highly recommend getting very familiar with the /cgroup virtual filesystem itself, and manually creating cgroups, cgroup hierarchies (including hierarchies with multiple subsystems attached, which libcgroup sneakily and leakily abstracts away), reassigning processes to different cgroups by running echo $the_pid > /cgroup/some_cgroup_hierarchy/a_cgroup_within_that_hierarchy/tasks, and other seemingly magical tasks that libcgroup performs under the hood.

Another basic concept I was missing was that if the /cgroup virtual filesystem is mounted on your system at all (or more accurately, if any of the cgroup subsystems aka "controllers" are mounted at all), then every process on your entire system is in a cgroup. There is no such thing as "some processes are in a cgroup and some aren't".

There is what is called the root cgroup for a given hierarchy, which owns all the system's resources for the attached subsystems. For example a cgroup hierarchy that has the cpuset and blkio subsystems attached, would have a root cgroup which would own all the cpus on the system and all the blkio on the system, and could share some of those resources with child cgroups. You can't restrict the root cgroup because it owns all your system's resources, so restricting it wouldn't even make sense.

Some other simple data I was missing about libcgroup:

If you use /etc/cgconfig.conf, you should ensure that chkconfig --list cgconfig shows that cgconfig is set to run at system boot.

If you change /etc/cgconfig.conf, you need to run service cgconfig restart to load in the changes. (And problems with stopping the service or running cgclear are very common when fooling around testing. For debugging I recommend, for example, lsof /cgroup/cpuset, if cpuset is the name of the cgroup hierarchy you are using.)

If you want to use /etc/cgrules.conf, you need to ensure the "cgroup rules engine daemon" (cgrulesengd) is running: service cgred start and chkconfig cgred on. (And you should be aware of a possible but unlikely race condition regarding this service, as described in the Red Hat Resource Management Guide in section 2.8.1 at the bottom of the page.)

If you want to fool around manually and set up your cgroups using the virtual filesystem (which I recommend for first use), you can do so and then create a cgconfig.conf file to mirror your setup by using cgsnapshot with its various options.

And finally, the key piece of info I was missing when I wrote the following:

However, the caveat on this seems to be...that the children of myprocessname will be reassigned to the restricted cpu0only cgroup.

I was correct, but there is an option I was unaware of.

cgexec is the command to start a process/run a command and assign it to a cgroup.

cgclassify is the command to assign an already running process to a cgroup.

Both of these will also prevent cgred (cgrulesengd) from reassigning the specified process to a different cgroup based on /etc/cgrules.conf.

Both cgexec and cgclassify support the --sticky flag, which additionally prevents cgred from reassigning child processes based on /etc/cgrules.conf.

So, the answer to the question as I wrote it (though not the setup I ended up implementing, because of the advice from our Red Hat Technical Architect mentioned above) is:

Make the cpu0only and anycpu cgroup as described in my question. (Ensure cgconfig is set to run at boot.)

Make the * cpuset cpu0only rule as described in my question. (And ensure cgred is set to run at boot.)

Start any processes I want unrestricted with: cgexec -g cpuset:anycpu --sticky myprocessname.

Those processes will be unrestricted, and all their child processes will be unrestricted as well. Everything else on the system will be restricted to CPU 0 (once you reboot, since cgred doesn't apply cgrules to already running processes unless they change their EUID). This is not completely advisable, but that was what I initially requested and it can be done with cgroups.

Cgroups/systemd: How to create a cgroup for a process tree [non-root]

You don't need to be root to start a user-scoped group with systemd-run:

$ systemd-run --user --scope /bin/bash 
  Running scope as unit run-23318.scope.
  $ sleep 999 &
  [1] 23369

You can see the unit:

$ systemctl --user status run-23318.scope
* run-23318.scope - /bin/bash
  Loaded: loaded (/run/user/1000/systemd/user/run-23318.scope; static; 
         vendor preset: enabled)
 Drop-In: /run/user/1000/systemd/user/run-23318.scope.d
      `-50-Description.conf
  Active: active (running) since Sun 2016-07-17 08:16:51 CEST; 10min ago
  CGroup: /user.slice/user-1000.slice/user@1000.service/run-23318.scope
      |-23318 /bin/bash
      `-23369 sleep 999
  Jul 17 08:16:51 home systemd[1056]: Started /bin/bash.
  Jul 17 08:16:51 home systemd[1056]: Starting /bin/bash.

and also with

$ systemd-cgls /user.slice/user-1000.slice/user@1000.service/run-23318.scope
   /user.slice/user-1000.slice/user@1000.service/run-23318.scope:
   |-23318 /bin/bash
   `-23369 sleep 999

Best Answer

Related Solutions

Cgroups – How to Limit All Processes Except Whitelist to a Single CPU

Cgroups/systemd: How to create a cgroup for a process tree [non-root]

Related Question