I use unprivileged lxc
containers in Arch Linux
. Here are the basic system infos:
[chb@conventiont ~]$ uname -a
Linux conventiont 3.17.4-Chb #1 SMP PREEMPT Fri Nov 28 12:39:54 UTC 2014 x86_64 GNU/Linux
It's a custom/compiled kernel with user namespace enabled
:
[chb@conventiont ~]$ lxc-checkconfig
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled
Multiple /dev/pts instances: enabled
--- Control groups ---
Cgroup: enabled
Cgroup clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled
--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
File capabilities: enabled
Note : Before booting a new kernel, you can check its configuration
usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig
[chb@conventiont ~]$ systemctl --version
systemd 217
+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD +IDN
Unfortunately, systemd
does not play well with lxc
currently. Especially setting up cgroups
for a non-root user seems to be working not well or I am just too unfamiliar how to do this. lxc
will only start a container in unprivileged mode when it can create the necessary cgroups in /sys/fs/cgroup/XXX/*
. This however is not possible for lxc
because systemd
mounts the root
cgroup hierarchy in /sys/fs/cgroup/*
. A workaround seems to be to do the following:
for d in /sys/fs/cgroup/*; do
f=$(basename $d)
echo "looking at $f"
if [ "$f" = "cpuset" ]; then
echo 1 | sudo tee -a $d/cgroup.clone_children;
elif [ "$f" = "memory" ]; then
echo 1 | sudo tee -a $d/memory.use_hierarchy;
fi
sudo mkdir -p $d/$USER
sudo chown -R $USER $d/$USER
echo $$ > $d/$USER/tasks
done
This code creates the corresponding cgroup
directories in the cgroup
hierarchy for an unprivileged user. However, something which I don't understand happens. Before executing the aforementioned I will see this:
[chb@conventiont ~]$ cat /proc/self/cgroup
8:blkio:/
7:net_cls:/
6:freezer:/
5:devices:/
4:memory:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-1000.slice/session-c1.scope
After executing the aforementioned code I see in the shell I ran it in:
[chb@conventiont ~]$ cat /proc/self/cgroup
8:blkio:/chb
7:net_cls:/chb
6:freezer:/chb
5:devices:/chb
4:memory:/chb
3:cpu,cpuacct:/chb
2:cpuset:/chb
1:name=systemd:/chb
But in any other shell I still see:
[chb@conventiont ~]$ cat /proc/self/cgroup
8:blkio:/
7:net_cls:/
6:freezer:/
5:devices:/
4:memory:/
3:cpu,cpuacct:/
2:cpuset:/
1:name=systemd:/user.slice/user-1000.slice/session-c1.scope
Hence, I can start my unprivileged lxc
container in the shell I executed the code mentioned above but not in any other.
-
Can someone explain this behaviour?
-
Has someone found a better way to set up the required
cgroups
with a current version ofsystemd
(>= 217
)?
Best Answer
A better and safer solution is to install
cgmanager
and run it withsystemctl start cgmanager
(on asystemd
-based distro). You can than have yourroot
user, or if you havesudo
rights on the host createcgroups
for your unprivileged user in all controllers with:Once they have been created for your unprivileged user she/he can move processes he has access to into his
cgroup
for every controller by using:Safer, faster, more reliable than the shell script I posted.
Manual solution:
To answer 1.
I was ignorant about what was going on exactly when I wrote that script but reading the cgroups documentation and experimenting a bit helped me to understand what is going on. What I am basically doing in this script is to create a new
cgroup
session for the currentuser
which is what I already stated above. When I run these commands in the currentshell
or run them in a script and make it so that it gets evaluated in the currentshell
and not in asubshell
(via. script
The.
is important for this to work!) is that I not just open a new session foruser
but add the current shell as a process that runs in this new cgroup. I can achieve the same effect by running the script in a subshell and then descend into thecgroup
hierarchy in thechb
subcgroup
and useecho $$ > tasks
to add the current shell to every member of thechb cgroup hierarchy
.Hence, when I run
lxc
in that current shell my container will also become a member of all thechb
subcgroup
s that the currentshell
is a member of. That is to say mycontainer
inherits thecgroup
status of myshell
. This also explains why it doesn't work in any other shell that is not part of the currentchb
subcgroup
s.I still pass at
2.
. We'll probably need to wait either for asystemd
update or furtherKernel
developments to makesystemd
adopt a consistent behaviour but I prefer the manual setup anyway as it forces you to understand what you're doing.