Linux – LXC: Any security difference between root and end-user owned unprivileged containers

linuxlxcnamespaceSecurity

I intend to use LXC containers to isolate most of the network facing services.

As per my understanding, I have mainly two ways to do this:

  1. Create unprivileged containers owned by root. In this case, root will have a single large set of sub-UIDs and sub-GIDs and different subsets of this range will be affected to each container (no container will share any sub-UID or sub-GID with one another),

  2. Create unprivileged containers owned by unprivileged system accounts. In this case, each account will own a single container and the subordinate UIDs and GIDs required for this single container.

From a usability point-of-view, the former is far better: easier to setup and maintain.

However, from a security perspective, is there any difference between the two?

For instance:

  • Is there any link or horizontal relationship of some sort between IDs belonging to the same pool (same line) as defined in /etc/subuid and /etc/subgid, compared to IDs belonging to different users and therefore belonging to different pools (different lines)?

  • Is there any link or vertical relationship of some sort between a subordinate ID and its owner account? May a subordinate ID owned by root manage to get higher privilege than a subordinate ID owned by an unprivileged user? Can a subordinate ID escalate to its owner ID in an easier way than escalating to any other arbitrary ID?

  • Owned by root means that all commands to administrate the container will be launched with host's root privilege. Does this constitute a weakness, or for instance are all privileges dropped early?

  • Etc.

In other words: may root owned unprivileged containers be "less unprivileged" than ones owned by standard accounts?

Best Answer

In other words: may root owned unprivileged containers be "less unprivileged" than ones owned by standard accounts?

I don't think so. What matters is what's in /proc/$PID/uid_map of processes in user namespace of the container, not what's in /etc/subuid. Suppose you execute the following from the initial user namespace (that is, not from the container) for $PID of a process running in the container:

$ cat /proc/$PID/uid_map
0 200000 1000

This means that UID range [0-1000) of the process $PID will be mapped to UID range [200000-201000) outside of its user namespace (of the container). UIDs outside of the [200000-201000) range will be mapped to 65534 ($(cat /proc/sys/kernel/overflowuid)) in the container. This can happen for instance if you don't create a new PID namespace. In that case, the process in the container would see processes outside, but their UID would be 65534.

So with proper UID mapping, even if the container is started by root, its processes will have unprivileged UIDs outside of it.

Subordinate UIDs in /etc/subuid are not in any way linked to a single UID outside. The purpose of this file is to allow unprivileged users to start containers which use more than one UID (which is the case for most Linux operating systems). By default, you can only map your UID if you're unprivileged user. That is, if your UID is 1000 and $PID refers to a process in the container, you can only do

echo "$N 1000 1" >/proc/$PID/uid_map

for any $N as unprivileged user. Everything else is not permitted. If you could map longer range, i.e.

echo "$N 1000 50" >/proc/$PID/uid_map

you would gain access to UIDs [1000-1050) outside of the container through the container. And of course, if you could change start of outer UID range, you'd have easy way to get root. So /etc/subuid defines outer ranges which you are allowed to use. This file is used by newuidmap which is setuid root.

$ cat /etc/subuid
woky:200000:50
$ echo '0 200000 50' >/proc/$PID/uid_map
-bash: echo: write error: Operation not permitted
$ newuidmap $PID 0 200000 50
$ # success

The details are much more complicated and I'm probably not the proper person to explain it but I guess it's better to have no answer. :-) You might want to check man pages user_namespaces(7) and newuidmap(1), and my own research First process in a new Linux user namespace needs to call setuid()? . Unfortunately, I'm not entirely sure how LXC uses this file.

Related Question