In other words: may root owned unprivileged containers be "less
unprivileged" than ones owned by standard accounts?
I don't think so. What matters is what's in /proc/$PID/uid_map
of processes in user namespace of the container, not what's in /etc/subuid
. Suppose you execute the following from the initial user namespace (that is, not from the container) for $PID
of a process running in the container:
$ cat /proc/$PID/uid_map
0 200000 1000
This means that UID range [0-1000)
of the process $PID will be mapped to UID range [200000-201000)
outside of its user namespace (of the container). UIDs outside of the [200000-201000)
range will be mapped to 65534 ($(cat /proc/sys/kernel/overflowuid)
) in the container. This can happen for instance if you don't create a new PID namespace. In that case, the process in the container would see processes outside, but their UID would be 65534.
So with proper UID mapping, even if the container is started by root, its processes will have unprivileged UIDs outside of it.
Subordinate UIDs in /etc/subuid
are not in any way linked to a single UID outside. The purpose of this file is to allow unprivileged users to start containers which use more than one UID (which is the case for most Linux operating systems). By default, you can only map your UID if you're unprivileged user. That is, if your UID is 1000 and $PID
refers to a process in the container, you can only do
echo "$N 1000 1" >/proc/$PID/uid_map
for any $N
as unprivileged user. Everything else is not permitted. If you could map longer range, i.e.
echo "$N 1000 50" >/proc/$PID/uid_map
you would gain access to UIDs [1000-1050)
outside of the container through the container. And of course, if you could change start of outer UID range, you'd have easy way to get root. So /etc/subuid
defines outer ranges which you are allowed to use. This file is used by newuidmap
which is setuid root.
$ cat /etc/subuid
woky:200000:50
$ echo '0 200000 50' >/proc/$PID/uid_map
-bash: echo: write error: Operation not permitted
$ newuidmap $PID 0 200000 50
$ # success
The details are much more complicated and I'm probably not the proper person to explain it but I guess it's better to have no answer. :-) You might want to check man pages user_namespaces(7)
and newuidmap(1)
, and my own research First process in a new Linux user namespace needs to call setuid()? . Unfortunately, I'm not entirely sure how LXC uses this file.
Best Answer