Linux – LXC: Any security difference between root and end-user owned unprivileged containers

linuxlxcnamespaceSecurity

I intend to use LXC containers to isolate most of the network facing services.

As per my understanding, I have mainly two ways to do this:

Create unprivileged containers owned by root. In this case, root will have a single large set of sub-UIDs and sub-GIDs and different subsets of this range will be affected to each container (no container will share any sub-UID or sub-GID with one another),
Create unprivileged containers owned by unprivileged system accounts. In this case, each account will own a single container and the subordinate UIDs and GIDs required for this single container.

From a usability point-of-view, the former is far better: easier to setup and maintain.

However, from a security perspective, is there any difference between the two?

For instance:

Is there any link or horizontal relationship of some sort between IDs belonging to the same pool (same line) as defined in /etc/subuid and /etc/subgid, compared to IDs belonging to different users and therefore belonging to different pools (different lines)?
Is there any link or vertical relationship of some sort between a subordinate ID and its owner account? May a subordinate ID owned by root manage to get higher privilege than a subordinate ID owned by an unprivileged user? Can a subordinate ID escalate to its owner ID in an easier way than escalating to any other arbitrary ID?
Owned by root means that all commands to administrate the container will be launched with host's root privilege. Does this constitute a weakness, or for instance are all privileges dropped early?
Etc.

In other words: may root owned unprivileged containers be "less unprivileged" than ones owned by standard accounts?

Best Answer

In other words: may root owned unprivileged containers be "less unprivileged" than ones owned by standard accounts?

I don't think so. What matters is what's in /proc/$PID/uid_map of processes in user namespace of the container, not what's in /etc/subuid. Suppose you execute the following from the initial user namespace (that is, not from the container) for $PID of a process running in the container:

$ cat /proc/$PID/uid_map
0 200000 1000

This means that UID range [0-1000) of the process $PID will be mapped to UID range [200000-201000) outside of its user namespace (of the container). UIDs outside of the [200000-201000) range will be mapped to 65534 ($(cat /proc/sys/kernel/overflowuid)) in the container. This can happen for instance if you don't create a new PID namespace. In that case, the process in the container would see processes outside, but their UID would be 65534.

So with proper UID mapping, even if the container is started by root, its processes will have unprivileged UIDs outside of it.

Subordinate UIDs in /etc/subuid are not in any way linked to a single UID outside. The purpose of this file is to allow unprivileged users to start containers which use more than one UID (which is the case for most Linux operating systems). By default, you can only map your UID if you're unprivileged user. That is, if your UID is 1000 and $PID refers to a process in the container, you can only do

echo "$N 1000 1" >/proc/$PID/uid_map

for any $N as unprivileged user. Everything else is not permitted. If you could map longer range, i.e.

echo "$N 1000 50" >/proc/$PID/uid_map

you would gain access to UIDs [1000-1050) outside of the container through the container. And of course, if you could change start of outer UID range, you'd have easy way to get root. So /etc/subuid defines outer ranges which you are allowed to use. This file is used by newuidmap which is setuid root.

$ cat /etc/subuid
woky:200000:50
$ echo '0 200000 50' >/proc/$PID/uid_map
-bash: echo: write error: Operation not permitted
$ newuidmap $PID 0 200000 50
$ # success

The details are much more complicated and I'm probably not the proper person to explain it but I guess it's better to have no answer. :-) You might want to check man pages user_namespaces(7) and newuidmap(1), and my own research First process in a new Linux user namespace needs to call setuid()? . Unfortunately, I'm not entirely sure how LXC uses this file.

Preliminary assumptions

I'll stick with the following assumptions, extended from what I have from your question:

host has a user1 and a user2, if an information isn't specific to one, we'll use userX
the container will be named by a variable which we will render as $container
home folders for user1 and user2 will be given in the notation known from Bash as ~user1 and ~user2.
we'll assume the subordinate UID and GID ranges to be 100000..165536 for user1 and 200000..265536 for user2 just for brevity
the root FS folder for $container will be rendered as $rootfs, regardless of where it will end up (~userX/.local/share/lxc/$container/rootfs)
container configuration is by default in ~userX/.local/share/lxc/$container/config

Moving the container

There are two relevant pieces of data that govern the userns containers:

owner and group for the files/folders of the folders comprising the $container
the subordinate UIDs and GIDs assigned in two places: /etc/sub{uid,gid} for the user account (manipulated via usermod --{add,del}-sub-{uid,gid}s) and lxc.id_map in the $container configuration (~userX/.local/share/lxc/$container/config) respectively
- I don't know for certain whether it is possible to define different ranges in the container configuration for each container. E.g. if the host user userX has 65536 subordinate GIDs and UIDs, it might be possible to assign 5000 to 65 different containers, but I haven't tested that hypothesis.
- it is certain, though, that this setting communicates to LXC which are the valid ranges for GID and UID in the child namespace.

So the gist is really that you need to make sure that the file/folder owner and group for the container match the configuration, which in turn has to be a valid subset of the host subordinate GIDs/UIDs assigned to user1 and user2 respectively.

If you're using Bash, for example, you can use $((expression)) for arithmetic expressions and let to assign arithmetic expressions to variables. This is mighty useful if you know a base value (100000 and 200000 respectively) and the GID/UID for the "inside" users.

The main points are:

it's possible
either the capability CAP_CHOWN or superuser rights are required

Here's a script that will probably need some more honing (example: migration from root-created container to unprivileged), but it works for me for the purpose:

#!/usr/bin/env bash

function syntax
{
    echo "SYNTAX: ${0##*/} <from-user> <to-user> <container-name>"
    [[ -n "$1" ]] && echo -e "\nERROR: ${1}."
    exit 1
}

# Checks
[[ -n "$1" ]] || syntax "<from-user> is not set"
[[ -n "$2" ]] || syntax "<to-user> is not set"
[[ -n "$3" ]] || syntax "<container-name> is not set"
[[ "$UID" -eq "0" ]] || syntax "${0##*/}" "You must be superuser to make use of this script"
# Constants with stuff we need
readonly USERFROM=$1
readonly USERTO=$2
shift; shift
readonly CONTAINER=${1:-*}
LXCLOCAL=".local/share/lxc"
readonly HOMEFROM=$(eval echo ~$USERFROM)
readonly HOMETO=$(eval echo ~$USERTO)
readonly LXCFROM="$HOMEFROM/$LXCLOCAL"
readonly LXCTO="$HOMETO/$LXCLOCAL"
readonly GIDBASEFROM=$(awk -F : "\$1 ~/$USERFROM/ {print \$2}" /etc/subgid)
readonly UIDBASEFROM=$(awk -F : "\$1 ~/$USERFROM/ {print \$2}" /etc/subuid)
readonly GIDSIZEFROM=$(awk -F : "\$1 ~/$USERFROM/ {print \$3}" /etc/subgid)
readonly UIDSIZEFROM=$(awk -F : "\$1 ~/$USERFROM/ {print \$3}" /etc/subuid)
readonly GIDBASETO=$(awk -F : "\$1 ~/$USERTO/ {print \$2}" /etc/subgid)
readonly UIDBASETO=$(awk -F : "\$1 ~/$USERTO/ {print \$2}" /etc/subuid)
readonly GIDSIZETO=$(awk -F : "\$1 ~/$USERTO/ {print \$3}" /etc/subgid)
readonly UIDSIZETO=$(awk -F : "\$1 ~/$USERTO/ {print \$3}" /etc/subuid)
unset LXCLOCAL
# More checks
[[ -d "$LXCFROM" ]] || syntax "Could not locate '$LXCFROM'. It is not a directory as expected"
[[ -e "$LXCTO" ]] && syntax "Destination '$LXCTO' already exists. However, it must not"
for i in GIDBASEFROM UIDBASEFROM GIDBASETO UIDBASETO; do
    (($i > 0)) || syntax "Could not determine base/offset of subordinate UID/GID range"
done
for i in GIDSIZEFROM UIDSIZEFROM GIDSIZETO UIDSIZETO; do
    (($i > 0)) || syntax "Could not determine length of subordinate UID/GID range"
done

echo "Going to migrate container: $CONTAINER"
echo -e "\tfrom user $USERFROM ($HOMEFROM): subUID=${UIDBASEFROM}..$((UIDBASEFROM+UIDSIZEFROM)); subGID=${GIDBASEFROM}..$((GIDBASEFROM+GIDSIZEFROM))"
echo -e "\tto user $USERTO ($HOMETO): subUID=${UIDBASETO}..$((UIDBASETO+UIDSIZETO)); subGID=${GIDBASETO}..$((GIDBASETO+GIDSIZETO))"
while read -p "Do you want to continue? (y/N) "; do
    case ${REPLY:0:1} in
        y|Y)
            break;
            ;;
        *)
            echo "User asked to abort."
            exit 1
            ;;
    esac
done

# Find the UIDs and GIDs in use in the container
readonly SUBGIDSFROM=$(find -H "$LXCFROM" -printf '%G\n'|sort -u)
readonly SUBUIDSFROM=$(find -H "$LXCFROM" -printf '%U\n'|sort -u)

# Change group
for gid in $SUBGIDSFROM; do
    let GIDTO=$(id -g "$USERTO")
    if ((gid == $(id -g "$USERFROM"))); then
        echo "Changing group from $USERFROM ($gid) to $USERTO ($GIDTO)"
        find -H "$LXCFROM/$CONTAINER" -gid $gid -exec chgrp $GIDTO {} +
    elif ((gid >= GIDBASEFROM )) && ((gid <= GIDBASEFROM+GIDSIZEFROM)); then
        let GIDTO=$((gid-GIDBASEFROM+GIDBASETO))
        echo "Changing group $gid -> $GIDTO"
        find -H "$LXCFROM/$CONTAINER" -gid $gid -exec chgrp $GIDTO {} +
    else
        echo "ERROR: Some file/folder inside '$LXCFROM/$CONTAINER' has a group not assigned to $USERFROM (assigned subordinate GIDs)."
        echo -e "Use:\n\tfind -H '$LXCFROM/$CONTAINER' -gid $gid\nto list those files/folders."
        exit 1
    fi
done

# Change owner
for uid in $SUBUIDSFROM; do
    let UIDTO=$(id -u "$USERTO")
    if ((uid == $(id -u "$USERFROM"))); then
        echo "Changing owner from $USERFROM ($uid) to $USERTO ($UIDTO)"
        find -H "$LXCFROM/$CONTAINER" -uid $uid -exec chown $UIDTO {} +
    elif ((uid >= UIDBASEFROM )) && ((uid <= UIDBASEFROM+UIDSIZEFROM)); then
        let UIDTO=$((uid-UIDBASEFROM+UIDBASETO))
        echo "Changing owner $uid -> $UIDTO"
        find -H "$LXCFROM/$CONTAINER" -uid $uid -exec chown $UIDTO {} +
    else
        echo "ERROR: Some file/folder inside '$LXCFROM/$CONTAINER' has an owner not assigned to $USERFROM (assigned subordinate UIDs)."
        echo -e "Use:\n\tfind -H '$LXCFROM/$CONTAINER' -uid $uid\nto list those files/folders."
        exit 1
    fi
done
mv "$LXCFROM/$CONTAINER" "$LXCTO/" || { echo "ERROR: failed to move to destination: ${LXCTO}/${CONTAINER}."; exit 1; }

In addition to the license terms of the StackExchange network, I am putting this into the public domain. So reuse and modify for whatever purpose, but it comes without any warranty and I must not be held liable for its use or abuse.

Usage

SYNTAX: lxc-reassign-userns.sh <from-user> <to-user> <container-name>

It assumes find, sort, uniq, awk (mawk and gawk should work), id, bash, chown, chmod and so on to be available and to understand all the command line switches it is using. For Bash readonly and let and arithmetic expressions are assumed to be understood. For find is assumes + is a valid terminator for the -exec action.

This list is probably not complete.

Backups

Yes, you can make backups and restore them elsewhere, as long as you also adjust the file owner and group accordingly.

However, assuming you use something like tar, there's a caveat: tar will ignore sockets, so $rootfs/dev/log will pose an issue - others may also create a similar issue.

Best Answer

Related Solutions

User permissions inside and outside of LXC containers

Linux – Migrate an unprivileged LXC container between users

Preliminary assumptions

Moving the container

Backups

Resources:

Related Question