What does it mean if a Linux container (LXC container) is called "unprivileged"?
An unprivileged LXC container
lxcuserns
Related Solutions
I was just doing something very similar, moving KVM VMs into unprivileged LXC.
I was using system containers for this (so they can be started automatically on boot), but with mapped UID/GIDs (user namespaces).
- edit /etc/subuid,subgid (I mapped uid/gids 10M-100M to root and use 100K per container)
- for first container, use u/gids 10000000-10099999 in /var/lib/lxc/CTNAME/config
- mount the container storage on /var/lib/lxc/CTNAME/rootfs (or do nothing if you don't use separate volume/dataset/whatever per container)
- chown 10000000:10000000 /var/lib/lxc/CTNAME/rootfs
- setfacl -m u:10000000:x /var/lib/lxc (or simply chmod o+x /var/lib/lxc)
- lxc-usernsexec -m b:0:10000000:100000 -- /bin/bash
Now you're in the first container user namespace. Everything is the same, but your process thinks it's uid is 0, when in fact in the host namespace it's uid 10000000. Check /proc/self/uid_map to see whether your uid is mapped or not. You will notice you can no longer read from /root and it seems to be owned by nobody/nogroup.
While in the user namespace, I rsync from the original host.
Outside the user namespace, you will see that the files in /var/lib/lxc/CTNAME/rootfs are now owned not by the expected (same) uids as the origin installation, but rather 10000000+remote_uid. This is what you want.
That's it. When you have your data sync'ed, remove everything from container's /etc/fstab so it won't try to mount things, and it should start. There might be other things to change, check what the LXC template for the containerised distro does. You can definitely remove the kernel, grub, ntp and any hardware-probing packages in the container (you don't even have to run it, you can chroot to the container from the user namespace)
If you don't have a running remote VM, you can also mount the original VM storage in the host namespace and rsync/SSH back in to localhost. The effect will be the same.
If you (as it seems) simply want to change your privileged container to unprivileged, you might as well just add the uid/gid mapping, add a mapping as above to your container config and then do something along the lines of:
for i in `seq 0 65535`; do
find /var/lib/lxc/CTNAME/rootfs -uid $i -exec chown $((10000000+i)) \{\} \;
find /var/lib/lxc/CTNAME/rootfs -gid $i -exec chgrp $((10000000+i)) \{\} \;
done
That should be all that needs doing, now you should be able to run the container unprivileged. This example above is extremely inefficient, uidshift will probably do a better job at this (but I haven't used it yet).
HTH.
I know now how to do this. If you can't follow this explanation, please ask back, but also make sure you have read up on userns in the readings I am giving at the bottom
Preliminary assumptions
I'll stick with the following assumptions, extended from what I have from your question:
- host has a
user1
and auser2
, if an information isn't specific to one, we'll useuserX
- the container will be named by a variable which we will render as
$container
- home folders for
user1
anduser2
will be given in the notation known from Bash as~user1
and~user2
. - we'll assume the subordinate UID and GID ranges to be 100000..165536 for
user1
and 200000..265536 foruser2
just for brevity - the root FS folder for
$container
will be rendered as$rootfs
, regardless of where it will end up (~userX/.local/share/lxc/$container/rootfs
) - container configuration is by default in
~userX/.local/share/lxc/$container/config
Moving the container
There are two relevant pieces of data that govern the userns
containers:
- owner and group for the files/folders of the folders comprising the
$container
- the subordinate UIDs and GIDs assigned in two places:
/etc/sub{uid,gid}
for the user account (manipulated viausermod --{add,del}-sub-{uid,gid}s
) andlxc.id_map
in the$container
configuration (~userX/.local/share/lxc/$container/config
) respectively- I don't know for certain whether it is possible to define different ranges in the container configuration for each container. E.g. if the host user
userX
has 65536 subordinate GIDs and UIDs, it might be possible to assign 5000 to 65 different containers, but I haven't tested that hypothesis. - it is certain, though, that this setting communicates to LXC which are the valid ranges for GID and UID in the child namespace.
- I don't know for certain whether it is possible to define different ranges in the container configuration for each container. E.g. if the host user
So the gist is really that you need to make sure that the file/folder owner and group for the container match the configuration, which in turn has to be a valid subset of the host subordinate GIDs/UIDs assigned to user1
and user2
respectively.
If you're using Bash, for example, you can use $((expression))
for arithmetic expressions and let
to assign arithmetic expressions to variables. This is mighty useful if you know a base value (100000 and 200000 respectively) and the GID/UID for the "inside" users.
The main points are:
- it's possible
- either the capability
CAP_CHOWN
or superuser rights are required
Here's a script that will probably need some more honing (example: migration from root-created container to unprivileged), but it works for me for the purpose:
#!/usr/bin/env bash
function syntax
{
echo "SYNTAX: ${0##*/} <from-user> <to-user> <container-name>"
[[ -n "$1" ]] && echo -e "\nERROR: ${1}."
exit 1
}
# Checks
[[ -n "$1" ]] || syntax "<from-user> is not set"
[[ -n "$2" ]] || syntax "<to-user> is not set"
[[ -n "$3" ]] || syntax "<container-name> is not set"
[[ "$UID" -eq "0" ]] || syntax "${0##*/}" "You must be superuser to make use of this script"
# Constants with stuff we need
readonly USERFROM=$1
readonly USERTO=$2
shift; shift
readonly CONTAINER=${1:-*}
LXCLOCAL=".local/share/lxc"
readonly HOMEFROM=$(eval echo ~$USERFROM)
readonly HOMETO=$(eval echo ~$USERTO)
readonly LXCFROM="$HOMEFROM/$LXCLOCAL"
readonly LXCTO="$HOMETO/$LXCLOCAL"
readonly GIDBASEFROM=$(awk -F : "\$1 ~/$USERFROM/ {print \$2}" /etc/subgid)
readonly UIDBASEFROM=$(awk -F : "\$1 ~/$USERFROM/ {print \$2}" /etc/subuid)
readonly GIDSIZEFROM=$(awk -F : "\$1 ~/$USERFROM/ {print \$3}" /etc/subgid)
readonly UIDSIZEFROM=$(awk -F : "\$1 ~/$USERFROM/ {print \$3}" /etc/subuid)
readonly GIDBASETO=$(awk -F : "\$1 ~/$USERTO/ {print \$2}" /etc/subgid)
readonly UIDBASETO=$(awk -F : "\$1 ~/$USERTO/ {print \$2}" /etc/subuid)
readonly GIDSIZETO=$(awk -F : "\$1 ~/$USERTO/ {print \$3}" /etc/subgid)
readonly UIDSIZETO=$(awk -F : "\$1 ~/$USERTO/ {print \$3}" /etc/subuid)
unset LXCLOCAL
# More checks
[[ -d "$LXCFROM" ]] || syntax "Could not locate '$LXCFROM'. It is not a directory as expected"
[[ -e "$LXCTO" ]] && syntax "Destination '$LXCTO' already exists. However, it must not"
for i in GIDBASEFROM UIDBASEFROM GIDBASETO UIDBASETO; do
(($i > 0)) || syntax "Could not determine base/offset of subordinate UID/GID range"
done
for i in GIDSIZEFROM UIDSIZEFROM GIDSIZETO UIDSIZETO; do
(($i > 0)) || syntax "Could not determine length of subordinate UID/GID range"
done
echo "Going to migrate container: $CONTAINER"
echo -e "\tfrom user $USERFROM ($HOMEFROM): subUID=${UIDBASEFROM}..$((UIDBASEFROM+UIDSIZEFROM)); subGID=${GIDBASEFROM}..$((GIDBASEFROM+GIDSIZEFROM))"
echo -e "\tto user $USERTO ($HOMETO): subUID=${UIDBASETO}..$((UIDBASETO+UIDSIZETO)); subGID=${GIDBASETO}..$((GIDBASETO+GIDSIZETO))"
while read -p "Do you want to continue? (y/N) "; do
case ${REPLY:0:1} in
y|Y)
break;
;;
*)
echo "User asked to abort."
exit 1
;;
esac
done
# Find the UIDs and GIDs in use in the container
readonly SUBGIDSFROM=$(find -H "$LXCFROM" -printf '%G\n'|sort -u)
readonly SUBUIDSFROM=$(find -H "$LXCFROM" -printf '%U\n'|sort -u)
# Change group
for gid in $SUBGIDSFROM; do
let GIDTO=$(id -g "$USERTO")
if ((gid == $(id -g "$USERFROM"))); then
echo "Changing group from $USERFROM ($gid) to $USERTO ($GIDTO)"
find -H "$LXCFROM/$CONTAINER" -gid $gid -exec chgrp $GIDTO {} +
elif ((gid >= GIDBASEFROM )) && ((gid <= GIDBASEFROM+GIDSIZEFROM)); then
let GIDTO=$((gid-GIDBASEFROM+GIDBASETO))
echo "Changing group $gid -> $GIDTO"
find -H "$LXCFROM/$CONTAINER" -gid $gid -exec chgrp $GIDTO {} +
else
echo "ERROR: Some file/folder inside '$LXCFROM/$CONTAINER' has a group not assigned to $USERFROM (assigned subordinate GIDs)."
echo -e "Use:\n\tfind -H '$LXCFROM/$CONTAINER' -gid $gid\nto list those files/folders."
exit 1
fi
done
# Change owner
for uid in $SUBUIDSFROM; do
let UIDTO=$(id -u "$USERTO")
if ((uid == $(id -u "$USERFROM"))); then
echo "Changing owner from $USERFROM ($uid) to $USERTO ($UIDTO)"
find -H "$LXCFROM/$CONTAINER" -uid $uid -exec chown $UIDTO {} +
elif ((uid >= UIDBASEFROM )) && ((uid <= UIDBASEFROM+UIDSIZEFROM)); then
let UIDTO=$((uid-UIDBASEFROM+UIDBASETO))
echo "Changing owner $uid -> $UIDTO"
find -H "$LXCFROM/$CONTAINER" -uid $uid -exec chown $UIDTO {} +
else
echo "ERROR: Some file/folder inside '$LXCFROM/$CONTAINER' has an owner not assigned to $USERFROM (assigned subordinate UIDs)."
echo -e "Use:\n\tfind -H '$LXCFROM/$CONTAINER' -uid $uid\nto list those files/folders."
exit 1
fi
done
mv "$LXCFROM/$CONTAINER" "$LXCTO/" || { echo "ERROR: failed to move to destination: ${LXCTO}/${CONTAINER}."; exit 1; }
In addition to the license terms of the StackExchange network, I am putting this into the public domain. So reuse and modify for whatever purpose, but it comes without any warranty and I must not be held liable for its use or abuse.
UsageSYNTAX: lxc-reassign-userns.sh <from-user> <to-user> <container-name>
It assumes find
, sort
, uniq
, awk
(mawk
and gawk
should work), id
, bash
, chown
, chmod
and so on to be available and to understand all the command line switches it is using. For Bash readonly
and let
and arithmetic expressions are assumed to be understood. For find
is assumes +
is a valid terminator for the -exec
action.
This list is probably not complete.
Backups
Yes, you can make backups and restore them elsewhere, as long as you also adjust the file owner and group accordingly.
However, assuming you use something like tar
, there's a caveat: tar
will ignore sockets, so $rootfs/dev/log
will pose an issue - others may also create a similar issue.
Best Answer
Unprivileged LXC containers are the ones making use of user namespaces (userns). I.e. of a kernel feature that allows to map a range of UIDs on the host into a namespace inside of which a user with UID 0 can exist again.
Contrary to my initial perception of unprivileged LXC containers for a while, this does not mean that the container has to be owned by an unprivileged host user. That is only one possibility.
Relevant is:
usermod [-v|-w|--add-sub-uids|--add-sub-gids]
)lxc.id_map = ...
)So even
root
can own unprivileged containers, since the effective UIDs of container processes on the host will end up inside the range defined by the mapping.However, for
root
you have to define the subordinate IDs first. Unlike users created viaadduser
,root
will not have a range of subordinate IDs defined by default.Also keep in mind that the full range you give is at your disposal, so you could have 3 containers with the following configuration lines (only UID mapping shown):
lxc.id_map = u 0 100000 100000
lxc.id_map = u 0 200000 100000
lxc.id_map = u 0 300000 100000
NB: as per a comment recent versions call this
lxc.idmap
!assuming that
root
owns the subordinate UIDs between 100000 and 400000. All documentation I found suggests to use 65536 subordinate IDs per container, some use 100000 to make it more human-readbable, though.In other words: You don't have to assign the same range to each container.
With over 4 billion (~
2^32
) possible subordinate IDs that means you can be generous when dealing the subordinate ranges to your host users.Unprivileged container owned and run by root
To rub that in again. An unprivileged LXC guest does not require to be run by an unprivileged user on the host.
Configuring your container with a subordinate UID/GID mapping like this:
where the user
root
on the host owns that given subordinate ID range, will allow you to confine guests even better.However, there is one important additional advantage in such a scenario (and yes, I have verified that it works): you can auto-start your container at system startup.
Usually when scouring the web for information about LXC you will be told that it is not possible to autostart an unprivileged LXC guest. However, that is only true by default for those containers which are not in the system-wide storage for containers (usually something like
/var/lib/lxc
). If they are (which usually means they were created by root and are started by root), it's a whole different story.will do the job quite nicely, once you put it into your container config.
Getting permissions and configuration right
I struggled with this myself a bit, so I'm adding a section here.
In addition to the configuration snippet included via
lxc.include
which usually goes by the name/usr/share/lxc/config/$distro.common.conf
(where$distro
is the name of a distro), you should check if there is also a/usr/share/lxc/config/$distro.userns.conf
on your system and include that as well. E.g.:Furthermore add the subordinate ID mappings:
which means that the host UID 100000 is
root
inside the user namespace of the LXC guest.Now make sure that the permissions are correct. If the name of your guest would be stored in the environment variable
$lxcguest
you'd run the following:This should allow you to run the container after your first attempt may have given some permission-related errors.