What are benefits and downsides of unprivileged containers

lxcnot-root-userprivilegesSecurityvirtual machine

The technical explanation of what is unprivileged container is quite good. However, it is not for ordinary PC user. Is there a simple answer when and why people should use unpriviliged containers, and what are their benefits and downsides?

Best Answer

Running unprivileged containers is the safest way to run containers in a production environment. Containers get bad publicity when it comes to security and one of the reasons is because some users have found that if a user gets root in a container then there is a possibility of gaining root on the host as well. Basically what an unprivileged container does is mask the userid from the host . With unprivileged containers, non root users can create containers and will have and appear in the container as root but will appear as userid 10000 for example on the host (whatever you map the userids as). I recently wrote a blog post on this based on Stephane Graber's blog series on LXC (One of the brilliant minds/lead developers of LXC and someone to definitely follow). I say again, extremely brilliant.

From my blog:

From the container:

lxc-attach -n ubuntu-unprived
root@ubuntu-unprived:/# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 04:48 ?        00:00:00 /sbin/init
root       157     1  0 04:48 ?        00:00:00 upstart-udev-bridge --daemon
root       189     1  0 04:48 ?        00:00:00 /lib/systemd/systemd-udevd --daemon
root       244     1  0 04:48 ?        00:00:00 dhclient -1 -v -pf /run/dhclient.eth0.pid
syslog     290     1  0 04:48 ?        00:00:00 rsyslogd
root       343     1  0 04:48 tty4     00:00:00 /sbin/getty -8 38400 tty4
root       345     1  0 04:48 tty2     00:00:00 /sbin/getty -8 38400 tty2
root       346     1  0 04:48 tty3     00:00:00 /sbin/getty -8 38400 tty3
root       359     1  0 04:48 ?        00:00:00 cron
root       386     1  0 04:48 console  00:00:00 /sbin/getty -8 38400 console
root       389     1  0 04:48 tty1     00:00:00 /sbin/getty -8 38400 tty1
root       408     1  0 04:48 ?        00:00:00 upstart-socket-bridge --daemon
root       409     1  0 04:48 ?        00:00:00 upstart-file-bridge --daemon
root       431     0  0 05:06 ?        00:00:00 /bin/bash
root       434   431  0 05:06 ?        00:00:00 ps -ef

From the host:

lxc-info -Ssip --name ubuntu-unprived
State:          RUNNING
PID:            3104
IP:             10.1.0.107
CPU use:        2.27 seconds
BlkIO use:      680.00 KiB
Memory use:     7.24 MiB
Link:           vethJ1Y7TG
TX bytes:      7.30 KiB
RX bytes:      46.21 KiB
Total bytes:   53.51 KiB

ps -ef | grep 3104
100000    3104  3067  0 Nov11 ?        00:00:00 /sbin/init
100000    3330  3104  0 Nov11 ?        00:00:00 upstart-udev-bridge --daemon
100000    3362  3104  0 Nov11 ?        00:00:00 /lib/systemd/systemd-udevd --daemon
100000    3417  3104  0 Nov11 ?        00:00:00 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
100102    3463  3104  0 Nov11 ?        00:00:00 rsyslogd
100000    3516  3104  0 Nov11 pts/8    00:00:00 /sbin/getty -8 38400 tty4
100000    3518  3104  0 Nov11 pts/6    00:00:00 /sbin/getty -8 38400 tty2
100000    3519  3104  0 Nov11 pts/7    00:00:00 /sbin/getty -8 38400 tty3
100000    3532  3104  0 Nov11 ?        00:00:00 cron
100000    3559  3104  0 Nov11 pts/9    00:00:00 /sbin/getty -8 38400 console
100000    3562  3104  0 Nov11 pts/5    00:00:00 /sbin/getty -8 38400 tty1
100000    3581  3104  0 Nov11 ?        00:00:00 upstart-socket-bridge --daemon
100000    3582  3104  0 Nov11 ?        00:00:00 upstart-file-bridge --daemon
lxc       3780  1518  0 00:10 pts/4    00:00:00 grep --color=auto 3104

As you can see processes are running inside the container as root but are not appearing as root but as 100000 from the host.

So to sum up: Benefits - added security and added isolation for security. Downside - A little confusing to wrap your head around at first and not for the novice user.

Related Solutions

Linux – Building unprivileged (userns) LXC container from scratch, by migrating a privileged container to be unprivileged

I was just doing something very similar, moving KVM VMs into unprivileged LXC.

I was using system containers for this (so they can be started automatically on boot), but with mapped UID/GIDs (user namespaces).

edit /etc/subuid,subgid (I mapped uid/gids 10M-100M to root and use 100K per container)
for first container, use u/gids 10000000-10099999 in /var/lib/lxc/CTNAME/config
mount the container storage on /var/lib/lxc/CTNAME/rootfs (or do nothing if you don't use separate volume/dataset/whatever per container)
chown 10000000:10000000 /var/lib/lxc/CTNAME/rootfs
setfacl -m u:10000000:x /var/lib/lxc (or simply chmod o+x /var/lib/lxc)
lxc-usernsexec -m b:0:10000000:100000 -- /bin/bash

Now you're in the first container user namespace. Everything is the same, but your process thinks it's uid is 0, when in fact in the host namespace it's uid 10000000. Check /proc/self/uid_map to see whether your uid is mapped or not. You will notice you can no longer read from /root and it seems to be owned by nobody/nogroup.

While in the user namespace, I rsync from the original host.

Outside the user namespace, you will see that the files in /var/lib/lxc/CTNAME/rootfs are now owned not by the expected (same) uids as the origin installation, but rather 10000000+remote_uid. This is what you want.

That's it. When you have your data sync'ed, remove everything from container's /etc/fstab so it won't try to mount things, and it should start. There might be other things to change, check what the LXC template for the containerised distro does. You can definitely remove the kernel, grub, ntp and any hardware-probing packages in the container (you don't even have to run it, you can chroot to the container from the user namespace)

If you don't have a running remote VM, you can also mount the original VM storage in the host namespace and rsync/SSH back in to localhost. The effect will be the same.

If you (as it seems) simply want to change your privileged container to unprivileged, you might as well just add the uid/gid mapping, add a mapping as above to your container config and then do something along the lines of:

for i in `seq 0 65535`; do
  find /var/lib/lxc/CTNAME/rootfs -uid $i -exec chown $((10000000+i)) \{\} \;
  find /var/lib/lxc/CTNAME/rootfs -gid $i -exec chgrp $((10000000+i)) \{\} \;
done

That should be all that needs doing, now you should be able to run the container unprivileged. This example above is extremely inefficient, uidshift will probably do a better job at this (but I haven't used it yet).

HTH.

Linux – a Linux container and a Linux hypervisor

A Virtual Machine (VM) is quite a generic term for many virtualisation technologies.

There are a many variations on virtualisation technologies, but the main ones are:

Hardware Level Virtualisation
Operating System Level Virtualisation

qemu-kvm and VMWare are examples of the first. They employ a hypervisor to manage the virtual environments in which a full operating system runs. For example, on a qemu-kvm system you can have one VM running FreeBSD, another running Windows, and another running Linux.

The virtual machines created by these technologies behave like isolated individual computers to the guest. These have a virtual CPU, RAM, NIC, graphics etc which the guest believes are the genuine article. Because of this, many different operating systems can be installed on the VMs and they work "out of the box" with no modification needed.

While this is very convenient, in that many OSes will install without much effort, it has a drawback in that the hypervisor has to simulate all the hardware, which can slow things down. An alternative is para-virtualised hardware, in which a new virtual device and driver is developed for the guest that is designed for performance in a virtual environment. qemu-kvm provide the virtio range of devices and drivers for this. A downside to this is that the guest OS must be supported; but if supported, the performance benefits are great.

lxc is an example of Operating System Level Virtualisation, or containers. Under this system, there is only one kernel installed - the host kernel. Each container is simply an isolation of the userland processes. For example, a web server (for instance apache) is installed in a container. As far as that web-server is concerned, the only installed server is itself. Another container may be running a FTP server. That FTP server isn't aware of the web-server installation - only it's own. Another container can contain the full userland installation of a Linux distro (as long as that distro is capable of running with the host system's kernel).

However, there are no separate operating system installations when using containers - only isolated instances of userland services. Because of this, you cannot install different platforms in a container - no Windows on Linux.

Containers are usually created by using a chroot. This creates a separate private root (/) for a process to work with. By creating many individual private roots, processes (web-servers, or a Linux distro, etc) run in their own isolated filesystem. More advanced techniques, such as cgroups can isolate other resources such as network and RAM.

There are pros and cons to both and many long running debates as to which is best.

Containers are lighter, in that a full OS isn't installed for each; which is the case for hypervisors. They can therefore run on lower spec'd hardware. However, they can only run Linux guests (on Linux hosts). Also, because they share the kernel, there is the possibility that a compromised container may affect another.
Hypervisors are more secure and can run different OSes because a full OS is installed in each VM and guests are not aware of other VMs. However, this utilises more resources on the host, which has to be relatively powerful.

Best Answer

Related Solutions

Linux – Building unprivileged (userns) LXC container from scratch, by migrating a privileged container to be unprivileged

Linux – a Linux container and a Linux hypervisor

Related Question