Chroot – How to Perform Chroot with Linux Namespaces

chrootnamespace

After reading about Linux namespaces I was under the impression that they are, amongst a lot of other features, an alternative to chroot. For example, in this article:

Other uses [of namespaces] include […] chroot()-style isolation of a process to a portion of the single directory hierarchy.

However, when I clone the mount namespace, for example with the following command, I still see the whole original root tree.

unshare --mount -- /bin/bash

I understand that I am now able to perform additional mounts in the new namespace that are not shared with the original namespace and thus this provides isolation, but it is still the same root, e.g. /etc is still the same for both namespaces. Do I still need chroot to change the root or is there an alternative?

I was expecting that this question would provide an answer, but the answer only uses chroot, again.

EDIT #1

There was a now deleted comment that mentioned pivot_root. Since this is actually part of linux/fs/namespace.c, it is in fact part of the namespaces implementation. This suggests that changing the root directory only with unshare and mount is not possible, but namespaces provides an own – more clever – version of chroot. Still I do not get the main idea of this approach that makes it fundamentally different from chroot, even after reading the source code (in the sense of e.g. security or better isolation).

EDIT #2

This is not a duplicate of this question. After executing all the commands from the answer I have separate /tmp/tmp.vyM9IwnKuY (or similar), but the root directory is still the same!

Best Answer

Entering a mount namespace before setting up a chroot, lets you avoid cluttering the host namespace with additional mounts, e.g. for /proc. You can use chroot inside a mount namespace as a nice and simple hack.

I think there are advantages to understanding pivot_root, but it has a bit of a learning curve. The documentation does not quite explain everything... although there is a usage example in man 8 pivot_root (for the shell command). man 2 pivot_root (for the system call) might be clearer if it did the same, and included an example C program.

How to use pivot_root

Immediately after entering the mount namespace, you also need mount --make-rslave / or equivalent. Otherwise, all your mount changes propagate to the mounts in the original namespace, including the pivot_root. You don't want that :).

If you used the unshare --mount command, note it is documented to apply mount --make-rprivate by default. AFAICS this is a bad default and you don't want this in production code. E.g. at this point, it would stop eject from working on a mounted DVD or USB in the host namespace. The DVD or USB would remain mounted inside the private mount tree, and the kernel would not let you eject the DVD.

Once you've done that, you can mount e.g. the /proc directory you will be using. The same way you would for chroot.

Unlike when you use chroot, pivot_root requires that your new root filesystem is a mount point. If it is not one already, you can satisfy this by simply applying a bind mount: mount --rbind new_root new_root.

Use pivot_root - and then umount the old root filesystem, with the -l / MNT_DETACH option. (You don't need umount -R, which can take longer.).

Technically, using pivot_root generally needs to involve using chroot as well; it's not "either-or".

As per man 2 pivot_root, it's only defined as swapping the root of the mount namespace. It isn't defined to change which physical directory the process root is pointing to. Or the current working directory (/proc/self/cwd). It happens that it does do so, but this is a hack to handle kernel threads. The manpage says that could change in future.

Usually you want this sequence:

chdir(new_root);            // cd new_root
pivot_root(".", put_old);   // pivot_root . put_old
chroot(".");                // chroot .

The postition of the chroot in this sequence is yet another subtle detail. Although the point of pivot_root is to rearrange the mount namespace, the kernel code seems to find the root filesystem to move by looking at the per-process root, which is what chroot sets.

Why to use pivot_root

In principle, it makes sense to use pivot_root for security and isolation. I like to think about the theory of capability-based security. You pass in a list of the specific resources needed, and the process can access no other resources. In this case we are talking about the filesystems passed in to a mount namespace. This idea applies generally to the Linux "namespaces" feature, though I'm probably not expressing it very well.

chroot only sets the process root, but the process still refers to the full mount namespace. If a process retains the privilege to perform chroot, then it can traverse back up the filesystem namespace. As detailed in man 2 chroot, "the superuser can escape from a 'chroot jail' by...".

Another thought-provoking way to undo chroot is nsenter --mount=/proc/self/ns/mnt. This is perhaps a stronger argument for the principle. nsenter / setns() necessarily re-loads the process root, from the root of the mount namespace... although the fact that this works when the two refer to different physical directories, might be considered a kernel bug. (Technical note: there could be multiple filesystems mounted on top of each other at the root; setns() uses the top, most recently mounted one).

This illustrates one advantage of combining a mount namespace with a "PID namespace". Being inside a PID namespace would prevent you from entering the mount namespace of an unconfined process. It also prevents you entering the root of an unconfined process (/proc/$PID/root). And of course a PID namespace also prevents you from killing any process which is outside it :-).

Related Solutions

Linux – Mount points in a chroot

Yes, you would need to use "unshare" instead (or as well as) chroot; chroot ONLY changes the root directory of the process. While it's difficult in practice to get to anything which is above it, there are many ways to break out. It's not a jail.

There are some tools which do this, such as "lxc" (Linux containers)

Linux – unshare –map-root-user switch to original uid/username after setup

The unshare(1) command can't do it:

-r, --map-root-user
[...] As a mere convenience feature, it does not support more sophisticated use cases, such as mapping multiple ranges of UIDs and GIDs.

Supplementary groups if any (video, ...) will be lost anyway (or mapped to nogroup).

By changing again into a 2nd new user namespace, it's possible to revert back the mapping. This requires a custom program, since unshare(1) won't do it. Here's a very minimalistic C program as proof of concept (one user only: uid/gid 1000/1000, zero failure check). Let's call it revertuid.c:

#define _GNU_SOURCE
#include <sched.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#include <unistd.h>

int main(int argc, char *argv[]) {
    int fd;

    unshare(CLONE_NEWUSER);
    fd=open("/proc/self/setgroups",O_WRONLY);
    write(fd,"deny",4);
    close(fd);
    fd=open("/proc/self/uid_map",O_WRONLY);
    write(fd,"1000 0 1",8);
    close(fd);
    fd=open("/proc/self/gid_map",O_WRONLY);
    write(fd,"1000 0 1",8);
    close(fd);
    execvp(argv[1],argv+1);
}

It's just doing the reverse mapping of the mapping done by unshare -r -m, which was unavoidable, to be able to be root and use mount, as seen with:

$ strace unshare -r -m /bin/sleep 1 2>&1 |sed -n '/^unshare/,/^execve/p'
unshare(CLONE_NEWNS|CLONE_NEWUSER)      = 0
open("/proc/self/setgroups", O_WRONLY)  = 3
write(3, "deny", 4)                     = 4
close(3)                                = 0
open("/proc/self/uid_map", O_WRONLY)    = 3
write(3, "0 1000 1", 8)                 = 8
close(3)                                = 0
open("/proc/self/gid_map", O_WRONLY)    = 3
write(3, "0 1000 1", 8)                 = 8
close(3)                                = 0
execve("/bin/sleep", ["/bin/sleep", "1"], [/* 18 vars */]) = 0

So that gives:

user@stretch-amd64:~$ gcc -o revertuid revertuid.c
user@stretch-amd64:~$ mkdir -p /tmp/src /tmp/dst
user@stretch-amd64:~$ touch /tmp/src/file
user@stretch-amd64:~$ ls /tmp/dst
user@stretch-amd64:~$ id
uid=1000(user) gid=1000(user) groups=1000(user)
user@stretch-amd64:~$ unshare -r -m
root@stretch-amd64:~# mount --bind /tmp/src /tmp/dst
root@stretch-amd64:~# ls /tmp/dst
file
root@stretch-amd64:~# exec ./revertuid bash
user@stretch-amd64:~$ ls /tmp/dst
file
user@stretch-amd64:~$ id
uid=1000(user) gid=1000(user) groups=1000(user)

Or shorter:

user@stretch-amd64:~$ unshare -r -m sh -c 'mount --bind /tmp/src /tmp/dst; exec ./revertuid bash'
user@stretch-amd64:~$ ls /tmp/dst
file

The behaviour probably changed after kernel 3.19 as seen in user_namespaces(7):

The /proc/[pid]/setgroups file was added in Linux 3.19, but was backported to many earlier stable kernel series, because it addresses a security issue. The issue concerned files with permissions such as "rwx---rwx".