It seems that others have missed your point, which was not reasons why to use changed roots, which of course you clearly already know, nor what else you can do to place limits on dæmons, when you also clearly know about running under the aegides of unprivileged user accounts; but why to do this stuff inside the application. There's actually a fairly on point example of why.
Consider the design of the httpd
dæmon program in Daniel J. Bernstein's publicfile package. The first thing that it does is change root to the root directory that it was told to use with a command argument, then drop privileges to the unprivileged user ID and group ID that are passed in two environment variables.
Dæmon management toolsets have dedicated tools for things like changing root directory and dropping to unprivileged user and group IDs. Gerrit Pape's runit has chpst
. My nosh toolset has chroot
and setuidgid-fromenv
. Laurent Bercot's s6 has s6-chroot
and s6-setuidgid
. Wayne Marshall's Perp has runtool
and runuid
. And so forth. Indeed, they all have M. Bernstein's own daemontools toolset with setuidgid
as an antecedent.
One would think that one could extract the functionality from httpd
and use such dedicated tools. Then, as you envision, no part of the server program ever runs with superuser privileges.
The problem is that one as a direct consequence has to do significantly more work to set up the changed root, and this exposes new problems.
With Bernstein httpd
as it stands, the only files and directories that are in the root directory tree are ones that are to be published to the world. There is nothing else in the tree at all. Moreover, there is no reason for any executable program image file to exist in that tree.
But move the root directory change out into a chain-loading program (or systemd), and suddenly the program image file for httpd
, any shared libraries that it loads, and any special files in /etc
, /run
, and /dev
that the program loader or C runtime library access during program initialization (which you might find quite surprising if you truss
/strace
a C or C++ program), also have to be present in the changed root. Otherwise httpd
cannot be chained to and won't load/run.
Remember that this is a HTTP(S) content server. It can potentially serve up any (world-readable) file in the changed root. This now includes things like your shared libraries, your program loader, and copies of various loader/CRTL configuration files for your operating system. And if by some (accidental) means the content server has access to write stuff, a compromised server can possibly gain write access to the program image for httpd
itself, or even your system's program loader. (Remember that you now have two parallel sets of /usr
, /lib
, /etc
, /run
, and /dev
directories to keep secure.)
None of this is the case where httpd
changes root and drops privileges itself.
So you have traded having a small amount of privileged code, that is fairly easy to audit and that runs right at the start of the httpd
program, running with superuser privileges; for having a greatly expanded attack surface of files and directories within the changed root.
This is why it is not as simple as doing everything externally to the service program.
Notice that this is nonetheless a bare minimum of functionality within httpd
itself. All of the code that does things such as look in the operating system's account database for the user ID and group ID to put into those environment variables in the first place is external to the httpd
program, in simple standalone auditable commands such as envuidgid
. (And of course it is a UCSPI tool, so it contains none of the code to listen on the relevant TCP port(s) or to accept connections, those being the domain of commands such as tcpserver
, tcp-socket-listen
, tcp-socket-accept
, s6-tcpserver4-socketbinder
, s6-tcpserver4d
, and so on.)
Further reading
Entering a mount namespace before setting up a chroot
, lets you avoid cluttering the host namespace with additional mounts, e.g. for /proc
. You can use chroot
inside a mount namespace as a nice and simple hack.
I think there are advantages to understanding pivot_root
, but it has a bit of a learning curve. The documentation does not quite explain everything... although there is a usage example in man 8 pivot_root
(for the shell command). man 2 pivot_root
(for the system call) might be clearer if it did the same, and included an example C program.
How to use pivot_root
Immediately after entering the mount namespace, you also need mount --make-rslave /
or equivalent. Otherwise, all your mount changes propagate to the mounts in the original namespace, including the pivot_root
. You don't want that :).
If you used the unshare --mount
command, note it is documented to apply mount --make-rprivate
by default. AFAICS this is a bad default and you don't want this in production code. E.g. at this point, it would stop eject
from working on a mounted DVD or USB in the host namespace. The DVD or USB would remain mounted inside the private mount tree, and the kernel would not let you eject the DVD.
Once you've done that, you can mount e.g. the /proc
directory you will be using. The same way you would for chroot
.
Unlike when you use chroot
, pivot_root
requires that your new root filesystem is a mount point. If it is not one already, you can satisfy this by simply applying a bind mount: mount --rbind new_root new_root
.
Use pivot_root
- and then umount
the old root filesystem, with the -l
/ MNT_DETACH
option. (You don't need umount -R
, which can take longer.).
Technically, using pivot_root
generally needs to involve using chroot
as well; it's not "either-or".
As per man 2 pivot_root
, it's only defined as swapping the root of the mount namespace. It isn't defined to change which physical directory the process root is pointing to. Or the current working directory (/proc/self/cwd
). It happens that it does do so, but this is a hack to handle kernel threads. The manpage says that could change in future.
Usually you want this sequence:
chdir(new_root); // cd new_root
pivot_root(".", put_old); // pivot_root . put_old
chroot("."); // chroot .
The postition of the chroot
in this sequence is yet another subtle detail. Although the point of pivot_root
is to rearrange the mount namespace, the kernel code seems to find the root filesystem to move by looking at the per-process root, which is what chroot
sets.
Why to use pivot_root
In principle, it makes sense to use pivot_root
for security and isolation. I like to think about the theory of capability-based security. You pass in a list of the specific resources needed, and the process can access no other resources. In this case we are talking about the filesystems passed in to a mount namespace. This idea applies generally to the Linux "namespaces" feature, though I'm probably not expressing it very well.
chroot
only sets the process root, but the process still refers to the full mount namespace. If a process retains the privilege to perform chroot
, then it can traverse back up the filesystem namespace. As detailed in man 2 chroot
, "the superuser can escape from a 'chroot jail' by...".
Another thought-provoking way to undo chroot
is nsenter --mount=/proc/self/ns/mnt
. This is perhaps a stronger argument for the principle. nsenter
/ setns()
necessarily re-loads the process root, from the root of the mount namespace... although the fact that this works when the two refer to different physical directories, might be considered a kernel bug. (Technical note: there could be multiple filesystems mounted on top of each other at the root; setns()
uses the top, most recently mounted one).
This illustrates one advantage of combining a mount namespace with a "PID namespace". Being inside a PID namespace would prevent you from entering the mount namespace of an unconfined process. It also prevents you entering the root of an unconfined process (/proc/$PID/root
). And of course a PID namespace also prevents you from killing any process which is outside it :-).
Best Answer
In step 2 you bind mounted
/
on/root/chroot
.If you create step 2.5 as
ls /root/chroot
you'll find all the directories of/
listed; including the system's/tmp
directory.If you
touch /root/chroot/test
you'll see thattest
is also in the output ofls /
. If yourm /test
you'll notice that it's also gone from/root/chroot/
. So/
and/root/chroot/
are exactly the same place.If you want to look in slightly more detail, run
stat /
and thenstat /root/chroot
and you'll notice that both return the sameInode
. AnInode
is a data structure that refers to a particular file/directory on the disk. As they both return the sameInode
then both paths are pointing to exactly the same directory.Step 3 therefore bind mounts the
/root/tmp
directory over the system/tmp
directory within the already bind mounted/root/chroot
.When you
chroot
in step 4, you'll be in a chrooted/
using the/tmp
directory in/root
instead of the system wide/tmp
. This way, the chroot isn't sharing a/tmp
with every other user on the system.