Linux Kernel – Understanding Mount Namespace Functionality

chrootlinux-kernelmountnamespace

either if you fork into a new mount namespace, or enter an existing one.

It is possible to hold file descriptors from a foreign mount namespace. You can demonstrate this very easily, by finding a process in a foreign mount namespace such as [kdevtmpfs], and opening /proc/$PID/root. (If I change to this directory and run /bin/pwd, it seems to print the awesome error message /usr/bin/pwd: couldn't find directory entry in ‘..’ with matching i-node, and strace shows that getcwd() returned (unreachable)/).

Please define what happens to the existing references which a process holds to the current mount namespace – the current directory and current root (chroot) – when entering a new mount namespace.

If neither of these references were modified, there would not be much point entering a new mount namespace. E.g. opening a file /path/to/file would open it from the old mount namespace, if the process' root still pointed into the old mount namespace.

Again, I would like to understand both the case of clone() with CLONENEWNS (like the unshare command), and the case of setns() (like the nsenter command).

Best Answer

Both the current working directory, and the root, are reset to the root filesystem of the entered mount namespace.

For example, I have tested that I can escape chroot by running nsenter -m --target $$.

(Reminder: chroot is easy to escape when you are still root. man chroot documents the well-known way of doing this).

Source

https://elixir.bootlin.com/linux/latest/source/fs/namespace.c?v=4.17#L3507

static int mntns_install(struct nsproxy *nsproxy, struct ns_common *ns)
{
    struct fs_struct *fs = current->fs;

Note: current means the current task - the current thread/process.

->fs will be the filesystem data of that task - this is shared between tasks that are threads within the same process. E.g. you will see below that changing the working directory is an operation on ->fs.

E.g. changing the working directory affects all threads of the same process. POSIX-compatible threads like this are implemented using the CLONE_FS flag of clone().

    struct mnt_namespace *mnt_ns = to_mnt_ns(ns), *old_mnt_ns;
    struct path root;
    int err;

...

    /* Find the root */
    err = vfs_path_lookup(mnt_ns->root->mnt.mnt_root, &mnt_ns->root->mnt,
                "/", LOOKUP_DOWN, &root);

here is the line in question:

    /* Update the pwd and root */
    set_fs_pwd(fs, &root);
    set_fs_root(fs, &root);

...

}

...

const struct proc_ns_operations mntns_operations = {
    .name       = "mnt",
    .type       = CLONE_NEWNS,
    .get        = mntns_get,
    .put        = mntns_put,
    .install    = mntns_install,
    .owner      = mntns_owner,
};

Related Solutions

Linux Mount – Unable to Umount After Pivot Root

I believe, you are executing the command umount /old_root still from the old root, and therefore it is busy.

I once did a similar script, and the following worked for me:

#!/bin/sh

 mount -v -n -t proc  -onodev,noexec,nosuid proc  /proc
 mount -v -n -t sysfs -onodev,noexec,nosuid sysfs /sys

 mount -v -t ext4 /dev/sdb1 /mnt/root                           

 mount --move /dev  /mnt/root/dev/                                  
 mount --move /proc /mnt/root/proc/                                 
 mount --move /sys  /mnt/root/sys/                                  

 echo "Switching root filesystem..."
 cd /mnt/root                                               
 pivot_root . mnt/tmp/                                          

 exec chroot . /sbin/init

then, inside the new root, the first command the new init executes is umount /mnt/tmp/.

Linux – Did the pivot_root() documentation anticipate the feature of mount namespaces

It sounds like the alternative implementation of pivot_root() would put the calling process in a new, altered mount namespace. Is that a valid reading?

No. IMO this is not very clear, but there is a much more consistent and correct reading.

The essential part of pivot_root(), which must be the same in either implementation, is:

pivot_root() moves the root filesystem of the calling process to the directory put_old and makes new_root the new root filesystem of the calling process.

The essential part of pivot_root() is not limited only to the calling process. The operation described in this quote works on the mount namespace of the calling process. It will affect the view of all the processes in the same mount namespace.

Consider the effect the essential change has on such a second process - or kernel thread - whose working directory was the old root filesystem. Its current directory will still be the old root filesystem. This will keep the /put_old mount point busy, and so it will not be possible to unmount the old root filesystem.

If you control this second process, you resolve this, as per the manpage, by setting its working directory to new_root before pivot_root() is called. After pivot_root() is called, its current directory will still be the new root filesystem.

So process S(ystemd) has been configured to signal process P(lymouth), to change working directory before S calls pivot_root(). No problem. But, we also have kernel threads, which start in /. The current implementation of pivot_root() takes care of the kernel threads for us; it is equivalent to setting the working directories of kernel threads and any other process to new_root before the essential part of pivot_root().

Except, the current implementation of pivot_root() only changes the working directory of a process if the old working directory was /. So it's actually quite easy to see the difference this makes:

$ unshare -rm
# cd /tmp    # work in a subdir instead of '/', and pivot_root() will not change it
# /bin/pwd
/tmp
# mount --bind /new-root /new-root
# pivot_root /new-root /new-root/mnt
# /bin/pwd
/mnt/tmp    # see below: if pivot_root had not updated our current chroot, this would still show /tmp

v.s.

$ unshare -rm
# cd /
# /bin/pwd
/
# ls -lid .
2 dr-xr-xr-x. 19 nfsnobody nfsnobody 4096 Jun 13 01:17 .
# ls -lid /newroot
6424395 dr-xr-xr-x. 20 nfsnobody nfsnobody 4096 May 10 12:53 /new-root
# mount --bind /new-root /new-root
# pivot_root /new-root /new-root/mnt
# /bin/pwd
/
# ls -lid .
6424395 dr-xr-xr-x. 20 nobody nobody 4096 May 10 12:53 .
# ls -lid /
6424395 dr-xr-xr-x. 20 nobody nobody 4096 May 10 12:53 /
# ls -lid /mnt
2 dr-xr-xr-x. 19 nobody nobody 4096 Jun 13 01:17 /mnt

Now I understand what's happening with the working directory, I find it easier to understand what's happening with chroot(). The current chroot of the process which calls pivot_root() may be a reference to the original root filesystem, just as its current working directory may be.

Note, if you do chdir()+pivot_root() but forgot to chroot(), your current directory would be outside your current chroot. When your current directory is outside your current chroot, things get quite confusing. You probably don't want to run your program in this state.

# cd /
# python
>>> import os
>>> os.chroot("/newroot")
>>> os.system("/bin/pwd")
(unreachable)/
0
>>> os.getcwd()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 2] No such file or directory
>>> os.system("ls -l ./proc/self/cwd")
lrwxrwxrwx. 1 root root 0 Jun 17 13:46 ./proc/self/cwd -> /
0
>>> os.system("ls -lid ./proc/self/cwd/")
2 dr-xr-xr-x. 19 root root 4096 Jun 13 01:17 ./proc/self/cwd/
0
>>> os.system("ls -lid /")
6424395 dr-xr-xr-x. 20 root root 4096 May 10 12:53 /
0

POSIX does not specify the result of pwd or getcwd() in this situation :). POSIX gives no warning that you might get an "No such file or directory" (ENOENT) error from getcwd(). Linux manpages point out this error as being possible, if the working directory was unlinked (e.g. with rm). I think this is a very good parallel.

Best Answer

Source

Related Solutions

Linux Mount – Unable to Umount After Pivot Root

Linux – Did the pivot_root() documentation anticipate the feature of mount namespaces

Related Question