Linux Kernel – Understanding Mount Namespace Functionality

chrootlinux-kernelmountnamespace

either if you fork into a new mount namespace, or enter an existing one.

It is possible to hold file descriptors from a foreign mount namespace. You can demonstrate this very easily, by finding a process in a foreign mount namespace such as [kdevtmpfs], and opening /proc/$PID/root. (If I change to this directory and run /bin/pwd, it seems to print the awesome error message /usr/bin/pwd: couldn't find directory entry in ‘..’ with matching i-node, and strace shows that getcwd() returned (unreachable)/).

Please define what happens to the existing references which a process holds to the current mount namespace – the current directory and current root (chroot) – when entering a new mount namespace.

If neither of these references were modified, there would not be much point entering a new mount namespace. E.g. opening a file /path/to/file would open it from the old mount namespace, if the process' root still pointed into the old mount namespace.

Again, I would like to understand both the case of clone() with CLONENEWNS (like the unshare command), and the case of setns() (like the nsenter command).

Best Answer

Both the current working directory, and the root, are reset to the root filesystem of the entered mount namespace.

For example, I have tested that I can escape chroot by running nsenter -m --target $$.

(Reminder: chroot is easy to escape when you are still root. man chroot documents the well-known way of doing this).


Source

https://elixir.bootlin.com/linux/latest/source/fs/namespace.c?v=4.17#L3507

static int mntns_install(struct nsproxy *nsproxy, struct ns_common *ns)
{
    struct fs_struct *fs = current->fs;

Note: current means the current task - the current thread/process.

->fs will be the filesystem data of that task - this is shared between tasks that are threads within the same process. E.g. you will see below that changing the working directory is an operation on ->fs.

E.g. changing the working directory affects all threads of the same process. POSIX-compatible threads like this are implemented using the CLONE_FS flag of clone().

    struct mnt_namespace *mnt_ns = to_mnt_ns(ns), *old_mnt_ns;
    struct path root;
    int err;

...

    /* Find the root */
    err = vfs_path_lookup(mnt_ns->root->mnt.mnt_root, &mnt_ns->root->mnt,
                "/", LOOKUP_DOWN, &root);

here is the line in question:

    /* Update the pwd and root */
    set_fs_pwd(fs, &root);
    set_fs_root(fs, &root);

...

}

...

const struct proc_ns_operations mntns_operations = {
    .name       = "mnt",
    .type       = CLONE_NEWNS,
    .get        = mntns_get,
    .put        = mntns_put,
    .install    = mntns_install,
    .owner      = mntns_owner,
};
Related Question