Why No Rootfs File System Present on Linux System?

kernellinuxmountroot-filesystem

The linux kernel documentation claims:

Rootfs is a special instance of ramfs (or tmpfs, if that's enabled),
which is always present in 2.6 systems. You can't unmount rootfs …

On all linux systems I tested (kernel > 2.6 and afaik normal boot procedure, e.g ubuntu 12.04), mount does not show a rootfs entry.

However, with a buildroot image when booting with an external .cpio archive, it's present.

In what cases is there a rootfs entry in mount?

Best Answer

  1. On old systems, mount may disagree with /proc/mounts
  2. Most of the time you won't see rootfs in /proc/mounts, but it is still mounted.
  3. Can we prove that rootfs is still mounted?

1. On old systems, mount may disagree with /proc/mounts

man mount says: "The programs mount and umount traditionally maintained a list of currently mounted filesystems in the file /etc/mtab."

The old approach does not really work for the root filesystem. The root filesystem may have been mounted by the kernel, not by mount. Therefore entries for / in the /etc/mtab may be quite contrived, and not necessarily in sync with the kernel's current list of mounts.

I haven't checked for sure, but in practice I don't think any system that uses the old scheme will initialize mtab to show a line with rootfs. (In theory, whether mount shows rootfs would depend on the software that first installed the mtab file).

man mount continues: "the real mtab file is still supported, but on current Linux systems it is better to make it a symlink to /proc/mounts instead, because a regular mtab file maintained in userspace cannot reliably work with namespaces, containers and other advanced Linux features."

mtab is converted into a symlink in Debian 7, and in Ubuntu 15.04.

1.1 Sources

Debian report #494001 - "debian-installer: /etc/mtab must be a symlink to /proc/mounts with linux >= 2.6.26"

#494001 is resolved in sysvinit-2.88dsf-14. See the closing message, dated 14 Dec 2011. The change is included in Debian 7 "Wheezy", released on 4 May 2013. (It uses sysvinit-2.88dsf-41).

Ubuntu delayed this change until sysvinit_2.88dsf-53.2ubuntu1. That changelog page shows the change enters "vivid", which is the codename for Ubuntu 15.04.

2. Most of the time you won't see rootfs in /proc/mounts, but it is still mounted

As of Linux v4.17, this kernel documentation is still up to date. rootfs is always present, and it can never be unmounted. But most of the time you cannot see it in /proc/mounts.

You can see rootfs if you boot into an initramfs shell. If your initramfs is dracut, as in Fedora Linux, you can do this by adding the option rd.break to the kernel command line. (E.g. inside the GRUB boot loader).

switch_root:/# grep rootfs /proc/mounts
rootfs / rootfs rw 0 0

When dracut switches the system to the real root filesystem, you can no longer see rootfs in /proc/mounts. dracut can use either switch_root or systemd to do this. Both of these follow the same sequence of operations, which are advised in the linked kernel doc.

In some other posts, people can see rootfs in /proc/mounts after switching out of the initramfs. For example on Debian 7: 'How can I find out about "rootfs"'. I think this must be because the kernel changed how it shows /proc/mounts, at some point between the kernel version in Debian 7 and my current kernel v4.17. From further searches, I think rootfs is shown on Ubuntu 14.04, but is not shown on Ubuntu 16.04 with Ubuntu kernel 4.4.0-28-generic.

Even if I don't use an initramfs, and have the kernel mount the root filesystem instead, I cannot see rootfs in /proc/mounts. This makes sense as the kernel code also seems to follow the same sequence of operations.

The operation which hides rootfs is chroot.

switch_root:/# cd /sysroot
switch_root:/sysroot# mount --bind /proc proc
switch_root:/sysroot# grep rootfs proc/mounts
rootfs / rootfs rw 0 0

switch_root:/sysroot# chroot .
sh-4.4# cat proc/mounts
/dev/sda3 / ext4 ro,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0

3. Can we prove that rootfs is still mounted?

Notoriously, a simple chroot can be escaped from when you are running as a privileged user. If switch_root did nothing more than chroot, we could reverse it and see the rootfs again.

sh-4.4# python3
...
>>> import os
>>> os.system('mount --bind / /mnt')
>>> os.system('cat proc/mounts')
/dev/sda3 / ext4 ro,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
/dev/sda3 /mnt ext4 ro,relatime 0 0
>>> os.chroot('/mnt')
>>>
>>> # now the root, "/", is the old "/mnt"...
>>> # but the current directory, ".", is outside the root :-)
>>>
>>> os.system('cat proc/mounts')
/dev/sda3 / ext4 ro,relatime 0 0
>>> os.chdir('..')
>>> os.system('bash')
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
bash-4.4# chroot .
sh-4.4# grep rootfs proc/mounts
rootfs / rootfs rw 0 0

However, the full switch_root sequence can not be reversed by this technique. The full sequence does

  1. Change the current working directory (as in /proc/self/cwd), to the mount point of the new filesystem:

    cd /newmount
    
  2. Move the new filesystem, i.e. change its mount point, so that it sits directly on top of the root directory.

    mount --move . /
    
  3. Change the current root directory (as in /proc/self/root) to match the current working directory.

    chroot .
    

In the chroot escape above, we were able to traverse from the root directory of the ext4 filesystem back to rootfs using .., because the ext4 filesystem was mounted on a subdirectory of the rootfs. The escape method does not work when the ext4 filesystem is mounted on the root directory of the rootfs.

I was able to find the rootfs using a different method. (At least one important kernel developer thinks of this as a bug in Linux).

http://archive.today/2018.07.22-161140/https://lore.kernel.org/lkml/20141007133339.GH7996@ZenIV.linux.org.uk/

/* CURSED.c - DO NOT RUN THIS PROGRAM INSIDE YOUR MAIN MOUNT NAMESPACE */

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>     /* open() */
#include <sys/mount.h>
#include <sched.h>     /* setns() */
#include <sys/statfs.h>

int main() {
        int fd = open("/proc/self/ns/mnt", O_RDONLY);

        /* "umount -l /" - lazy unmount everything we can see */
        umount2("/", MNT_DETACH);

        /* reset root, by re-entering our mount namespace */
        setns(fd, CLONE_NEWNS);

        /* "stat -f /" - inspect the root */
        struct statfs fs;
        statfs("/", &fs);
}

Tested on Linux 4.17.3-200.fc28.x86_64:

$ make CURSED
cc CURSED.c -o CURSED
$ sudo unshare -m strace ./CURSED
...
openat(AT_FDCWD, "/proc/self/ns/mnt", O_RDONLY) = 3
umount2("/", MNT_DETACH)                = 0
setns(3, CLONE_NEWNS)                   = 0
statfs("/", {f_type=RAMFS_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID}) = 0
                    ^
                    ^ result: rootfs uses ramfs code on this system

(I also confirmed that this filesystem is empty as expected, and writeable).

Related Question