So if I want my rootfs to be in RAM, I need to set CONFIG_INITRAMFS_SOURCE to point to my rootfs (in cpio format presumably).
That's one way to do it, yes, but it is not the only way.
If you have a bootloader that can be configured to load the kernel and the initramfs as separate files, you don't need to use CONFIG_INITRAMFS_SOURCE
while building the kernel. It is enough to have CONFIG_BLK_DEV_INITRD
set in kernel configuration. (Before initramfs there was an older version of the technique named initrd
, and the old name still appears at some places.) The bootloader will load the initramfs file, and then fill in some information about its memory location and size into a data structure in a specific location of the already-loaded kernel image. The kernel has built-in routines that will use that information to find the initramfs in the system RAM and uncompress it.
Having the initramfs as a separate file will allow you to modify the initramfs file more easily, and if your bootloader can accept input from the user, perhaps specify another initramfs file to be loaded instead of the regular one at boot time. (That's very handy if you try and create a customized initramfs and get some things wrong. Been there, done that.)
For a traditional BIOS-based x86 system, you'll find information about these details in (kernel source)/Documentation/x86/boot.txt. UEFI-based systems do it a bit differently (also described in the same file), and other architectures like ARM have their own sets of details about passing information from the bootloader to the kernel.
Furthermore, what if I want my rootfs to be on physical storage (like eMMC, flash drive, etc.) and not in RAM?
In regular non-embedded systems, the initramfs will usually only contain enough functionality to activate the essential sub-systems. In a regular PC, those would usually be the drivers for the keyboard, display and the driver for the storage controller for your root filesystem, plus any kernel modules and tools required to activate subsystems like LVM, disk encryption, and/or software RAID, if you use those features.
Once the essential sub-systems are active and the root filesystem is accessible, the initramfs will typically do a pivot_root(8)
operation to switch from initramfs to the real root filesystem. But an embedded system, or a specialized utility like DBAN, could package everything it needs into the initramfs and just never do the pivot_root
operation.
Usually, the scripts and/or tools within the initramfs will get the necessary information to locate the real root filesystem from the options on the kernel command line. But you don't have to do that: with a customized initramfs, you could do something like switching to a different root filesystem if a specific key or mouse button is held down at a specific time in the boot sequence.
With a complex storage configuration (e.g. encrypted LVM on top of a software RAID, on a system that uses redundant multipathed SAN storage), all the information needed to activate the root filesystem might not fit onto the kernel command line, so you could include the bigger pieces into initramfs.
Modern distributions usually use an initramfs generator to build a tailored initramfs for each installed kernel. Different distributions used to have their own initramfs generators: RedHat used mkinitrd
while Debian had update-initramfs
. But after the introduction of systemd
it looks like many distributions are standardizing on dracut
as an initramfs generator.
A modern initramfs file can be a concatenation of multiple .cpio
archives, and each part may or may not be compressed. A typical initramfs file on a modern x86_64 system might have an "early microcode update" file as a first component (usually just a single file in an uncompressed cpio archive, as the microcode file is typically encrypted and so not very compressible. After that comes the regular initramfs content, as a compressed .cpio
file.
To gain a deeper understanding of your system, I would encourage you to extract an initramfs file to a temporary directory and then examine its contents. On Debian, there is an unmkinitramfs(8)
tool that can be used to extract an initramfs file in a straightforward manner. On RedHat 7, you might need to use /usr/lib/dracut/skipcpio <initramfs file>
to skip the microcode update file, and then pipe the resulting output to gzcat
and onward to cpio -i -d
to extract the initramfs contents to the current working directory. Ubuntu might use lzcat
in place of gzcat
.
Best Answer
mount
may disagree with/proc/mounts
rootfs
in/proc/mounts
, but it is still mounted.1. On old systems,
mount
may disagree with/proc/mounts
man mount
says: "The programsmount
andumount
traditionally maintained a list of currently mounted filesystems in the file/etc/mtab
."The old approach does not really work for the root filesystem. The root filesystem may have been mounted by the kernel, not by
mount
. Therefore entries for/
in the/etc/mtab
may be quite contrived, and not necessarily in sync with the kernel's current list of mounts.I haven't checked for sure, but in practice I don't think any system that uses the old scheme will initialize
mtab
to show a line withrootfs
. (In theory, whethermount
showsrootfs
would depend on the software that first installed themtab
file).man mount
continues: "the real mtab file is still supported, but on current Linux systems it is better to make it a symlink to /proc/mounts instead, because a regular mtab file maintained in userspace cannot reliably work with namespaces, containers and other advanced Linux features."mtab is converted into a symlink in Debian 7, and in Ubuntu 15.04.
1.1 Sources
Debian report #494001 - "debian-installer: /etc/mtab must be a symlink to /proc/mounts with linux >= 2.6.26"
#494001 is resolved in sysvinit-2.88dsf-14. See the closing message, dated 14 Dec 2011. The change is included in Debian 7 "Wheezy", released on 4 May 2013. (It uses sysvinit-2.88dsf-41).
Ubuntu delayed this change until sysvinit_2.88dsf-53.2ubuntu1. That changelog page shows the change enters "vivid", which is the codename for Ubuntu 15.04.
2. Most of the time you won't see
rootfs
in/proc/mounts
, but it is still mountedAs of Linux v4.17, this kernel documentation is still up to date. rootfs is always present, and it can never be unmounted. But most of the time you cannot see it in /proc/mounts.
You can see rootfs if you boot into an initramfs shell. If your initramfs is
dracut
, as in Fedora Linux, you can do this by adding the optionrd.break
to the kernel command line. (E.g. inside the GRUB boot loader).When dracut switches the system to the real root filesystem, you can no longer see rootfs in /proc/mounts. dracut can use either
switch_root
orsystemd
to do this. Both of these follow the same sequence of operations, which are advised in the linked kernel doc.In some other posts, people can see rootfs in /proc/mounts after switching out of the initramfs. For example on Debian 7: 'How can I find out about "rootfs"'. I think this must be because the kernel changed how it shows /proc/mounts, at some point between the kernel version in Debian 7 and my current kernel v4.17. From further searches, I think rootfs is shown on Ubuntu 14.04, but is not shown on Ubuntu 16.04 with Ubuntu kernel 4.4.0-28-generic.
Even if I don't use an initramfs, and have the kernel mount the root filesystem instead, I cannot see rootfs in /proc/mounts. This makes sense as the kernel code also seems to follow the same sequence of operations.
The operation which hides rootfs is
chroot
.3. Can we prove that rootfs is still mounted?
Notoriously, a simple
chroot
can be escaped from when you are running as a privileged user. Ifswitch_root
did nothing more thanchroot
, we could reverse it and see the rootfs again.However, the full
switch_root
sequence can not be reversed by this technique. The full sequence doesChange the current working directory (as in
/proc/self/cwd
), to the mount point of the new filesystem:Move the new filesystem, i.e. change its mount point, so that it sits directly on top of the root directory.
Change the current root directory (as in
/proc/self/root
) to match the current working directory.In the chroot escape above, we were able to traverse from the root directory of the
ext4
filesystem back torootfs
using..
, because theext4
filesystem was mounted on a subdirectory of therootfs
. The escape method does not work when theext4
filesystem is mounted on the root directory of the rootfs.I was able to find the
rootfs
using a different method. (At least one important kernel developer thinks of this as a bug in Linux).http://archive.today/2018.07.22-161140/https://lore.kernel.org/lkml/20141007133339.GH7996@ZenIV.linux.org.uk/
Tested on Linux 4.17.3-200.fc28.x86_64:
(I also confirmed that this filesystem is empty as expected, and writeable).