Linux – How does linux know where the rootfs is

linux-kernelroot-filesystem

I'm trying to understand how the linux kernel knows where the desired rootfs is on boot.

I read this document:

https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt

One part of interest says:

All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
extracted into rootfs when the kernel boots up … If rootfs does not contain an init program after the embedded cpio archive is extracted into it, the kernel will fall through to the older code to locate and mount a root partition

Our kernel is 4.X, but I'm guessing this still applies? This sounds like all kernels have an embedded "cpio" rootfs.

And indeed as we read on it says:

The 2.6 kernel build process always creates a gzipped cpio format initramfs archive and links it into the resulting kernel binary. By default, this archive is empty … The config option CONFIG_INITRAMFS_SOURCE … can be used to specify a source for the initramfs archive

This raises a few more questions:

So if I want my rootfs to be in RAM, I need to set CONFIG_INITRAMFS_SOURCE to point to my rootfs (in cpio format presumably).

But won't that mean my kernel and rootfs are now inseparable? What if I want to make a small tweak to the RootFS without rebuilding? What if I want my rootfs stored separate from the kernel? How do I tell the kernel the location of my rootfs?

Furthermore, what if I want my rootfs to be on physical storage (like eMMC, flash drive, etc.) and not in RAM?

It said earlier that:

If rootfs does not contain an init program after the embedded cpio archive is extracted into it, the kernel will fall through to the older code to locate and mount a root partition

But… how? How does it know where to locate the rootfs? If it's on eMMC I need to tell the kernel that somehow, right?

The bootloader I am using is U-boot. I checked U-boot environment variables to see if it was somehow passing the rootfs location to the kernel as a boot arg, but it doesn't seem to be the case…

Edit:

As pointed out in the comments, the location of the rootfs is passed to the kernel via boot arg. In my case, the u-boot is passing root=/dev/mmcblk0p4 rw as a boot arg to the kernel. So that answers one of my questions – you can pass the location to any decompressed rootfs as a boot arg.

I'm still not clear how, given some rootfs.tar.gz that is seperate from the kernel how to tell the kernel to untar that into RAM and use it as the rootfs. Maybe that's not possible and I just need to use CONFIG_INITRAMFS_SOURCE? At any rate, I'll read up on the 4.X docs.

Best Answer

So if I want my rootfs to be in RAM, I need to set CONFIG_INITRAMFS_SOURCE to point to my rootfs (in cpio format presumably).

That's one way to do it, yes, but it is not the only way.

If you have a bootloader that can be configured to load the kernel and the initramfs as separate files, you don't need to use CONFIG_INITRAMFS_SOURCE while building the kernel. It is enough to have CONFIG_BLK_DEV_INITRD set in kernel configuration. (Before initramfs there was an older version of the technique named initrd, and the old name still appears at some places.) The bootloader will load the initramfs file, and then fill in some information about its memory location and size into a data structure in a specific location of the already-loaded kernel image. The kernel has built-in routines that will use that information to find the initramfs in the system RAM and uncompress it.

Having the initramfs as a separate file will allow you to modify the initramfs file more easily, and if your bootloader can accept input from the user, perhaps specify another initramfs file to be loaded instead of the regular one at boot time. (That's very handy if you try and create a customized initramfs and get some things wrong. Been there, done that.)

For a traditional BIOS-based x86 system, you'll find information about these details in (kernel source)/Documentation/x86/boot.txt. UEFI-based systems do it a bit differently (also described in the same file), and other architectures like ARM have their own sets of details about passing information from the bootloader to the kernel.

Furthermore, what if I want my rootfs to be on physical storage (like eMMC, flash drive, etc.) and not in RAM?

In regular non-embedded systems, the initramfs will usually only contain enough functionality to activate the essential sub-systems. In a regular PC, those would usually be the drivers for the keyboard, display and the driver for the storage controller for your root filesystem, plus any kernel modules and tools required to activate subsystems like LVM, disk encryption, and/or software RAID, if you use those features.

Once the essential sub-systems are active and the root filesystem is accessible, the initramfs will typically do a pivot_root(8) operation to switch from initramfs to the real root filesystem. But an embedded system, or a specialized utility like DBAN, could package everything it needs into the initramfs and just never do the pivot_root operation.

Usually, the scripts and/or tools within the initramfs will get the necessary information to locate the real root filesystem from the options on the kernel command line. But you don't have to do that: with a customized initramfs, you could do something like switching to a different root filesystem if a specific key or mouse button is held down at a specific time in the boot sequence.

With a complex storage configuration (e.g. encrypted LVM on top of a software RAID, on a system that uses redundant multipathed SAN storage), all the information needed to activate the root filesystem might not fit onto the kernel command line, so you could include the bigger pieces into initramfs.

Modern distributions usually use an initramfs generator to build a tailored initramfs for each installed kernel. Different distributions used to have their own initramfs generators: RedHat used mkinitrd while Debian had update-initramfs. But after the introduction of systemd it looks like many distributions are standardizing on dracut as an initramfs generator.

A modern initramfs file can be a concatenation of multiple .cpio archives, and each part may or may not be compressed. A typical initramfs file on a modern x86_64 system might have an "early microcode update" file as a first component (usually just a single file in an uncompressed cpio archive, as the microcode file is typically encrypted and so not very compressible. After that comes the regular initramfs content, as a compressed .cpio file.

To gain a deeper understanding of your system, I would encourage you to extract an initramfs file to a temporary directory and then examine its contents. On Debian, there is an unmkinitramfs(8) tool that can be used to extract an initramfs file in a straightforward manner. On RedHat 7, you might need to use /usr/lib/dracut/skipcpio <initramfs file> to skip the microcode update file, and then pipe the resulting output to gzcat and onward to cpio -i -d to extract the initramfs contents to the current working directory. Ubuntu might use lzcat in place of gzcat.

Best Answer

Related Solutions

Linux – How to combine linux kernel and initrd without compiling

Linux – Where does kernel define the SD card naming index

Related Question