Linux – How does Linux differentiate between real and unexisting (eg: device) files

devicesfilesystemslinux

This is a rather low-level question, and I understand that it might not be the best place to ask. But, it seemed more appropriate than any other SE site, so here goes.

I know that on the Linux filesystem, some files actually exist, for example: /usr/bin/bash is one that exists. However, (as far as I understand it), some also don't actually exist as such and are more virtual files, eg: /dev/sda, /proc/cpuinfo, etc. My questions are (they are two, but too closely related to be separate questions):

  • How does the Linux kernel work out whether these files are real (and therefore read them from the disk) or not when a read command (or such) is issued?
  • If the file isn't real: as an example, a read from /dev/random will return random data, and a read from /dev/null will return EOF. How does it work out what data to read from this virtual file (and therefore what to do when/if data written to the virtual file too) – is there some kind of map with pointers to separate read/write commands appropriate for each file, or even for the virtual directory itself? So, an entry for /dev/null could simply return an EOF.

Best Answer

So there are basically two different types of thing here:

  1. Normal filesystems, which hold files in directories with data and metadata, in the familiar manner (including soft links, hard links, and so on). These are often, but not always, backed by a block device for persistent storage (a tmpfs lives in RAM only, but is otherwise identical to a normal filesystem). The semantics of these are familiar; read, write, rename, and so forth, all work the way you expect them to.
  2. Virtual filesystems, of various kinds. /proc and /sys are examples here, as are FUSE custom filesystems like sshfs or ifuse. There's much more diversity in these, because really they just refer to a filesystem with semantics that are in some sense 'custom'. Thus, when you read from a file under /proc, you aren't actually accessing a specific piece of data that's been stored by something else writing it earlier, as under a normal filesystem. You're essentially doing a kernel call, requesting some information that's generated on-the-fly. And this code can do anything it likes, since it's just some function somewhere implementing read semantics. Thus, you have the weird behavior of files under /proc, like for instance pretending to be symlinks when they aren't really.

The key is that /dev is actually, usually, one of the first kind. It's normal in modern distributions to have /dev be something like a tmpfs, but in older systems, it was normal to have it be a plain directory on disk, without any special attributes. The key is that the files under /dev are device nodes, a type of special file similar to FIFOs or Unix sockets; a device node has a major and minor number, and reading or writing them is doing a call to a kernel driver, much like reading or writing a FIFO is calling the kernel to buffer your output in a pipe. This driver can do whatever it wants, but it usually touches hardware somehow, e.g. to access a hard disk or play sound in the speakers.

To answer the original questions:

  1. There are two questions relevant to whether the 'file exists' or not; these are whether the device node file literally exists, and whether the kernel code backing it is meaningful. The former is resolved just like anything on a normal filesystem. Modern systems use udev or something like it to watch for hardware events and automatically create and destroy the device nodes under /dev accordingly. But older systems, or light custom builds, can just have all their device nodes literally on the disk, created ahead of time. Meanwhile, when you read these files, you're doing a call to kernel code which is determined by the major and minor device numbers; if these aren't reasonable (for instance, you're trying to read a block device that doesn't exist), you'll just get some kind of I/O error.

  2. The way it works out what kernel code to call for which device file varies. For virtual filesystems like /proc, they implement their own read and write functions; the kernel just calls that code depending on which mount point it's in, and the filesystem implementation takes care of the rest. For device files, it's dispatched based on the major and minor device numbers.

Related Question