Filesystems – Relationship of Inodes, LBA, Logical Volumes, Blocks, and Sectors

filesystems

I'm a bit embarrassed to ask this question but I'd like to see a diagram that shows how the following things are related. It would be nice if the diagram also included any transforms required to map between the various layers as well.

As I understand it, I believe they're related in the following way, but I'm not sure that my understanding is 100% accurate.

                           .-----------------.
                           |      inode      |
                           '-----------------'
                                    |
                           .-----------------.
                           |      EXT4       |
                           '-----------------'
                                    |
                         .---------------------.
                         | logical volume (LV) | --- part of LVM
                         '---------------------'
                                    |
                          .-------------------.
                          | volume group (VG) |  --- part of LVM
                          '-------------------'
                                    |
                            .---------------.
                            | /dev/<device> |
                            '---------------'
                                    |
                   .--------------------------------.
                   | Logical Block Addressing (LBA) |
                   '--------------------------------'
                                    |
                           .-----------------.
                           | blocks/sectors  |
                           '-----------------'
                                    |
                                   HDD     
                                _.-----._  
                              .-         -.
                              |-_       _-|
                              |  ~-----~  |
                              |           |
                              `._       _.'
                                 "-----"   

References

Best Answer

way tl;dr

Your diagram is essentially correct.

/dev/<device> files

I think the most basic way to start answering your question is with what /dev/<device> files are. Say you have a hard disk. This hard disk has an MBR-based partition table, and it has two partitions, one formatted ext4 with some files on it, and the other set up for LVM. Note that this answer talks about on-the-fly device file creation, which implies that you're using a Linux kernel. Things are a little different on other Unices.

When you plug this hard disk in (or when the system detects it at boot-time) a device file will be created in the /dev directory - generally called either /dev/sd* or /dev/hd* (depending on what controller is used to connect the drive) - the * is a letter. Bytes on the device file are essentially mapped linearly to bytes on the physical disk: if you use a tool to write to the beginning of the device file, that data will also be written to the physical beginning of the physical disk.

Now, the system also understands partition tables like MBRs and GPTs. Once the initial device file has been created, it will be read to determine if it has a partition table. If it does, device files representing these partitions will be created. So assuming that the original device file was called /dev/sda, a device file called /dev/sda1 will be created (representing the first, ext4 formatted partition), as well as a /dev/sda2 device (representing the second LVM partition). These are mapped linearly to their respective partitions in the same way as the entire drive - that is, if you use a tool to (for example) write to the beginning of /dev/sda2, the data written will be physically written to the beginning of the second partition, which is actually the middle of the whole disk, because that's where the second partition starts.

Blocks and sectors

This is a convenient time to talk about blocks and sectors: these are just measurements of space on a physical disk, nothing more (at least if I understand correctly). A sector is a physical region on a hard drive; it's typically 512 bytes - 4 KB on newer hard drives. A block is also a unit of measurement, it's almost always 8 KB. When someone talks about reading and writing blocks, that just means that instead of reading each byte of data individually, they read and write data in chunks of 8 KB.

Filesystems and inodes

Next up, filesystems and inodes. A filesystem is a fairly simple concept: at the beginning of the region in which the filesystem resides (this region is usually a partition), there's a bunch of information on the filesystem. This header (also referred to as the superblock, I believe) is first used to determine which filesystem driver should be used to read the filesystem, and then it's used by the chosen filesystem driver to read files. This is a simplification, of course, but it basically stores two things (which may or may not be stored as two distinct data structures on disk, depending on fs type): the directory tree and a list of inodes. The directory tree is what you see when you do an ls or a tree. The directory tree states which files and directories are the children of which other directories. The file/directory parent-child relationship forms the UNIX directory tree as we know it.

But the directory tree only includes names. Those names are additionally associated with inode numbers. An inode number contains information like where the pieces of a file are physically stored on disk. An inode by itself is simply "a file" with no name; an inode is associated with a name via the directory tree. See also What is a Superblock, Inode, Dentry and a File?

So far, we have the following explanation: /dev/sd* files map to hard drives, /dev/sd*# files map to partition number # on /dev/sd*. A filesystem is a data structure on disk that keeps track of a directory tree; it is generally kept in a partition (/dev/sd*#). A filesystem contains inodes; inodes are numbers that represent files, along with data associated with those files (except for their name and position in the directory tree).

It's worth noting that filesystems generally keep track of data in blocks. Usually, the directory tree and inode list is stored in blocks, not in bytes, and inodes point to blocks on disk, not bytes. (This can cause problems where files typically waste a half a block of space, because the filesystem allocated an entire block but didn't need to use that entire block for the last part of the file.)

The device mapper

The final piece of the puzzle is a very important module in the Linux kernel called the device mapper (load it with modprobe dm). The device mapper basically lets you create another device file in the /dev/mapper directory. That device file then is mapped to another source of data, possibly getting transformed in the process. The simplest example is reading a portion of a file.

Say you have a full-disk image, complete with the partition table. You need to read the data off one of the partitions in the image, but you can't get to just that partition, since it's a full-disk image, instead of a single-partition image. The solution is to find where in the image your partition is, and then create a new device file mapping to that portion of the disk image. Here's a diagram:

.-------------------.
|  /dev/mapper/foo  | <- This is the device file created with the device mapper
.___________________.
\                   /
 \                 /
  \               /   <- This is a small section of the image being mapped to
   \             /         the new device file
    \           /
     \         /
 .------------------.
 |  diskimage.img   | <- This is the full-disk image. It's a regular file.
 .__________________.     Notice how the mapping goes to _part_ of the file.

Another way to think of it is like a transformation pipeline (this is the more accurate metaphor for what is happening internally in the kernel). Imagine a conveyor belt. A request - a read, a write, etc. - starts at one end of the conveyor belt, on a device file created with the device mapper. The request then travels through the device mapper transformation to the source file. In the above example, this source file is a regular file, diskimage.img. Here's the diagram:

Read operation goes onto
device mapper conveyor belt

read()                                      The device mapper transforms the read         The modified read request finally
  \                                         request by moving the requested region        reaches the source file, and the data
   \         Beginning of conveyor belt     to read forward by some number of bytes.      is retrieved from the filesystem.
    \     
     \       .-------------------.          .--------------------------.                  .------------------------.
      \      |  /dev/mapper/foo  |          |   Transformation logic   |                  | /path/to/diskimage.img |
       \     .___________________.          .___+_____+_____+_____+____.                  .________________________.
        \-->                                             
             ---------------------------------------------------------------------------------------------------------------
             o          o          o          o          o          o          o          o          o          o          o

Notice how in the diagram, the transformation logic that's been hooked up with the device mapper has little tools (+s) to manipulate the read request as it moves by on the conveyor belt.

Now, I don't particularly feel like copying that diagram and modifying it for LVM, but basically, the transformation part can be anything - not just shifting the byte range forward. This is how LVM works: an LVM Physical Extent is the part of LVM that sits on disk and keeps track of where data is. Think of it like the filesystem of LVM. In the conveyor belt metaphor, a Physical Extent is one of the source files, and the transformation is LVM doing its thing, mapping a request on a Logical Volume (which is the leftmost item on the conveyor belt) to the physical data on disk. Speaking of which...

I'm a little rusty on my LVM concepts, but IIRC, a Volume Group is essentially like a disk in LVM. Again, IIRC, RAID levels, etc. are managed per Volume Group. A Logical Volume, then, is just like a partition, and Logical Volumes are what actually have device files representing them. You put filesystems and stuff on Logical Volumes.

The cool thing about the device mapper is that logic built with it can be inserted arbitrarily into the data stack - all you have to do is change the device name that you're reading. This is how encrypted partitions work (not encryption schemes that work at the file level - those use FUSE), and this is how LVM works. I can't think of any other examples at the moment, but trust me, the device mapper is pretty badass.

Logical Block Addressing

I've never heard of this, so I can't offer any information on it. Hopefully someone will come by and edit this answer.

Related Question