Why size reporting for directories is different than other files

filesfilesystemsls

I was wondering why an empty directory occupied 4096 bytes of space and I have seen this question. It is stated that space is allocated in blocks and hence, the size of a new directory is 4096 bytes.

However I am pretty sure that allocation for "normal" files are done in blocks as well. At least it is like that in Windows filesystems and I am guessing that it must be at least similar in ext*.

Now as far as I understood, size listing for other types of files, such as files, symbolic links etc. are done in terms of real size. Because when I create an empty file, I see a 0 as the size. When a type a few characters, I see the < number of characters > bytes as the size etc.

So my question is, although the allocation for other files are done in blocks too, why the policy for reporting the size of a directory and a file differs?

Clarification

I thought the question was clear enough but apparently is wasn't. I will try to clarify the question here.

1) What I think a directory is:

I will try to explain what I think a directory is by the following example. After reading, if it is wrong, please notify me.

Let's say that we have a directory named mydir. And let's say that it contains 3 files, which are: f0, f1 and f2. Let's assume that each file is 1 byte long.

Now, what is mydir? It is a pointer to an inode which contains the following: String "f0" and the inode number which f0 points to. String "f1" and the inode number which f1 points to. And string "f2" and the inode number which f2 points to. (At least this is what I think a directory is. Please correct me if I am wrong.)

Now there may be two methods for calculating the size of a directory:

1) Calculating the size of the inode which mydir points to.

2) Summing the sizes of the inodes which contents of mydir points to.

Although 1 is more counter intuitive, let's assume that it is the method that is being used. (For this question, which method is the method that is actually being used does not matter.) Then, the size of mydir is calculated as the following:

2 + 2 + 2 + 3 * <space_required_to_store_an_inode_number>

2's are because each filename is 2 bytes long.

2) The question:

Now the question: Assuming what I think a directory is correct, the reported size for mydir should be much much less than 4096, no matter method 1 or method 2 is being used to calculate its size.

Now, you will say that the reason it is reported 4096 bytes is because the allocation is done in blocks. Hence, the reported size that big.

But then I will say: Allocation is done in blocks for regular files as well. (See thrig's answer for reference) But nevertheless, their sizes are reported in real sizes. (1 byte if they contain 1 character, 2 bytes if they contain 2 characters etc.)

So my question is, why is the policy for reporting sizes of directories is such different than reporting sizes of regular files?

More clarification:

We know that the initial number of blocks allocated for a non-empty file and for an empty directory is both 8 blocks. (See thrig's answer) So even though allocation is made in the same number of blocks for both regular files and directories, why the reported size for a directory is much bigger?

Best Answer

I think the reason you're confused is because you don't know what a directory is. To do this lets take a step back and examine how Unix filesystems work.

The Unix filesystem has several separate notions for addressing data on disk:

  • data blocks are a group of blocks on a disk which have the contents of a file.
  • inodes are special blocks on a filesystem, with a numerical address unique within that filesystem, which contains metadata about a file such as:
    • permissions
    • access / modification times
    • size
    • pointers to the data blocks (could be a list of blocks, extents, etc)
  • filenames are hierarchical locations on a filesystem root that are mapped to inodes.

In other words, a "file" is actually composed of three different things:

  1. a PATH in the filesystem
  2. an inode with metadata
  3. data blocks pointed to by the inode

Most of the time, users imagine a file to be synonymous to "the entity associated with the filename" - it's only when you're dealing with low-level entities or the file/socket API that you think of inodes or data blocks. Directories are one of those low-level entities.

You might think that a directory is a file that contains a bunch of other files. That's only half-correct. A directory is a file that maps filenames to inode numbers. It doesn't "contain" files, but pointers to filenames. Think of it like a text file that contains entries like this:

  • . - inode 1234
  • .. - inode 200
  • Documents - inode 2008
  • README.txt - inode 2009

The entries above are called directory entries. They are basically mappings from filenames to inode numbers. A directory is a special file that contains directory entries.

That's a simplification of course, but it explains the basic idea and other directory weirdness.

  • Why don't directories know their own size?
    • Because they only contain pointers to other stuff, you have to iterate over their contents to find the size
  • Why aren't directories ever empty?
    • Because they contain at least the . and .. entries. Thus, a proper directory will be at least as small as the smallest filesize that can contain those entries. In most filesystems, 4096 bytes is the smallest.
  • Why is it that you need write permission on the parent directory when renaming a file?
    • Because you're not just changing the file, you're changing the directory entry pointing to the file.
  • Why does ls show a weird number of "links" to a directory?
    • a directory can be referenced (linked to) by itself, its parent, its children.
  • What does a hard link do and how does it differ from a symlink?
    • a hard link adds a directory entry pointing to the same inode number. Because it points to an inode number, it can only point to files in the same filesystem (inodes are local to a filesystem)
    • a symlink adds a new inode which points to a separate filename. Because it refers to a filename it can point to arbitrary files in the tree.

But wait! Weird things are happening!

ls -ld somedirectory always shows the filesize to be 4096, whereas ls -l somefile shows the actual size of a file. Why?

Point of confusion 1: when we say "size" we can be referring to two things:

  • filesize, which is a number stored in the inode; and
  • allocated size, which is the number of blocks associated with the inode times the size of each block.

In general, these are not the same number. Try running stat on a regular file and you'll see this difference.

When a filesystem creates a non-empty file, it usually eagerly allocates data blocks in groups. This is because files have a tendency to grow and shrink arbitrarily fast. If the filesystem only allocated as many data blocks as needed to represent the file, growing / shrinking would be slower, and fragmentation would be a serious concern. So in practice, filesystems don't have to keep reallocating space for small changes. This means that there may be a lot of space on disk that is "claimed" by files but completely unused.

What does the filesystem do with all this unused space? Nothing. Until it feels like it needs to. If your filesystem optimizer tool - maybe an online optimizer running in the background, maybe part of your fsck, maybe built-in to your filesystem itself - feels like it, it may reassign the data blocks of your files - moving used blocks, freeing unused blocks, etc.

So now we come to the difference between regular files and directories: because directories form the "backbone" of your filesystem, you expect that they may need to be accessed or modified frequently and should thus be optimized. And so you don't want them fragmented at all. When directories are created, they always max out all their data blocks in size, even when they only have so many directory entries. This is okay for directories, because, unlike files, directories are typically limited in size and growth rate.

The 4096 reported size of directories is the "filesize" number stored in the directory inode, not the number of entries in the directory. It isn't a fixed number - it's the maximum bytes that will fit into the allocated number of blocks for the directory. Typically, this is 512 bytes/block times 8 blocks allocated for a file with any contents - incidentally, for directories, the filesize and the allocated size are the same. Because it's allocated as a single group, the filesystem optimizer won't move its blocks around.

As the directory grows, more data blocks are assigned to it, and it will also max out those blocks by adjusting the filesize accordingly.

And so ls and stat will show the filesize field of the directory's inode, which is set to the size of the data blocks assigned to it.