Disk Usage – How ‘du’ Counts Blocks Used

disk-usage

I'm curious to understand how du counts blocks used in a file.

Scenario

dd bs=1 seek=2GiB if=/dev/null of=big
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2.3324e-05 s, 0.0 kB/s

ls -lh big
-rw-r--r-- 1 roaima roaima 2.0G May 19 15:55 big

du -h big
0       big

I've always accepted that it will give me different answers to ls, and that's fine because they're measuring different things.

Now I have a cloud based filesystem where I get charged not only for storage but also each time I download data, so I need to minimise the amount of data accessed by general housekeeping activities such as "how much disk space is used in this tree?"

I'm not aware of a library/system call to tell me the number of used blocks, although there could easily be one. I don't believe du reads its way through every file it's considering because that doesn't differentiate between a file filled with zeros and one that's truly sparse.

So, how does du count blocks used?

Best Answer

du uses stat(2) to find the number of blocks used by a file. If you run stat big you should see that the number of blocks matches the number given by du.

You can force du to count bytes using the -b option; then its output matches ls's.

In both cases it uses stat(2) (or rather, fstatat(2) at least in the version I have):

$ strace du big|&grep big
execve("/usr/bin/du", ["du", "big"], [/* 57 vars */]) = 0
newfstatat(AT_FDCWD, "big", {st_mode=S_IFREG|0644, st_size=2147483648, ...}, AT_SYMLINK_NOFOLLOW) = 0
write(1, "0\tbig\n", 60 big

The difference in processing is visible in du.c.

Related Question