Disk Usage – How ‘du’ Counts Blocks Used

disk-usage

I'm curious to understand how du counts blocks used in a file.

Scenario

dd bs=1 seek=2GiB if=/dev/null of=big
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2.3324e-05 s, 0.0 kB/s

ls -lh big
-rw-r--r-- 1 roaima roaima 2.0G May 19 15:55 big

du -h big
0       big

I've always accepted that it will give me different answers to ls, and that's fine because they're measuring different things.

Now I have a cloud based filesystem where I get charged not only for storage but also each time I download data, so I need to minimise the amount of data accessed by general housekeeping activities such as "how much disk space is used in this tree?"

I'm not aware of a library/system call to tell me the number of used blocks, although there could easily be one. I don't believe du reads its way through every file it's considering because that doesn't differentiate between a file filled with zeros and one that's truly sparse.

So, how does du count blocks used?

Best Answer

du uses stat(2) to find the number of blocks used by a file. If you run stat big you should see that the number of blocks matches the number given by du.

You can force du to count bytes using the -b option; then its output matches ls's.

In both cases it uses stat(2) (or rather, fstatat(2) at least in the version I have):

$ strace du big|&grep big
execve("/usr/bin/du", ["du", "big"], [/* 57 vars */]) = 0
newfstatat(AT_FDCWD, "big", {st_mode=S_IFREG|0644, st_size=2147483648, ...}, AT_SYMLINK_NOFOLLOW) = 0
write(1, "0\tbig\n", 60 big

The difference in processing is visible in du.c.

Related Solutions

What does “1K-blocks” column mean in the output of `df`

The 1K-blocks header is the total space available, measured in 1kB units. Historically, and according to the POSIX standard, df should report the space in units of 512-byte blocks; you can get that output by doing:

POSIXLY_CORRECT=1 df

The "block" here is simply the unit used for the amounts, it is not related to the file system blocksize (or cluster size, if appropriate for the file system involved). For ext2/ext3/ext4 filesystems you can display the file system info with:

sudo dumpe2fs -h /dev/sda7

(replace /dev/sda7 with the file system device).

Note that if you add The Used and Available columns you don't get the total size shown; this is because of blocks that are reserved for root as shown in the output of dumpe2fs as Reserved block count:. Those blocks can only be used by root, the idea behind this is that if a user fills up the filesystem, critical stuff still works and root can fix the problem.

Does ‘du’ command count the size of unaccessible folders

Simply not. Look this example

du -shc *
4,0K    AUDIO_TS
4,4G    VIDEO_TS
4,4G    total
chmod 000 * #don't use this in wrong dir!
du -shc *
du: cannot read directory 'VIDEO_TS': Permission denied
du: cannot read directory 'AUDIO_TS': Permission denied
4,0K    AUDIO_TS
4,0K    VIDEO_TS
8,0K    total

Best Answer

Related Solutions

What does “1K-blocks” column mean in the output of `df`

Does ‘du’ command count the size of unaccessible folders

Related Question