I'm curious to understand how du
counts blocks used in a file.
Scenario
dd bs=1 seek=2GiB if=/dev/null of=big
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2.3324e-05 s, 0.0 kB/s
ls -lh big
-rw-r--r-- 1 roaima roaima 2.0G May 19 15:55 big
du -h big
0 big
I've always accepted that it will give me different answers to ls
, and that's fine because they're measuring different things.
Now I have a cloud based filesystem where I get charged not only for storage but also each time I download data, so I need to minimise the amount of data accessed by general housekeeping activities such as "how much disk space is used in this tree?"
I'm not aware of a library/system call to tell me the number of used blocks, although there could easily be one. I don't believe du
reads its way through every file it's considering because that doesn't differentiate between a file filled with zeros and one that's truly sparse.
So, how does du
count blocks used?
Best Answer
du
usesstat(2)
to find the number of blocks used by a file. If you runstat big
you should see that the number of blocks matches the number given bydu
.You can force
du
to count bytes using the-b
option; then its output matchesls
's.In both cases it uses
stat(2)
(or rather,fstatat(2)
at least in the version I have):The difference in processing is visible in
du.c
.