Ubuntu – Why do du -sh and the file manager disagree

disk-usagefilemanager

I would like to know why the directory sizes I get when I execute du -sh disagree from the ones the file manager shows. What do they do differently and how big is my data really? I am not so much interested in the size it takes on the disk (because of blocks and stuff), I just want to know how big the actual data is.

Best Answer

Short answer: The file manager calculates with units based on 1000, du calculates per default with units based on 1024. Because of this, the file manager views a file of 1024 bytes as "1.024 kB", while du views it as "1.000 kiB". This (literally) multiplies if you think of larger files, for example Megabyte (1000 * 1000) vs. Mibibyte (1024 * 1024) or Gigabyte (1000 * 1000 * 1000) vs. Gibibyte (1024 * 1024 * 1024).

Long answer: The difference stems from the different ways computers and humans count. Most current human societies do their math with the decimal system, on the base of 10. Not all cultures in history did, that's why, for example, we divide the day into 24 hours. But with most things, we use 10, or multiples of 10, or 10 to the n-th power. This is evident in the International system of Units (SI), which uses prefixes to mark 10 ^ 3 = 1000. 1000 gram equals 1 *kilo*gram, the 1000th part of 1 meter equals 1 *milli*meter and so fort. Technical, 1000 kilogram would be 1 "megagram", but traditionally, we use a different word for it, "ton". Still, it is based on 1000.

On the other hand, computers calculate not based on 10, but based on 2 - on/off, power/no power, true/false. Therefore, computers use multiplies and powers of 2 instead of multiplies of 10: 2, 4, 8, 16, 32, 64 and so fort. The power of 2 which is nearest to 1000 is 1024. Because of this, "1 kilobyte" was defined originally not as "1000 byte" as most other units would have been, but as "1024 byte". In the same way, "1 megabyte" originally was "1024 * 1024 byte", "1 gigabyte" originally was "1024 * 1024 * 1024 byte" and so forth.

"Back then", most people who used computers knew about this, and in the order of scales that were used in those days, it didn't make munch of a difference. Whether a file is 1000 bytes "large" or 1024 bytes, doesn't really matter in most cases. But time went on, computers became omnipresent, and the numbers became larger. Today, many computer users don't know about 1000 vs. 1024, or they don't care. It doesn't make too much sense to explain to "Joe Everbody", that with almost everything, "kilo" means "1000 of it", but with computers, it's different. Additionally, the difference starts to get significant. If you compare a "Gigabyte" based on 1000 to a "Gigabyte" based on 1024, the difference is roughly 10%. With "Terabyte" and larger, the difference is an even larger fraction.

Therefore, over the last years many countries decided to differentiate between those two calculation systems. The classical prefixes kilo-, mega-, giga-, tera- etc. are today almost always used based on 1000. So, a file with 1024 bytes would no longer be "1.000 kilobyte", but "1.024 kilobyte". The units based on 1024 got new prefixes, with the first syllable of the "old one" followed by "bi": Kilo -> kibi, mega -> mibi, tera -> tebi and so forth. The symbols are KiB, MiB, TiB and so forth.

Nautilus, Ubuntu's file manager, calculates based on 1000. So it shows your file sizes in kilobytes, megabytes etc. du on the other hand still calculates based on 1024. So with du you see your file sizes in kibibytes, mebibytes etc. And as said above, once we are in the tera- vs. tebi- range and up, it starts to show ;)

du offers the --si switch. It works like -h, but calculates with SI units instead of based on 1024. So

du --si -s my_files/

would give you a size in KB, MB, GB etc., while

du -sh my_files/

would give you a size in KiB, MiB, GiB etc.

Related Solutions

Disk Usage – What is Taking Up Disk Space Besides Filesystem

If you use Disk Analyzer as a normal user there can be some files that you can't access or see. You can try to start it with superuser privileges. Open a terminal, or press ALT+F2 and type:

gksudo baobab

Baobab is the geeky name of the Disk Analyzer if you are wondering. Maybe it can now show you where those missing megabytes are.

Ubuntu – Find cluster size

Top post edit:
Pre-emptively find size:

temp = int(size/block)  
if mod(size/block) != 0:  
    temp += 1
temp = temp*block
print temp

to know how many blocks a file has on disk:

ls -s

where block-size is the partition block size
and size on disk is block-size * number of blocks

Explanation about block size terminology differences
sudo fdisk -l /dev/sda
where /dev/sda is the hard disk in question

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c1f6b

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63      498014      248976   83  Linux
/dev/sda2          498015   976768064   488135025    5  Extended
/dev/sda5          498078   976768064   488134993+  83  Linux

This tells you several things. Somebody else already said it better so blockquote:

The problem with this is that there are four distinct units that you must be keeping in mind. To make things even worse, two of these units bear the same name. These are the different units:

Hardware block size, "sector size"

Filesystem block size, "block size"

Kernel buffer cache block size, "block size"

Partition table block size, "cylinder size"

To differentiate between the filesystem block size and the buffer cache block size, I will follow FAT terminology and use "cluster size" for the filesystem block size.

The sector size is the units that the hardware deals with. This ranges between different hardware types, but most PC-style hardware (floppies, IDE disks, etc.) use 512 byte sectors.

The cluster size is the allocation unit that the filesystem uses, and is what causes fragmentation - I'm sure you know about that. On a moderately sized ext3 filesystem, this is usually 4096 bytes, but you can check that with dumpe2fs. Remember that these are also usually called "blocks", only that I refer to them as clusters here.

The cluster size is what gets returned in st_blksize in the stat buffer, in order for programs to be able to calculate the actual disk usage of a file.

The block size is the size of the buffers that the kernel uses internally when it caches sectors that have been read from storage devices (hence the name "block device"). Since this is the most primitive form of storage in the kernel, all filesystem cluster sizes must be multiples of this. This block size is also what is almost always referred to by userspace programs. For example, when you run "du" without the -h or -H options, it will return how many of these blocks a file takes up. df will also report sizes in these blocks, the "Blocks" column in the fdisk -l output is of this type, and so on. It is what is most commonly referred to as a "block". Two disk sectors fit into each block.

The cylinder size is only used in the partition table and by the BIOS (and the BIOS isn't used by Linux).

"df" only operates on filesystems, so, no, it can't be used without a filesystem - without a filesystem, the data that it would return doesn't exist. "du" operates on individual files.

from here.

Best Answer

Related Solutions

Disk Usage – What is Taking Up Disk Space Besides Filesystem

Ubuntu – Find cluster size

Related Question