Ubuntu – Why is disk usage greater than the size of all files on it

ext4hard drivepartitioning

I have a 3TB HDD. In the properties screen of the HDD, it says that I have used 471.4GB, but when I select all the files in nautilus, it says that 321.0GB is selected. If I only have 321.0GB of files in the HDD, why is it using 471.4GB?

The HDD's partitioning is using GUID and file system being used in EXT4. When I select the HDD using Disk Utility app, I get a warning saying:

WARNING: The partition is misaligned by 3072 bytes.
This may result in very poor performance.  Repartitioning is suggested.

Has that got anything to do with the missing 150.4GB?

Best Answer

Files on disk have two sizes: the "apparent size" and the "size on disk". Several reasons can cause a large discrepancy:

  • A large number of files will result in a large amount of overhead, because of internal fragmentation. E.g. Ext4 has a 4KiB default block size; files smaller than that will consume always 4KiB, and sizes above will be "rounded" to this block alignment.
  • Directories are also files and the same rule applies for them as well. Moreover, if you would create a large amount of files in a directory, and remove them again later, the usage on the directory file can't be reclaimed (recreating the directory helps).
  • Sparse files are special files, that appear to be large, but aren't 'consuming' the space. This is common in virtualization for virtual disk images; they will appear large, but the 'real' size can be a lot smaller. A lot of utilities (and file managers) are incapable of showing the actual disk usage.
  • The use of hard links. The contents of a file can exist on disk while multiple references are pointing to it. Some file managers may account the size for every reference.

I would suggest to use a disk usage tool known to be capable of listing both sizes to see if this is the issue. Try ncdu in a terminal and use a to toggle between actual and disk usage.

A short demo on internal fragmentation due to a 4KiB block size filesystem using du:

$ sudo tune2fs -l /dev/path-to-device | grep "Block size"
Block size:               4096

$ echo blaataaap > myfile                      # creates a 10-byte file

$ du --block-size=1 myfile                     # prints the usage on disk (filesystem)
4096   myfile

$ du --apparent-size --block-size=1 myfile     # prints the apparent size, i.e.
10     myfile                                  # content length when seeking

$ ls -al
-rw-rw-r-- 1 gert gert 10 Jan 1 23:24 myfile   # ls uses apparent sizes

This means that this 10-byte file is 4086 bytes bigger on disk than it would appear in a listing and is suffering from internal fragmentation.

A short demo on hard links and disk usage shown wrong when listing files (ls in this case):

$ dd if=/dev/zero of=1MBfile bs=1M count=1 # create a 1MB file
$ ln 1MBfile a_hard_link                   # create a hard link to it

$ ls -alht                                 # ls will report 2MB
total 2.1M
drwxrwxr-x  2 gert gert 4.0K Jan  2 11:21 .
-rw-rw-r--  2 gert gert 1.0M Jan  2 11:21 1MBfile
-rw-rw-r--  2 gert gert 1.0M Jan  2 11:21 a_hard_link

$ du -B 1024 .                             # du reports 1028K total for directory
1028    .

$ du -B 1024 a_hard_link                   # and 1024K for each file individually
1024    a_hard_link
$ du -B 1024 1MBfile
1024    1MBfile