How to `du` only the space used up by files that are not hardlinked elsewhere

disk-usagehard link

Using rsync --link-dest for space-saving snapshots, how can I figure out how much space I actually saved? Or more general:

How to figure out how much space a directory uses considering only files that are not hardlinked elsewhere outside the directory structure? Asked differently: How much space would actually be freed after a deletion of that directory? (du -hs would lie. The space required for the hardlinks themselves may be included)

Best Answer

Assuming there aren't internal hardlinks (that is, every file with more than 1 hardlink is linked from outside the tree), you can do:

find . -links -2 -print0 | du -c --files0-from=-

EDIT And here is what I sketched in the comment, applied. Only without du; kudos to @StephaneChazelas for noticing du is not necessary. Explanation at the end.

( find . -type d -printf '%k + ' ; \
  find . \! -type d -printf '%n\t%i\t%k\n' | \
    sort | uniq -c                         | \
    awk '$1 >= $2 { print $4 " +\\" }' ; \
  echo 0 ) | bc

What we do is to create a string with the disk usage (in KB) of every relevant file, separated by plus signs. Then we feed that big addition to bc.

The first find invocation does that for directories.

The second find prints link count, inode, and disk usage. We pass that list through sort | uniq -c to get a list of (number of appearances in the tree, link count, inode, disk usage).

We pass that list through awk, and, if the first field (# of appearances) is greater than or equal the second (# of hardlinks), meaning there aren't links to this file from outside the tree, then print the fourth field (disk usage) with a plus sign and a backslash attached.

Finally we output a 0, so the formula is syntactically correct (it would en in + otherwise) and pass it to bc. Phew.

(But I would use the simpler first method, if it gives a good enough answer.)

Related Question