If you copy a file, it will duplicate the content. So if you modify the content of a single file, that has no effect on the other one.
If you make a hardlink, that will create a file pointing to the same content. So if you change the content of either of the files, the change will be seen on both.
You could do it by hand with GNU find
:
find snapshot-dir -type d -printf '1 %b\n' -o -printf '%n %b %i\n' |
awk '$1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}
END{print t*512}'
That counts the disk usage of files whose link count would go down to 0 after all the links found in the snapshot directory have been found.
find
prints:
1 <disk-usage>
for directories
<link-count> <disk-usage> <inode-number>
for other types of files.
We pretend the link count is always one for directories, because when in practice it's not, its because of the ..
entries, and find
doesn't list those entries, and directories generally don't have other hardlinks.
From that output, awk
counts the disk usage of the entries that have link count of 1 and also of the inodes which it has seen <link-count>
times (that is the ones whose all hard links are in the current directory and so, like the ones with a link-count of one would have their space reclaimed once the directory tree is deleted).
You can also use find snapshot-dir1 snapshot-dir2
to find out how much disk space would be reclaimed if both dirs were removed (which may be more than the sum of the space for the two directories taken individually if there are are files that are found in both and only in those snapshots).
If you want to find out how much space you would save after each snapshot-dir deletion (in a cumulated fashion), you could do:
find snapshot-dir* \( -path '*/*' -o -printf "%p:\n" \) \
-type d -printf '1 %b\n' -o -printf '%n %b %i\n' |
awk '/:$/ {if (NR>1) print t*512; printf "%s ", $0; next}
$1 == 1 || ++c[$3] == $1 {t+=$2;delete c[$3]}
END{print t*512}'
That processes the list of snapshots in lexical order. If you processed it in a different order, that would likely give you different numbers except for the final one (when all snapshots are removed).
See numfmt
to make the numbers more readable.
That assumes all files are on the same filesystem. If not, you can replace %i
with %D:%i
(if they're not all on the same filesystem, that would mean you'd have a mount point in there which you couldn't remove anyway).
Best Answer
Total size in bytes of all files in
hourly.2
which have only one link:From
find
man-page:To get the sum in kilobytes instead of bytes, use
-printf "%k\n"
To list files with different link counts, play around with
find -links +1
(more than one link),find -links -5
(less than five links) and so on.