How to delete all duplicate hardlinks to a file

duplicatefileshard link

I've got a directory tree created by rsnapshot, which contains multiple snapshots of the same directory structure with all identical files replaced by hardlinks.

I would like to delete all those hardlink duplicates and keep only a single copy of every file (so I can later move all files into a sorted archive without having to touch identical files twice).

Is there a tool that does that?
So far I've only found tools that find duplicates and create hardlinks to replace them…
I guess I could list all files and their inode numbers and implement the deduplicating and deleting myself, but I don't want to reinvent the wheel here.

Best Answer

In the end it wasn't too hard to do this manually, based on Stéphane's and xenoid's hints and some prior experience with find.
I had to adapt a few commands to work with FreeBSD's non-GNU tools — GNU find has the -printf option that could have replaced the -exec stat, but FreeBSD's find doesn't have that.

# create a list of "<inode number> <tab> <full file path>"
find rsnapshots -type f -links +1 -exec stat -f '%i%t%R' {} + > inodes.txt

# sort the list by inode number (to have consecutive blocks of duplicate files)
sort -n inodes.txt > inodes.sorted.txt

# remove the first file from each block (we want to keep one link per inode)
awk -F'\t' 'BEGIN {lastinode = 0} {inode = 0+$1; if (inode == lastinode) {print $2}; lastinode = inode}' inodes.sorted.txt > inodes.to-delete.txt

# delete duplicates (watch out for special characters in the filename, and possibly adjust the read command and double quotes accordingly)
cat inodes.to-delete.txt | while read line; do rm -f "$line"; done
Related Question