Way to search for files by hash value

file searchhashingsearch

Is there a way I can have a hash value as input when searching for files and a complete list of files and their locations as output?

This could be helpful when trying to pin point file duplicates. I often times find myself in situations where I have a bunch of files that I know I already have stored in some location but I don't know where. They are essentially duplicates.

For instance, I could have a bunch of files on a portable hard drive, and also hard copies of those files on the internal hard drive of a desktop computer… but I'm not sure of the location! Now if the files are not renamed, I could do a file name search to try to locate the hard copy on the desktop. I could then compare them side by side and in case they are the same I could delete the copy I have on the portable hard drive. But if the files have been renamed on either one of the hard drives this would probably not work (depending on how much the new names differ from the original).

If a file is renamed, but not edited, I could calculate its hash value, e.g. SHA1 value is 74e7432df4a66f246b5214d60b190b67e2f6ce52. I would then like to have this value as input when searching for files and have the operating system search through a given directory or the entire file system for files with this exact SHA1 hash value and output a complete list of locations where these files are stored.

I'm using Windows, but I am generally interested in knowing how something like this could be achieved, regardless of operating system.

Best Answer

Linux example:

hash='74e7432df4a66f246b5214d60b190b67e2f6ce52'
find . -type f -exec sh -c '
   sha1sum "$2" | cut -f 1 -d " " | sed "s|^\\\\||" | grep -Eqi "$1"
' find-sh "$hash" {} \; -print

This code is more complex than you would think it should be because:

  • it is intended to correctly handle filenames with spaces, newlines, backslashes, quotations, special characters etc. (change -print to -print0 to parse them further);
  • it is intended to accept hash(es) as regex (compatible with grep -E i.e. egrep),
    e.g. '(^00)|(00$)' will match if the file hash starts or ends with 00.

You can use other *sum tools with compatible interface (e.g. md5sum).

Related Question