Bash – md5sum of files in two folders


I'm trying to compare all the files in two folders via an md5sum in one command. Some like the following (bash) in Debian:

$ cd ~/FOLDER1
$ md5sum ~/FOLDER2/* | md5sum -c -

The idea is that the output of the hashes from the first md5sum will be passed into the second one and used as the input file. However, testing of this shows that it just compares each file in FOLDER2 to itself and returns "OK" for each one. I think the reason this is not working is because the filenames output from the first md5sum include the full path. I've looked at md5deep but have not found anything to help me there. I know that it is possible to do the md5sum for one folder, write the results out to a file, and then use that file as the input for the second md5sum. I was wanting to do it all in one line though a pipe, rather than using two commands and writing out a file.

Edit: The accepted answer here (using diff) might do what I want, but I don't know if diff (correctly) compares binary files.

Edit: The get the output I wanted using md5sum (which shows the filename and "OK"), I've resorted to writing a batch file. Execute with ~/FOLDER1 ~/FOLDER2.

cd "$1"
md5sum * > /tmp/md5sum.cmp
cd "$2"
md5sum -c /tmp/md5sum.cmp
cd $HERE

This script will only compare files which are present in ~/FOLDER. If ~/FOLDER2 has additional files, these will not be compared and no output will indicate that they even exist.

Best Answer

You can use process substitution to pass the output of the 2 md5sum's to diff. Diff in this case would be fine because the md5 outputs are plain text. Something like:

diff <(md5 ~/FOLDER1/* | awk '{print $4}') <(md5 ~/FOLDER2/* | awk '{print $4}')

Sorry, I don't have Debian here and can't test this on it. The above is tested on OS X that has md5 which may be slightly different in terms of output. On OS X the 4th column of md5 is the actual md5 sum, that's why I am taking just these columns.

Instead of awk, you could also use cut, but you may need to change the separator to get the 4th column (these are not tab-separated).

Related Question