Ubuntu – Comparing the contents of two directories

command line

I have two directories that should contain the same files and have the same directory structure.

I think that something is missing in one of these directories.

Using the bash shell, is there a way to compare my directories and see if one of them is missing files that are present in the other?

Best Answer

A good way to do this comparison is to use find with md5sum, then a diff.

Example

Use find to list all the files in the directory then calculate the md5 hash for each file and pipe it sorted by filename to a file:

find /dir1/ -type f -exec md5sum {} + | sort -k 2 > dir1.txt

Do the same procedure to the another directory:

find /dir2/ -type f -exec md5sum {} + | sort -k 2 > dir2.txt

Then compare the result two files with diff:

diff -u dir1.txt dir2.txt

Or as a single command using process substitution:

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2) <(find /dir2/ -type f -exec md5sum {} + | sort -k 2)

If you want to see only the changes:

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ") <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ")

The cut command prints only the hash (first field) to be compared by diff. Otherwise diff will print every line as the directory paths differ even when the hash is the same.

But you won't know which file changed...

For that, you can try something like

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /') <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /')

This strategy is very useful when the two directories to be compared are not in the same machine and you need to make sure that the files are equal in both directories.

Another good way to do the job is using Git’s diff command (may cause problems when files has different permissions -> every file is listed in output then):

git diff --no-index dir1/ dir2/