Ubuntu – Comparing the contents of two directories

command line

I have two directories that should contain the same files and have the same directory structure.

I think that something is missing in one of these directories.

Using the bash shell, is there a way to compare my directories and see if one of them is missing files that are present in the other?

Best Answer

A good way to do this comparison is to use find with md5sum, then a diff.

Example

Use find to list all the files in the directory then calculate the md5 hash for each file and pipe it sorted by filename to a file:

find /dir1/ -type f -exec md5sum {} + | sort -k 2 > dir1.txt

Do the same procedure to the another directory:

find /dir2/ -type f -exec md5sum {} + | sort -k 2 > dir2.txt

Then compare the result two files with diff:

diff -u dir1.txt dir2.txt

Or as a single command using process substitution:

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2) <(find /dir2/ -type f -exec md5sum {} + | sort -k 2)

If you want to see only the changes:

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ") <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ")

The cut command prints only the hash (first field) to be compared by diff. Otherwise diff will print every line as the directory paths differ even when the hash is the same.

But you won't know which file changed...

For that, you can try something like

diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /') <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /')

This strategy is very useful when the two directories to be compared are not in the same machine and you need to make sure that the files are equal in both directories.

Another good way to do the job is using Git’s diff command (may cause problems when files has different permissions -> every file is listed in output then):

git diff --no-index dir1/ dir2/

Related Solutions

Ubuntu – How to compare two directories recursively and check if one of the directories contains the other

#!/bin/bash

# cmp_dir - program to compare two directories

# Check for required arguments
if [ $# -ne 2 ]; then
  echo "usage: $0 directory_1 directory_2" 1>&2
  exit 1
fi

# Make sure both arguments are directories
if [ ! -d "$1" ]; then
  echo "$1 is not a directory!" 1>&2
  exit 1
fi

if [ ! -d "$2" ]; then
  echo "$2 is not a directory!" 1>&2
  exit 1
fi

# Process each file in directory_1, comparing it to directory_2
missing=0
while IFS= read -r -d $'\0' filename
do
  fn=${filename#$1}
  if [ ! -f "$2/$fn" ]; then
      echo "$fn is missing from $2"
      missing=$((missing + 1))
  fi
done < <(find "$1" -type f -print0)

echo "$missing files missing"

Note that I have added double-quotes around $1 and $2 at various places above to protect them shell expansion. Without the double-quotes, directory names with spaces or other difficult characters would cause errors.

The key loop now reads:

while IFS= read -r -d $'\0' filename
do
  fn=${filename#$1}
  if [ ! -f "$2/$fn" ]; then
      echo "$fn is missing from $2"
      missing=$((missing + 1))
  fi
done < <(find "$1" -type f -print0)

This uses find to recursively dive into directory $1 and find file names. The construction while IFS= read -r -d $'\0' filename; do .... done < <(find "$1" -type f -print0) is safe against all file names.

basename is no longer used because we are looking at files within subdirectories and we need to keep the subdirectories. So, in place of the call to basename, the line fn=${filename#$1} is used. This just removes from filename the prefix containing directory $1.

Problem 2

Suppose that we match files by name but regardless of directory. In other words, if the first directory contains a file a/b/c/some.txt, we will consider it present in the second directory if file some.txt exists in any subdirectory of the second directory. To do this replace the loop above with:

while IFS= read -r -d $'\0' filename
do
  fn=$(basename "$filename")
  if ! find "$2" -name "$fn" | grep -q . ; then
      echo "$fn is missing from $2"
      missing=$((missing + 1))
  fi
done < <(find "$1" -type f -print0)

Ubuntu – Copy files and directories without files content

From man cp

--attributes-only don't copy the file data, just the attributes

So , if you want to copy all folders and files that are in somedirectory

do cp -R --attributes-only somedirectory destinationdirectory