#!/bin/bash
# cmp_dir - program to compare two directories
# Check for required arguments
if [ $# -ne 2 ]; then
echo "usage: $0 directory_1 directory_2" 1>&2
exit 1
fi
# Make sure both arguments are directories
if [ ! -d "$1" ]; then
echo "$1 is not a directory!" 1>&2
exit 1
fi
if [ ! -d "$2" ]; then
echo "$2 is not a directory!" 1>&2
exit 1
fi
# Process each file in directory_1, comparing it to directory_2
missing=0
while IFS= read -r -d $'\0' filename
do
fn=${filename#$1}
if [ ! -f "$2/$fn" ]; then
echo "$fn is missing from $2"
missing=$((missing + 1))
fi
done < <(find "$1" -type f -print0)
echo "$missing files missing"
Note that I have added double-quotes around $1
and $2
at various places above to protect them shell expansion. Without the double-quotes, directory names with spaces or other difficult characters would cause errors.
The key loop now reads:
while IFS= read -r -d $'\0' filename
do
fn=${filename#$1}
if [ ! -f "$2/$fn" ]; then
echo "$fn is missing from $2"
missing=$((missing + 1))
fi
done < <(find "$1" -type f -print0)
This uses find
to recursively dive into directory $1
and find file names. The construction while IFS= read -r -d $'\0' filename; do .... done < <(find "$1" -type f -print0)
is safe against all file names.
basename
is no longer used because we are looking at files within subdirectories and we need to keep the subdirectories. So, in place of the call to basename
, the line fn=${filename#$1}
is used. This just removes from filename
the prefix containing directory $1
.
Problem 2
Suppose that we match files by name but regardless of directory. In other words, if the first directory contains a file a/b/c/some.txt
, we will consider it present in the second directory if file some.txt
exists in any subdirectory of the second directory. To do this replace the loop above with:
while IFS= read -r -d $'\0' filename
do
fn=$(basename "$filename")
if ! find "$2" -name "$fn" | grep -q . ; then
echo "$fn is missing from $2"
missing=$((missing + 1))
fi
done < <(find "$1" -type f -print0)
Best Answer
Use
tree
:Or you can combine
find
andawk
:Or, generalizing:
If you don't mind concise output:
This will count one more directory than
tree
- the starting directory is not counted intree
's output.