I would go for a low key solution. If I get you correctly, you want to update web pages and don't expect most of them changing, in which case I would just upload the changed files, whole.
This could be achieved eg. in mc
, connect one panel over FTP to your web host, let other panel show the local version. Then select everything and copy, and choose to overwrite only all newer files (you can chose that for all files at once). Or use another file manager's synchronize facility, I believe krusader has some. Unless you've got big files which change only locally (what are they? databases1? executables maybe, but not compressed?), binary deltas won't give you much IMO.
NOTE 1: Synchronizing databases in this way is a bad idea.
A good way to do this comparison is to use find
with md5sum
, then a diff
.
Example
Use find to list all the files in the directory then calculate the md5 hash for each file and pipe it sorted by filename to a file:
find /dir1/ -type f -exec md5sum {} + | sort -k 2 > dir1.txt
Do the same procedure to the another directory:
find /dir2/ -type f -exec md5sum {} + | sort -k 2 > dir2.txt
Then compare the result two files with diff
:
diff -u dir1.txt dir2.txt
Or as a single command using process substitution:
diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2) <(find /dir2/ -type f -exec md5sum {} + | sort -k 2)
If you want to see only the changes:
diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ") <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | cut -f1 -d" ")
The cut command prints only the hash (first field) to be compared by diff. Otherwise diff will print every line as the directory paths differ even when the hash is the same.
But you won't know which file changed...
For that, you can try something like
diff <(find /dir1/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /') <(find /dir2/ -type f -exec md5sum {} + | sort -k 2 | sed 's/ .*\// /')
This strategy is very useful when the two directories to be compared are not in the same machine and you need to make sure that the files are equal in both directories.
Another good way to do the job is using Git’s diff
command (may cause problems when files has different permissions -> every file is listed in output then):
git diff --no-index dir1/ dir2/
Best Answer
Note that I have added double-quotes around
$1
and$2
at various places above to protect them shell expansion. Without the double-quotes, directory names with spaces or other difficult characters would cause errors.The key loop now reads:
This uses
find
to recursively dive into directory$1
and find file names. The constructionwhile IFS= read -r -d $'\0' filename; do .... done < <(find "$1" -type f -print0)
is safe against all file names.basename
is no longer used because we are looking at files within subdirectories and we need to keep the subdirectories. So, in place of the call tobasename
, the linefn=${filename#$1}
is used. This just removes fromfilename
the prefix containing directory$1
.Problem 2
Suppose that we match files by name but regardless of directory. In other words, if the first directory contains a file
a/b/c/some.txt
, we will consider it present in the second directory if filesome.txt
exists in any subdirectory of the second directory. To do this replace the loop above with: