How to check which files have been copied to the main hard drive and which ones have not been copied

applescriptbackupbashdata transfer

I formatted my MacBook main drive some months ago and, in order to not lose data, I created a .dmg image of the hard drive before the formatting procedure and saved it to an external device.

Then, some days ago, I started a kind of data restoring, coping most of the backup files back to the main drive.

Now, I would like to check if all these mentioned files have really been copied back to my Macbook. Due to this, I am looking for an application theoretically able to analyze the files (in particular, their name, hash, etc.) contained in my external drive and find corresponding files on my Macintosh hard drive; if a corresponding file is not found, the file should be highlighted. If a similar application actually exists, can you suggest it to me?

If there is not an application with the features above explained, can you suggest me a brief Bash script or AppleScript able to do that? I'm not familiar with both these languages, but I have a little Batch scripting background and I was thinking about some statements – e.g. the for loop, md5, etc. – which generate a filename plus MD5 checksum list for both drives, and which find correspondances. What about this? Can you suggest me some examples?


Note: This is not the same as How to confirm that a file has copied to a new disk without any errors?. The new data structure on the main drive is a bit different compared to the previous one, which is stored in the external drive. Due to this, a standard folder/volume comparison with a dry rsync run is not exactly what I am looking for.

Best Answer

Using a bit of Bash and some nice utilities is possible to compare MD5 of every single file. I will assume here that the same MD5 means the same content.

Make MD5 of everything:

find /one/dir -type f | xargs md5 > one.txt
find /other/dir -type f | xargs md5 > other.txt

Compare every MD5 in order to find what is missing where:

diff -u <(cut -d'=' -f2 one.txt | sort) <(cut -d'=' -f2 other.txt | sort) > diff.txt

Only found in /one/dir:

grep -f <(sed -n 's/^- //p' diff.txt) one.txt

Only found in /other/dir:

grep -f <(sed -n 's/^+ //p' diff.txt) other.txt

Let me know if this works.