Compare an old file and new file, but ignore lines which only exist in new file

diff()file-comparisonjoin;text processing

I have two files:

  1. oldlist – This contains a list of files and a md5 hash for each file. This was generated one year ago.
  2. newlist – This also contains a list of files and a md5 hash for each file. However, some files have been changed (e.g. their md5 hash is different) and some new files have been added.

I would like to see all differences between oldlist and newlist, but I want to ignore any files which don't exist in oldlist.

That is, I don't care about new files. I only want to compare the md5 hashes for each old file, so that I can see if any files have changed within the last year.

I have tried diff and comm, but have not found a solution yet.

Best Answer

Use join to combine matching lines from the two files. Assuming the file names come after the checksums (as in md5sum output) and don't contain whitespace, this will print all file names that are present in both lists, together with the old checksum and the new checksum:

join -1 2 -2 2 <(sort -k 2 oldlist) <(sort -k 2 newlist)

To also see new files, pass the -a option to join. A bit of output postprocessing will remove the file names for which the checksum has not changed.

join -a 2 -1 2 -2 2 <(sort -k 2 oldlist) <(sort -k 2 newlist) |
awk '$2 != $3'
Related Question