Shell – md5 hash only first 512 bytes of file

datahashsumshell-script

Background

I am about to migrate files from my old NAS to a new one, and want to verify the data integrity. The old NAS (Debian) is using Linux Ext3 file system, whilst the new one (FreeNAS) is based on ZFS. To speed up the integrity validation I am trying to use the triage approach:

  • first validate all file sizes
  • secondly md5 hash the first 512 bytes of each file
  • lastly md5 hash entire file

The idea being that the first two steps would filter out obviously corrupted files, and be much quicker to detect than running md5 in bulk for TB of files.

Question

I have constructed a bash command for performing a md5 hash of a directory structure, and sorting the output based on file name to ensure a deterministic order on my Linux NAS.

#find somedir -type f -exec md5sum {} \; | sort -k 34;
12e761f96223145aa63f4f48f252d7fb  /somedir/foo.txt
18409feb00b6519c891c751fe2541fdc  /somedir/bar.txt

But how to modify above if I want to md5 only the first 512 bytes of each file?

Best Answer

You can use dd to pipe only the first 512 bytes to md5sum. However this will cause md5sum to be oblivious of the filename, so in addition replace - with the filename again.

find . -type f -exec sh -c "dd if={} bs=512 count=1 2>/dev/null | md5sum | sed s\|-\|{}\|" \; | sort -k 34;
Related Question