Merge nonzero blocks of huge (sparse) file A into huge file B

binaryfileslarge files

I have two partial disk images from a failing hard drive. File B contains the bulk of the disk's contents, with gaps where sector reads failed. File A is the result of telling ddrescue to retry all the failed sectors, so it is almost entirely gaps, but contains a few places where rereads succeeded. I now need to merge the interesting contents of File A back into File B. The algorithm is simple:

while not eof(A):
   read 512 bytes from A
   if any of them are nonzero:
       seek to corresponding offset in B
       write bytes into B

and I could sit down and write this myself, but I would first like to know if someone else has already written and debugged it.

(To complicate matters, due to limited space, File B and File A are on two different computers — this is why I didn't just tell ddrescue to attempt to fill in the gaps in B in the first place — but A can be transferred over the network relatively easily, being sparse.)

Best Answer

Your algorithm is implemented in GNU dd.

dd bs=512 if=A of=B conv=sparse,notrunc

Please verify this beforehand with some test files of your choice. You don't want to inadvertently damage your File B. A better algorithm would be to check whether B also has zeroes at that position, alas that's something dd does not do.

As for two different computers, you have several options. Use a network filesystem that supports seeks on writes (not all do); transfer the file beforehand; or pipe through SSH like so:

dd if=A | ssh -C B-host dd of=B conv=sparse,notrunc
# or the other way around
ssh -C A-host dd if=A | dd of=B conv=sparse,notrunc

The ssh -C option enables compression, you'd be transferring gigabytes of zeroes over the network otherwise.