SSH – Verify Data Integrity of Tar Output

backupchecksumdata integritysshtar

I am creating a backup script which uses tar, encrypts it with gpg and then sends the output through ssh to an offsite host. I am now wondering weather and how I can verify that all of the data is transmitted successfully. Since the backup is multiple GB in size, there is a reasonable chance the connection might drop mid way. Since ssh uses TCP I am not worried about some packets being dropped, but what about when the connection is gone for 20 seconds or more?

This is the command I am using:

tar \
    --create \
    ... \
    --listed-incremental="${metadata_file}" \
    --file - \
    "${FILES_TO_BACKUP[@]}" |
gpg --encrypt ... --output - |
ssh -T ${REMOTE_HOST} "cat > ${output_file}"

After the backup I copy the metadata file as well, where I can can use a checksum to verify. But what about the backup itself?

There is the possibility to

  1. write the backup to local storage
  2. copy to remote
  3. verify checksum
  4. rm the local backup

However since I don't have that much storage locally I'd like not to that if possible.

I am on Linux with Bash >4.0.

Best Answer

You don't have to save the file locally to compute its checksum.

The idea is to add the checksum utility of your choice into the pipe after gpg and before ssh, to calculate the checksum locally without saving it to a local file. To avoid sending the checksum to ssh you may use tee.

The idea is that the pipe will look like this :

tar ... | gpg ... | tee >(checksum-utility > output-file) | ssh ...

For output-file you may use a real file or also /dev/tty.

See for example the post Checksum and pipe an input simultaneously.

Related Question