How to determine size of tar archive without creating it

archivebackuplarge filestapetar

I'm archiving a few directories every night to LTO-7 tape with about 100 or so large (2GB) files in each of them.

As a check that the data has been written correctly, I'm verifying that the number of bytes reported written is the same as what should have been written.

I'm first looking at the size of the archive by doing a tar dry-run:

tar -cP --warning=no-file-changed $OLDEST_DIR | wc -c

Then I'm creating the archive with:

tar -cvf /dev/nst0 --warning=no-file-changed --totals $OLDEST_DIR

If the filesizes match, then I delete the original file.

The problem is that the dry-run has to read the entire contents of the files and can take several hours. Ideally, it should use the reported filesizes, apply the necessary padding / aligning, and report back the size rather than thrashing the disk for hours.

Using du -s or similar doesn't work because the sizes don't quite match (filesystems treat a directory as 4096 bytes, tar treats it as 0 bytes for example).

Alternatively, is there a better way of checking that the file has been correctly written? I can't trust tar's return code, since I'm ignoring certain warnings (to handle some sort of bug with tar/mdraid)

Best Answer

If you add an extra v to your tar command which writes to the drive, it will report the file sizes; you could perhaps parse that and compare the file sizes, without having to read all the files twice.

You have to realise that this is no substitute for proper verification, and the only real verification of a backup is a restore... Note that LTO drives verify writes as they go, so you're not quite driving blind here. But simply relying on file size comparisons doesn't tell you all that much!

I'd actually strongly recommend using proper backup software, such as Bacula which is ideally suited for tape backups. Once you've set it up, it will take care of your verifications for you.

Related Question