Resumable archive

archivetar

I've a need to archive some big directory structures on an NFS server. They're not likely to be needed particularly soon, if ever – they're just being retained for policy reasons.

To this end, I'm making tarballs, and probably eventually writing them out to tape.

There's just one problem – I'm having a bit of difficult with the really large volumes (10TB+) – the runtime is sufficiently long that it gets left overnight, and in a few cases it seems to have 'stalled' – it's not too easy to tell for a backgrounded tar xvfz.

And then things like running out of space, network interruption etc. means that for things that don't complete in a single session, I'm not entirely sure that the archive is a) complete and b) entirely valid.

So hoping for some advice – ideally what I'd like is something resumable, like rsync, which I can multi-pass the copy, without starting over.

Is there a way to "rsync to a tar.gz"?
A not-too-expensive way of verifying file writes? I'm currently looking at 'extract, shasum and compare' but that's also a rather expensive/intensive process.

Best Answer

Perhaps splitting your backups would be a step towards addressing your problem?

tar cvzf - /your/dir/ | split --bytes=1000MB - backup.tar.gz

Or you can look into dar, perhaps. It has splitting built-in: http://dar.linux.free.fr/doc/Features.html

I also found some info on how to resume an interrupted dar backup job that might help:

http://sourceforge.net/p/dar/mailman/message/30863378/

Related Question