Archive big data into multiple parts

archivegziprheltar

I'm working on big data and I need to archive a directory that is larger than 64 terabytes. I cannot create such large file (archive) on my file system. Unluckily, all proposed solutions for creating a multiple-parts archive on Linux suggest creating an archive first and then splitting it into smaller files with split command.

I know that it is possible with f.e. 7zip, but unluckily I'm quite forced to use tools built in RedHat 6 – tar, gzip, bzip2…

I was wondering about creating a script that would ask user for the maximum volume size. It would archive every single file with gzip, split those files, that are too big and then manually merge them into many tars with the chosen size. Is that a good idea?

Is there any other possibility to achieve big archive division with basic Linux commands?

UPDATE:

I've tested the solution on the filesystem with the restricted maximum file size and it worked. The pipe that redirects the tar output directly into split command has worked as intended:

tar -czf - HugeDirectory | split --bytes=100GB - MyArchive.tgz.

The created files are already small and when merging them together no supersized files are created:

cat MyArchive.tgz* | tar -xzf -

Best Answer

If you have enough space to store the compressed archive, then the archive could be created and split in one go (assuming GNU split):

tar -c -vz -f - directory | split --additional-suffix=.gz.part -b 1G

This would create files called xaa.gz.part, xab.gz.part etc., each file being a 1G compressed bit of the tar archive.

To extract the archive:

cat x*.gz.part | tar -x -vz -f -

If the filesystem can not store the compressed archive, the archive parts needs to be written to another filesystem, alternative to some remote location.

On that remote location, for example:

ssh user@serverwithfiles tar -c -vz -f - directory | split --additional-suffix=.gz.part -b 1G

This would transfer the compressed archive over ssh from the machine with the big directory to the local machine and split it.