Linux – How could I portably split large backup files over multiple discs

backuplinuxopen sourcewindowszip

Context: I make backups / archives, primarily of photos. I'm experimenting with Bup, which is designed for backup to hard disk. Basically it creates Git repos which include packfiles of up to 1GB. But I still need last-ditch backups to keep offline and move offsite (and keeping them on read-only media is good too!).

What are the options for archiving and splitting large files over several discs like CDs (and reading them back!)? I'd prefer methods which

  • will stay readable in future.
  • are portable e.g. to Windows.
  • have known simple implementations, so I could re-implement them myself if necessary.

(Using Bup packs will stretch my robustness budget. So I want to be confident about how other parts of the system would behave).

I heard split archives are possible with both ZIP and 7-Zip. Is that right?

Best Answer

  • Info-ZIP on Linux can create split ZIP files in a documented format. The option is e.g. zip -s 600m .... These could be extracted by WinZIP, but...

    unfortunately most other unzip software will choke. Windows built-in extraction doesn't claim to support them. Even Info-ZIP can't extract them directly :(. You have to use zip -F (or zip -s0) to merge them in advance.

  • 7-Zip on Linux (and presumably Windows) can create split archives of any type. The option is e.g. 7z a -tzip -v600m ... (for ZIP files).

    7-Zip's split archives are simply generic split files, not the documented ZIP split format. Other extraction software won't handle them correctly, include unzip, WinZIP, and definitely not Window's built-in extraction.

    It means that individual files will appear without a header (won't be recognised by file). The first file will have a header but appear as corrupt to other software. This situation might be exacerbated further by the ZIP format, where the "header" is somewhat optional; even if it's not present, reading software is supposed to recognise the "central directory" at the end of the file. So the last file in the sequence will be a valid ZIP file by the standard, and this will confuse other software.

    Confusion isn't great for backups! At least on Windows, you can hope the extra numeric file extension will provide some protection. The sequence of filenames which 7-Zip generates is very different to what you get with Info-ZIP.

    And on the plus side, 7-Zip is open source and widely used. Modulo the caveats above, this split "format" is well-known and can be recombined on any operating system:

    $ cat split.zip.* > split.zip
    
    C:> copy /b split.zip.* split.zip
    
  • Simple split files like 7-Zip uses can also be generated using split -b 600m split.zip split.zip.. It'll use a different sequence of filenames by default, so 7-Zip won't extract them directly.

I suggest using simple generic split files like 7-Zip does, and using ZIP for a portable archive format. So you could create them using the 7-Zip command above.

Related Question