Extracting a single file from a zip file is a fast operation, so I assumed this would be true for TAR as well, but I learned that even though a TAR file is without compression, it can take a looong time for a file to be extracted. I had used tar to backup my home folder on OS X, and I then needed a single file. Since tar doesn't know where the file is, it needed to scan the entire 300GB file before being able to extract. This means TAR is a terrible format for most backup scenarios, so I'd like to know my options.
So, which archival file formats are suitable for quickly extracting a single file?
Even though this question isn't really about compression, I don't mind answers listing formats that combine archiving and compression (like zip), in which case "solid compression" will matter.
Best Answer
It sounds like speed & efficiency of extraction are your main concerns, and I'm assuming you're using linux or macOS so want to preserve special file attributes (the ones zip & 7z ignore). In that case, an excellent archive format would be:
An ext[2/3/4] filesystem - Just copy the files somewhere, then extracting a single file is as quick & easy as mounting & reading the original file. You could put the whole archive filesystem inside a single archive file if you wish, just create a file big enough & format it & mount it (don't even need the
-o loop
option anymore).Pros:
A nice bonus is you can easily add encryption (LUKS) to the whole archive file too, or any other encryption the filesystem supports (eCryptFS, EncFS, etc).
You can also use rsync-based backup solutions easily.
It's easy to add/delete files (up to the overall archive file's size).
Cons:
resize2fs
to shrink the filesystem, thentruncate
to shrink the file (or vice versa to expand).The same filesystem you're already using, in case you're using macOS and it likes something other than ext. I'm pretty sure macOS's mount command works with a single large archive file too.
If you do want some compression also, that's usually where the solid archives & slow reading comes in. Some filesystems support compression directly (btrfs, reiserfs/reiser4, planned for ext?) but I'd just go with:
SquashFS - It might be the compression King, saves file attributes, and allows quick extraction of a single file (mounting & browsing of every file in fact). It's great for archives too, and has adjustable levels of compression, use it.
Or perhaps combine it with incremental backups & overlay mounts for a nice "partial backups but full files" solution.
A con is it's impossible to increase or shrink the size of the archive, or add/delete files.
Or just use an existing backup product (Time Machine?).
If you really wanted to use an archive like 7z/zip anyway, but still keep the file attributes, you could tar each file individually (saving the attributes) then store the separate tar files in a 7z/zip archive. It needs an extra step with more hassles, but would let you easily extract a single (tar'd) file, and expand or shrink the archive without re-compressing everything (if it's not a solid archive).