Why would I tar a single file

archivecompressiontar

At my company, we download a local development database snapshot as a db.dump.tar.gz file. The compression makes sense, but the tarball only contains a single file (db.dump).

Is there any point to archiving a single file, or is .tar.gz just such a common idiom? Why not just .gz?

Best Answer

Advantages of using .tar.gz instead of .gz are that

tar stores more meta-data (UNIX permissions etc.) than gzip.
the setup can more easily be expanded to store multiple files
.tar.gz files are very common, only-gzipped files may puzzle some users. (cf. MelBurslans comment)

The overhead of using tar is also very small.

If not really needed, I still do not recommend to tar a single file. There are many useful tools which can access compressed single files directly (such as zcat, zgrep etc. - also existing for bzip2 and xz).

Related Solutions

Command-Line – Program for Consistent Interface Across Multiple Archive Types

You can use p7zip. It automatically identifies the archive type and decompress it.

p7zip is the command line version of 7-Zip for Unix/Linux, made by an independent developer.

7z e <file_name>

How should I combine many compressed files into one archive

Since tar files are a streaming format — you can cat two of them together and get an almost-correct result — you don't need to extract them to disk at all to do this. You can decompress (only) the files, concatenate them together, and recompress that stream:

xzcat *.tar.xz | xz -c > combined.tar.xz

combined.tar.xz will be a compressed tarball of all the files in the component tarballs that is only slightly corrupt. To extract, you'll have to use the --ignore-zeros option (in GNU tar), because the archives do have an "end-of-file" marker that will appear in the middle of the result. Other than that, though, everything will work correctly.

GNU tar also supports a --concatenate mode for producing combined archives. That has the same limitations as above — you must use --ignore-zeros to extract — but it doesn't work with compressed archives. You can build something up to trick it into working using process substitution, but it's a hassle and even more fragile.

If there are files that appear more than once in different tar files, this won't work properly, but you've got that problem regardless. Otherwise this will give you what you want — piping the output through xz is how tar compresses its output anyway.

If archives that only work with a particular tar implementation aren't adequate for your purposes, appending to the archive with r is your friend:

tar cJf combined.tar.xz dummy-file
for x in db-*.tar.xz
do
    mkdir tmp
    pushd tmp
    tar xJf "../$x"
    tar rJf ../combined.tar.xz .
    popd
    rm -r tmp
done

This only ever extracts a single archive at a time, so the working space is limited to the size of a single archive's contents. The compression is streaming just like it would have been had you made the final archive all at once, so it will be as good as it ever could have been. You do a lot of excess decompression and recompression that will make this slower than the cat versions, but the resulting archive will work anywhere without any special support.

Note that — depending on what exactly you want — just adding the uncompressed tar files themselves to an archive might suffice. They will compress (almost) exactly as well as their contents in a single file, and it will reduce the compression overhead for each file. This would look something like:

tar cJf combined.tar.xz dummy-file
for x in db-*.tar.xz
do
    xz -dk "$x"
    tar rJf combined.tar.xz "${x%.xz}"
    rm -f "${x%.xz}"
done

This is slightly less efficient in terms of the final compressed size because there are extra tar headers in the stream, but saves some time on extracting and re-adding all the files as files. You'd end up with combined.tar.xz containing many (uncompressed) db-*.tar files.

Best Answer

Related Solutions

Command-Line – Program for Consistent Interface Across Multiple Archive Types

How should I combine many compressed files into one archive

Related Question