Since tar files are a streaming format — you can cat
two of them together and get an almost-correct result — you don't need to extract them to disk at all to do this. You can decompress (only) the files, concatenate them together, and recompress that stream:
xzcat *.tar.xz | xz -c > combined.tar.xz
combined.tar.xz
will be a compressed tarball of all the files in the component tarballs that is only slightly corrupt. To extract, you'll have to use the --ignore-zeros
option (in GNU tar
), because the archives do have an "end-of-file" marker that will appear in the middle of the result. Other than that, though, everything will work correctly.
GNU tar
also supports a --concatenate
mode for producing combined archives. That has the same limitations as above — you must use --ignore-zeros
to extract — but it doesn't work with compressed archives. You can build something up to trick it into working using process substitution, but it's a hassle and even more fragile.
If there are files that appear more than once in different tar files, this won't work properly, but you've got that problem regardless. Otherwise this will give you what you want — piping the output through xz
is how tar
compresses its output anyway.
If archives that only work with a particular tar
implementation aren't adequate for your purposes, appending to the archive with r
is your friend:
tar cJf combined.tar.xz dummy-file
for x in db-*.tar.xz
do
mkdir tmp
pushd tmp
tar xJf "../$x"
tar rJf ../combined.tar.xz .
popd
rm -r tmp
done
This only ever extracts a single archive at a time, so the working space is limited to the size of a single archive's contents. The compression is streaming just like it would have been had you made the final archive all at once, so it will be as good as it ever could have been. You do a lot of excess decompression and recompression that will make this slower than the cat
versions, but the resulting archive will work anywhere without any special support.
Note that — depending on what exactly you want — just adding the uncompressed tar files themselves to an archive might suffice. They will compress (almost) exactly as well as their contents in a single file, and it will reduce the compression overhead for each file. This would look something like:
tar cJf combined.tar.xz dummy-file
for x in db-*.tar.xz
do
xz -dk "$x"
tar rJf combined.tar.xz "${x%.xz}"
rm -f "${x%.xz}"
done
This is slightly less efficient in terms of the final compressed size because there are extra tar headers in the stream, but saves some time on extracting and re-adding all the files as files. You'd end up with combined.tar.xz
containing many (uncompressed) db-*.tar
files.
Best Answer
Advantages of using
.tar.gz
instead of.gz
are thattar
stores more meta-data (UNIX permissions etc.) thangzip
.The overhead of using
tar
is also very small.If not really needed, I still do not recommend to tar a single file. There are many useful tools which can access compressed single files directly (such as
zcat
,zgrep
etc. - also existing forbzip2
andxz
).