An image is a raw (literal, byte for byte) copy of a filesystem. Because this includes all the fs meta-data, you can mount it the same way you would mount a physical device with the exact same byte for byte data on it.
A tar file (aka. a 'tarchive') is an archival format that is filesystem agnostic -- although it includes information such as permissions, ownership, and maintains directory structure, it does not depend further upon the source filesystem. This means tarchives are portable from one type of filesystem to another; anywhere you have a tar
utility, you should be able to use a tarfile regardless of origin.
A tarchive is not a literal byte for byte copy of a region of storage. It's a set of files structured by tar, and hence, unlike an image, its contents can be analyzed and manipulated externally (by the tar
utility itself). This also means it does depend on some existing filesystem in order to be unpacked and used.
A tarchive can contain the contents of an entire filesystem, but this is not the same as containing the actual filesystem, as an image does. In order to reproduce the original fs, you would have to create an fs partition of the same type (n.b., which the tarchive contains no indication of) and unpack into it. Conversely, if you want to "unpack" an image into a subdirectory of an existing filesystem, you must mount it and copy out manually (although there may be tools to aid in this).
So, the two methodologies best suit slightly different purposes. With regard to back-ups, tar is the better choice for a number of reasons:
- You are only copying actual files, and not empty space.
- You are not bringing the underlying filesystem and its attendant imperfections with you (fragmentation, inconsistencies).
- You can avoid including things which should never be included (e.g.,
/proc
, /dev
).
- Tar files are easier to update.
Since tar files are a streaming format — you can cat
two of them together and get an almost-correct result — you don't need to extract them to disk at all to do this. You can decompress (only) the files, concatenate them together, and recompress that stream:
xzcat *.tar.xz | xz -c > combined.tar.xz
combined.tar.xz
will be a compressed tarball of all the files in the component tarballs that is only slightly corrupt. To extract, you'll have to use the --ignore-zeros
option (in GNU tar
), because the archives do have an "end-of-file" marker that will appear in the middle of the result. Other than that, though, everything will work correctly.
GNU tar
also supports a --concatenate
mode for producing combined archives. That has the same limitations as above — you must use --ignore-zeros
to extract — but it doesn't work with compressed archives. You can build something up to trick it into working using process substitution, but it's a hassle and even more fragile.
If there are files that appear more than once in different tar files, this won't work properly, but you've got that problem regardless. Otherwise this will give you what you want — piping the output through xz
is how tar
compresses its output anyway.
If archives that only work with a particular tar
implementation aren't adequate for your purposes, appending to the archive with r
is your friend:
tar cJf combined.tar.xz dummy-file
for x in db-*.tar.xz
do
mkdir tmp
pushd tmp
tar xJf "../$x"
tar rJf ../combined.tar.xz .
popd
rm -r tmp
done
This only ever extracts a single archive at a time, so the working space is limited to the size of a single archive's contents. The compression is streaming just like it would have been had you made the final archive all at once, so it will be as good as it ever could have been. You do a lot of excess decompression and recompression that will make this slower than the cat
versions, but the resulting archive will work anywhere without any special support.
Note that — depending on what exactly you want — just adding the uncompressed tar files themselves to an archive might suffice. They will compress (almost) exactly as well as their contents in a single file, and it will reduce the compression overhead for each file. This would look something like:
tar cJf combined.tar.xz dummy-file
for x in db-*.tar.xz
do
xz -dk "$x"
tar rJf combined.tar.xz "${x%.xz}"
rm -f "${x%.xz}"
done
This is slightly less efficient in terms of the final compressed size because there are extra tar headers in the stream, but saves some time on extracting and re-adding all the files as files. You'd end up with combined.tar.xz
containing many (uncompressed) db-*.tar
files.
Best Answer
An option could be to use
avfs
(here assuming a GNU system):