Compression method that supports solid compression and also adding data to the compressed file

compression

I have a large compressed .tar.xz file containing log files. The compression ratio is very good – but it takes a long time to compress, and if I want to add additional log files to it, I have to extract it, add the new file, and recompress it – which takes even longer and uses up a lot of disk space.

Is there an archive/compression method that allows me to efficiently add a new file to an existing archive while still retaining the benefits of solid compression? (i.e. not compressing files on an individual basis, which is what .zip does).

Best Answer

It is not possible to update or delete files from a solid compression. In a solid compression, the compression of the subsequent data depends on statistical analysis of previous data (which usually brings better compression fine tuning the statistical analysis of the input), hence removing files require un-compression and re-compression of the whole archive containing the files.

It is also important to understand that solid compression is usually used where you want to save disk/bandwidth but don't mind the extra time it takes to compress or decompress and loose the flexibility of updating or editing. there are other tools out there that provide the ability for quick compression/decompression, including updating existing archive but then the compression ratio is not the same as solid compression.

The Solid compression you are referring to "tar.xz" is an "emulated" solid compression"

In computing, solid compression refers to a method for data compression of multiple files, wherein all the uncompressed files are concatenated and treated as a single data block. Such an archive is called a solid archive. It is used natively in the 7z [1] and RAR [2] formats, as well as indirectly in tar-based formats such as .tar.gz and .tar.bz2. By contrast, the ZIP format is not solid because it stores separate compressed files (though solid compression can be emulated for small archives by combining the files into an uncompressed zip archive and then compressing the zip archive inside a second compressed zip file).

Let me start by explaining how your currently used method of tar.xz works.

tar

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball

Hence Tar is basically an archive.

xz

xz is a lossless data compression program and file format which incorporates the LZMA/LZMA2 compression algorithms. it has a high compression ratio, however slow compression and decompression times.

Hence when you combine the two you are first Archiving (tar) a number of files and then compressing (xz) that single file.

Now on the question on how can one add new content / update the existing compressed file.

If using tar.xz you will have to uncompress the tar.xz, which will leave you with a tar file and then you can use the following to append the archive.

tar --append --file=archive.tar file_or_dir_to_add

and then to compress it back again.

xz archive.tar

Or alternatively, you can use ZIP.

zip -g archive.zip folder/file

ZIP is an archive file format that supports lossless data compression. A .ZIP file may contain one or more files or directories that may have been compressed. The .ZIP file format permits a number of compression algorithms, though DEFLATE is the most common.

Related Question