Compression – Differences Between Various Compression Systems


I've always used TAR and ZIP for compression, but recently I have heard about the *.Z compression algorithm. This brought up a question for me:

With all of these compression systems, which one is best for general use and compression?

Running a few tests, I have discovered that tar, as I discovered, does NOT really compress (unless explicitly specified). Meaning, what is it good for compared to other compression methods?

I am already aware that ZIP is the most widely-used compression system, but should I use it instead of *.Z, *.7z, .tar, or .tar.<insert ending here>?

Post Summary:

  1. Should I use *.tar, *.Z, *.7z, .tar, or .tar.<insert ending here> for the best compression?
  2. If plain *.tar doesn't compress, why do we use it?

EDIT: Not all algorithms allow storing of Linux permissions (from what I learned). Which do, and is there some sort of hack (or script) I could use to store permissions?

Best Answer

tar stands for tape archive. All it does is pack files, and their metadata ( permissions, ownership, etc ) into a stream of bytes that can be stored on a tape drive ( or a file ) and restored later. Compression is an entirely separate matter that you used to have to pipe the output through an external utility to compress if wanted that. GNU tar was nice enough to add switches to tell it to automatically filter the output through the appropriate utility as a shortcut.

Zip and 7z combine the archiving and compression together into their own container format, and they are meant to pack files on a DOS/Windows system, so they do not store unix permissions and ownership. Thus if you want to store permissions for proper backups, you need to stick with tar. If you plan on exchanging files with Windows users, then zip or 7z is good. The actual compression algorithms zip and 7zip use can be used with tar, by uzing gzip and lzma respectively.

lzma ( aka. *.xz ) has one of the best compression ratios, and is quite fast at decompression, making it a top choice these days. It does however, require a ton of ram and cpu time to compress. The venerable gzip is quite a bit faster at compression, so may be used if you don't want to dedicate that much cpu time. It also has an even faster variant called lzop. bzip2 is still fairly popular as it largely replaced gzip for a time before 7zip/lzma came about, since it got better compression ratios, but is falling out of favor these days since 7z/lzma is faster at decompression and gets better compression ratios. The compress utility, which normally names files *.Z, is ancient and long forgotten.

One of the other important differences between zip and tar is that zip compresses the data in small chunks, whereas when you compress a tar file, you compress the whole thing at once. The latter gives better compression ratios, but in order to extract a single file at the end of the archive, you must decompress the whole thing to get to it. Thus the zip format is better at extracting a single file or two from a large archive. 7z and dar allow you to choose to compress the whole thing ( called "solid" mode ) or small chunks for easy piecemeal extraction.