Macos – the fastest compression method for a large number of files

compressiongzipmacostarzip

I need to compress a directory with around 350,000 fairly small files that amount to about 100GB total. I am using OSX and am currently using the standard "Compress" tool that converts this directory into a .zip file. Is there a faster way to do this?

Best Answer

For directories I'd use a tar piped to bzip2 with max-compression.

a simple way to go is,

tar cfj archive.tar.bz2 dir-to-be-archived/ 

This works great if you don't intend to fetch small sets of files out of the archive
and are just planning to extract the whole thing whenever/wherever required.
Yet, if you do want to get a small set of files out, its not too bad.

I prefer to call such archives filename.tar.bz2 and extract with the 'xfj' option.

The max-compression pipe looks like this,

tar cf - dir-to-be-archived/ | bzip2 -9 - > archive.tar.bz2  
#      ^pipe tarball from here to zip-in^ into the archive file. 

Note: the 'bzip2' method and more compression tends to be slower than regular gzip from 'tar cfz'.

If you have a fast network and the archive is going to be placed on a different machine,
you can speed up with a pipe across the network (effectively using two machines together).

tar cf - dir/ | ssh user@server "bzip2 -9 - > /target-path/archive.tar.bz2"  
#      ^ pipe tarball over network to zip ^ and archive on remote machine.

Some references,

  1. Linux Journal: Compression Tools Compared, Jul 28, 2005
  2. gzip vs. bzip2, Aug 26, 2003
  3. A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA, May 31, 2005