Is it better to compress all data or compressed directories

archivingcompressionrartarzip

I'm archiving some projects, let's say each of them has own directory:

projects
 |- project-1
 |- project-2
 |- project-3

I started compressing them as following:

==== SITUATION 1 ====

projects
 |- project-1.zip
 |- project-2.zip
 |- project-3.zip

and then I started wondering if wouldn't it be better to compress all data in one zip file:

==== SITUATION 2 ====

projects.zip
 |- project-1
 |- project-2
 |- project-3

or maybe compress already compressed files?:

==== SITUATION 3 ====

projects.zip
 |- project-1.zip
 |- project-2.zip
 |- project-3.zip

Which situation is the best (occupies the least space)? Why? Does it depend on compression algorithm? I know that compressing one compressed file cannot help much, but let's say 20 of them? For me situation 1 doesn't look like a good idea.

Best Answer

I doubt that the different schemes would make a lot of difference to be honest since the compression algorithms typically only look forward a limited amount in order to control memory use.

The exception is S3 which would end up larger most likely since compressing a compressed file adds overheads but cannot compress.

If you want better compression, look for newer archiving tools that have better algorithms. 7-zip for example is generally better than zip.

In terms of the difference between s1 and s2, I would say that it depends on how you are most likely to use the archive in the future and how big they end up.

Really big archives are a pain to handle (moving, opening, etc) and this is likely to be more important than saving a few kB.

Additionally, when thinking of long-term storage, don't ignore "bit-rot". A small error in a large archive can be devastating. Loosing one project is probably much better than loosing them all.

You might however look at something like RAR which allows redundancy and split archives. This is a bit like RAID5. You create multiple archive files each of which has built in redundancy so that you can loose a file and still recreate the original data.

Related Question