How to specify the order files should be compressed in 7zip

7-ziparchivingcompression

I have a set of files I would like to compress that I know to be repetitive and compressible, but 7zip chooses a non-optimal order to compress the files and fails to take advantage of their compressibility. How can I get 7zip to compress the files in another order?

The files I want to compress are the following:

  • A 200MB PDF containing a large number of embedded JPGs
  • 190MB of JPGs, all of which are separately embedded in the PDF
  • About 500MB of miscellaneous other moderately compressible
    files

I know it is possible for 7zip to take advantage of the repetition between the PDF and the bare JPGs because when I archive just the PDF and the JPGs together, I get a compression ratio of 47%. But when I try to include the 500MB of other files, 7zip compresses the JPGs first, then the miscellaneous other data, and by the time it gets to the PDF, the compression algorithm must have 'forgotten' about the JPGs because the PDF is hardly compressed at all.

With 7-zip 9.32 alpha, using the 7z archive format, ultra compression level, LZMA2 algorithm, 256MB dictionary size, 128 word size, 4GB solid block size, and 2 CPU threads, I get the following compression ratios:

  • PDF only: 93%
  • JPGs only: 95%
  • PDF and JPGs together: 47%
  • Misc. files only: 44%
  • Misc. files and PDF: 55%
  • Misc. files and PDF and JPGs: 63%

Since the misc. files are compressible to 44% of their original size, and the PDFs and JPGs together are compressible to 47%, I would expect everything together to be compressible to somewhere on the lower end of 44-47%, but due to the poor ordering of files by 7zip, I get a significantly worse result.

I have tried to alter the order 7zip compresses files by playing with file creation, modification, and access dates. I have tried moving the files to another folder and copying them back so they are rewritten to disc consecutively. I have even tried archiving all the JPGs in a zip file with store-level compression, so that their filesize will approximately match the PDF. No matter what I do, I can't seem to get 7zip to compress the PDF and the JPGs without the misc. files in between.

Any ideas? I am unable to increase the dictionary size due to memory constraints.

Best Answer

I managed to solve this problem. The solution was to create an archive containing only the miscellaneous files, and after that select "add to archive" from the explorer context menu while selecting both the PDF and JPGs. In the 7zip "Add to Archive" dialog, I chose the same compression settings and archive name as before.

This compressed the PDF and JPGs together - taking full advantage of their redundancy - then added them to the existing archive. It resulted in an overall 45% compression ratio, exactly what I was looking for.

Related Question