Why does a zip file appear larger than the source file especially when it is text

7-zipcompressionzip

I have a text file that is 19 bytes in size and having compressed the file using zip and 7zip, it appears to be larger. I had a read of the question on Why is a 7zipped file larger than the raw file? as well as Why doesn't ZIP Compression compress anything? but considering the file is not already compressed I would have expected further compression. Attached is a screenshot.

enter image description here

EDIT0

I took the example further by creating a file that contained random data as follows dd if=/dev/urandom of=sample.log bs=1G count=1 and attempted to compress the file using both zip and 7zip however there were no compression gains. Why is that?

enter image description here

Best Answer

As @kinokijuf said, there is a file header. But to expand upon that there are a few other things to understand about file compression.

The zip header contains all the necessary info for identifying the file type (the magic number), zip version and finally a listing of all the files included in the archive.

Your file probably wasn't compressed anyways. If you run unzip -l example.zip you will probably see that the file size is unchanged. 19 bytes would probably generate more overhead than would be saved if it were compressible at all by DEFLATE (the main compression method used by zip).

In other cases, PNG images for example, they are already compressed so zip will just store them. DEFLATE won't bother compressing anything already compressed.

If on the other hand you had a lot of text files, and their size was more than a few kilobytes each, you would get great savings by putting them all into a single zip archive.

You will get your best savings when compressing very regular, formatted data, like a text file containing a SQL dump. For example, I once had a dump of a small SQL database at around 13MB. I ran zip -9 dump.sql dump.zip on it and ended up with around a 1MB afterwards.

Another factor is your compression level. Many archivers by default will only compress at mid-level, going for speed over reduction. When compressing with zip, try the -9 flag for maximum compression (I think the 3.x manual says that compression levels are only supported by DEFLATE at this time).

TL;DR

The overhead for the archive exceeded any gains you may have gotten for compressing the file. Try putting larger text files in there and see what you get. Use the -v flag when zipping to see your savings as you go.