I had a large (~60G) compressed file (tar.gz
).
I used split
to break it into 4 parts and then cat
to join them back together.
However, now, when I am trying to estimate the size of the uncompressed file, it turns out it is smaller than the original? How is this possible?
$ gzip -l myfile.tar.gz
compressed uncompressed ratio uncompressed_name
60680003101 3985780736 -1422.4% myfile.tar
Best Answer
This is caused by the size of the field used to store the uncompressed size in gzipped files: it’s only 32 bits, so
gzip
can only store sizes of files up to 4 GiB. Anything larger is compressed and uncompressed correctly, butgzip -l
gives an incorrect uncompressed size.So splitting the tarball and reconstructing it hasn’t caused this, and shouldn’t have affected the file — if you want to make sure, you can check it with
gzip -tv
.See Fastest way of working out uncompressed size of large GZIPPED file for more details, and the
gzip
manual: