Once a file is gzipped, is there a way of quickly querying it to say what the uncompressed file size is (without decompressing it), especially in cases where the uncompressed file is > 4GB in size.
According to the RFC https://tools.ietf.org/html/rfc1952#page-5 you can query the last 4 bytes of the file, but if the uncompressed file was > 4GB then the value just represents the uncompressed value modulo 2^32
This value can also be retrieved by running gunzip -l foo.gz
, however the "uncompressed" column just contains uncompressed value modulo 2^32
again, presumably as it's reading the footer as described above.
I was just wondering if there is a way of getting the uncompressed file size without having to decompress it first, this would be especially useful in the case where gzipped files contain 50GB+ of data and would take a while to decompress using methods like gzcat foo.gz | wc -c
EDIT: The 4GB limitation is openly acknowledged in the man
page of the gzip
utility included with OSX (Apple gzip 242
)
BUGS
According to RFC 1952, the recorded file size is stored in a 32-bit
integer, therefore, it can not represent files larger than 4GB. This
limitation also applies to -l option of gzip utility.
Best Answer
I believe the fastest way is to modify
gzip
so that testing in verbose mode outputs the number of bytes decompressed; on my system, with a 7761108684-byte file, I getTo modify gzip (1.6, as available in Debian), the patch is as follows: