Gzip: unexpected end of file with – how to read file anyway

gzip

I have a job on a batch system that runs extremely long and produces tons of output. So much actually that I have to pipe the standard output through gzip to keep the batch node from filling its work area and subsequently crashing.

longscript | gzip -9 > log.gz

Now, I would like to investigate the output of the job while it is still running.
So I do this:

gunzip log.gz

This runs very long, as it is huge file (several GB). I can see the output file being created while it is running and can look at it while it is being built.

tail log
> some-line-of-the-log-file
tail log
> some-other-line-of-the-log-file

However, ultimately, gzip encounters the end of the gzipped file. Since the job is still running and gzip is still writing the file, there is no proper footer yet, so this happens:

gzip: log.gz: unexpected end of file

After this, the extracted log file is deleted, as gzip thinks that the corrupted extracted data is of no use to me. I, however, disagree – even if the last couple of lines are scrambled, the output is still highly interesting to me.

How can I convince gzip to let me keep the "corrupted" file?

Best Answer

Apart from the very end of the file, you will be able to see the uncompressed data with zcat (or gzip -dc, or gunzip -c):

zcat log.gz | tail

or

zcat log.gz | less

or

zless log.gz

gzip will do buffering for obvious reasons (it needs to compress the data in chunks), so even though the program may have outputted some data, that data may not yet be in the log.gz file.

You may also store the uncompressed log with

zcat log.gz > log

... but that would be silly since there's obviously a reason why you compress the output in the first place.

Related Question