If you just wanted to insert another line on top, it would be simple.
echo some line | gzip > newfile.gz
cat newfile.gz oldfile.gz > result.gz
gzip allows concatenation. If you don't mind it reporting a wrong uncompressed filesize if you just look at the file w/o uncompressing it, that is. Also some programs can not handle such files, WinRAR for example.
To get closer to what you actually want, the question is whether your gzip file is made up of blocks that function entirely independent from one another, and if so, how to find the block boundary.
If you knew you wanted to do this beforehand and created the gzip by concatenating two independent gzip files in the first place, it would be easy to solve; however on arbitrary gzip files, if it can be done at all, it would require more in depth knowledge of the gzip file format.
I remember there was such a program for bzip2 (but I forgot its name), it created a bzip2 block map that would allow you direct access to specific offsets without uncompressing everything that came before it.
On the bottom line, though, most people just recompress. You likely won't be able to avoid re-writing the entire file anyhow and writing files is usually slower than gzip can compress data, so - if you managed to pull it off, you'd probably save some CPU cycles, but no time.
Not a solution to your gzip
question but... don't use tail
to get rid of the first line, it's probably very inefficient compared to a sed 1d
or whatever. No need to count all lines of a file just to get rid of the first one.
Best Answer
This is what you asked for. But it may not be what you really want. Use at your own risk.
If the 420GB file is stored on a filesystem with sparse file and punch hole support (e.g.
ext4
,xfs
, but notntfs
), it would be possible to read a file and free the read blocks usingfallocate --punch-hole
. However, if the process is cancelled for any reason, there may be no way to recover since all that's left is a half-deleted, half-uncompressed file. Don't attempt it without making another copy of the source file first.Very rough proof of concept:
urandom.img.gz
file occupies 76% of available space, so it can't be uncompressed directly. Pipe uncompressed result tomd5sum
so we can verify later:Uncompress while hole punching: (this is very rough without any error checking whatsoever)
Result:
The checksum matches, the size of the source file reduced from 6GB to 0 while it was uncompressed in place.
But there are so many things that can go wrong... better don't do it at all or if you really have to, at least use a program that does saner error checking. The loop above does not guarantee at all that the data was read and processed before it gets deleted. If
dd
orgunzip
returns an error for any reason,fallocate
still happily tosses it... so if you must use this approach better write a sanerread-and-eat
program.