I am sending compressed file with piping either local or from a network location. And on the receiving end, I would like to detect the type of compression and use the appropriate decompression utility (gzip, bzip2, xz..etc) to extract it. Commands looks as follows:
Local:
cat misteryCompressedFile | [compressionUtility] -d -fc > /opt/files/uncompressedfile
Over network:
ssh user@ipaddr "cat misteryCompressedFile" | [compressionUtility] -d -fc > /opt/files/uncompressedfile
One can tell the type of compression used even if there is no extension provided (e.g., .gz or .bz2) by looking at first few hex values of the file. For example, if I use xxd
to look at first few hex values of two compressed files, then I will 1f8b 0808
for gzip and 425a 6836
for bzip2.
However, to still use piping, how can I check the first incoming byte to select the proper decompression utility for the first of the file?
So if unknown compressed file is a gzip type, command will be this:
cat misteryCompressedFile | gzip -d -fc > /opt/files/uncompressedfile
and if unknown compressed file is bzip2 type, command will be this:
cat misteryCompressedFile | bzip2 -d -fc > /opt/files/uncompressedfile
Is it possible to make such decision with piping on the fly without having to download entire file and then make decision what to use for decompression?
Best Answer
Yes, you can do that in the pipeline, without having to read the whole file.
This first script fragment illustrates the mechanism by which we will intercept and inspect the header and pass it on. Notice that we print the header to stderr (>&2), yet it continues to appear in the output:
The key is using the
dd
the file conversion utility with a small block sizebs=1
.Expanding on that, this is a working solution. We'll use a temporary file to store the binary header. If it doesn't see one of the two 4-byte headers then it does nothing: