Gzip decompress on file with other extension

filenamesgzip

Is it possible to use gzip to decompress a gzipped file, without the gz extension, and without moving the file?

Best Answer

You can pass the -S option to use a suffix other than .gz.

gunzip -S .compressed file.compressed

If you want the uncompressed file to have some other name, run

gzip -dc <compressed-file >uncompressed-file
gunzip <compressed-file >uncompressed-file

(these commands are equivalent).

Normally unzipping restores the name and date of the original file (when it was compressed); this doesn't happen with -c.

If you want the compressed file and the uncompressed file to have the same name, you can't do it directly, you need to either rename the compressed file or rename the uncompressed file. In particular, gzip removes and recreates its target file, so if you need to modify the file in place because you don't have write permission in the directory, you need to use -c or redirection.

cp somefile /tmp
gunzip </tmp/somefile >|somefile

Note that gunzip <somefile >somefile will not work, because the gunzip process would see a file truncated to 0 bytes when it starts reading. If you could invoke the truncation, then gunzip would feed back on its own output; either way, this one can't be done in place.

Related Solutions

Why do tar and gzip files usually have a file extension

Originally, on unix systems, the extensions on file names were a matter of convention. They allowed a human being to choose the right program to open a file. The modern convention is to use extensions in most cases; common exceptions are:

Only regular files have an extension, not directories or device names. The mere fact of being a directory or device is enough file type indication.
Executables that are meant to be invoked directly don't have an extension. The mere fact of being executable is enough information for the user, and the kernel doesn't care about file names.
Files beginning with a word in all caps are often text files, e.g. README, TODO. Sometimes there is an additional part that indicate a subcategory, e.g. INSTALL.linux, INSTALL.solaris.
Files whose name begins with a dot are configuration or state files of a particular application, and often don't have an extension, e.g. .bashrc, .profile, .emacs.
There are a few traditional cases, e.g. Makefile.

(These are common cases, not hard-and-fast rules.)

Most binary file formats also contain some kind of header that describes properties of the file, and typically allows the file format to be identified through magic numbers. The file command looks at this information and shows you its guesses.

Sometimes the file extension gives more information than the file format, sometimes it's the other way round. For example many file formats consist of a zip archive: Java libraries (.jar), OpenOffice documents (.odt, …), Microsoft Office document (.docx, …), etc. Another example is source code files, where the extension indicates the programming language, which can be difficult for a computer to guess automatically from the file contents. Conversely, some extensions are wildly ambiguous, for example .o is used for compiled code files (object files), but inspection of the file contents usually easily reveals what machine type and operating system the object file is for.

An advantage of the extension is that it's a lot faster to recognize it than to open the file and look for magic sequences. For example completion of file names in shells is almost always based on the name (mainly the extension), because reading every file in a large directory can take a long time whereas just reading the file names is fast enough for a Tab press.

Sometimes changing a file's extension can allow you to say how a file is to be interpreted, when two file formats are almost, but not wholly identical. For example a web server might treat .shtml and .html differently, the former undergoing some server-side preprocessing, the latter being served as-is.

In the case of gzip archives, gzip won't recompress files whose name ends in .gz, .tgz and a few other extensions. That way you can run gzip * to compress every file in a directory, and already compressed files are not modified.

How to create a gzip file without .gz file extension

This does NOT work:

# echo Hello World > example.txt
# gzip < example.txt > example.txt # WRONG!
# file example.txt
example.txt: gzip compressed data, from Unix, last modified: Thu Mar 21 19:45:29 2013
# gunzip < example.txt
<empty file>

This is a race condition:

# echo Hello World > example.txt
# dd if=example.txt | gzip | dd of=example.txt # still WRONG!
# gunzip < example.txt 
Hello World # may also be empty

The problem is that the > example.txt (or dd of=example.txt for that matter) kills the file before the other process has the chance to read it. So there is no obvious solution, which is why you should stick to mv.

There are a number of ways you could cheat. You can open the file, then unlink it - the file will continue to exist until you close it - and then create a new file with the same name and write the gzipped data to that. However I do not know an obvious way to coerce bash to use that, and even if I did, my answer would still be:

Don't even do it.

If gzip fails for any reason, or any problem occurs, like you running out of space while gzipping (because other processes are writing, or gzip result is larger than the input - which happens for random data - etc.), you just lost your file. Congratulations!

Create a separate file and mv on success. That's the simplest, easy to understand, and most reliable method you will ever find.

Best Answer

Related Solutions

Why do tar and gzip files usually have a file extension

How to create a gzip file without .gz file extension

Related Question