Wget and curl saving web page as gibberish (encrypted?)


When I download https://www.wired.com/category/security/ using either wget or curl, the result is gibberish/encrypted.

Is it possible (and if so what is the correct way) to save that web page (unencrypted / plain HTML) from the command line?

Executive summary:

It seems like the downloaded file is compressed and you should decompress it.

Detailed answer


wget https://www.wired.com/category/security/

Result with a downloaded index.html file

Executing file command on the download file shows:

$ file index.html 
index.html: gzip compressed data, from Unix

Renaming the file and decompressing it turn it to be HTML document

$ mv index.html index.html.gz
$ gunzip index.html.gz 
$ file index.html 

index.html: HTML document, UTF-8 Unicode text, with very long lines, with overstriking

Extra Info - why wget downloaded a compressed file?

As explained in How To Optimize Your Site With GZIP Compression:

Instead of downloading a large text file, modern HTTP server/clients uses Compressed HTTP Response which reduce the size of the transfered files.

