Linux – How to get uncompressed content when using recursive wget

command linecompressionlinuxwget

i am downloading many single pages with all static content (js, css, imgs…) via wget recursive. It showed up, that served content, which was compressed (gzip), is stored by wget in compressed form. But I want uncompressed form. It is not easy to imagine writing another script which would go through dirs recursively and trying to uncompress what is possible. So is there any way to get it uncompressed?

CMD:

wget -E -H -k -K -p https://some.example

even –header='Accept-Encoding: ' (telling server to not use gzip) did not help.

Thank you for advices 🙂

Best Answer

  1. Use httrack instead of wget
  2. Setup decompression proxy. Squid with some 3rd party plugin should be able to do that. I'm more familiar with Java so I used LittleProxy, overrode method getMaximumResponseBufferSizeInBytes() and that was it. I wrote about the later here.

EDIT: Wget 1.19.2 introduces Add gzip Content-Encoding decompression (and it works)

Related Question