How to tell wget to download files with url encoded names

character encodingwget

I'm trying to download an entire website using wget and this is the command I use:

wget --recursive --no-clobber --page-requisites --convert-links --domains example.com --no-parent  http://www.example.com/en/

It's working just fine but there is one problem. There files (mainly images) that their name contains Chinese characters like this:

http://www.example.com/path/to/首页主KV3.jpg

After downloading the file has been save with this name:

??%96页主KV3.jpg

And it's addressed in the html page like this and therefore issuing a 404 error:

�%2596页主KV3.jpg

I wonder how can I prevent this inconsistency?!

Best Answer

I fought with this today aswell.

In my case the problem was with german letters like "ä,ö,ü"...

I fixed it by setting ALL my language settings to UTF-8.

You can see a tutorial here:

https://perlgeek.de/en/article/set-up-a-clean-utf8-environment

Related Question