I'm trying to download an entire website using wget
and this is the command I use:
wget --recursive --no-clobber --page-requisites --convert-links --domains example.com --no-parent http://www.example.com/en/
It's working just fine but there is one problem. There files (mainly images) that their name contains Chinese characters like this:
After downloading the file has been save with this name:
??%96页主KV3.jpg
And it's addressed in the html page like this and therefore issuing a 404 error:
�%2596页主KV3.jpg
I wonder how can I prevent this inconsistency?!
Best Answer
I fought with this today aswell.
In my case the problem was with german letters like "ä,ö,ü"...
I fixed it by setting ALL my language settings to UTF-8.
You can see a tutorial here:
https://perlgeek.de/en/article/set-up-a-clean-utf8-environment