I am trying to mirror a blogger site so that I can have an exact copy of it
on my filesystem to view. I have tried issuing the following command on Linux:
wget -r -k -x -e robots=off --wait 1 http://your.site.here.blogspot.com/
I have even tried using the -D flag to list a comma-separated list of domanins
to follow (would prefer to just follow any domain though without having to
specify all of them). I have even tried changing the .com part of the URL
to the top-level domain for my country (.it) (without which for some reason
I don't understand and would like to know, wget retrieves only index.html
and no other page, perhaps someone here can explain why).
So, even when I do a
wget -r -k -x -e robots=off --wait 1 http://your.site.here.blogspot.it/
several HTML and also the favicon.ico are downloaded but none of the .png
images from blogger are downloaded. Why is this so and how can I get wget
to work properly. I've read the wget man page but had no luck.
Thanks.
Best Answer
As
jayhendren
suggested, I had tried listing the domain bp.blogspot.com on the list following the -D flag. However what I forgot to do is add the -H flag. Why wget requires the extra -H flag to be added separately from the list of domains to follow with the -D flag is unclear to me, but it works. Here is the command I ultimately specified to mirror the Blogger site including the images served from the external domain:Note: this works from Italy. Convert .it to .com or to whatever other top-level domain if you want this to work from your location.
Regards.