I have a problem with the following wget command:
wget -nd -r -l 10 http://web.archive.org/web/20110726051510/http://feedparser.org/docs/
It should download recursively all of the linked documents on the original web but it downloads only two files (index.html
and robots.txt
).
How can I achieve recursive download of this web?
Best Answer
wget
by default honours the robots.txt standard for crawling pages, just like search engines do, and for archive.org, it disallows the entire /web/ subdirectory. To override, use-e robots=off
,