Wget Recursive Download – How to Download Recursively with Wget

wget

I have a problem with the following wget command:

wget -nd -r -l 10 http://web.archive.org/web/20110726051510/http://feedparser.org/docs/

It should download recursively all of the linked documents on the original web but it downloads only two files (index.html and robots.txt).

How can I achieve recursive download of this web?

Best Answer

wget by default honours the robots.txt standard for crawling pages, just like search engines do, and for archive.org, it disallows the entire /web/ subdirectory. To override, use -e robots=off,

wget -nd -r -l 10 -e robots=off http://web.archive.org/web/20110726051510/http://feedparser.org/docs/
Related Question