I would like to crawl links under www.website.com/XYZ and only download the links that is under www.website.com/ABC.
I am using the following wget command to get the files I want:
wget -I ABC -r -e robots=off --wait 0.25 http://www.website.com/XYZ
This works perfectly when I use wget 1.13.4. But the problem is I have to use this command on a server which has wget 1.11 and when I use the same command, it ends up downloading additional domains such as:
www.website.de
www.website.it
...
How can I avoid this problem? I tried using
--exclude domains=www.website.de,www.website.it
however it kept downloading those domains.
Also note that I can't use --no-parent
since the files I want is in upper level (I want files under website.com/ABC by crawling links under website.com/XYZ).
Any hints?
Best Answer
This is wrong:
The right way is:
From the wget man page: