Linux – wget recursive limited to children of URL path

linuxwget

I want to download the following subdomain with the recursive option using wget:

www.example.com/A/B

So if that URL has links to www.example.com/A/B/C and www.example.com/A/B/D, these two should also be downloaded.

But I don't want anything outside the www.example.com/A/B subdomain to be downloaded. For example, if www.example.com/A/B/C has a link back to www.example.com, the page www.example.com should not be downloaded.

What wget command should I use?

Best Answer

Use the --no-parent option in wget:

--no-parent

Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.

Related Question