Mirror a blog with wget

mirroringwget

I am trying to mirror a blog, eg www.example.com, with wget.

I use wget with the following options (shell variables are substituted correctly):

wget -m -p -H -k -E -np \
    -w 1 \
    --random-wait \
    --restrict-file-names=windows \
    -P $folder \
    -Q${quota}m \
    -t 3 \
    --referer=$url \
    -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' \
    -e robots=off \
    -D $domains
    -- $url

The blog contain images that reside on other domains.

Even though I have specified the -p option (download linked page assets) these images are not being downloaded unless I specify each domain explicitly in the -D option.

If I omit the -D option then wget will follow every link outside www.example.com and download the whole internet.

Is it possible for wget to just follow every link under www.example.com and download each page’s required assets, whether those reside on the same domain or not without me having to specify each domain explicitly?

Best Answer

No, the only way is to specify the domains that you want wget to follow using -D or --domains=[domain list] (in the form of comma separated list)

Related Question