How to recursively download a web page and its linked content from a URL

downloadhtmlrecursivewget

I'd like to use wget to recursively download a web page. By recursively I mean all the other files it might point to should be downloaded as well. The reason for that is that I'd like to be able to reasonably see its content offline.

The webpage I need to download also links to other pages on the same website, and I wish I could download them as well.

Is it possible to do so with wget? Are there any flags for that?

Best Answer

Try:

wget -r -np -k -p http://www.site.com/dir/page.html

The args (see man wget) are:

  • r Recurse into links, retrieving those pages too (this has a default max depth of 5, can be set with -l).
  • np Never enter a parent directory (i.e., don't follow a "home" link and mirror the whole site; this will prevent going above ccc in your example).
  • k Convert links relative to local copy.
  • p Get page-requisites like stylesheets (this is an exception to the np rule).

If I remember correctly, wget will create a directory named after the domain and put everything in there, but just in case try it from an empty PWD.

Related Question