Download all source files for a webpage

command linecurlwebwget

I want to download the source files for a webpage which is a database search engine. Using curl I'm only able to download the main html page. I would also like to download all the javascript files, css files, and php files that are linked to the webpage and mentioned in the main html page. Is this possible to do using curl/wget or some other utility?

Best Answer

First of all, you should check with the website operator that this an acceptable use of their service. After that, you can do something like this:

wget -pk example.com

-p gets the requisites to view the page (the Javascript, CSS, etc). -k converts the links on the page to those that can be used for local viewing.

From man wget:

-p, --page-requisites

This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.

[...]

-k, --convert-links

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

Related Question