When I use a browser to save this page:
http://maine.craigslist.org/fuo/
the links are saved in such a way that they link to content.
like this:
href="http://maine.craigslist.org/fuo/4323535885.html"
when I try to use wget, the links are
$ wget --no-parent maine.craigslist.org/fuo
saved as:
href="/fuo/4305913395.html"
I have tried options:
--spider
--page-requisites
--user-agent="Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:27.0) Gecko/20100101 Firefox/27.0"
but the links all come out without the url attached.
I have the rest of the script working, to parse out my location, and make a new list of links for furniture in my area. But I cannot figure out how to get the same output as I get when I save the page via firefox.
I thought using wget would be simplest. Perhaps that isnt right. If I can achieve the same effect using some other software, so long as i can write a script to make it work, I will be happy.
Best Answer
The
--convert-links
option should do what you're looking for:More information about this option and what it does is below (copied from
man wget
):