I try to download a file with wget
and curl
and it is rejected with a 403 error (forbidden).
I can view the file using the web browser on the same machine.
I try again with my browser's user agent, obtained by http://www.whatsmyuseragent.com. I do this:
wget -U 'Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0' http://...
and
curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0' http://...
but it is still forbidden. What other reasons might there be for the 403, and what ways can I alter the wget
and curl
commands to overcome them?
(this is not about being able to get the file – I know I can just save it from my browser; it's about understanding why the command-line tools work differently)
update
Thanks to all the excellent answers given to this question. The specific problem I had encountered was that the server was checking the referrer. By adding this to the command-line I could get the file using curl
and wget
.
The server that checked the referrer bounced through a 302 to another location that performed no checks at all, so a curl
or wget
of that site worked cleanly.
If anyone is interested, this came about because I was reading this page to learn about embedded CSS and was trying to look at the site's css for an example. The actual URL I was getting trouble with was this and the curl
I ended up with is
curl -L -H 'Referer: http://css-tricks.com/forums/topic/font-face-in-base64-is-cross-browser-compatible/' http://cloud.typography.com/610186/691184/css/fonts.css
and the wget is
wget --referer='http://css-tricks.com/forums/topic/font-face-in-base64-is-cross-browser-compatible/' http://cloud.typography.com/610186/691184/css/fonts.css
Very interesting.
Best Answer
A HTTP request may contain more headers that are not set by curl or wget. For example:
key=val
, you can set it with the-b key=val
(or--cookie key=val
) option forcurl
.curl
option for this is-e URL
and--referer URL
.curl
with the-u user:password
(or--user user:password
) option.Mozilla
, or containWget
orcurl
).You can normally use the Developer tools of your browser (Firefox and Chrome support this) to read the headers sent by your browser. If the connection is not encrypted (that is, not using HTTPS), then you can also use a packet sniffer such as Wireshark for this purpose.
Besides these headers, websites may also trigger some actions behind the scenes that change state. For example, when opening a page, it is possible that a request is performed on the background to prepare the download link. Or a redirect happens on the page. These actions typically make use of Javascript, but there may also be a hidden frame to facilitate these actions.
If you are looking for a method to easily fetch files from a download site, have a look at plowdown, included with plowshare.