I'm trying to download this published journal article using cURL
. It's the main page of an open access, so there should be not problems for anyone to see/download the article. I then extract the pdfurl
, which keeps changing.
Then I try to download the pdf:
curl -L -o test.pdf "http://www.sciencedirect.com/science/article/pii/S0378426612000817/pdfft?md5=6a85f34def09dd5cfb1d1b8feded0d51&pid=1-s2.0-S0378426612000817-main.pdf"
but all the time it redirects me to the main page, which is then downloaded as a html page called "test.pdf".
Best Answer
curl
seems to handle redirects differently fromwget
by default. The direct download URL will involve some redirects and it also requires the HTTP referer header to be set correctly after the first redirect (otherwise, you will get a HTML page).First, you need to enable location redirects in
curl
with-L
, and then enablecurl
's automatic handling of the referer header with--referer ";auto"
, that is,