Download an article with cURL given a dynamic download link

curldownloadpdf

I'm trying to download this published journal article using cURL. It's the main page of an open access, so there should be not problems for anyone to see/download the article. I then extract the pdfurl, which keeps changing.

Then I try to download the pdf:

curl -L -o test.pdf "http://www.sciencedirect.com/science/article/pii/S0378426612000817/pdfft?md5=6a85f34def09dd5cfb1d1b8feded0d51&pid=1-s2.0-S0378426612000817-main.pdf"

but all the time it redirects me to the main page, which is then downloaded as a html page called "test.pdf".

Best Answer

curl seems to handle redirects differently from wget by default. The direct download URL will involve some redirects and it also requires the HTTP referer header to be set correctly after the first redirect (otherwise, you will get a HTML page).

First, you need to enable location redirects in curl with -L, and then enable curl's automatic handling of the referer header with --referer ";auto", that is,

curl -L --referer ";auto" -o test.pdf URL-for-direct-download

Related Solutions

How to download videos from Coursera with curl

As with any web service, the exact method changes a lot.

In the coursera-dl project, we try to do all the magic. The code involves jumping through some redirects and keeping cookies in place (which changes almost at every login), but you can run it with the --debug option to see how it calls curl or wget or your preferred downloader.

Disclaimer: I am a contributor to the project.

Curl download multiple files with brace syntax

Update: This has been implemented in curl 7.19.0. See @Besworks answer.

According to the man page there is no way to keep the original file name except using multiple O´s. Alternatively you could use your own file names:

curl http://{one,two}.site.com -o "file_#1.txt"

resulting in http://one.site.com being saved to file_one.txt and http://two.site.com being saved to file_two.txt.

or even multiple variables like

curl http://{site,host}.host[1-5].com -o "#1_#2"

resulting in http://site.host1.com being saved to site_1, http://host.host1.com being saved to host_1 and so on.

Best Answer

Related Solutions

How to download videos from Coursera with curl

Curl download multiple files with brace syntax

Related Question