I want to download a file (https://discovery.ucl.ac.uk/1575442/1/Palmisanoetal.zip) from a public server via the R command
temp <- tempfile()
utils::download.file(db_url, temp, method = 'curl')
This does not work on my Ubuntu 18.04.3 LTS (Bionic Beaver) system. I get the following error:
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
Error in utils::download.file(db_url, temp, method = "curl") :
'curl' call had nonzero exit status
I get the same error on the command line with curl (curl https://discovery.ucl.ac.uk/1575442/1/Palmisanoetal.zip
).
I did some experiments and googling and realized that I can access the file without any problems with my Browser (Chromium). My system/curl seems to lack a CA certificate that my browser has. I tried to determine which certificate this server is using with openssl s_client -showcerts -servername discovery.ucl.ac.uk -connect discovery.ucl.ac.uk:443
and added the result (QuoVadis EV SSL ICA G3) to my /etc/ssl/certs/ca-certificates.crt
file. This did not solve the problem.
I don't want to solve this with the curl --insecure
flag. I also don't have any control over https://discovery.ucl.ac.uk. I just want to access the file with R.
Best Answer
Curl is failing because that site is incorrectly configured
Certificates are used to sign other certificates, forming chains. A CA has a root certificate, which is trusted by operating systems and browsers. This root certificate is most commonly used to sign one or several intermediate certificates, which in turn are used to sign leaf certificates (that can not sign other certificates), which are what websites use.
Browsers and operating systems tend to carry only the root certificates, but to verify a leaf certificate (and establish a secure connection), a client needs the entire chain of certificates. In practice, that means that a website must not just supply its leaf certificate, it must also supply the used intermediate certificate. And
discovery.ucl.ac.uk
fails to do that.I'll show you.
Finding the problem
openssl
is a X509 / SSL swiss army knife that proves very useful here:Relevant for us here is the part after
Certificate chain
. It shows only one certificate.Feeding that
-----BEGIN CERTIFICATE-----
block throughopenssl x509 -text -noout
presents the certificate in a more readable form:Particularly relevant are these lines:
This shows that the provided certificate is a leaf certificate, for
discovery.ucl.ac.uk
, and that it is signed by some certificate (or rather entity) namedQuoVadis EV SSL ICA G3
. It will become clear later that this is not a root certificate (for now, the lack ofCA
in the name is a hint; andICA
commonly means intermediate certificate authority).The certificate @little_dog suggested you download is the missing intermediate certificate (NOT the root certificate!). You can see that from the following lines in his answer:
That certificate is the
QuoVadis EV SSL ICA G3
referenced by the leaf certificate above! But this certificate is not a root certificate. Root certificates are signed by themselves, but this certificate is signed byQuoVadis Root CA 2 G3
. Which, by the way, hasCA
in its name.So, where do we get the root certificate? Ideally, it should be in your browser or OS. For Debian at least (and probably Ubuntu as well), we can check with this monstrosity:
The first part of the command produces the certificate subjects ("names") of all system-trusted CA certificates, which we then search for the relevant QuoVadis root certificate. On my system it finds this, so the root certificate is present.
To recap
QuoVadis Root CA 2 G3
(on your system)QuoVadis EV SSL ICA G3
(missing)discovery.ucl.ac.uk
(provided by web server)Where should the intermediate cert come from? The answer to that is simple: the web server should provide it as well. Then the client can check the whole chain up until the root certificate (which comes from its trust store).
Getting it fixed
@little_dog's answer had you download the intermediate, and install that in your trust store, effectively turning that intermediate into a root cert for your system. That will work for this particular problem, for now, but there are drawbacks:
The real solution is getting the website fixed. Try reporting it to the discovery.ucl.ac.uk webmasters. Any decent web server admin should know exactly what's up when you report to them that the webserver isn't serving the intermediate CA certificate. If they need more information, this answer has plenty :)
There are also dozens of online services that will check any web server you specify and report a list of potential security issues and configuration problems. I tried a handful, and they all complained about the missing intermediate certificate. A few popular ones include:
But it worked in Chrome?
The story becomes more complicated here. There's a mechanism called Authority Information Access (AIA) that allows HTTP clients to query the CA for the intermediate certificate. You can see URLs provided for it in the textual certificate output earlier in this answer.
But not every client implements AIA fetching. Internet Explorer and Safari do. Chrome relies on the OS to do this (so yes on some platforms, no on others). Android does not. Firefox does not, because of privacy concerns. Curl and wget do not, as far as I can tell.
Complicating things further, browsers can cache intermediate certificates they encountered, so if you visit a website that correctly sends the
QuoVadis EV SSL ICA G3
intermediate with your browser, that certificate may be cached, and then website that otherwise wouldn't work suddenly would. Finally, browsers/OSes could come with (some) intermediate certificates pre-loaded, which would also hide this issue. At least Firefox is exploring this option.None of these things can be relied on though; plenty of clients don't do AIA fetching or pre-loading. So until these mechanisms become mandatory and universally supported, web servers will still need to include all the certificates to complete the chain.