UPDATE: It seems the core issue with images not loading stemmed from the way the EFF’s HTTPS Everywhere plugin/extension handled some Tumblr URLs. The developer’s were notified and a fix appears to be in place. This answer basically breaks down the detective work done to uncover the issue as outlined by the initial question and could prove useful for further debugging/diagnosis if a similar issue appears in the future.
EDIT: The larger content about image leeching seems invalid. So will add a new idea at the top and leave the image leeching info at the bottom just in case it is useful to someone.
Amazon CloudFront CDN Ideas
Okay, using the URLs you have provided—as well as some of my real world experience with Amazon CloudFront CDN setups—I think I discovered something. It seems like Tumblr’s Amazon CloudFront CDN config is choking for some reason. Here is why I think that is the case.
Let’s take this example URL:
http://36.media.tumblr.com/d685b02fdf2d3f167c22d9a97e27e87a/tumblr_nfpq5qPZ4v1tognpro1_1280.png
Now let’s run curl -I
to get header information on that file:
curl -I http://36.media.tumblr.com/d685b02fdf2d3f167c22d9a97e27e87a/tumblr_nfpq5qPZ4v1tognpro1_1280.png
The output for that would be something like this:
HTTP/1.1 200 OK
Content-Type: image/png
Content-Length: 782141
Connection: keep-alive
Accept-Ranges: bytes
Cache-Control: max-age=1209600
Date: Thu, 05 Mar 2015 02:15:44 GMT
Server: nginx
X-Cache: Miss from cloudfront
Via: 1.1 7e54fc06cd70e4752fe050bbe5c130be.cloudfront.net (CloudFront)
X-Amz-Cf-Id: QyIUyzfaJJN3PU_xWkW0P-D2kjg_1cVenKzFAoY2PubgZQlBHWorZQ==
Now the things to pay attention to here are the Date
(the date and time of the file on the CloudFront endpoint) and X-Cache
(Amazon content delivery status) headers. Typical behavior on Amazon CloudFront is the first access will convey a “Miss from cloudfront” and then if you do another curl -I
right away afterwards there should be a Hit from cloudfront
.
But that’s not what I saw just now. Here is a breakdown of the Date
and X-Cache
status of a bunch of accesses I made:
Date: Thu, 05 Mar 2015 02:19:37 GMT
= X-Cache: Miss from cloudfront
Date: Thu, 05 Mar 2015 02:19:39 GMT
= X-Cache: Miss from cloudfront
Date: Thu, 05 Mar 2015 02:19:44 GMT
= X-Cache: Miss from cloudfront
Date: Thu, 05 Mar 2015 02:19:50 GMT
= X-Cache: Miss from cloudfront
Date: Thu, 05 Mar 2015 02:19:50 GMT
= X-Cache: Hit from cloudfront
Date: Thu, 05 Mar 2015 02:19:50 GMT
= X-Cache: Hit from cloudfront
Date: Thu, 05 Mar 2015 02:19:50 GMT
= X-Cache: Hit from cloudfront
The reason why there are multiple items with the same exact data which are Hit from cloudfront
near the end is because that is what happens on a CDN: If the endpoint of the CDN has the file, then Date
correlates to the actual creation/modification date of the file that endpoint has.
You notice the first four access are seconds apart, with different dates/times and all of them are Miss from cloudfront
, right? That means the CDN endpoint is just echoing back that there was an attempt to access that file at those times and all attempts were misses.
So my armchair assessment of this is that Tumblr’s systems are not keeping up with the Amazon CloudFront CDN or the Amazon CloudFront CDN is not keeping up with Tumblr. But in some way, things are amiss on their server side. And since this is a CDN, someone accessing the files in one location might not notice an issue while someone else in another location would have issues viewing the image.
Which is all to say, I don’t think this can easily be cleared up on the client side.
EDIT: So the original poster added some new URLs, and this still points to a server-side issue, but I just wanted to post the details for the record.
EdgeCast & Highwinds CDN Ideas
So the original poster added more specifics, so here are more details based on the blog post that is being used as an example:
http://claystorks.tumblr.com/post/112741831192/soulmister-claystorks-windspeare-explain
And these image URLs are provided as examples of URLs in that post:
https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_500.png
https://gs1.wac.edgecastcdn.net/8019B6/data.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_1280.png
And those two image URLs do indeed fail. But from my side—looking at the original soure code of the blog post from Brooklyn, New York, USA—I am not seeing those EdgeCast (gs1.wac.edgecastcdn.net
) URLs. Rather, these are the URLs I am seeing:
http://41.media.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_500.png
http://41.media.tumblr.com/76493f424ebb3b62d6de43e53643180a/tumblr_nkps82DdCh1sjn35qo1_1280.png
So my first thought is why is the original poster seeing those EdgeCast (gs1.wac.edgecastcdn.net
). But then if I do a traceroute to the 41.media.tumblr.com
I see that is a server managed by Highwinds (!?!?). In contrast the initial URLs passed on by the original user are using the 36.media.tumblr.com
hostname and you can see they are managed by Amazon CloudFront CDN servers.
Which is all to say—which I said before—all of this seems to be a server side issue with Tumblr and their CDN management. But from my side—in Brooklyn, New York, USA—I am clearly seeing content being delivered as expected from Highwinds CDN servers as well as Amazon CloudFront CDN servers. Where these EdgeCast URLS are coming from or how/why they are then failing is out of anyone’s control on the client side. This would definitely be something to contact Tumblr tech staff about because there is no way a desktop end-user could resolve this.
Image Leeching Ideas
Might not be relevant anymore, but here for reference.
You stating this give me a clue:
Using wget
on the images' direct links works.
Many sites have rules in place—usually set via Apache—that prevent image leeching. More details on how those rules work are provided here and is summarized as this:
Using .htaccess, you can disallow hot linking on your server, so those
attempting to link to an image or CSS file on your site, for example,
is either blocked (failed request, such as a broken image) or served a
different content (ie: an image of an angry man).
Based on your description—and the fact you can access the images via wget
—leads me to believe that the images you are having issues with are not hosted on Tumblr by users, but rather images that are placed on a Tumblr blog but actually hosted on another site.
When standard image leeching procedures are put in place, viewing an embedded image on one site that is hosted on another site—which blocks leeching—would result in a broken image link or perhaps a “Stop Leeching!” image being returned. This is because basic anti-leeching rules—such as those in that example page—crosscheck image referrers to make sure the page requesting the image matches the domain hosting the image.
So when you are accessing the image via wget
you are accessing the image directly. So image leeching rules would not kick in. Thus you can get the image via wget
but not when it is embedded in another page.
Best Answer
Solution: disable VPN.
I was using VPN to work remotely on a client's system. Apparently that means that all my traffic goes through their network and restictions like website bans (imgur) are applied. The images I COULD see were probably from my cache.