Curl and Google Image

curl

I tried to scrape Google image search result page with curl from Terminal, but it doesn't give me an actual html source that I can view with "View Page Source" in Firefox. I tried both "curl [url]" and "curl -L [url]". Both gave me a short html source that includes "Your client does not have permission to get URL " "from this server". How can I get the html source that I can get in Firefox with a shell script?

Part of the short html I got in Terminal said this.

Please see Google's Terms of Service posted at
http://www.google.com/terms_of_service.html

If you believe
that you have received this response in error, please report
your problem. However, please make sure to take a look at our Terms of
Service (http://www.google.com/terms_of_service.html). In your email,
please send us the entire code displayed below.

Best Answer

The error message contains a broken link, but Google's current terms of service say:

Do not misuse our Services, for example, do not interfere with our Services or try to access them using a method other than the interface and the instructions that we provide.

(emphasis mine)

They're refusing your request for some reason. It could be that they've seen suspicious activity from your IP address, but it's most probably that they've spotted that you're using curl instead of a regular browser (in which you would see the adverts).

You could make curl imitate such a browser, by providing a common user-agent (eg. from http://www.browser-info.net/useragents) to the -A option, but that would still be violating the ToS.

Related Question