I don't want to call the whole Firefox
/Chrome
/Opera
… to find out the meaning of a word with the Google translate
, so I decided to write a shell script
which uses wget
to get the content of translate.google.hu
and gets the translation from the downloaded file. But I get stuck at the first step.
E.g. if I want to find out the translation (from eng to hun) of word 'Enthusiast' I would try
$ wget https://translate.google.hu/?hl=hu&tab=wT#en/hu/Enthusiast
but wget
doesn't download the page that I get if I type
into my browser's address bar. Instead of that I got the following:
solid@skynet:~> wget https://translate.google.hu/?hl=hu&tab=wT#en/hu/Enthusiast
[1] 2143
solid@skynet:~> --2016-05-02 08:23:24-- https://translate.google.hu/?hl=hu
Resolving translate.google.hu (translate.google.hu)... 216.58.209.163, 2a00:1450:400d:806::2003
Connecting to translate.google.hu (translate.google.hu)|216.58.209.163|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2016-05-02 08:23:24 ERROR 403: Forbidden.
And I'm waiting, and waiting and waiting… finally I press ENTER:
[1]+ Exit 8 wget https://translate.google.hu/?hl=hu
Could someone solve my problem?
(I'm using OpenSuse Linux 13.2)
UPDATE According to [Alexander Batischev] I have tried
$ wget 'https://translate.google.hu/?hl=hu&tab=wT#en/hu/Enthusiast'
It solved the problem of running in background, and passed to wget the proper address (instead of creating local variable 'tab') ^.^'
But I get the same error until the Forbidden
:
$ wget 'https://translate.google.hu/?hl=hu&tab=wT#en/hu/Enthusiast'
--2016-05-03 14:57:48-- https://translate.google.hu/?hl=hu&tab=wT
Resolving translate.google.hu (translate.google.hu)... 216.58.209.163, 2a00:1450:400d:806::2003
Connecting to translate.google.hu
(translate.google.hu)|216.58.209.163|:443... connected. HTTP request
sent, awaiting response... 403 Forbidden
2016-05-03 14:57:48 ERROR 403: Forbidden.
Best Answer
When you run this command:
what really happens is:
wget
with URL of "https://translate.google.hu/?hl=hu";wget
will run in background;tab
is defined and gets a valuewT#en/hu/Enthusiast
.The reason for all this is that shell reserves some characters, ampersand included, for special things. To prevent shell from interpreting ampersand, use quotes:
With that resolved, you're still getting "Forbidden" response.
It's a race between clients who want to bypass the interface and the providers who don't want to let them. Google gets its revenue from ads, and it knows that your script won't display any. Thus, they're taking measures to forbid any access but via browser.
The only people who can tell you precisely why you have been "Forbidden" are Google engineers. That said, the easier of the techniques are well-known.
One of the easiest ones are blocking by "user agent string". This is a string identifying the make and version of the client (your browser or wget). It looks like this:
The client sends this string with every request. The server can use it to tweak the appearance of the result, or to deny access, like in your case.
wget
accepts--user-agent
flag where you can specify the user agent string to send. To imitate your own browser, you can type "what is my user agent" into that same Google and copy the string from there :) Then, just pass it towget
like so: