Forum Moderators: phranque
[translate.google.ru...]
[translate.google.bg...]
I get the feeling there's potentially more harm going on there than good. How to stop visits from those Google translations?
http.//[google IP]/translate_c?hl=ru&sl=en&tl=ru&u=http://www.example.com/
Google do add the text "(via translate.google.com)" to the user-agent, however this always says google.com regardless of regional variation used. The requests will come from a Google IP, since they're essentially running a proxy. I don't know if there's a particular IP range allocated to their translate proxies.
So, you could easily block all translations, but not regional variations. I suppose you could check both the UA and the browser language to see if it was from a country you wanted block. For extra accuracy you could also check that the request came from a .google.com IP, but I'm not sure that would be necessary.
public bool IsViaTranslate
{
get
{
if (HasBlankUserAgent == false)
{
if (Headers["User-Agent"].IndexOf("via translate") > -1)
{
return true;
}
if (Headers["User-Agent"].IndexOf("via babelfish") > -1)
{
return true;
}
if (Headers["User-Agent"].IndexOf("Google Wireless Transcoder") > -1)
{
return true;
}
}
return false;
}
}
As for the other two. I usually return a 403, with no content. My sites I deal with only cater to US Residents, who speak English. So removing these services does not hurt me at all. But would be careful of doing this on other sites, depending on who there target audience's are, and where they are located.
Also blocking these services stops another way people can scrap data from a site. Reducing the ways someone can get to your content, reduces your risk of being successfully scraped.