Forum Moderators: phranque

Message Too Old, No Replies

Blocking JAVA Bots

         

Tonerman

4:11 pm on Oct 6, 2006 (gmt 0)

10+ Year Member



There is a closed thread here:

[webmasterworld.com...]

about blocking java based bad bots but allowing Google and Yahoo java based bots access. I just implemented the rules posted and I am blocking Google bots from newer IP addresses like [64.233.172.35...]

The original code was created by jdMorgan. Have you updated the routine you created in 2005? I sure could use it if you have! Tonerman

jdMorgan

4:21 pm on Oct 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just add another RewriteCond line to allow ^64\.233\.172\.

That should take care of it.

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

Jim

Tonerman

12:24 am on Oct 7, 2006 (gmt 0)

10+ Year Member



Thank you for your help jd. We implemented your change. Do you think there are any other ip addresses we need to allow?

Your code sure does kill the java site scrapers! Haven't seen one in the logs since we turned it on this AM. Very kind of you to share it with others. Thanks, Tonerman

jdMorgan

2:24 am on Oct 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Do you think there are any other ip addresses we need to allow?

No, as a matter of fact, I appreciate you posting this new IP address range, because I was unaware of it; One effect of blocking common scrapers *seems* to be that the more you block, the fewer attempts your site is subjected to over time. I mean, I think there are target site lists, and once you take action, they take you off the 'easy targets' list, probably because some of the hard targets report their activities to their hosting providers and ISPs... :)

Jim

Tonerman

3:51 pm on Oct 7, 2006 (gmt 0)

10+ Year Member



Jim,
Here are all the Google datacenter IP addresses:
216.239.37.104
216.239.39.104
216.239.53.104
216.239.57.104
216.239.59.104
216.239.63.104
64.233.161.104
64.233.167.104
64.233.171.104
64.233.179.104
64.233.183.104
64.233.185.104
64.233.187.104
64.233.189.104
66.102.11.104
66.102.7.104
66.102.9.104
66.249.89.104
66.249.93.104
72.14.207.104

Tom

Tonerman

10:24 pm on Oct 8, 2006 (gmt 0)

10+ Year Member



Jim:

There arwe 600 total known Google Data Center IP addresses. Using only the first three parts of the address the 600 ip addresses boiled down to the following:

^64\.233\.161\.
^64\.233\.163\.
^64\.233\.167\.
^64\.233\.169\.
^64\.233\.171\.
^64\.233\.179\.
^64\.233\.183\.
^64\.233\.185\.
^64\.233\.187\.
^64\.233\.189\.
^66\.102\.1\.
^66\.102\.7\.
^66\.102\.9\.
^66\.102\.11\.
^66\.249\.81\.
^66\.249\.83\.
^66\.249\.85\.
^66\.249\.89\.
^66\.249\.91\.
^66\.249\.93\.
^72\.14\.203\.
^72\.14\.205\.
^72\.14\.207\.
^72\.14\.209\.
^72\.14\.211\.
^72\.14\.215\.
^72\.14\.217\.
^72\.14\.219\.
^72\.14\.221\.
^72\.14\.223\.
^72\.14\.235\.
^72\.14\.253\.
^216\.239\.37\.
^216\.239\.39\.
^216\.239\.51\.
^216\.239\.53\.
^216\.239\.57\.
^216\.239\.59\.
^216\.239\.63\.

Tom

jdMorgan

1:08 pm on Oct 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was really just looking for the ones that make requests using the Java and/or Python UAs, but thanks for the lists.

Jim