Forum Moderators: phranque

Message Too Old, No Replies

Help blocking abusive IPs

Preserve bandwidth

         

silverbytes

9:36 pm on Jan 19, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My sites are consuming lot of bandwidth, though I have a decent .htaccess file blocking known harvesters and other pests, I see in logs very long lists of same IP hitting

My first problem is that log files are huge and make that job visually is extremely time consuming.

My second problem is that I can't tell certainly (though I suspect) that those IP are pests and I'm not sure of being blocking good bots in my attempt to preserve bandwidth.

How can I know for sure, what are good and what are bad visitors?

Is there a tool or service I can use to do that?

Here's a short example of my log, I suspect as "bad"

200.139.115.11 - - [31/Dec/2006:12:01:43 -0300] "GET /fotos/fbla.jpg HTTP/1.0" 200 4398 "http://www.mysite.com/index.htm" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"
200.139.115.10 - - [31/Dec/2006:12:01:45 -0300] "GET /fotos/fondonav.gif HTTP/1.0" 200 230 "http://www.mysite.com/index.htm" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"
200.139.115.10 - - [31/Dec/2006:12:01:46 -0300] "GET /fotos/cellback_homerightextend.gif HTTP/1.0" 304 - "http://www.mysite.com/index.htm" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

youfoundjake

10:13 pm on Jan 19, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Set up a bot trap. That is the best way to determine if visitors are good or bad, I have found. Keeps up from stealing my bandwidth and harvesting the pages for email address's

silverbytes

4:02 pm on Jan 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



May you explain more about "Set up a bot trap" please?

silverbytes

5:28 pm on Jan 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If somebody uses a good script to autoban or has some good solution please post or stick me. Very appreciated!

silverbytes

8:50 pm on Jan 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Still posting myself (what a shame)
Saw this old post

[webmasterworld.com ]

What does exactly this solution do?
I understand bad bots get the file getout and then what? An email is sent you the address you put there, but what happens with bots?
Do they still eat bandwidth?

jdMorgan

9:25 pm on Jan 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For more background info on this subject, try this search [google.com]. The results contain links to threads about two different bad-bot scripts, one based on robots.txt violations, and the other based on frequency of access. They do different things and can be used together.

Both block access to your content by dynamically adding code to your .htaccess file which denies further access to abusive robots. As long as you serve a very-small 403-Forbidden custom error page, you should see a dramatic decline in wasted bandwidth.

Either or both scripts could be modified to write to a firewall configuration file as well, thus completely stopping any further abuse. At least one member here has done this, although I don't remember the details.

Jim

silverbytes

10:07 pm on Jan 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some recommend a 1x1 tranparent gif pointing to trap, what makes me think that Google will hate and I'll compromise my website. (here in webmasterworld and kloth net)

Other post talks about being blocking innocent surfers
[webmasterworld.com ]

Seems like a lot of risk... Is that the right way?

jdMorgan

11:48 pm on Jan 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The key is to disallow any trap link in robots.txt. So no, Google shouldn't get mad.

Generally, there is little danger if you understand the scripts and are comfortable adapting them to your site. Neither of them are suitable for "plug and play" installation, though.

> Seems like a lot of risk... Is that the right way?

Your site is being abused. The solution is somewhat complex. The solution can be understood and mastered with study of PERL, PHP, regular expressions, and .htaccess directives. Is it worth it to you? Only you can decide this.

Jim

youfoundjake

7:06 pm on Jan 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just recently went through all of this.
I set up a robots.txt that instructs the bots on what they can and cannot scan on my site. I left this up for a few days just to have them be aware of the changes. I checked to see if google was reading it correctly by using the Google Webmaster Tools interface to test my robots.txt file
Next I created a landing page in a folder that no one should ever be able to visit naturally without looking at the source code of my main page.
In my main page I did put a 1x1 image link pointing to that directory.
Next when a bot hits the fordbidden directory, a script is run that opens up the .htaccess file and adds the IP address of the visitor.
Then later in in the script, I have an email notification that someone has trigged the trap and it sends me the IP Address, and user agent.
This way I can verify if its a rogue bot and if need be, do a reverse dns lookup on the ipaddress to see if they are who they say they are by user agent.
Be careful, playing around with the .htaccess file, you can bring down the whole site.

If the mod's think its a good idea, maybe they will post a link to my experience about this. It worked for me, i'm not sure if it will work for you though. But it is a combination of all that you have discussed.

silverbytes

1:23 pm on Jan 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[cite]The key is to disallow any trap link in robots.txt. So no, Google shouldn't get mad. [/cite]

Do I understand correctly? If I set the trap.php

I must add to robots.txt

User-agent: *
Disallow: /trap.php

And link from all my pages the 1x1 pixel to /trap.php
And Google won't think I'm hidding anything.
And Bad-Bots will visit trap.php anyway, so those will be identyfied to be banned in .htaccess.
Is that correct?

youfoundjake

3:31 pm on Jan 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For the most part,yeah, you got it.
I set mine up like this
User-agent: *
Disallow: /trap/

And then had an index page in the trap folder that when triggered would add the ip address to .htaccess and send me an email that the page was visited.

The full url is
www.example.com/trap/index.php