Forum Moderators: phranque
My first problem is that log files are huge and make that job visually is extremely time consuming.
My second problem is that I can't tell certainly (though I suspect) that those IP are pests and I'm not sure of being blocking good bots in my attempt to preserve bandwidth.
How can I know for sure, what are good and what are bad visitors?
Is there a tool or service I can use to do that?
Here's a short example of my log, I suspect as "bad"
200.139.115.11 - - [31/Dec/2006:12:01:43 -0300] "GET /fotos/fbla.jpg HTTP/1.0" 200 4398 "http://www.mysite.com/index.htm" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"
200.139.115.10 - - [31/Dec/2006:12:01:45 -0300] "GET /fotos/fondonav.gif HTTP/1.0" 200 230 "http://www.mysite.com/index.htm" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"
200.139.115.10 - - [31/Dec/2006:12:01:46 -0300] "GET /fotos/cellback_homerightextend.gif HTTP/1.0" 304 - "http://www.mysite.com/index.htm" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"
What does exactly this solution do?
I understand bad bots get the file getout and then what? An email is sent you the address you put there, but what happens with bots?
Do they still eat bandwidth?
Both block access to your content by dynamically adding code to your .htaccess file which denies further access to abusive robots. As long as you serve a very-small 403-Forbidden custom error page, you should see a dramatic decline in wasted bandwidth.
Either or both scripts could be modified to write to a firewall configuration file as well, thus completely stopping any further abuse. At least one member here has done this, although I don't remember the details.
Jim
Other post talks about being blocking innocent surfers
[webmasterworld.com ]
Seems like a lot of risk... Is that the right way?
Generally, there is little danger if you understand the scripts and are comfortable adapting them to your site. Neither of them are suitable for "plug and play" installation, though.
> Seems like a lot of risk... Is that the right way?
Your site is being abused. The solution is somewhat complex. The solution can be understood and mastered with study of PERL, PHP, regular expressions, and .htaccess directives. Is it worth it to you? Only you can decide this.
Jim
If the mod's think its a good idea, maybe they will post a link to my experience about this. It worked for me, i'm not sure if it will work for you though. But it is a combination of all that you have discussed.
Do I understand correctly? If I set the trap.php
I must add to robots.txt
User-agent: *
Disallow: /trap.php
And link from all my pages the 1x1 pixel to /trap.php
And Google won't think I'm hidding anything.
And Bad-Bots will visit trap.php anyway, so those will be identyfied to be banned in .htaccess.
Is that correct?