Forum Moderators: phranque
[b]tail -50000 access_log | awk '{print $1}' | sort | uniq -c | sort -n | tail[/b] The -50000 says look at the last 50000 lines of the access_log file. You can tweak that if you need to look at more or less entries.
Helped me today when I was getting bombed by "omgilibot" essentially nearly bringing my server down. I was able to pinpoint the little sucker in seconds and add them to my firewall block list.
1. Have cron run this say every 15 minutes and dump the output into a file.
2. Have a perl script run to look at the output and run an IP lookup on the top culprits.
3. If there is excessive hits and they are not one of the big dog search bots, issue a IP ban and email myself a notification with the results (so I can confirm it was a bot I wanted blocked).
Wonder if this is going to cause me more head aches than it would solve? Maybe I'll give it a whirl anyway... =)
My perl script now does the following:
1. Queries my access-logs and looks at the recent activity from the past few hours and tallies up the most common IP addresses accessing my content.
2. It then checks the top 10 entries against a GEO database to determine the country of origin, and then does a DNS name lookup on the IP.
3. A report is generated with all this info that looks like this:
Lookup: crawl02.exabot.com
Bot IP: 193.47.80.38
Reads: 751
Country: FranceLookup: crawl-66-249-65-225.googlebot.com
Bot IP: 66.249.65.225
Reads: 771
Country: United States
Lookup: katy-dsl-76-164-108-162.consolidated.net
Bot IP: 76.164.108.162
Reads: 1008
Country: United States
Lookup: spider38.yandex.ru
Bot IP: 77.88.30.246
Reads: 4492
Country: Russian Federation
Lookup: crawl-66-249-65-246.googlebot.com
Bot IP: 66.249.65.246
Reads: 6197
Country: United States
Lookup: b3091256.crawl.yahoo.net
Bot IP: 67.195.112.32
Reads: 6397
Country: United States
I am now adding some logic to filter the list based on white-listed bots (google, yahoo, msn, ask, etc). Then adding a email notification if a non-white listed bot is looking at too many pages.
Have not decided yet if I'll auto-block based on the above rules using my firewall. Just not comfortable with doing that.
Anyway, just a status update. When I'm done I'll be sure to post my code and all my scripts in case it helps someone else.
tail -50000 /etc/httpd/logs/access_log | grep 'GET /filename1.cgi\|GET /filename2.cgi\|GET /filename3.cgi' | awk '{print $1}' | sort | uniq -c | sort -n | tail -50 | /root/jobs/ip_scan_nomail.pl | /root/jobs/ddos_scan.pl #!/usr/bin/perl
#-- take input of top IPs and do a lookup... report suspicious activity via e-mail alert
require '/var/www/html/constants.pl';
# DNS lookup:
use Socket;
# Display some geo info
use Geo::IP;
my $gi = Geo::IP->open("/usr/local/share/GeoIP/GeoLiteCity.dat", GEOIP_STANDARD);
while (defined($line = <STDIN>)) {
$hostname = "";
chomp($line);
$line =~ s/^\s+//; #remove leading spaces
$line =~ s/\s+$//; #remove trailing spaces
@data = split(/ /, $line);
my $record = $gi->record_by_name($data[1]);
$iaddr = inet_aton($data[1]);
$hostname = gethostbyaddr($iaddr, AF_INET);
$output = "Lookup: $hostname\nBot IP: $data[1]\nReads: $data[0]\nCountry: " . $record->country_name . "\n\n";
print $output;
}
exit; #!/usr/bin/perl
#-- take input of the 'free' command and report memory problems via email...
require '/var/www/html/constants.pl';
$cnt = 0;
$bad_lookup_cnt = 0;
while (defined($line = <STDIN>)) {
chomp($line);
$line =~ s/^\s+//; #remove leading spaces
$line =~ s/\s+$//; #remove trailing spaces
$line =~ s!\s+!g;
@data = split(/: /, $line);
# Track the number of non-US sources...
if($data[0] eq "Country" && $data[1] ne "United States" && $data[1] ne "Canada"){
$cnt = $cnt + 1;
$country_list = $country_list . $data[1] . "\n";
}
#Also check for scrapers... store lookup from this batch
if($data[0] eq "Lookup"){
$tmp_lookup = $data[1];
$tmp_lookup =~ tr/A-Z/a-z/;
# If by chance our ip lookup service is acting up... let's not ban ip's right now... so keep track of bad lookups
if(!$data[1] || $data[1] eq " "){
$bad_lookup_cnt = $bad_lookup_cnt + 1;
}
}
if($data[0] eq "Bot IP"){
$tmp_ip = $data[1];
}
# If not a big search bot, warn if reads are high...
if($data[0] eq "Reads" && $data[1] > 200 && $bad_lookup_cnt < 30){
if($tmp_lookup !~ /(google)|(msn)|(yahoo)|(amazon)|(ask)/){
# Block the scraper for now... and email admin
system("/usr/local/sbin/apf", "-d", "$tmp_ip");
if ( $? == -1 )
{
$result = "APF Command failed: $!\n";
}
else
{
$result = "APF block executed: $tmp_ip";
}
# Send text msg
&send_text_2;
$tmp_lookup = "";
$result = "";
$tmp_ip = "";
}
}
}
# If high number of non-US sources hitting the site, send an alert... potential problem/ddos
if($cnt > 20){
$msg = "High number ($cnt) of international bots hitting the server right now...";
&send_mail;
&send_text;
}
exit; [edited by: phranque at 6:27 am (utc) on Mar 16, 2010]
[edit reason] disabled graphic smileys ;) [/edit]
sure we are reading, so are the hackers and automated bots writers and coders, I have written few security scripts...