czwartek, 7 sierpnia 2014

Apache stats for bots

Last time my server got overload due to heavy queries from bots. I needed to know which bot is so malicious. I wrote simple script to parse apache logs to search boot.
#!/usr/bin/perl use File::Basename; use Time::Piece; use Term::ANSIColor qw(:constants); if (-T $plik){ open(PLIK,"$plik")||die "nie mozna otwoprzyc pliku: $plik!!!\n"; } elsif(-B $plik){ open(PLIK,"zcat $plik |")||die "nie mozna otwoprzyc pliku: $plik!!!\n"; } else { print "Pliku: $plik nie mozna otworzyc\n"; exit; } while(defined($log=)){ my ($host,$date,$reqtype,$url,$proto,$status,$size,$referrer,$agent) = $log =~ m/^(\S+) - - \[(\S+ [\-|\+]\d{4})\] "(GET|POST)\s(.+)\sHTTP\/(\d.\d)" (\d{3}) (\d+|-) "(.*?)" "([^"]+)"$/; if ($status eq "200" && $reqtype eq "GET" && $agent =~ m/bot/i){ my $dt = Time::Piece->strptime($date, '%d/%b/%Y:%H:%M:%S %z'); $date= $dt->strftime('%Y-%m-%d'); $slugnumber{$agent}{$date}{$host}++; $bot{$agent}++; } } close(PLIK); foreach $klucz (sort keys %slugnumber){ print "\n================================================\n"; print BOLD,BLUE,"\n $klucz \n",RESET; foreach $data (keys %{ $slugnumber{$klucz} }){ print BOLD,BLUE,"\n $data \n",RESET; foreach $ipek (keys %{ $slugnumber{$klucz}{$data} }){ print "$klucz $data [$ipek] : $slugnumber{$klucz}{$data}{$ipek}\n" } } }
Below is output:
testing> perl ipstats.pl /var/log/apache/access.log ================================================ Yeti/1.1 (Naver Corp.; http://help.naver.com/robots/) 2014-08-05 Yeti/1.1 (Naver Corp.; http://help.naver.com/robots/) 2014-08-05 [125.209.211.199] : 1 2014-08-04 Yeti/1.1 (Naver Corp.; http://help.naver.com/robots/) 2014-08-04 [125.209.211.199] : 1 ================================================ msnbot/2.0b (+http://search.msn.com/msnbot.htm) 2014-08-05 msnbot/2.0b (+http://search.msn.com/msnbot.htm) 2014-08-05 [65.55.213.247] : 10 msnbot/2.0b (+http://search.msn.com/msnbot.htm) 2014-08-05 [65.55.213.243] : 4 msnbot/2.0b (+http://search.msn.com/msnbot.htm) 2014-08-05 [65.55.213.242] : 2