Forum Moderators: open
static.88-198-7-nn.clients.your-server.de
findfiles.net/0.96 (Robot;test_robot@gmx-topmail.de)
robots.txt? Yes BUT ignored it
Since May, partial listing:
static.108.75.46.nn.clients.your-server.de
Mozilla/5.0 (compatible; heritrix/2.0.2 +http://seekda.com)
robots.txt? YES
static.108.75.46.nn.clients.your-server.de
Mozilla/5.0 (compatible; heritrix/${pom.version} +http://seekda.com)
robots.txt? YES
Fake ref? YES
static.47.84.46.nn.clients.your-server.de
Mr. X (Nutch spiderman; [agenteX.googlepages.com...] ; MyEmail)
robots.txt? Yes BUT ignored it
static.84.69.46.nn.clients.your-server.de
IE 4.01 Win98
(yeah, sure)
static.213-239-214-nn.clients.your-server.de
Mozilla/5.0 (compatible; proximic; +http://www.proximic.com)
robots.txt? YES
213-239-212-nn.clients.your-server.de
GrubNG 20080128
robots.txt? NO
static.165.71.46.nn.clients.your-server.de
Eurobot/Nutch-1.0-dev (1.0)
robots.txt? Yes BUT ignored it
static.88-198-50-nnn.clients.your-server.de
Mozilla/5.0 (Windows; U; Windows NT 5.0; de; rv:1.8.1.5) Gecko/20070713 Firefox/2.0.0.5
robots.txt? Yes BUT ignored it
Correct way around ones are those in the ranges below:
213.239.192.0 - 213.239.223.225
88.198.0.0 - 88.198.255.255
Any chance of the true initial IP portion so they can be tracked down? I may already have them blocked (as with those above) but it would be nice to check.
-----
Any IP address or reverse DNS information not expressly belonging to a search engine should be masked as follows:
Example IP: 111.222.333.nnn
Example DNS: nnn.333.222.111.example.com
Additionally, the IPs should be obscured when discussing distributed crawlers that are run from volunteer computers.
-----
I have the full Host info, of course, but bothunting/tracking is too OCD/time-consuming as-is:) so please don't Sticky me for the missinnng details. If you know a server reverses IPs in its Host names, I guess you'll have to swap 'em around yourself as need be, sorry.
(If life was like the movie TIMECOP and "The same matter cannot occupy the same space" vis-a-vis scourge hosts/farms/clouds*, these two would be gone in a flash. Forever;)
ec2-[yada-yada].compute-1.amazonaws.com
Mozilla/5.0 (compatible; proximic; +http://www.proximic.com)
robots.txt? YES
08/01 06:20:24
static.47.34.46.nn.clients.your-server.de
Mozilla/5.0 (compatible; proximic; +http://www.proximic.com)
robots.txt? Yes BUT ignored it
08/01 06:20:25
08/01 06:20:26
*see also:
amazonaws.com plays host to wide variety of bad bots [webmasterworld.com]
For example: 47.34.46.nnn resolves to Bell in Canada. The correct IP range should begin nnn.46.34.47 but it's difficult to discover what nnn is and hence which block your-server.de resides on in that instance. nnn could be anything from the 80 to 95 but isn't, nor is it in the 21n range. There are several other possibilties including 77, 78, 79 and in fact it appears to resolve to 78.46.32.0 - 78.46.63.255. I actually have the whole block 78.46.nnn.nnn already blocked. :)
I agree it is not always obvious when to reverse the numbers and I appreciate your time is valuable. I also appreciate your postings. :)
blend27 - I almost wrote the same thing about netdirekt, which is a known exploit source. :)
janharders - life is too short to compile a list of exploites from there. :)
Here's the thing:
We do rDNS on the server so my Apache ELF entries show visitors by Host name. Plain IPs only appear when there's no Host.
That's why, after white- or blacklisting by UA, I then 403 by Host, and thereafter 403 by IP/CIDR if need be. And that's why the majority of my bot-sighting posts show Host info, not IPs: I don't need to WHOIS every bot-running Host I spot prior to blocking. And I don't have time to WHOIS them just to post.
So where does how we do things leave you in terms of lookups and/or nnn reversals?
You're on your own:)
FWIW, at least vis-a-vis hits to our Class C, the vast majority are by Hosts, and the worst trouble-making Hosts do not reverse IPs in their names.
All of my logs show IP not rDNS. It's faster, although speed is not so much of a problem now (always provided the DNS server doesn't bottleneck).
The only time the server does rDNS is for stats analysis - which I took ages setting up per site and none of the b clients uses! :(
I suppose a problem in blocking by host rather than IP is that server farms often have rDNS set up to the clients' domains (mine all are), so you would need to block a lot of domains instead of a range of IPs. Obviously more selective but in my case much more server-time consuming.
So: I'm on my own. No problem now I understand. :)
I was looking for one ( not urgently but for a future project ) hetzner.de will do nicely :) and I promise to not run bots off it ..