Forum Moderators: open

Message Too Old, No Replies

80legs

80legs abuse

         

MxAngel

7:04 am on Nov 27, 2010 (gmt 0)

10+ Year Member



Today I ran into them for the first time, all topics are too old so I had to start a new topic to share my experience with this "botnet" ...

UA:
Mozilla/5.0 (compatible; 008/0.83; [80legs.com...] Gecko/2008032620

26/Nov/2010:15:58:15 -0700 - 26/Nov/2010:19:53:25 -0700
Search "80legs" (2409 hits in 1 files)

2409 hits in 4h ... there're out.

tangor

8:34 am on Nov 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



80legs does honor robots.txt.

I white list (let in only a few, disallow all others), but 80legs will honor a named disallow for their bot as well.

Pfui

12:09 pm on Nov 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Personally:

These spiders-for-hire have been blockworthy in my book for as long as I can remember, because of abuse, no robots.txt, and no benefit.

Perhaps requesting/respecting robots.txt is a config that can be locally overridden? My experience is that many, if not most, of the distributed crawlers do not request robots.txt.

Previously:

80legs [webmasterworld.com...]

UA blocking of 80legs bot?
using .htaccess to block 80legs bot
[webmasterworld.com...]

Related:

Digsby IM Enables Web Crawlers Control of Your PC & Bandwidth
Plura Processing and 80Legs to Leverage Digsby Network
[webmasterworld.com...]

tangor

3:23 pm on Nov 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My white list quickly reveals if any bot/crawler disobeys, and at that time I'll deal with the miscreant in .htaccess, BUT, if it OBEYS then I don't have to do that... which is less work. Just make sure all bots, even disallowed bots, can get robots.txt... failure to show one is an invitation to rip the site.