Forum Moderated by: open

Crawler, Spider, and User Agent ID


Forum to identify search engine spiders and user agents

 
Thread SubjectMessagesStarted byLast Message
Szukacz
claims to honour robots.txt but doesn't
5 Mokita 4:28 pm Aug 3, 2006
Adwords Bot
2 volatilegx 4:13 pm Aug 3, 2006
ebay indexing?
6 jake66 3:24 am Aug 3, 2006
PigBlock
No robots.txt
4 GaryK 7:14 pm Aug 1, 2006
Purpose of this Crawler/Bot?
7 DXL 7:04 pm Aug 1, 2006
The new Y! slurp
3 BlackTulip 5:36 pm Aug 1, 2006
PediaSearch.com Crawler
9 keyplyr 7:19 am Jul 30, 2006
SevenTwentyFour/LinkWalker - New Owner, Mission
Watch out! Brand name surveillance via LinkWalker.
4 Wizcrafts 5:00 am Jul 30, 2006
lwp::simple/5.803 pretending to be yahoo?
6 jake66 7:53 pm Jul 27, 2006
"NutchCVS" (again) but from penguin26.parc.xerox.com
No robots.txt
3 Pfui 9:02 am Jul 26, 2006
"GT::WWW/1.026" from .reverse.layeredtech.com
No robots.txt
7 Pfui 4:16 am Jul 26, 2006
MetagerBot/0.8-dev (MetagerBot; http://metager.de; )
Note space before close paren
8 Pfui 7:19 pm Jul 25, 2006
Downloads from a blank UA?
23 keyplyr 6:26 am Jul 25, 2006
HA! No User Agent for You!
These just flat-out annoy me!
10 GaryK 2:16 am Jul 25, 2006
Crawler/1.0 http://elibron.com
No robots.txt
5 GaryK 7:24 pm Jul 24, 2006
Mozilla/5.0 (compatible;MAINSEEK BOT)
No robots.txt
3 GaryK 3:06 pm Jul 24, 2006
Mozilla/5.0 (compatible; robtexbot/1.0; http://www.robtex.com/ )
Note space before close paren. Also: no robots.txt; uses site URL in ref
10 Pfui 1:50 am Jul 24, 2006
"teoma agent1" from directhit.com -- no robots.txt
2 Pfui 11:17 pm Jul 22, 2006
"research-spider" from .cs.brown.edu
2 Pfui 11:16 pm Jul 22, 2006
"Entrieva/1.0" -- no robots.txt
2 Pfui 10:26 pm Jul 22, 2006
000s of Truncated Page Requests from Many IPs
[3] ( 1 2 3 )
82 jomaxx 10:59 pm Jul 20, 2006
Yahoo! Crawlers - A response from Yahoo! Search
Response from Yahoo!
9 Yahoo_Mike 8:33 am Jul 18, 2006
How to ban (compatible ; type requests
Note space between compatible and semicolon[2] ( 1 2 )
40 larryhatch 3:06 am Jul 18, 2006
Googlebot
Google but not Googlebot
4 vortech 4:55 pm Jul 16, 2006
server2.attributor.com
12 Cromicon 4:39 pm Jul 16, 2006