Forum Moderators: open
There isn't a great deal of (any) detail on the technical differences between the spider versions, unfortunately. Anyone know any more about it?
They claim that Slurp 3.0 will recognize the old Slurp information which means the robots.txt file should be OK but those of you that do very narrow rewrite rules might need to update. Additionally, reverse dns validation of crawl.yahoo.net domain will continue to function properly for the new smaller set of IPs.
Many sites will start bouncing Slurp! that didn't heed the call to use rDNS validation for major SEs so this will be ugly.
Per their own NEW press release.
#SetEnvIf User-Agent "Slurp/3.0" keep_out
SetEnvIf User-Agent "Slurp/1.0" keep_out
SetEnvIf User-Agent "Slurp/2.0" keep_out
SetEnvIf User-Agent "slurp@inktomi.com" keep_out
SetEnvIf User-Agent "Yahoo! Slurp;" keep_out
In addition I have some very old references to the following (have no idea when they were last used):
Slurp/cat
Slurp/si
Nor have I kept updates on the follwing which are contained in my robots.txt:
Yahoo-MMCrawler
YahooSeeker
Yahoo! Mindset
Yahoo-Blogs
Yahoo-MMAudVid
YahooFeedSeeker
YahooSeeker-Testing
YahooSeeker/CafeKelsa-dev
YahooVideoSearch
YahooYSMcm
Yahoo! DE Slurp
Yahoo! Slurp China
if so, this is whats comming to my sites from that range.
Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
Mozilla/5.0 (compatible; Yahoo! DE Slurp; [help.yahoo.com...]
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...]
Slurp/3.0 was first seen around 2007-11-19 and got caught in the trap, trying...
So is this the Range 67.195.0.0/16?
According to their blog post:
The crawlers will start crawling from a different and much smaller set of IP addresses, but it'll still be from the crawl.yahoo.net domain.
So I'm not sure if that means they're switching to a completely new set of IPs or just dropping a large segment of their existing IPs, but it does say "different" so the jury is still out on what that means until we can verify it.
Still, these are our sites they're unleashing themselves on. Would be nice to be told what to expect, eh?
Still, these are our sites they're unleashing themselves on. Would be nice to be told what to expect, eh?
They could care less of what webmasters desire. At least those few that are aware of their activity.
The bots and their Dr. Frankenstien's have simply grown accustomed to crawling as they please with as many different number of bots simultaneously.
Unfortuantely, even if every participant here banded together in a joint denial it wouldn't slow down the crawling of the bots in amy manner, nor, even make them blink and wonder. . .
Googlebot came in 3rd with only got 8K pages and msnbot claimed 2nd with 11K pages, making Slurp the biggest crawler and it's been this way for many weeks now.
The bot that crawls the least sends the most traffic, the irony.
I am still digging for more details and doing comparisons to see if there is anything else worth noting here.