Forum Moderators: open
the ip resolves back to msn (spoofed?) or is this a legit bot, and what's it for?
msn working on a froogle type database?
Name: msnbot-products/1.0 (+http://search.msn.com/msnbot.htm)
IP Address: 207.68.154.139
User Agent: msnbot-products/1.0 (+http://search.msn.com/msnbot.htm)
The MSN Shopping bot is msnbot-products.
The MSN News bot is msnbot-news.
The MSN Image Search bot is msnbot-MM.
The MSN Search bot is still just plain msnbot.
By the way, this change was partly precipitated by people here at Webmaster World complaining that we were crawling them a lot but never indexing them; it always turned out that it wasn't MSN Search doing the crawling -- it was some other team at MSN. Now it should be much easier for people to see what's really going on -- and to block or restrict other bots (without blocking MSN Search) if they have to.
By the way, this change was partly precipitated by people here at Webmaster World complaining that we were crawling them a lot but never indexing them; it always turned out that it wasn't MSN Search doing the crawling -- it was some other team at MSN. Now it should be much easier for people to see what's really going on -- and to block or restrict other bots (without blocking MSN Search) if they have to.
A move in the right direction... thanks for the response MSNDude.
Will the MSNBot info page for Webmasters [search.msn.com] be updated to reflect these newly-announced 'bots? I'm currently looking for answers to the following questions:
On sites with no non-proprietary multimedia files, and with no news or shopping content, would the following construct allow or deny msnbot-media, msnbot-news, etc.?
# Allow unrestricted access for msnbot
User-agent: msnbot
Disallow:# Disallow all others not 'allowed' above
User-agent: *
Disallow: /
# Disallow all MSN specialty robots
User-agent: msnbot-
Disallow: /# Allow untrestricted access for msnbot search robot
User-agent: msnbot
Disallow:# Disallow all others not 'allowed' above
User-agent: *
Disallow: /
Jim
[edited by: jdMorgan at 11:38 pm (utc) on July 31, 2006]
· MSNBot obeys robots.txt for MSNBot
· MSNBot-NAME obeys robots.txt for MSNBot *and* MSNBot-NAME.
This allows site owners to do no extra work for our additional crawlers and also gives them the flexibility to limit for specific crawlers.
Hope that helps.
How about this:msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)
I blocked this one a while back because it didn't look kosher, and ended up with the whole site de-indexed and MSN search bot stopped spidering the site.
It also took me a while to realise what was happening, to un-block it and get the site re-indexed. Doh!
Both Google & Slurp! have accepted compressed pages for years now (although with G it was the "Mozilla/5" bot, which is now the standard bot). msnbot never has, and consequently consumes more bandwidth on my site than G & Y together, although both of the former take far more pages each than msnbot.
The inability to accept compressed pages really does give the impression of an old, out-dated technology being employed at MS. Time to join the 21st Century, no?
Why?
With a dynamic website it's bandwidth vs. CPU time and compressing the page on my site increases the time to deliver the page and chews up more CPU cycles meaning I can deliver fewer pages in the same amount of time.
I won't be sending compressed pages anytime soon, guess I'm using out-dated technology too ;)
With a dynamic website it's bandwidth vs. CPU time
Compressed pages are nice for you static web page sites
guess I'm using out-dated technology too ;)
CPU is cheap now. Times have changed.
If you just let the dynamic page be delivered as it's being created the overall process is faster as waiting to generate the whole page, then zip it and ship it, means the overall page time processing the page is longer as it doesn't start transmitting until the entire process is completed.
Not sure how you can get around that fact and my server is just too busy to risk it.
[edited by: incrediBILL at 5:58 am (utc) on Aug. 8, 2006]