Forum Moderators: mack
I just checked the site: command on Live and we've only got about 100 pages in their index now - which is fewer than the number of pages mentioned above.
Anyway, we keep thinking about blocking msn altogether and stopping them from wasting bandwidth. I know no one really cares about Live anymore, but I was wondering if anyone else noticed the same - especially if you think you're under some kind of penalty.
I use a couple of standard robots.txt instructions (can't remember where I got them from - probably here at WebmasterWorld, and then confirmed with SEs robots pages.)
For Google, Yahoo, MSN (presumably also Livesearch), teoma and another obscure SE;
User-agent: botname
Crawl-delay: 10
for MSN specifically;
User-agent: msnbot
Crawl-delay: 10
and general catch-all for anyone else who decides to start being nice:
User-agent: *
Crawl-delay: 15
That's 10 seconds and 15 seconds, obviously you can make it shorter or even longer, just double check the SE protocol.
I don't know if this creates a conflict with the SEs who recognise the robots crawl delay, I would presume not.
Since I started using it, I notice that the SEs don't seem to be so "grabby' when they come through on a large sweep after an algo update, which is usually 20-30 pages at a time on my primary site now.
I had seen up to 100 pages grabbed in a single pass previously, the bandwidth spikes were ... large ...
Now I mainly get mutiple daily visits of 1,2, upto 10 pages at a time, from the majors and their data centres, so it would appear that if you want "steady drip" rather than "sudden flood", it works.
Hope this is useful.
Hooroo
JP