Forum Moderators: phranque

Message Too Old, No Replies

Allocating Resources for Bots only

slow down robots

         

gruntel

6:20 am on Dec 3, 2014 (gmt 0)

10+ Year Member



My server is getting brutally hammerred to near death by robots, most of which are good robots.

There is no way to change their crawl rate.

Anyone have a suggestion as to how to solve this problem?

I thought perhaps of creating 2 groups, called users and bot and allocating 50% for each.

Then from the behaviour or user-agent, if something is identified as a bot, it will be set to the bots group whose members have a maximum of 50% total resources. If bots try to take more, the system will slow down proportiannly so they never have more than 50%.

Does anyone know how to go about doing this or some other method with the same goal?

lucy24

6:40 am on Dec 3, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Uh... If you don't want them around, why are you admitting them at all?

There is no way to change their crawl rate.

If they don't heed the crawl-delay directive in robots.txt (Googlebot, to name but one, frankly and forthrightly ignores it), you may have to go to the various search engines' Webmaster Tools and set a rate manually.

But really... It is not every day one sees "getting brutally hammered to near death" in the same sentence as "good robots". What on earth is on your site that makes the search engines love it so passionately? (File under: Problems We Wish We All Had)

gruntel

7:14 am on Dec 3, 2014 (gmt 0)

10+ Year Member



it's robots like bing google etc.

large online furniture site. we know they are mostly robots we want on there and would like to try the solution mentioned above. any idea how to go about implementing such a thing? thanks a million

not2easy

7:31 am on Dec 3, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You don't want a site to be seen as slow loading, really the best thing to do is limit their crawl rate in the webmaster tools accounts for Bing and Google and kick the abusers out. What lucy24 says is better than loading down a server with more work to show "good" bots how slow your site loads.

gruntel

11:33 am on Dec 3, 2014 (gmt 0)

10+ Year Member



that can be done for google but not other bots.

I'm looking for a solution from our side.
any way to add detected bot ips to some kind of group which is limited in terms of total number of requests. i've heard of mod_cband but it is known to be buggy

not2easy

3:33 pm on Dec 3, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You can control Bing's crawl from a Bing Webmaster Tools account. You can limit it to only crawl during certain hours, too.

LifeinAsia

4:53 pm on Dec 3, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Have you implemented any server-side caching? Also, if you have a lot of images, look into offloading them to a CDN. What about getting additional servers and load balancing them?

Sysor

6:25 pm on Dec 5, 2014 (gmt 0)



If you list the bots here, it'd be easier to help you figure out how to block each of them ;)