Forum Moderators: bakedjake
Can a website hosting server tell the difference between a visitor going through individual pages on a website and a bot doing the same thing?
If the answer is "yes", then can a server be configured to reject (or redirect) ALL bot crawling that is not approved by the siteowner?
I do realize of course that robots.txt is supposed to perform that function, but it's also my understanding that some malicious bots will just ignore robots.txt. That being the case, I'm wondering if there is a way for the Linux/Unix server itself to block (via htaccess for example) any and all bot crawling that is not explicitedly allowed?
Thanks for any feedback...
......................................
[webmasterworld.com...]
Luckily I just now found that my cPanel at the sites that are being heavily crawled has a feature called "IP Deny Manager", so I am carefully scrutinizing the cgi script I use to capture the ip addresses of all the crawlers, and am blocking those that are not identified (or that are identified but appear useless!). Hopefully that will save me a ton of bandwidth and will keep those rodents at bay for awhile.
Thanks for the feedback...
........................................
[edited by: Reno at 4:22 am (utc) on Feb. 2, 2007]