Forum Moderators: open
How do people feel about this? As I understand it, this company 'selects' parts of web sites to 'package up' and then repurposes the content for people to view offline.
Sounds awfully much like breach of copyright to me. But is there an advantage to webmasters? Or should it be blocked?
PiyushBot (Piyush Web Miner; [piyush.com...]
both crawling my site from the same IP Address at the same time. Both read robots.txt. Neither one obeyed it.
Another interesting note. If you click on the URL in the UA you'll wind up at a page that looks like a bad spoof of a Network Solutions, website under construction page. Maybe it's the real thing but I really don't think so.
<snip IP address and whois lookup data>
[edited by: volatilegx at 6:53 pm (utc) on April 2, 2007]
[edit reason] removed identifying info [/edit]
PiyushBot (Piyush Web Miner;*)
RufusBot (Rufus Web Miner;*)
SumeetBot (Sumeet Bot; *)
WebarooBot (Webaroo Bot;*)
These all seem to have a relationship to Webaroo.
So just a heads-up for anyone using "Web.?Miner" to trap Webaroo. The last two in that list don't follow the same pattern.
I've said it before and I'll say it again, robots.txt is for GOOD spiders, firewalls are for all the rest.
Here's my complete list of their bot's User Agent strings:
64.124.122.228 "WebarooBot (Webaroo Bot; [64.124.122.252...]
64.124.122.228 "PiyushBot (Piyush Web Miner; [piyush.com...]
64.124.122.228 "RufusBot (Rufus Web Miner; [webaroo.com...]
64.124.122.228 "RufusBot (Rufus Web Miner; [64.124.122.252...]
64.124.122.228 "SumeetBot (Sumeet Bot; [64.124.122.252...]
64.124.122.228 "PsBot (PsBot; [64.124.122.252...]
64.124.122.228 "pulseBot (pulse Web Miner)"