Forum Moderators: rogerd
Some professional scrapers stage blitzkrieg raids, mounting around a dozen simultaneous attacks on a website to grab as much data as quickly as possible without being detected or crashing the site they're targeting.
Raids like these are on the rise. "Customers for whom we were regularly blocking about 1,000 to 2,000 scrapes a month are now seeing three times or in some cases 10 times as much scraping," says Marino Zini, managing director of Sentor Anti Scraping System.
The emerging business of web scraping provides some of the raw material for a rapidly expanding data economy. Marketers spent $7.8 billion on online and offline data in 2009, according to the New York management consulting firm Winterberry Group LLC. Spending on data from online sources is set to more than double, to $840 million in 2012 from $410 million in 2009.
Since the problem is well known and likely to keep growing it's a bit surprising that something akin to Akismet-for-blocking-site-ripper-bots hasn't emerged.
Flood protection would only slow down the most amateurish of the scrapers. Some of them tend to be smarter than that and may even have randomised timing to make them look like a human user. What often gives them away is their sharp difference from human browsing patterns.
It's hard to see any way to defeat scrapers altogether by blocking - even if you come up with the perfect piece of software, the potential currently exists to use botnets and defeating them will be real tricky. However, when the big boys are caught at it, naming and shaming might help.
I FF with BetterPrivacy installed
RequestPolicy, ABP, noscript, cookiesafe just to be on the safe side
I'm not sure what any of this privacy stuff has to do with scrapers. Nothing. Off Topic.