Forum Moderators: open
Most of the following impose themselves on the SERPs and signal whether or not your link is safe to click - even when they have no information they will flag your site in a way that will discourage visitors, so in this less-than-brave new world it is important to get their seal of approval.
--
This user-agent is used by Exploit Prevention Labs LinkScanner:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
It pre-fetches HTML and JavaScript from searches on Google/Yahoo/MSN done by humans - the IP in your logs will be that of the user who has it installed and is searching on your keywords.
The software was recently acquired by Grisoft AVG but is still available for download on CNET, and if you block it you will be discouraging visitors and filling your logs with 403s for no good reason.
--
This user-agent is used by Grisoft AVG 8.0 LinkScanner:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)
As above, a bandwidth-wasting (and easily fooled) pre-fetcher best dealt with by cloaking minimal content (example given by jdMorgan in the AVG thread). When I had it blocked I lost a lot of traffic and on one of my tests it produced an impressive 120 (one hundred and twenty) 403s in 12 seconds - without me even visiting the site.
--
This user-agent is used by Trend Micro Internet Security and TrendProtect:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Unlike the others it does not pre-fetch SERPs in real time, but can be triggered from the "website authentication" feature of the Internet Security Pro package on demand, and also appears to be doing some general spidering for the Trend Micro "rating server".
Sometimes it comes from the Trend Micro IP range (66.180.80.0 - 66.180.95.255) but more often it comes from the Japan Network Information Center with various IPs in the 150.70.84.xx range - so you can probably expect your site to be classed as "Suspicious" (the outrageous term they use for unknown sites) if you have APNIC blocked.
--
This user-agent is used by the DrWeb plugin for Explorer and FireFox:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Maxthon)
This one pre-fetches HTML and JavaScript, but only on a specific request from the user, and in my tests it always came from 81.176.67.173 (the DrWeb server) as advertised - more reasonable than the others, but like them working for real oxygen-breathing humans.
--
I have not been able to identify where McAfee SiteAdvisor gets its information, but I do have an amusing screenshot of the related Yahoo SearchScan flagging google.com as a purveyor of "Dangerous Downloads".
On all my sites it says "We've tested millions of sites but haven't tested this one yet" - and unless McAfee scans the entire web as frequently as GoogleBot it is presumably worthless.
--
None of the other 18 anti-virus packages I tested currently interfere with search results, but it may only be a matter of time, and if you don't appease them and get flagged as "clean" you may find that your ranking is considered irrelevant.
"Paranoia strikes deep - into your SERPS it will creep"
...
I am testing sending a 403 with a short error message, and will later test sending a status 200 with the same error message to see if anything different comes up. And to see if my click thoughts from the ips actually increase or decrease.
What it does mean is that the described behaviour can be replicated by downloading the software.
What happens if an error 500 is returned?
I haven't tested it, but the point with all of these "tools" is that if they have no information
(for whatever reason) then they will not flag your site as "clean" and users will naturally be discouraged from visiting it.
If you want to know the definitive answer, download one of them and try it.
I haven't tested it, but the point with all of these "tools" is that if they have no information (for whatever reason) then they will not flag your site as "clean" and users will naturally be discouraged from visiting it.
I do recognize that every webmaster has different priorities; some websites are always on "paranoid mode" and other websites have their gates wide open to every nutch and libwwperl out there. That said, when Google Web Accelerator came out, it created (rightly or wrongly) a firestorm of controversy and various websites, blogs, etc. posted cookie-cutter solutions to blocking GWA. Correct me if I'm wrong, but despite Google's efforts to market GWA, it is now used by a tiny minority of users and perhaps the reason is because so many webmasters had fought back.
So if enough webmasters are fed up with the unwanted noise generated by all these different scanners, then maybe these tools will also go away in time. The Internet IS a dynamic market, new products are tested (and fail) all the time, and I don't see any reason why we must automatically concede to every superfluous scanner.
At least the developers at Grisoft et. al. could take a moment to discuss these issues with the webmaster community. Ask for our feedback, create something like a robots.txt standard, etc. So far, it seems they have been pointedly ignoring this thread.
So if enough webmasters are fed up with the unwanted noise generated by all these different scanners, then maybe these tools will also go away in time. The Internet IS a dynamic market, new products are tested (and fail) all the time, and I don't see any reason why we must automatically concede to every superfluous scanner.
Totally agree!
As an aside, the subsequent continuation of this thread in Forum 11 may replace the "Close to Perfect Htaccess as the longest ever.
maybe these tools will also go away in time
I wish they would, but no matter how much opposition we put up I fear they are here to stay, and the best we can hope for is getting them to modify their behaviour - or getting someone else to do the job properly.
My own primary objection is not about bandwidth (though I appreciate that is also a serious issue) but about the hi-jacking of the SERPs by companies who have a vested interest in promoting fear, uncertainty and doubt. While Google has long flagged pages that are known to be dangerous, they are otherwise neutral - and they inspect a vast number of URLs daily.
The anti-virus companies take the opposite view - everything is suspect until they have proved it innocent - but those such as McAfee and Trend Micro who rely on a "rating server" seem oblivious to the fact that they would have to check every page on the web every day (at least) if their assessment is to be any use at all, while branding sites they haven't checked as "Suspicious" is as absurd as it is offensive.
Grisoft's approach, which at least checks the evidence before the verdict, seems more reliable on one level, but we know how easy it is to fool their LinkScanner, and the software is clearly deficient in other respects. Unfortunately they have introduced it as a free feature and most of their users will probably see it as a good thing, so the pressure will be on other AV vendors to do something similar to keep up.
Bandwidth, of course, is something that webmasters pay for, and statistics are something they rely on. The Grisoft approach wastes a colossal amount of bandwidth and skews statistics, and the McAfee/Trend approach will do the same if they ever get serious about crawling the web.
Then there is the issue of honesty - like many here I take a dim view of robots crawling my sites while masquerading as something else, and that is something all these services have in common. They may argue that they need to conceal their identity to do their job, but if I can identify them then so can every teenage scammer on the planet.
It seems to me that the only people in a position to accurately evaluate webpages are the search engines. Yahoo already have a tie-in with McAfee and how they exchange information is unclear, but if flagging google.com as a drive-by site is any indication then they are not doing it very well.
A move from Google in this area may well be imminent, if only because the practise of second-guessing their results will surely have a negative effect on their image if it becomes widespread - I seriously doubt that they want to become the "web police", but they may have no option.
...
I am installing AVG 8 on one pc to test it out directly. And will post my results later.
After doing a little testing with AVG. It will happily accept a 403 status code as long as it has html content sent with it, and show the nice green check mark next to the url. It will only go bonkers when it gets no content no mater the status code returned from my simple tests.
I've now removed the bloatware from the computer in question and re-installed Norton. Just to check up I googled some of my search terms and clicked on my site; in the subsequent log entry the UA was just the same:,
compatible; MSIE 6.0; Windows NT 5.1; SV1.
The indication then is that AVG 8 is not leaving any signature.
The slightly mitigating fact is that when the site is preloaded it is the page only without the accompanying graphics, css etc so the bandwidth hit is a fraction of what a human visitor would cause. I am going to get round the problem by giving every page a small, unique css script with the same name as the page (ie a page called blue-widgets will refer to blue-widgets.css) and I will stop my stats programme from showing hits on .html pages. Not an ideal situation and a pain in the proverbial to set up but at least by checking the number of hits on the .css files I will be able to check up instantly how many human visitors I am getting since they will be the only ones tripping them.
I have brought this thread to his notice so I look forward to hearing his comments here!
[edited by: incrediBILL at 10:07 am (utc) on June 4, 2008]
[edit reason] call to action removed - see tos #26 [/edit]
he feels that the company's product is the lesser of two evils since he feels that the disruption to millions of webmaster's stats is justified by the extra safety the product gives to surfers
He's wrong because:
a) They created a DDOS attack on popular sites with lots of bookmarks and high rankings.
b) It's less secure because everyone and his brother now knows how to spoof it, where's the safety now?
He should know that Grisoft bought a useless product and made it substantially worse.
His customers may feel safer, but they are being deluded - every script kiddie on the planet can fool this fabulous new "security tool" and get their payload pages marked as safe by AVG.
Meanwhile ordinary webmasters are seeing their statistics rendered useless and their bandwidth charges rocketing as this useless pre-fetcher rampages through their sites.
Grisoft may know all about Windows but they appear to know nothing about the web.
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1; .NET CLR 1.1.4322)
This pre-fetcher works the same as AVG and came from the the 82.166.163.xx range.
I would not recommend blocking or cloaking this one by user-agent.