Forum Moderators: martinibuster
These false visits aren't hurting us with Adsense because neither Adsense nor Analytics counts them, apparently the Javascript Google uses is smarter than the server logging, or maybe Google has an extensive IP blacklist. However, they are growing both in quantity of false visits and in the number of pages affected. They appear in our logs as:
GET (ourpage.html) - 80 - (random IP Addy) HTTP/1.1 Mozilla/4.0+compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)
That the visits are false is apparent because they increase the logged traffic to a given page by a factor of 4X to 8X, resulting in hundreds of false visits a day on some pages. What's more, they never go away. After these false visits appears, they simply continue, for months in the case of our earliest page affected. Overall, our server logs are showing over a thousand of the false visits a day.
I did some research on this, saw quite a few comments from last year that it may be related to search engine caching or to a flaw in the .NET architecture, but I'm not enough of an infrastructure guy to understand those discussions. I didn't find any discussions relating the false visits to pages showing Adsense, which is what concerns me. Since we only run Adsense on a fraction of our pages and don't do huge traffic to start with, it shows up pretty clearly. Also, removing Adsense from an affected page does not free it from the daily beating.
Has anybody else seen such a rise in false traffic and does the Adsense relation hold up? Is it an attack, somebody trying to spam the system with noise to cover their tracks? Is it (to use a hardware analogy) just chattering relays out on the Internet infrastructure?
they aren't real people or even bots
Another 'Net life form discovered? ;-)
This looks like a regular XP user agent. A hint could possibly be in the IP addresses and frequency of hits. The best place to get help for something like this is over at the tracking and logging [webmasterworld.com] or spider [webmasterworld.com] forums where all the log obsessed folks hang out.
If JavaScript is not enabled then these visits have no effect on your AdSense either.
Correct, but it doesn't explain what they are trying to do. It's only a handfull of Adsense pages drawing this attention, no relation to Page Rank.
Maybe they just "ping" some high-pagerank pages to steal content/updates.
No, they wouldn't be coming back to the same pages (which are static by the way) day after day, generating hundreds of false visits on moderately trafficed pages and dozens of false visits on pages that only draw a couple real visitors a day.
This looks like a regular XP user agent. A hint could possibly be in the IP addresses and frequency of hits. The best place to get help for something like this is over at the tracking and logging or spider forums where all the log obsessed folks hang out.
I thought of that, but since it's only happening with pages that show Adsense, I thought I'd try here first to see if anybody else is seeing the same issue. On a page that only gets a couple dozen of these false visits per day, they come every hour or so from unique IP's. I haven't carefully analyzed the logs for one of the pages with 300 or 600 false visits a day to see if the IP's repeat yet, I suppose I'll try that if I get really annoyed.
They then use a bot net to keep the scraped content updated. I know you said they were static, but a scraper that uses a bot net is lazy and will keep automatically checking anyway.
It's just my guess, but I would still post your question in the tracking and logging forum.
To this day I have no idea whether it's due to some kind of ISP pre-cacheing, or slow-motion email harvesting, or search engines doing some under-the-radar crawling in order to identify cloaking, or what.
If it becomes a major problem you can try to ban the IP blocks involved. Personally I don't see simple crawling as anything worth losing sleep over, except when they ignore my robots.txt file and hammer my executable scripts with requests.
I know you said they were static, but a scraper that uses a bot net is lazy
But why hit the same page hundreds of times a day? I understand the scraper paradigm, before Google worked out their duplicate content algo, I used to see tens of thousands of scrapes of our pages in their index. I guess I need to study up on bot nets, that sounds like something that would use unlimited random IP's. And getting back to this forum, why only target pages displaying Adsense? It's not like we have Adsense on the high quality pages or the low quality pages, it's pretty random on our site, depending on where Adsense converts well and what doesn't interfere with out core business.
But why hit the same page hundreds of times a day?
I hear you, but to illustrate why I think bot nets can and will do this because they are stupid and lazy, I'll give you a real example that is off topic from adsense, but might answer your question.
On one of my sites, I had a form where users could submit info to a page. I didn't properly sanitize one of the fields, but figured it was ok, because whatever a user submitted, I had to approve prior to it showing up on the page.
Well, someone used that field to insert some java script from another site. When I viewed the field to approve it, I also ran the javascript (because I viewed the field in html). No other users ever ran that javascript, but I didn't appreciate it so much, so I went ahead and made sure that field was sanitized. I fixed the issue immediately. Also, the site hosting the code was taken down.
Well, I guess whoever did it thought they found a vulnerable page, because next thing I know I'm getting hits to that field from random IP's all inserting the same javascript.
Even though the field has been sanitized for over a month and the site hosting the bad code has also been down for over a month, this one page is still to this day getting these multiple hits from random IP's doing the exact same thing and still pointing to that down site.
All I can do is just sit back and shake my head, I guess.
But as I said before in my earlier post, it's just my guess(with some tin hat stuff mixed in) that it's a bot net causing your log entries.
[edited by: Jordo_needs_a_drink at 4:53 pm (utc) on Oct. 26, 2006]
GET (ourpage.html) - 80 - (random IP Addy) HTTP/1.1 Mozilla/4.0+compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)
Those aren't false visitors, it's a bot of some sort, you're getting scraped.
The + signs are because whoever pasted that in the user agent string copied it from a Windows IIS server log which replaces spaces with + and they think that's how the world sees that information which is hysterical.
You can learn more about these things in Pubcon Vegas:
[pubcon.com...]
You can learn more about these things in Pubcon Vegas:
Going to be overseas for business in November, Boston Pubcon was it for the year. I'm glad the consensus is scraper bots, better than some sort of attack on Adsense. I can't say it makes any sense to me that the same page would get scraped hundreds of times a day, but I'm a white hat, so what do I know:-)
It's got me thinking. I had been considering becoming a malnourished data center detective, but now I'm intrigued with the thought of being a log-obsessed, crypto-bot-fearing recluse.
Thanks!
Now blocked.
What did you block, a series of IP's? We're getting a thousand different IP's a day with this junk, so that route won't work for us, and there' plenty of legit traffic with the 1.4332 bit. When I first saw this, maybe a half year ago, the false hits were coming from just two IP's in a known bad neighborhood, but now they're random.