Forum Moderators: mack
http://search.live.com/result.aspx?q=KEYWORD&mrt=en-us&FORM=LVSP
When I load the referred page then I am told that there are no results. Also there is no relationsfip between the keyword and the page requested. The Kkeywords are single words and seem to be mainly concerned with the normal spam areas.
I have scoured Live to try and find form 'LSVP', searched everywhere that I acn think of.
Can anyone enlighten me as to what the heck form LSVP is? Have the spammers foound another flaw? I am based in the UK.
Thanks in advance.
[edited by: engine at 10:30 am (utc) on Aug. 18, 2007]
[edit reason] delinked [/edit]
example:
65.55.209.48 - - [28/Aug/2007:22:55:11 +0200] "GET /aut.php?id=3244&bib=1 HTTP/1.0" 200 6152
[deleted lines]
65.55.165.11 - - [28/Aug/2007:22:56:14 +0200] "GET /aut.php?id=3244&bib=1 HTTP/1.0" 200 6152 "http://search.live.com/results.aspx?q=tetsuya&mrt=en-us&FORM=LIVSOP" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
65.55.165.11 - - [28/Aug/2007:22:56:15 +0200] "GET /fct.js HTTP/1.0" 200 12745
65.55.165.11 - - [28/Aug/2007:22:56:16 +0200] "GET /skins/bdN.css HTTP/1.0" 200 3113 "" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
The hits, which claim to be from Microsoft, list the user agent as: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
These hits always come within 60 seconds of a visit from msnbot for the exact same page.
I don't get much traffic from live.com, which is one of the reasons that these hits stand out so much.
The other reason that they stand out is that the query string listed in the referering pages from live.com shows really interesting search terms. For example, I got one earlier today for "insurance." I wish I was getting real traffic for a term like that!
I'm reluctant to block the traffic, but I suppose if the volume of these hits increases, I might not have a choice.
What actions are you guys taking?
The keyword showing in my logs is "travel" - that's the sector the affected site is in.
Possibly M$ is beta testing some directory or context advertising system? - simulating or possibly running with thousands of PC's and watching peoples surfing habits? pure conjecture of course.
Maybe they learnt something from Halo.... Wired Report [wired.com] on Halo game testing
I'm reluctant to block the traffic, but I suppose if the volume of these hits increases, I might not have a choice. What actions are you guys taking?
Well I'm working on getting some feedback from Microsoft before it gets out of hand. Hopefully posted here.
Until they respond they are blocked... screwing up my stats..
Fair enough! Anyone else doing the same?
There are times when silence is a virtue from a big company. This is not one of them.
These are all from the 65.55.165.xx range that use the adult/spam faked referrers.
I also see one coming from bl1sch2044210.phx.gbl at 65.55.235.217 with UA of msnbot-media/1.0 (+http://search.msn.com/msnbot.htm). It first showed up on 10:03:23 AM on Sunday, August 26, 2007 and has since come back for a total of 129 pages. I have not blocked this as it only hit less than 130 pages.
There also seems to be a msnbot-media/1.0 bot running with livebot-65-55-213-6.search.live.com (65.55.213.6) and one at livebot-65-55-235-202.search.live.com (65.55.235.202). These do not use the adult/spam/faked referrers.
[edited by: The_Contractor at 10:54 am (utc) on Aug. 30, 2007]
Someone's running a scraper bot using an IP faker and false referrer. The content is getting jumbled and republished somewhere in a way that may or may not be getting indexed. My guess is that the spammer is cloaking this content to Googlebot and/or just putting it up there for the traffic value or any vague hope of link juice. (Find a unique word in your content, then search for it and go to the end of the results where the spam is).
Using Microsoft IP's and referrer convenient as blocking them might also block real traffic.
Question - why use Microsoft instead of Google? Have Google found a fix or does the spammer think Microsoft would result in less suspicion? OR... have they found a hole in the MS technology (not necessarily search) that let's them spoof or use Microsoft's IPs? JdMorgan points out [webmasterworld.com] that they are from tide.microsoft.com so maybe there's a technology there that's being exploited.
I still think Microsoft should confirm if this isn't them. There was a guy at AdChamps in the UK from MS that gave the most knowledgable presentation of click fraud I have ever seen - from anywhere in the industry. So they have the expertise. Just not the publicity machine on the organic side to be able to tell us.
[edited by: Receptional at 11:47 am (utc) on Aug. 30, 2007]
Someone's running a scraper bot using an IP faker and false referrer.
Everyone always thinks people fake IPs and that has no value for data retreival, so unless you're mounting an attack faking does nothing.
Just a little light reading on spoofing [securityfocus.com]...
While some of the attacks described above are a bit outdated, such as session hijacking for host-based authentication services, IP spoofing is still prevalent in network scanning and probes, as well as denial of service floods. However, the technique does not allow for anonymous Internet access, which is a common misconception for those unfamiliar with the practice. Any sort of spoofing beyond simple floods is relatively advanced and used in very specific instances such as evasion and connection hijacking.
Basically, if someone was spoofing MS we'd all be sending the data BACK to MS and not the spoofer, get it? So if it's not MS IP's doing this then it's actually someone engaging all of our servers to mount and attack against MS and even 403 errors send packets.
I'm still in the camp that thinks it's a) a MS project of some sort of b) a proxy service being abused.
Blocking it will probably have no repercussions unless it's an cloaking checker.
I'm running reverse cloaking so if any of the content collected from those IPs is actually used I'll know about it and let you know if it ever appears.
Basically, if someone was spoofing MS we'd all be sending the data BACK to MS and not the spoofer, get it? So if it's not MS IP's doing this then it's actually someone engaging all of our servers to mount and attack against MS and even 403 errors send packets.I'm still in the camp that thinks it's a) a MS project of some sort of b) a proxy service being abused.
Yep - I get that now. Thanks for clarifying, Bill. Receptional_andy also pointed out the error of my thought process :).
Semi-advanced robot. It initially looks like a human but if understand the patterns in log files and what they imply you'll know that this is indeed a bot. I will not however go in to any further detail on that aspect however.
I'm not sure I can agree with the cloaking theory because after all you wouldn't want to make people aware that you're looking to figure out if they are cloaking?
The site scrapper seems (without deep insight in to my own logs) to make the most sense initially. Spammers aren't apologetic in the least about screwing up our statistical analysis.
Here is an important question, does Microsoft's Live spider support the application/xhtml+xml media type? I know Google does not. This bot is requesting pages with the following query on my site...
file.php?mime=axml
I think my site's media type switcher isn't functioning correctly (oh well it's well over a year old and soon to be replaced anyway) though I'm sure this has some implications?
Will blocking with the earlier mentioned Apache script block legitimate traffic and legitimate Microsoft Live spider crawling?
- John
I'm not sure what you base this statement on. Both Googlebot and Googlbot-Mobile regularly fetch and index my mobile-device pages, and all are of MIME-type application/xhtml+xml. These mobile pages are also indexed in MSN, so I conclude that msnbot can handle tha MIME-type as well.
Also, that query string is meaningless to the server. It is just a query string, and unless your file.php makes use of it, it is ignored; It does not 'select' an application/xhtml+xml response unless your script interprets it as such.
Maybe I'm not seeing the same "Strange Referrer Activity" as the rest of the respondents to this thread, but I've managed to block all of these requests by denying access to Microsoft's "Tide" proxy servers, as I noted above.
Jim
First, we appreciate the concerns and issues that have been raised and apologize for any incovenience this might have caused.
Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on
addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.
Please keep the feedback and thoughts coming as we will use this to help improve this process and make sure that it impacts your sites as little as possible.
thanks
- msndude (msd)
The traffic you are seeing is part of a quality check we run on selected pages
I understand your need for quality checking but trying to bypass site security just to check for cloaking is a bit much. Besides, it came from Microsoft IPs and was easily detectable (we all caught it) means it can also be easily cloaked so if you think you're really doing quality control you're just fooling yourself.
FWIW, my bot blocker quarantined that IP range as a roque bot a long time ago because your server kept asking for pages and couldn't answer the captcha.
The traffic you are seeing is part of a quality check we run on selected page
Sorry, but when you run through a proxy and use fake adult, spam, and s@x related referring query strings, you are blocked. Maybe you should be running a quality check on your engineers...
I'll risk a little traffic loss over the principle that a "real" company shouldn't use faked adult, spam, and s@x related referrers.
My interpretation is that it is running through all the top spam search terms and clearing a site of having those terms... therefore is viewed as a 'quality' website.
However, by really screwing over webmasters logs and getting blocked in many cases ... (which may prevent webmasters sites being shown on LIVE serps), Msn are actually making their serps worse by being denied access to legitimate sites.
Now to perform a quality check on a page it shouldn't be interferring with webmasters logs ... an idea would be to cache the page on msn servers and run quality checks on the cached copies and NOT on the webmasters site.
Simple process - download the sitemap file, retrieve updated/modified pages, compare pages with existing cached copy, evaluate page changes with quality check on cached pages and reassign scoring rank based on the cached page.
For several weeks I have been annoyed by this "quality check" as it imposes it self as real behaviour with full fledge browser capabilities and a standard user agent. All of the sudden my customers are all excited over getting all this search engine traffic from Live Search – which they are in fact NOT.
I am very very close to just block out all Microsoft traffic in the 65.55.165.* segment now! So Live-guys, please, PLEASE state in the user agent that this is robotic traffic, e.g. "Live Search Quality Check Robot" or what ever, and by that give us a chance to deliver correct data to our customers.
Regards
Jesper
Although I did not do any research as to your genuine identity as a Microsoft employee, I generally question any poster's authenticity when it comes to any well known company's new policy posted on a discussion forum and nowhere for it to be seen on their site or official news anywhere.
An important issue here which is affecting millions of companies websites as well as well known and highly trafficked sites, surely MS should have posted something about it.
Knowing that spoofing search queries, referrers, domains and IPs in any manner will trigger security software such as mod_security as well as any security systems and webmasters to manually and automatically block IPs, which will in the end prevent MS and its bots from requesting and indexing sites and pages of those millions of webmasters. Should that happens, which it looks to me it's happening alteady, sooner or later 90% of the web will be inaccessible to any MS Bot, hence the live database will have only few million of lower quality and unimportant sites / links.
I can't believe for one minute Microsoft will want that, and in my opinion, some smart hackjack is doing his/her bit to ruin MS. A competitor, or an insider employee acting as a Mole using a competitor's infrastructure and technology within one of the Microsoft buildings...
msndude said:
Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on
addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.
You're sending queries to Google AdSense, downloading and processing Javascript blocks using people's AdSense publisher ID, greatly inflating impressions, causing a much lower CTR, which for all we know is decreasing the per-click earnings on those accounts.
On top of that, now you are saying if we don't let you continue, we might not get included in MSN Live search?
How the hell is that Quality Control?
-Michael
The fact that IE is installed on many new machines provides MS with a huge opportunity they cannot capitalize on. The average Joe knows it - Live stinks as a search engine.