Hunting Google Safebrowsing Diagnostic Spidering

Forum Moderators: open

Message Too Old, No Replies

Hunting Google Safebrowsing Diagnostic Spidering

Anyone seen anything?

incrediBILL

1:35 am on May 29, 2009 (gmt 0)

Interesting thread in the Google forum about safebrowsing diagnostic spidering possibly going on at Google that may not be the standard Googlebot.

[webmasterworld.com...]

I literally have too much traffic, esp. from Google, to sort this out easily if it's not just the Googlebot doing it's thing with post analysis.

What I would look for, and I don't store enough historical data to do this otherwise I'd have a server farm just holding log files, would be to try to identify IPs coming back just looking at javascript as a stand alone file perhaps, looking for malware downloaders.

Sometime tells me the only way to find this beast might be to set a honeypot trap for it.

Pfui

8:46 am on May 29, 2009 (gmt 0)

Hmm. Methinks the 'bot' might be anybody using the Google Toolbar [toolbar.google.com]. From Google's "Safe Browsing for Firefox" FAQ [google.com], emphasis mine:

-----
11. What is the Enhanced Protection Feature?

If you enable Enhanced Protection, the Google Safe Browsing Extension will provide more up-to date protection by sending encrypted URLs of sites that you visit and limited information about the site content to Google for evaluation. For information about how we protect your privacy in this and other usage data, please see the Google Privacy Policy [google.com].

12. What information is sent to Google when I enable the Enhanced Protection Feature?

When enabled, the entire URL of the site that you're visiting will be securely transmitted to Google for evaluation. In addition, a very condensed version of the page's content may be sent to compare similarities between authentic and forged pages. ...
-----

FWIW: Here's an analysis [oreillynet.com] of the data exchanged circa late 2005.

Thoughts?

[edited by: Pfui at 8:50 am (utc) on May 29, 2009]

dstiles

10:06 pm on May 29, 2009 (gmt 0)

Pfui - Things have moved on quite a bit in the past YEAR, never mind 4 years.

IF google is still using that method then it's going to expose trusting punters to a) potential fraud; b) google's advert team (or whatever).

It seems from your paras 11 & 12 that urls are now being sent encrypted. Ambiguously it also mentions "limited... content" - are we to understand that this is also encrypted? Para 12 SUGGESTS it's not but it isn't clear.

Language is so seldom used accurately that something like this can IMPLY it's encrypted when it's not - or not. And that originally SSL page MAY include secure information - "Your password for this session is...". Some sites even embed sensitive info in plain text within the page as hidden form fields. After all, it's sent to the punter encrypted, right?

I agree that most people will auto-accept almost anything presented to them on-screen, from "This page is safe" to "I am about to kill your computer." I recommend NoScript to my customers. I'm reasonably certain that almost none of them, IF they install it, a) actually sets it up properly; b) understands what it's all about; c) even reads the NoScript web page downloaded with ever update.

Apparently-valid SSL certs are now being used by malware sites to fool the punters.

On the plus side, google is working from info already downloaded from the web site so it isn't impacting web site bandwidth as AVG did / does.

That's assuming the Firefox SB scenario. If google is checking web sites directly by visiting them then that would impact bandwidth AND may need SSL access to get relevant pages. If this only happens occasionaly (eg first page request of the day) then it shouldn't be too intrusive.

On the other hand, what happens if the google access is rejected (IP doesn't match bot activity, badly formed UA or header) and gets rejected. Would google report a bad site, as older versions of AVG (STILL) do? And if so, is there a means of suing google for implied blackening of character and loss of traffic?

And, Bill, why only javascript? There are several other vectors employed on exploit web sites, from PDF and media files to Java, Activex and iFrames.

As I noted in the google thread mentioned above, I tried three of my domains and got nothing odd in my logs.

As noted elsewhere in that thread, google are supposed to be using a third party site for malware detection (also used by other companies). Which, if something goes worng, can certainly be used to spread the blame around. :)

Pfui

2:37 am on May 30, 2009 (gmt 0)

dstiles, I linked to the O'Reilly author's analysis because although it's older and yep, things change, here's something that hasn't, yet: FAQ #6 [google.com], a.k.a. "How does Google know a page is bogus?" is exactly the same today as it was in Dhanjani's 12-05 assessment.

I think I'll leave to ye uber-intrigued the details of whether the [google.com...] URL is still invoked, and/or what info is transmitted, and how and/or how securely; and to plaintiffs' lawyers the Google lingo:)

FWIW:
Google Safe Browsing API [code.google.com]
Google Safe Browsing API Group [groups.google.com]
(Aside: With all of their innovations, Google has yet to clean up its own house, spam post-wise. Alas.)

FYI:
From the API Developer's Guide [code.google.com], the "Reporting incorrect data" section:
-----
Report malware URLs that are not currently on our malware list
[google.com...]

Report URLs that are currently on our malware list in error:
[stopbadware.org...]
-----

dstiles

10:30 pm on May 30, 2009 (gmt 0)

Based on the number of exploit domains reportedly listed, according to security blogs, in google et al, I don't have tremendous faith in their detection - or at least, their own use of the detection. It's not that scammers can easily detect tests and bypass them, but the worst IPs are fairly well reported so could easily be blocked from listings, not just warned about.

Your safebrowsing/lookup link reports, "Your client has issued a malformed or illegal request." Is that you or google? :)

The first API link seems to assume it's MY site that has the problem and should carry the api. Now I expect it's VERY difficult for a scammer to put that api on his exploit site...

I'm not saying the enterprise is not useful, just that it could be better.

Pfui

2:56 am on May 31, 2009 (gmt 0)

The safebrowsing/lookup link was in Dhanjani's analysis [oreillynet.com], something he intercepted while testing.

For current data, looks like we need an HTTP Headers aficionado/maven with SafeBrowsing and "Live HTTP Headers [livehttpheaders.mozdev.org]" or its ilk aboard.