Forum Moderators: open

Message Too Old, No Replies

HuaweiSymantecSpider

Privacy, ethics

         

Frank_Rizzo

11:10 am on Nov 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



69.28.58.3 - - [02/Nov/2010:18:00:11 +0000] "GET /robots.txt HTTP/1.0" 200 3223 "/robots.txt" "HuaweiSymantecSpider/1.0+DSE-support@huaweisymantec.com+(compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR ; http://www.huaweisymantec.com/en/IRL/spider)" 0

69.28.58.6 - - [02/Nov/2010:18:02:23 +0000] "GET /robots.txt HTTP/1.0" 200 3223 "/robots.txt" "Huaweisymantecspider (compatible; MSIE 8.0; DSE-support@huaweisymantec.com)" 0

62.24.181.134 - - [26/Nov/2010:19:40:42 +0000] "GET /mysite_members.php HTTP/1.0" 200 19673 "/mysite_members.php" "HuaweiSymantecSpider/1.0+DSE-support@huaweisymantec.com+(compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR ; http://www.huaweisymantec.com/en/IRL/spider)"

other IPs noted from this bot:

69.28.58.5
69.28.58.43
62.24.181.135
62.24.252.133
62.24.252.132

---

This is the TalkTalk ISP privacy invading bot. It's front is to 'protect' TalkTalk customers from visiting sites with malware. But it's true modus operandi is to build up profiles of users, for various means such as targeted advertising, for Government IMP / Data retention purposes, anti-net neutrality.

Either way this is a bad bot. It is scraping your pages and processing the info in China - out of the jurisdiction of the UK ICO and other such authorities.

Info on the Huwaei bot:
[nodpi.org...]

Why TalkTalk customers should get their MAC and leave:
[nodpi.org...]

[edited by: incrediBILL at 5:34 am (utc) on Nov 28, 2010]
[edit reason] fixed UA links [/edit]

dstiles

8:42 pm on Nov 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




System: The following message was spliced on to this thread from: http://www.webmasterworld.com/search_engine_spiders/4236027.htm [webmasterworld.com] by incredibill - 7:46 am on Nov 29, 2010 (PST -8)


This posting is in part the result of a report at nodpi.org and partly a follow-on from a discussion in the WebmasterWorld UK & Ireland Search Engines forum at [webmasterworld.com...]

TalkTalk, who have many aliases (significant one here is Opal), employ the spider to check whether a web page has a virus or not. This would be OK if it were checking on the client machine but it's instead intercepting the communication using Deep Packet Inspection, which is illegal in many parts of the world including Europe (UK is currently being sued by EU Justice over the BT Phorm trials and other privacy issues).

In TalkTalk's Q&A:

"7. Will only customers who sign up to Network Security have the websites they visit scanned?"

"We are scanning all the websites our customer base as a whole visits, in complete anonymity. You have to opt-into the Virus Alerts product itself, so if you don't want the warnings while you browse you don't have to enable the service, or if you activate Virus Alerts, you can switch it off again at any time afterwards."

What this means in practice is that TalkTalk visits EVERY web page any of their users visit regardless of whether the facility is opted-into or not (but see below). I do not know if robots.txt is obeyed (see below).

I have just conducted an experiment with my brother, who has a TalkTalk account. SOME pages he visited were also visited by the bot about 30 seconds later (what good is that to someone visiting a new site?!) but not ALL pages. Javascript/CSS/images were not tested but this may be due to receiving a 403.

The good news: it checked robots.txt first. What I don't know is if it obeyed it, since I do not have specific disallows (or allows). The specific observed action was:

robots.txt
home page (first one brother visited)
robots.txt
2nd visited page
4th visited page
robots.txt
8th visited page
10th page visited
robots.txt
11th page visited

As far as I'm aware there is no restriction on visiting webmail sites etc although, since the bot is purported to be based on phorm, it may or may not be able to read SSL (although it may know at least the original URL).

The bot was trialled in June this year - again illegally and without their customers being aware of it. It was at that time that I discovered the bot for myself, without knowing its function, and blocked it (see below). It appears the system went live on 26th Nov.

I have seen reports that the bot was engineered in China from the original Phorm DPI bot and that pages (or at least results) are forwarded to China. TalkTalk claim that no personal information is retained. See forum on nodpi.org for further details.

There is no immediate indication that adverts or other "personalised" information is to be targetted at Opal customers but with DPI it's always possible/

===========
Adapted from my posting on previously identified WebmasterWorld forum)

At the start of this month (from 2nd Nov but may have begun any time before Nov) all hits were from ChinaCache USA but from 26th Nov they all seem to be from Opal on an IP range I partially blocked back in June when I had around 300 hits in 15 days on two IPs and another couple of thousand on a few others. Knowing what I do now at least some of that was obviously the experimental period of the bot around June/July.

ChinaCache North America: 69.28.48.0 - 69.28.63.255
Owned by Beijing Blue I. T. Technologies Co Ltd
Admin in US by Citynet.

Opal Telecommunications full range: 62.24.128.0 - 62.24.255.255
(bots appearing with about half-dozen IPs in groups of 2 to 5)

Within that range I've also had multiple bot-like hits from webmarketing company Global Media Applications Ltd (G-MAPPS-UK) on 62.24.226/23 as well as many other "bots" going back before April across several sub-ranges. The range is now completely blocked.

I am now seeing hits from ChinaCache with the UA:

page_test_larbin2.6.3@unspecified.mail

Obviously these are being rejected with 403's.

Does anyone else have information on this Opal bot, such as: is it worth putting a block in robots.txt (and if so WHAT?); and has anyone with a TalkTalk/etc account been blocked because of a) a web page virus and b) because a 403 was issued by the site.

dstiles

5:40 pm on Nov 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ChinaCache is now sending wget UAs:

Wget/1.9+cvs-stable (Ref Hat modified)

Pfui

1:33 am on Nov 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Too weird. They've obviously gone wide in a hurry. And with a log-spamming UA, fake referers, etc. New today:

62.24.252.133
HuaweiSymantecSpider/1.0+DSE-support@huaweisymantec.com+(compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR ; http://www.huaweisymantec.com/en/IRL/spider)

11/29 17:08:49/robots.txt
11/29 17:15:31/robots.txt

robots.txt? Yes
Fake refs? Yes: "robots.txt"

caribguy

8:41 am on Nov 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the posts, both the original and the 1st hand analysis! I remember seeing some of these: Chinacache sounds familiar. Will dig through my logs, ipfilter tomorrow and correlate.

Pfui

6:36 pm on Dec 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, this UA just in:

62.24.181.134
(compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR ; http://www.talktalk.co.uk/products/virus-alerts/)

robots.txt? Yes BUT...
Fake ref? Yes: robots.txt

Note also the space before & after the last semi-colon: .
NET CLR ; http

dstiles

9:04 pm on Dec 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, I've begun getting those. Someone at talktalk/opus screwed up the UA.

It seems to be in response to real visitor accesses - again, AFTER the page has been returned to the actual visitor by at least several seconds, so either taking a very long time to appear in the web browser or, much more likely, the bot access is made with no intention of telling the visitor whether or not the site has a virus, much like the huaweisymantec bot. Which places both types of access as advert-tellers for later use rather than virus checkers. Which, considering China is involved and there is a Phorm linkup, should be even more worrying to talktalk/opus customers.

The UA IS hitting robots.txt before each "virus-alert" hit, the same as huaweisymantec bot. What happens later on, if/when talktalk/opus reports the fact of bot rejection (or 403) to their customer - which knows?

What I previously meant to say in this thread was: I'm very concerned that symantec is apparently associated with all of this. It implies that everything is above board and that the sites are being virus-checked BEFORE the visitor gets the page, not some time after. Not only illegal use of Deep Packet Inspection but fraudulent advertising re: virus-checking, I would have thought.