Forum Moderators: open

Message Too Old, No Replies

C4pc

         

Pfui

4:00 pm on Jul 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



[Just in case -- This is titled this "C4PC" but post Preview shows "C4pc".]

search2.cloud4search.com
C4PC UserAgent/0.7

robots.txt? Yes BUT...

First request to robots.txt used "-" as the UA and was blocked (no UA; hmm). Second hit to / used UA above also blocked (no Moz).

Search results full of hits from multi-numbered subdomains but WHO sez no site, no profile; just registered in France via OVH SAS. The owner, CLOUD4PC, matches the UA name (C4PC), and cloud4pc.com -- also no site, no profile, but "coming soon" -- is also OVH SAS in France, long a source of troublesome UAs and Hostnames.

FWIW, I've been redirecting "cloud" because of junk from multiple .static.cloud-ips.com accounts, etc. Not a single real user yet.

dstiles

10:46 pm on Jul 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For the record, this is on OVH IP range 91.121/16

Pfui

6:41 am on Jul 26, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Now they're cloaking, even if the UA is semi-notorious* --

search2.cloud4search.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;)

robots.txt? NO

* [webmasterworld.com...]

Dijkgraaf

9:40 pm on Aug 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Same UA: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;)
robots.txt? YES

Pfui

12:24 am on Aug 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Apparently cloud4search's bots are inconsistent. This morning, the same cloaked UA from another of their hosts made a beeline for one existing file. Too bad it wasn't robots.txt:

search3.cloud4search.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;)

robots.txt? NO

Following a link? Dunno, but doesn't matter since no robots.txt means cloud4search is a goner on sight (& sites:)

blend27

10:00 pm on Aug 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is a new OVH Range for me, all others been blocking for ages.

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;)

To me this UA simply says that someone is 'usin', IE6 on Windows 2000 box?, common now...

robots.txt? NO

Single hit(homepage), banned on the SPOT.

Dijkgraaf

10:27 pm on Sep 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I got hits from the OVH range as well
UA: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;)
IP: 178.33.104.nnn
rDNS: 178-33-104-nnn.ovh.net.

robots.txt: Yes

Strange that I'm the only one seeing hits to robots.txt

keyplyr

11:16 pm on Sep 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've also seen several IPs from the 178-33-104-***.ovh.net range sneaking around as:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;)

Requesting robots.txt and a few other web pages.

Pfui

6:59 am on Sep 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The OP is about "C4PC" from cloud4search.com hailing from OVH IPs in the 91.121. ranges. It looks like the OVH 178.33. spawn is something else again, possibly warranting its own thread?

FWIW... Hitting robots.txt neither makes a bad bot good, nor even minimally trustworthy, imho. Yesterday, the cloaked OVH 178-33. bot simultaneously requested robots.txt and a redirected->offsite link:

178-33-107-9n.ovh.net
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;)
/offsite

178-33-104-43n.ovh.net
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0;)
/robots.txt

Simultaneous hits don't cut it, particularly since the link hit is only on pages the bot's disallowed from accessing.

enigma1

11:17 am on Sep 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All I see in my logs for the 91.121.* range are attempts to hack. The UA is different than the OP, libwww and the like.

The ips themselves I saw respond to port-80 so they're either compromised systems or the bots are setup on purpose trying to hack systems.

Dijkgraaf

8:15 pm on Sep 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@Pfui
I agree that just hitting robots.txt isn't sufficient to make it a good bot. It also has to obey those rules, use a UA that declares it to be a bot and which contains a URl that points to a page that explains the purpose of the bot.
They seem to miss on all the points at least some of the time.

keyplyr

9:32 pm on Sep 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My take - I care more about how my work is being used. What is this crawler doing with my property? How does it benefit me?