Forum Moderators: not2easy

Message Too Old, No Replies

"proxy server" with entire copy of my site

issue a take-down request, or nothing to worry about?

         

callivert

5:09 am on Oct 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was doing a search for copies of my site, when I found a proxy server that had a carbon copy of my entire site. The pages interlinked: there were no links back to my site.
It even had my adsense ads, with my adsense id on them.

I contacted the owner who told me that the site was a proxy server, that did not store any data from other websites. The owner said they would block my site from users from now on.
My material is still accessible at their url (complete with my ads).

I have a couple of questions. First, if the proxy site doesn't store data, how come (a) I found my own material on their site through Google, and (b) why was it all still there when I visited their site? A Google "allinurl" search showed a ton of material from all around the web on their site.
Second, is this something I should worry about? On the one hand, they had my adsense showing. On the other hand, this is not a content sharing arrangement that I was told about, nor did I agree to it.

Is/was this a storm in a teacup? Did I do the right thing asking them to take my material down?

jdMorgan

1:21 pm on Oct 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you reviewed this recent thread [webmasterworld.com]?

A proxy server is like a pipe -- The user connects to the proxy server, the proxy server connects to your site, and then 'pipes' your content back through to the user. That's why the owner said he didn't store any of your content on his server.

Proxies can be bad or good, and there's quite a bit of grey between those black and white distinctions.

An example of a 'good' proxy is that you can use a proxy in a foreign country to check localized search results for that country, by-passing the search engines' feature that detects your location and localizes results for your country; By using a proxy, you appear to be a user in the proxy's country, not your own.

Another 'good' example is that of a user in a country which restricts access to global information sources, Myanmar being a current example. By using a proxy, the user can possibly hide his/her own identity, and bypass the security wall around the country.

Proxies can also be used for 'bad' -- For example, a site-scraper 'bot crawling your site through one or more proxies to hide its identity and activity from you, the Webmaster. Or a proxy with a script added on top of the pure 'data piping' function, so that the proxy connects to your site to get your content as described above, but substitutes its own ads for yours.

However, in all but the last example --good and bad-- the function of the proxy hasn't changed at all -- A pure proxy server just acts as an intermediary between the user and your site -- as implied by the definition of the word "proxy."

Jim

callivert

2:26 pm on Oct 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks JDMorgan.
Have you reviewed this recent thread?

yes, I found it after I posted this one!

That's why the owner said he didn't store any of your content on his server.

Given this, I don't understand why there seems to be static pages of mine on his domain, right down to their own google cache. i say "seems" because apparently thre aren't, but that's how it looks.
Someone on the other thread you mentioned said that no proxy server should ever allow themselves to be crawled by search engines, and this seems to be at least partly the problem here.

jdMorgan

2:55 pm on Oct 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It helps to consider a proxy server as a "window" through which you can view any other site -- as long as that site doesn't block the proxy server's access by IP address, hostname, or proxy-specific HTTP request headers.

So yes, it "appears" that your site is on the proxy server, but it's not -- The proxy simply requests stuff from your site and shows it to the client (browser or SE robot) accessing the proxy.

Some of these proxies may temporarily cache the content they fetch but if they get high traffic, they won't be able to cache it for long and you should see the proxy server accesses to your site in your raw server access logs. Using that logged information, you can then block the proxy, --again by IP address, hostname, or proxy-specific HTTP request headers-- using mod_rewrite, ISAPI rewrite, or even a common PHP header script included on your pages.

Jim

jtara

3:23 pm on Oct 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Er, wait a minute.

Although the descriptions above of a proxy server are accurate, this doesn't sound like one.

Proxy servers don't normally make your content appear to be on another site. As stated above, they are a "conduit".

Traditionally, proxy servers are used by configuring your web browser (in it's setup pages) to connect to the proxy.

The browser bar will show YOUR site, not the proxy site.

There are "web 2.0" type proxies now as well, essentially a "browser within a browser". The proxy server shows it's own URL bar. Again, though, it should show your site name, only in the "browser within a browser".

Proxy sites don't normally show-up in Google or other search engines, either.

Can you example-ize the URLs on this "proxy" site where your content is found and post here?

Receptional Andy

3:36 pm on Oct 4, 2007 (gmt 0)



A proxy server can be web based, e.g. www.example.com/?url=www.anotherexample.com

The point being, the content is available only when it's requested, so it isn't stored in any traditional sense. When you check to see if it's available (or Google do) then it is, because they requested it...

callivert

4:36 am on Oct 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Can you example-ize the URLs on this "proxy" site where your content is found and post here?

sure.
Here's what happened. I googled by own content, and I found a non-supplemental SERP containing my content at the following.
www.example.com/folder/www.mysite.com/stuff-I-wrote.html

example.com is a proxy server, including a translation service.
further Google search showed that every page from my site was cached in the Google index as being from example.com. e.g.,
www.example.com/folder/www.mysite.com/page1.html
www.example.com/folder/www.mysite.com/page2.html
etc.
This wasn't a disaster, since my own site is also indexed, so in almost all cases, those copied pages of my content were supplemental.
I looked into the site a little further with the "allinurl" operator in a Google search.
allinurl showed that Google has cached hundreds of thousands of pages owned by sites all over the internet from www.example.com.
Similar results for yahoo siteexplorer.
The search engines have indexed a massive amount of content from many, many sites as coming from www.example.com.

callivert

10:53 am on Oct 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



as a followup:
since this issue got cleared up, adsense has started showing ads again on my site. Recently I've been getting nothing but public service ads, and I had no idea why. Now the real ads are back.
Probably a coincidence...

cgiscripts4u

5:12 pm on Nov 8, 2007 (gmt 0)

10+ Year Member



I have discovered the same thing happening to one of my sites, but the proxy url does not show my sites url at all, ie the url is in the format of

http://www.example.com/index.php?q=aGR0cEovL427b38uY28ueWs%3D

which is the exact copy of my site, all of the links are changed so they go through similar urls to get exact copies of other pages on my site.

I have no problem with people using proxies but in this case the proxy is indexed in Google and by default it turns Javascript off so my bandwidth is being used up but none of my javascript is working, ie analytics is not recording the traffic, adsense ads are not being displayed etc.

I have blocked the IP of this proxy for the time being but do wonder if there is a better solution.