Forum Moderators: phranque

Message Too Old, No Replies

Should I be concerned

Someone accessing mysite attaching another url

         

denisl

7:57 am on May 7, 2009 (gmt 0)

10+ Year Member Top Contributors Of The Month



I have for some time, now and then seen that someone has been accessing my site attacvhing a different url to mine.

They access a page that should be mysite.com/mypage.php?code=#*$!
(code is a widget id)

They replace #*$! with another url. The page doesnt redirect to them (as far as I know) but displays a relatively blank page - should I be concered- can it do me any harm?

Today as an extra twist, I've seen the attached url ends with: Please_click_on_my_google_adds

Fortunately, I have my adsense set up to only display if the widget description on any particular page is over a certain length, so the page that results from the above does not display adsense.

I would like to block this unwanted access to my site but the ip used varies (not even in the same block)- I assume I can do this with htaccess but not sure how.

rocknbil

3:32 pm on May 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



should I be concerned?

I'm going to say . . probably. Others will have better contributions, but...

They replace #*$! with another url. - can it do me any harm?

This is going to depend on the content of the "other URL." What does mypage.php do, does it open a database, are there abilities to access system components? Does this URL contain anything like "or 1=1" or possible email injection content? How secure is myscript.php, does it filter input data?

I've seen the attached url ends with: Please_click_on_my_google_adds

I am "presuming" you are displaying adSense on your site, is this correct? As you (should) know any requests to click a site's ads via forum posts or other inquiries is strictly against adSense policy and grounds for termination of the account. My concern would be that this is possibly a competitor or the like, hoping these requests get logged somewhere and the logs get picked up by Google, in which case your adSense account would get suspended.

Don't let my ramblings alarm you, as I could be completely wrong, but this would be the only real reason I could see for sending your site such a request.

denisl

6:48 pm on May 7, 2009 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you Rocknbill for your input.

The variable in question is sanitised befor it used in a database query so I am hopeful it cannont harm the database.

Yes I do run adsense on the site but it doesn't show unless the widget description is over a certain size - so will not show with the worrying urls.

The urls do not show on the page, only in the address bar - but then anyone can type what they want into the address bar.

I would be interested to know if there is a way to totally block this unwanted access

rocknbil

7:20 pm on May 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There probably is, and you'll find help in the Apache forum, probably via .htacess and a directive based on the query string. This probably won't stop it from showing up in the logs ("permission denied by server configuration",) but it may be annoying enough to make them give up or take your site off their bot list.

If the attempts don't pass through your script and are not logged publicly, you may not have anything to worry about.

g1smd

10:51 pm on May 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What you haven't said, and this is the most important thing to be finding out, is what HTTP Status Code is returned in the HTTP Header for those requests.

If you don't know, right about now is the time to be installing the Live HTTP Headers extension for Firefox and finding out.

If it is '200 OK' then you are in deep deep trouble, because your site returns Infinite Duplicate Content, albeit with a 'relatively blank template page'.

If it is '404 Not Found' then there is little to worry about. You can leave it like that, or set up a 301 redirect to capture those requests and redirect the user to a valid URL that will return real content.

If it is '200 OK' then you have a range of options to fix it. One fix is to set up a 301 redirect to the corrrect URL, either by using a RewriteRule in the .htaccess file or by altering the PHP script, and the other fix is for the script to return a 404 HTTP Header to the browser. If none of the real URLs use parameters, then this can be more efficiently done in the .htaccess file instead.

idfer

11:46 pm on May 7, 2009 (gmt 0)

10+ Year Member



If it is '200 OK' then you are in deep deep trouble, because your site returns Infinite Duplicate Content, albeit with a 'relatively blank template page'.

What's "infinite duplicate content"? Why would it put someone in deep deep trouble? And how would an error page be considered "duplicate content" by anyone? It seems to me that any website with some programming behind it could potentially generate an infinite number of "duplicate" pages that say "invalid request". I can't see how that can be anything but ok.

jdMorgan

12:11 am on May 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In addition to sanitizing all query string data, you should check the requested ID value against your database and if there is no "unique page" that can be generated for that ID, then return a 404-Not Found status response. It doesn't really matter what "page content" you generate, but the 404 status is what is important. I suggest you listen to the advice above and check your server response headers unless you do not care about what search engines think of your site.

Try doing a search for "duplicate content." One of the more interesting thread titles you'll find here is "Duplicate content - Get it right or perish."

A bit of research will reveal that search engines take steps to identify "infinite URL-spaces" on sites. If they discover that requests for every or most URLs return a 200-OK, then they will arbitrarily limit the number of URLs that they crawl on your site as a policy of self-preservation and fairness. They will also limit the number of your URLs that they display in search results, primarily to keep their indexes free of junk, and likely give you several demerits on your site's "quality score" as well.

If you have any concern for usability and retention of off-site referral traffic, then the page content returned with the 404 should explain that the requested resource was not found or does not exist, and offer text links to your home page, major category pages, major site sub-sections, and your site search facility, as applicable.

Run a tight ship.

Jim

idfer

2:16 am on May 8, 2009 (gmt 0)

10+ Year Member



Yes you're correct. I misunderstood the original problem, if the URLs are crawled by a search engine, they should generate 404s, but if it's from someone hacking a form, i'd find it counter-intuitive to generate a 404 instead of an "invalid values in some fields" type error page.

jdMorgan

2:21 am on May 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't give malicious agents any additional information on what you check, accept, or reject. Information is power. Just a 404 will do.

If it is you that needs this level of information, then in addition to the normal 404 response, you can add a 'debug switch' to the code to temporarily enable outputting diagnostic information on the page.

Jim

denisl

9:20 am on May 8, 2009 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks everyone for the help

Yes it was giving 200 response.

After some testing have found exactly what I need to do

This may explain a worry I had with another site where in Google WMT, every now and again the crawl stats show them crawling far more pages than they should.

Jim - as an alternative to the debug switch, I have some code on my 404 page that sends info to me in an email.
I have also put this on a site I monitor for a client to help me keep an eye on things.

BradleyT

2:56 pm on May 8, 2009 (gmt 0)

10+ Year Member



My guess is that maybe on some pages the querystring and the page heading text match up. So they thought that by adding "please click on my ads" to the querystring that it may show up in your heading text (h1) on the page and you'd get possibly banned.

On my site the ending url and the heading text sometimes match but the heading text is always pulled from the DB so messing with different URL parameter text is only going to cause 404's not different text on the page.

So now the question is, who and why does someone want to get you in trouble with adsense?

kaled

3:25 pm on May 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you can identify bad urls, simply issue a 301 permanent redirect to a good url. Job done.

Kaled.

g1smd

8:42 pm on May 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's a very large US retailer that allows you to change (in the URL request) the attribute value for the product category, and that attribute value is then reflected into the page content.

You can make some crazy stuff up and the website will parrot it back, with products seemingly categorised a long way from their proper location ([not real examples] e.g. watches in the underwear section, socks in the kitchen section, electric saws in the shoes section, and so on).