Forum Moderators: phranque

Message Too Old, No Replies

Search Form Spam Attack

My site search has been spammed by what looks like a torrent site

         

kiddz83

1:59 am on Jul 23, 2009 (gmt 0)

10+ Year Member



Hi guys, I wonder if anyone could help me with this. I've searched around the forum but can't find anything (yet).

My site is an .asp site and has Google Site Search installed for internal searching. A few weeks ago I got an email from Google Webmaster Tools saying Googlebot found an extremely high number of URLs on your site.

So I checked Webmaster Tools and found over 6,000 "Not Found" URLs that all originate from my Search URL, i.e. mysite.com/search?q=torrent+keywords and mysite.com/Search/?q=more+torrent+keywords

Now I don't see how this would benefit the attacker in any way. However it has damaged my site because if you search for example "mysearch #*$!" in Google, one of these search result URLs would come up 1st!

I have tried adding a rel="noindex" tag and removing the attacked URL from the Google Index because that's the least I can do about it.

But just last week, the attack happened again, and there's now over 2,000 "Not Found" URLs. This time the attacker targets my root domain! i.e. mysite.com/advanced_search?q=even+more+torrent+keywords mysite.com/preferences?q=borat+britney+whatnot

Those URLs don't physically exist but they are still recorded by Google Webmaster Tools as the site URL.

So my questions are:
1. What on earth has happened? What is it that the attacker is trying to do?
2. What is the worst thing that can happen to my site?
3. What is the best way to stop and prevent this attack for good?
4. What settings might I have set wrong on my site that have allowed this to happen?

Thanks guys.

kaled

10:23 am on Jul 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You need to disallow indexing of your search results. If you are able to insert a robots meta tag into the search results, that would be the way to go, otherwise the following may work if added to the robots.txt file

User-agent: *
Disallow: /search
Disallow: /advanced_search

[Disallow" TARGET="_top" title="http://www.robotstxt.org/orig.html#format

Disallow">robotstxt.org...]
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html.

Kaled.

kiddz83

11:16 pm on Jul 23, 2009 (gmt 0)

10+ Year Member



Hi Kaled,
Thanks for your suggestion - I'm in the process to get the IT guys insert Disallow lines for the attacked URLs, after which I will request their removal from the Google index.

However I'm more interested in preventing similar attacks in the future. The IT team is not the most agile (big corporate wheel...) so I need to find prevention tactics rather than countering.

kaled

10:01 am on Jul 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Both of my suggestions should correct the existing problem and prevent such attacks in the future!

Kaled.

kiddz83

6:32 am on Jul 27, 2009 (gmt 0)

10+ Year Member



Yep, but the thing is the attacker seems to use random URLs that don't physically exist, e.g. mysite.com/blabla?q=random+keywords ---> I can't predict the "blabla" part - so far they've used preferences, advanced_search, quality_form, etc. and they all get indexed by Google!

Even if I've noindexed the known URLs and removed them from the Google index, they can attack again with new variations of the URL e.g. mysite.com/blabla2?q=random+keywords, mysite.com/blabla3?q=random+keywords, etc.

How can I prevent this from happening?

kaled

10:52 am on Jul 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Urls that do not exist should resolve to a 404 error page (preferably that includes a NOINDEX robots meta tag just to be doubly secure).

It is reasonable in this case to use cloaking, so that you deliver one error page to search engine spiders and another error page to users. However, the method by which you detect whether an url is genuine or not depends on how your site works - I can't help you there.

Kaled.

kiddz83

11:23 pm on Jul 27, 2009 (gmt 0)

10+ Year Member



Ahh yep that makes sense. Many thanks for all your help Kaled :)