Forum Moderators: phranque

Message Too Old, No Replies

Random URLs in log coming from Googlebot

Off topic searches

         

sftriman

3:52 pm on Jan 5, 2012 (gmt 0)

10+ Year Member



I've now collected a list of 10,800 URLs coming from Googlebot which hit my sites search function with random, off-topic searches. Has anyone seen this before? Where do they come from? How do I get rid of them? My concern is that they are indexed by Google, so they are taking up valuable indexing with which I'd rather index my real pages. For now, if I see it's one of the 10,800 random searches with no relevance or meaning on my site, I 404 it. And in Google Webmaster, I am seeing that they are marked as 404. Does that mean Google will hopefully nuke those URLs from its index some day? My site deals with golf - here are some searches to illustrate:

billabong truckin sweater plum
billabong tulip luggage pink lady
billabong turmoil zip hoody black
billabong turmoil zip hoody navy
billabong upper deck t shirt athletic
billabong vernon walk short white
billabong vertigo walk short desert
billabong young folk hat black
billion dollar brow brow powder taupe
billy jealousy hair
billy jealousy lightning bolt electric shave
billy jealousy liquidsand cleanser
billy jealousy white knight cleanser
binding t3 automatic
binding touring auto nnn
biochem 100 berrie whey 1 39lb
biochem 100 green whey protein 2
biochem 100 hemp whey 11 7oz
biochem 100 raw food whey 11
biochem 100 whey protein 2lb
biochem 100 yams whey 11 7oz
biochem 5 htp tryptophan 50 mg
biochem aller max caps 50 vcap
biochem alpha lipoic acid 100 mg
biochem dhea 25 mg 90 caps
biochem glucolean 120 tabs
biochem glucosamaine chondroitin 60 caps
biochem green whey protein bar 12
biochem lipoic acid time released 60
biochem phosphatidylserine complex 30 gels
biochem r lipoic acid 60 vcap
biochem tension rx nighttime 90 vcap

viralvideowall

9:46 pm on Jan 6, 2012 (gmt 0)

10+ Year Member



This doesn't look good. Just to be safe I'd check your site to make sure there are no exploits... is it a wordpress site by any chance?

tangor

5:17 am on Jan 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If these pages do not exist on your site, and are not returning a 200 found, the normal 404 should take care of these without requiring any thing extra in .htaccess.

I did note you spoke of "my site's search function". Do you have an on site internal search engine fielding these requests instead of a direct request? Have you run any rDNS against the ips to see if they are from G?

enigma1

1:38 pm on Jan 9, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



which hit my sites search function with random, off-topic searches

Your search function uses a /GET or /POST form? Best to use /POST or some transparent jscript. Spiders may try forms if they see a search link can be extracted and tested.

sftriman

2:53 pm on Jan 9, 2012 (gmt 0)

10+ Year Member



I use GET. So are you thinking these 1000's of hits are not googlebot, but rather spiders spoofing as googlebot? I read a theory that somehow, a browser searchbar could somehow be responsible for the generation of all these random, well-formed strings for search; but why they would then be tried on my site, I can't figure. I suppose I could go to POST, but for the 100's of legacy links out there to my search page which are useful.

enigma1

6:05 pm on Jan 9, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The useful links could be exposed in some other way I believe.

But it is a problem with the /GET forms. Even google has documented they can try various searches when they find them. Personally I use some js to process search forms as the database queries can be intense and I don't want bots to process them just like that. With js you don't even need to output an HTML form.