Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

"your query looks similar to automated requests"

classic google problem - any solutions yet?

         

amznVibe

4:29 am on Sep 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm trying to get to page three in a search result for a couple days now with no luck.

Google simply doesn't want to let me go there with the classic message that can be found in a dozen threads [google.com] here over the past few years (including one from a few months ago that I cannot reply to)

Here's a generic example:
[google.com...]

We're sorry...

... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now.

We'll restore your access as quickly as possible, so try again soon.
...

Of course I am not infected, and I've tried this from several computers, several OS and different IP. Logged in or out of an account. I've tried varying the number per page and the start position. No luck.

The only variation is that sometimes if I vary what service I use from them (for pda, etc) I will get only of those captcha verification boxes and other times no verification to proceed. Page 2 will get the capture, page 3 is refused.

There are 800 results total, I really would like to at least see page three - any ideas?

I can do a similar request on yahoo and can see the results without issue but it doesn't have all the hits that google has.

Do you think maybe with a developer key and going through one of the api services might do it? I don't know much about that but if it might work I'd lookup how to do it...

glengara

10:43 am on Sep 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you tried going through a proxy?

amznVibe

12:03 pm on Sep 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, hence the "different IP" listed above ;-)

I managed to get a similar query through Yahoo to sorta get what I was looking for but Yahoo has some serious bugs in their negative/exclude (NOT) feature. Yahoo results are a mess compared to Google's even in 2007, amazing.

ps. sorry, I'm not giving out the query because the more people that do it, the more it looks like a virus to Google. It does have the word "forums" in it so I suspect that maybe spammers use Google to get lists of forums? But that can't be the only trigger...

blend27

1:14 pm on Sep 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



-- It does have the word "forums" in it so I suspect that maybe spammers use Google to get lists of forums?--

I've recently came across a HACKED Apache installation, that contained a small program written in PHP. From what I was able to understand the program queried G, Y, M, and Gigablast with combination of operators: inurl:, intitle: and intext:.

There was a log of what was going on and it contained over 7500 different variations of queries performed -guestbooks, forums, comment-forms. Clever app. I have to say, it even contained the names of the files they were looking for.

tedster

3:48 pm on Sep 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Parasite hosting for search engine scraping - that does line up exactly with Google's error message. I have not run into this warning for some time, and I've been hoping that Google had tuned their criteria in some way so they got fewer false positives.

For a while I thought that using a dial-up account was the path around the error message, but recently Ibanged into on dial-up too. And I have sometimes seen the message even on site: operator queries for new sites - I really doubt scrapers would care about those searches.

amznVibe

7:05 pm on Sep 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I dug around in my old old email and found I have a google api key from 2002 even though they aren't giving them out anymore so I'll see if that still works at some point and try to grab the results through the soap interface. Hopefully they don't block the api with a valid key!

[edited by: amznVibe at 7:05 pm (utc) on Sep. 1, 2007]

TheSeoDude

10:45 pm on Sep 1, 2007 (gmt 0)



Vibe my man! Stop searching to spam forums and guestbooks and blogs and so.
And you won't get that error anymore.

Let's be frank here, we ain't stupid!

PS: Try other engines to see if it works but msn has many funny searches figured out and yahoo blocks you if you are too perseverent.

800 results .. lemme guess! a guestbook .. ;)

g1smd

12:41 am on Sep 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I get that error message all the time, for page 3 onwards, when using the site: and inurl: operators together.

TheSeoDude

1:24 am on Sep 2, 2007 (gmt 0)



No offense m8 but there are not too many legitimate uses for them together and going to the last pages. ;)

If you are searching your own site you usally use only site and browse based on structure. But site: and inurl: has virtually no use for site owners. If you structured the site properly you will find what you need with site! And inurl won't help you a lot as filenames usually have different names.

Eg: http// domain.com / level1 / level2 / title-of-content-page

You can use site: on this to browse by levels. And inurl: can be replaced by site:. I'm sure you don't want to find how many pages on your site have a word url. And I'm sure not more than 30?

PS:I suggest finding a footprint in the guestbooks or forums or blogs and use that for search. That's the way to go!

g1smd

1:56 am on Sep 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Using that dual search has been vital in sorting canonicalisation issues with parameter-based URLs out.

site:domain.com -inurl:www inurl:nextoldest
site:www.domain.com inurl:mode
site:www.domain.com inurl:showthread

and many dozens of others.

TheSeoDude

3:08 am on Sep 2, 2007 (gmt 0)



inurl:mode inurl:showthread ... first is a search for guestbooks second (mode=post) is a search for forums (you can post replies) ... come on. I work with this stuff ;)

And www (non-www) problem can be easily fixed in .htaccess.
If you do the search you must be searching if you have / not have problems, not what pages have.
If one has problems all have as relative links will make all site crawlable.
And fixing non www problems is done in .htaccess (sitewide) so finding all pages with problems is futile.

I rest my case ... no legitimate use for them together, and even if there would be going over page 3 ... I don't think so.

tedster

4:02 am on Sep 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



no legitimate use for them together

Sure there is - have you ever tried to audit a major website with 6 or 7 figures worth of URLs indexed? Especially when the business has not done a good job of managing its legacy URL structure?

I understand that Google does need to guard against automated queries, but when this error message comes up on my second or third click, I really do wonder. And I don't have any malware making hidden queries either - I have at times monitored every packet my box is sending out in these situations.

TheSeoDude

4:34 am on Sep 2, 2007 (gmt 0)



If you use them together you need a count not every URL. Because you can not fix 6-7 figures websites url by url. You apply general fixes to problems.

So they are cool for result count but not to check each page of results.

PS: Google limits 1k of results. I'd pay to see you check a 6-7 figures website page by page chasing URLs.

Miamacs

12:15 pm on Sep 2, 2007 (gmt 0)

10+ Year Member



come on. I work with this stuff ;)

...

Ohkay...
Well, I have this strange urge trying to tell apart the types of people on this forum ( webmasters, SEOs, spammers, MFA'ers, coders, designers, moms and pops and the combos of either ), and have to thank you for this revealing thread, it sure is funny. Your lecturing of g1smd ( playing the part of 'forum/guestbook spammer - B' ) on .htaccess and canonical issues was a blast.

You could probably tell me who *I* am.

And in the meantime 'reveal' some additional high profile ( albeit from an SEO stand point somewhat uneasy ) practices on how to spam worthelss low quality pages in the bulk, I mean... only a few tricks you would accidentally know of because of your coder background. I'd be really grateful because the IBL *count* on my sites is way too low.

...

off topic, again: Google once gave me this error when I was so lazy I didn't even type in our in-house rank checker's URL, and kept on clicking from SERP to SERP too fast, trying to find -950 pages that were scattered throughout the results. The site had - surprise surprise - relevancy problems, its inbound links used an abbreviation, and the site used the full phrase. I was able to click so fast, and at such a steady pace, I seemed like a robot.

Dunno who would have taken the day off after this. *cough* I switched to a browser w/o the toolbar ( yes I have it installed ), and finished the remaining stuff at light speed. Btw site is out of -950 for everything it needs to be out for.

amznVibe

3:32 pm on Sep 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Whoa. Is someone seriously accusing me of spamming or otherwise?
I run a small webhosting company and do programming/design on the side.
I don't even really do SEO other than understanding the basic principles.

I am simply looking at how many forums are out there with a certain very specific subject. It's an unusual query but not an illicit one. I wasn't aware spammers cared about specifics and wouldn't they just use yahoo/msn instead?

ps. ahem and I am also not "my man" - there are women on webmasterworld

[edited by: amznVibe at 3:34 pm (utc) on Sep. 2, 2007]

amznVibe

4:12 pm on Sep 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Aha! I figured out the minimum magic "bad word" combination.

Try searching for the word "forums" with the word "topics".

You'll never get past the 200th result.
No "inurl" etc. needed. Just those two words.

Now that's just silly.

g1smd

4:54 pm on Sep 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> I rest my case ... no legitimate use for them together <<

If you read the stuff I wrote a couple of years ago about sites with parameters in URLs, you will see what sort of things I have been dealing with.

In particular, forums like vbulletin can expose up to 20 URL formats for each thread and a similar number for the thread listings. These consist of having the parameters in different orders and/or having extra parameters on some of them.

This is a major Duplicate Content issue that I have written about several times in the last few years. The only way to track down every rogue format is to search for them. In many cases these are problems that cannot be entirely fixed using .htaccess but instead require some editing of the scripts. A quick and dirty way is to block some formats using robots.txt but blocking the wrong one may stop bots indexing the site at all.

Which would you block or modify?

www.domain.com/forum/forumdisplay.php?f=33&page=59
www.domain.com/forum/forumdisplay.php?page=59&f=33
www.domain.com/forum/forumdisplay.php?f=33&page=59&order=desc
www.domain.com/forum/forumdisplay.php?f=33&order=desc&page=59
www.domain.com/forum/forumdisplay.php?order=desc&f=33&page=59
www.domain.com/forum/forumdisplay.php?order=desc&page=59&f=33
and another 20 to 30 formats variously having additional "daysprune", "do", "pp", "session", "sort" and other such parameters...