Forum Moderators: open
I have been swamped with error URL's (7844) I noticed this in my sitemaps account on webmaster central, These are all URL's which are not on my site, somehow googlebot is following wrong links.
Looking at my logs I see that they are all coming from Googlebot/2.1, having checked around and I gather that this is a phoney!
After looking around I see that I have to place an htaccess file with a line like this to block this bot :
RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]
RewriteRule /*$ [site-you-are-sending-the-bot-to.com...] [L,R]
I dont really have an idea where I should send the bot to
http://www.site-you-are-sending-the-bot-to.comcan somebody point me in the right direction please.
can somebody point me in the right direction please.
Redirecting these to amother website (even the URL they came from) is a BAD practice.
The most effective solution to offer on your end is denial of access (403).
You have not provided an IP range or for these visits?
Nor have you provided a full UA?
The real Google could be chasing intentional errors to verify 404's, however the quanity you provided (7844) is far to many for a solitary website (unless your referring to a large time frame as opppsed to a day or a week or a month?
If these visits "are" coming from a FAKE Google the most effective practice is denial.
Don
RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]
Better to block by IP address:
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.1###\.##\.###$
RewriteRule .* - [F]
Or if you use a custom 403forbidden page, you'll need to allow them to request it, otherwise it will create a looping problem:
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.###\.##\.##$ [OR]
RewriteCond %{REMOTE_ADDR} ^##\.1###\.##\.###$
RewriteRule !^403forbidden\.html$ - [F]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot/2.1 [NC,OR]
pp46,
the line that you have:
UA begins with Googlebot should be a safe denial of a non-Google (fake) bot.
I went through a few of my logs and all the genuine Google UA's begin with another word.
You will however NEED to remove the OR from your line IF this is the only and/or last rewrite line that you are using.
Don
In my experience, the spoofer just pastes the entire UA string exactly as Google uses it, so yes, the first word is not "Googlebot" and this is another reason why that example won't block effectively.
I currently block a half dozen of these.