Forum Moderators: open

Message Too Old, No Replies

these ips are from google or pretending to be Googlebot

         

pramod

7:28 am on Sep 16, 2024 (gmt 0)



hi experts
i get lot of direct request from these ips 66.249.77.131 13,609 request
66.249.77.129 8,105 request
66.249.77.142 4,904 request
66.249.77.140 4,478 request
66.249.77.135 4,440 request
66.249.77.128 4,381 request
66.249.77.134 4,170 request
66.249.77.138 4,157 request
66.249.77.130 4,067 request
66.249.77.141 4,023 request i need to know these ips are from google search engines or someone is crawling my site, pretending to be Googlebot [developers.google.com...]

not2easy

12:05 pm on Sep 16, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hello pramod and welcome to WebmasterWorld [webmasterworld.com]

Google owns a lot of IPs and some host their crawlers while some IPs may be hosts for sites. They tell you how to verify their bots here: https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot

In general, anyone can claim to be one of their UAs and if it seems they are aggressively crawling, you can slow the crawl rate. I would look at the requests (are they asking for actual content or are they random resource requests with 404 responses?) and the UAs.

pramod

1:41 pm on Sep 16, 2024 (gmt 0)



Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MM B29P) AppleWebKit/537.36 (KHTML, like Gecko) Chro me/128.0.6613.137 Mobile Safari/537.36 (compatible; GoogleOther) they are aggressively crawling

not2easy

2:04 pm on Sep 16, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That is not a Googlebot and doesn't claim to be one. What does their verification guide say about GoogleOther?

pramod

2:48 pm on Sep 16, 2024 (gmt 0)



https://i.ibb.co/Bf7tK05/Capture.png [ibb.co] so can i block these ips

not2easy

3:34 pm on Sep 16, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Blocking decisions are up to you, but yes, if you have investigated and found them unnecessary you can block those IPs. I can't say because I did not investigate.

SumGuy

3:44 am on Sep 21, 2024 (gmt 0)

5+ Year Member Top Contributors Of The Month



I'm seeing hits from 66.249.77.0/24 starting March 12, 2021. Nothing before that.

Pretty consistently every day, about 30k hits in total (just to the HTTP side of my site). The vast majority of the time, probably 90%, it identifies itself as googlebot.

lucy24

5:13 am on Sep 21, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google uses two adjacent but very different ranges, so watch out.

66.249.64-79 (66.249.64.0/20) is the Googlebot crawl range. (I don't think anyone has ever figured out how G### does all its crawling from a single /20, while Bing requires a dozen ranges scattered all over the IPverse.) Anything claiming to be Googlebot that doesn't come from this range is a faker, though happily these are much less common than in years past.

66.249.80-95 (66.249.80.0/20) is a variety of Googloid functions that fall into YMMV territory. You'll need to consider the User-Agent, the exact IP, or both. Looking it up I see I'm currently blocking 66.249.84, which among other things is Google Proxy.

SumGuy

1:53 pm on Sep 22, 2024 (gmt 0)

5+ Year Member Top Contributors Of The Month



I don't understand why some are claiming 66.249.77.0/24 is not googlebot.

not2easy

2:17 pm on Sep 22, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Because in the OP the UA is not Googlebot. No one has said the range is not ever used by Google, but it is not using Googlebot UA in the requests quoted here.

SumGuy

3:46 pm on Sep 22, 2024 (gmt 0)

5+ Year Member Top Contributors Of The Month



My experience is that sometimes you wil not see a googlebot UA from th 66.249.x.x range. Google will use an android or some other civillian / retail device UA. But it should not be happening 100% of the time. Is that what the OP is claiming?

not2easy

3:57 pm on Sep 22, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



What the OP is claiming is a UA that is not a Googlebot, it is posted above as containing
GoogleOther
so it is not one of their typical crawlers though it might be part of one of their services. It does appear that someone/thing is aggressively crawling though we cannot know whether every visit is the same UA. That is why the verification URL was posted.

This is not about the entire 66.249.x.x range, only certain IP/UA combinations.

SumGuy

12:11 am on Sep 24, 2024 (gmt 0)

5+ Year Member Top Contributors Of The Month



I have 4800 hits where "googleother" appears in the user agent. This started December 2, 2022 and continues to the present day. Usually multiple hits on the same day, usually every day

They come from the following /24 IP blocks:

66.249.64.0
66.249.65.0
66.249.66.0
66.249.68.0
66.249.69.0
66.249.70.0
66.249.72.0
66.249.73.0
66.249.74.0
66.249.75.0
66.249.77.0
66.249.79.0

Note there are some gaps there.

I was not limiting the search to just 66.249.x.x. It could have been any IP, but it just turns out to be those IP's.

I don't find these suspicious, I'm making no attempt to block them based on the UA being "googleother". If anyone has any solid info on what google is up to with this user agent, in terms of it being "block-worthy", I'd like to hear it.

SumGuy

2:05 pm on Sep 24, 2024 (gmt 0)

5+ Year Member Top Contributors Of The Month



GoogleOther

Affected products

Crawling preferences addressed to the GoogleOther user agent don't affect any specific product. GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development. It has no effect on Google Search or other products.

[developers.google.com...]

The description does not lead one to believe that googleother should be hitting their site every day. So from that point of view the googleother hits are unsettling.

See also:

[seroundtable.com...]

Gary Illyes from Google explained on LinkedIn that this new crawler will "replace some of Googlebot's other jobs like R&D crawls to free up some crawl capacity for Googlebot."

Gary wrote, "We added a new crawler, GoogleOther to our list of crawlers that ultimately will take some strain off of Googlebot. This is a no-op change for you, but it's interesting nonetheless I reckon. As we optimize how and what Googlebot crawls, one thing we wanted to ensure is that Googlebot's crawl jobs are only used internally for building the index that's used by Search. For this we added a new crawler, GoogleOther, that will replace some of Googlebot's other jobs like R&D crawls to free up some crawl capacity for Googlebot. The new crawler uses the same infrastructure as Googlebot and so it has the same limitations and features as Googlebot: hostload limitations, robotstxt (though different user agent token), http protocol version, fetch size, you name it. It's basically Googlebot under a different name."

There is no comment on if this crawler may or may not be used for Google Bard

Pfui

6:41 am on Sep 28, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1.) To pramod: Your UA details include a space in the word Chrome: "Chro me". Is that in the UA you're seeing, or a typo?

2.) Misc. details from my notes:
- May 17, 2024: 2 new variations: GoogleOther-Image and GoogleOther-Video and another: Google-Extended. "It should be noted that the data scraped by these crawlers are not explicitly for AI training data, that’s what the Google-Extended crawler is for." [searchenginejournal.com...]
- April 20, 2023 launch:
"It respects the same directives and protocols of the main Googlebot crawler and will free up some resources for the main Googlebot crawlers." [searchengineland.com...]

The latter claims -- as evidenced by hits from variations of crawl-66-249-79-20x.googlebot.com (66.249.79.20x) and in the same sessions as Googlebot -- turned out to be false. GoogleOther tried to get multiple files that are long off-limits to Googlebot. The UA used at that time:
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.175 Mobile Safari/537.36 (compatible; GoogleOther)

3.) "[GoogleOther is] basically Googlebot under a different name." Anyone see that as a head-scratcher? So If GoogleOther doesn't look like Googlebot, and doesn't act like Googlebot, then it's Googlebot? Nah.

lucy24

4:25 pm on Sep 28, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: detour to raw logs ::

Crikey. I don't think I realized I've been getting so many of them. Just Other, no -Extended. Every last one blocked, putting them in Out of sight, Out of mind territory. All from 66.249.70.various, i.e. the crawl range rather than an Assorted Googloid range.

:: follow-up to headers ::

One simple header deficit that would also apply to Googlebot, except the UA gets a hole poked. And a further header that only affects which form of robots.txt they see--or would see, if they had requested it more than once in this calendar year.

I think some botrunners, including legitimate ones that ought to know better, don't understand that you need to use the same UA when requesting robots.txt as for other requests; you can’t just waltz in as Joe Smith and expect to find out if Jim Brown is allowed to visit.

Edit:
the data scraped by these crawlers
There’s just something about the word “scrape” that inspires instant confidence, isn’t there.