Forum Moderators: open

Where is grok bot / xAI?

Does grok use anonymous VPNs to crawl sites?

         

SumGuy

12:18 pm on Aug 5, 2025 (gmt 0)

5+ Year Member Top Contributors Of The Month



As of up to now, I see no user-agent containing "grok".

What I do see, with INCREASING REGULARITY, are hits using bull crap UA's from IP's that are known residential VPN networks, including spaceX IP's. A lot of effort to grab my PDF files, all of them easily thwarted. Googlebot and Bingbot and Applebot and Yandex and DDG and InternetArchive and maybe a very few others are allowed, because they identify themselves (or when they don't, they still come from a very tight IP range). I don't allow amazon bot or fecebook bots.

Now understand that I IP-block huge chunks of goog / MSFT / AMZ and practically all other data centers world-wide that have popped up over the years, so maybe I'm not seeing grok because it's buried in one of those ranges.

So I'm asking here if anyone has seen a grok-bot or xai-bot hit, and from where. I've done a few web-searches on this, absolutely no examples of what a grok UA looks like.

lucy24

5:47 pm on Aug 5, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: quick run to raw logs and case-insensitive search for (grok|xai) ::

Nope, nothing.

VPNs and proxies are food for a thread of their own, because it puts us in “Do you have access-control rules that would get yourself blocked” territory. (Years ago, I had to un-block one proxy because it turned out to be used by a local governmental entity that made up a large part of one site’s target audience. Oh well.)

tangor

3:58 am on Aug 6, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not seen. Then again, isn't grok the really smart one that disguises itself as all those others?* --- The ones that are "allowed"?

*(Hey ma! Where's my tin foil hat?)

SumGuy

1:28 pm on Aug 6, 2025 (gmt 0)

5+ Year Member Top Contributors Of The Month



So nobody has seen a grok UA.

I might be unknowingly IP-blocking grok-bot, but others aren't, and we're still back to nobody has seen a grok-ua. So if XAI has their own bot crawling the web, they don't want to be discovered, or blocked, they're not using a UA and they're not using / following robots.txt.

There are only 2 possibilities for internet data harvesting by grok. One - they are using the database of another bot, like google, bing, amazon or apple. Would money have to change hands for that? Is there any known business partnership between XAI and any of those companies? Two - XAI is using VPNs or possibly SpaceX IP's. But still doing it anonymously. I'm not blocking spacex IP's, but I have seen dozens of bot-like hits from different spacex IP's. I guess there is a third possibility - XAI isn't crawling the web at all. I don't use grok, so I don't know if it's user interactions point to obvious web-crawling need on it's part to give answers to questions.

Just now I went to grok.com. I entered a few things, but I either got "there was an error" or an endless "reconnecting ()". Grok mode was "fast". There are other modes, but I need to sign in with X to use them it seems. But either the grok web interface is pooched, or there's a problem with my browser, but I had no success in using it.

SumGuy

1:45 pm on Aug 6, 2025 (gmt 0)

5+ Year Member Top Contributors Of The Month



A little more investigating - grok.com (104.18.28.234) is hosted by cloudflare, so that doesn't help us. x.ai is also hosted by cloudflare (104.18.19.80).

Twitter.com also points to a cloudflare IP.

I've only spent a few minutes on this, but I'm not able to discover any institutional IP assignments for grok / X or X.ai or twitter. I have seen twitter-bot hits in the past, I'll look for those again to get their IP and see where that leads. But if anyone can get any whois or bgp info on grok / xai please post it here.

Edit: Ok, I found some AS numbers for twitter and X. X is based in Ireland.

AS63179
AS54888
AS35995
AS13414
AS8945 (this is X, AKA "X Internet Unlimited Company")

I'll check all IPv4's and see if I'm blocking, or any web hits.

lucy24

3:42 pm on Aug 6, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: google, google ::
But what makes Grok different is its direct access to posts made on X. This enables Grok to have “real-time knowledge of the world,” according to the company, which gives it a “massive advantage over other models,” as Musk put it.
Yeah. I can see where information gleaned from “X” would be more accurate and reliable than information gleaned from the internet-at-large. (Though I do enjoy seeing other AI-type entities scrape my site, because everything is better with some obscure 19th-century novels mixed in.)

SumGuy

12:05 am on Aug 7, 2025 (gmt 0)

5+ Year Member Top Contributors Of The Month



So Grok is learning from X posts. What about Reddit, Quora, 4-Chan, 8-Chan? Maybe even usenet?

Brett_Tabke

1:33 am on Aug 12, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month Best Post Of The Month



Grok says it doesn't use any se db, or common crawl:

Unlike other AIs, Grok's training emphasizes X data. Based on xAI disclosures and LLM industry analyses, estimated top sources by percentage: X (Twitter) posts 35%, Common Crawl 25%, Wikipedia 10%, academic papers 8%, books 7%, code repositories 5%, news sites 4%, blogs 3%, forums 2%, other public datasets 1%. Exact details are proprietary; I supplement with real-time tools.


[x.com...]

graeme_p

4:07 pm on Sep 2, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wonder whether Grok is using pages linked to from tweets, as Twitter/X crawls them anyway to get meta data.

It mentions forums. @Brett maybe that explains a chunk of the excessive crawling you are seeing.

lucy24

9:02 pm on Sep 2, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Twitter/X crawls them anyway to get meta data
That brings up the obvious query: is it still called the Twitterbot, or has it too renamed itself?

:: poring over raw logs, leading to embarrassing discovery that I’ve been blocking Twitterbot because I forgot to poke a hole in a pattern that accidentally fits this UA ::

fwiw, something calling itself Twitterbot is still around.

Brett_Tabke

3:12 pm on Oct 10, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month Best Post Of The Month



> graeme_p

For sure that would explain some of it. I mean - why wouldn't they use that?

SumGuy

1:12 pm on Oct 11, 2025 (gmt 0)

5+ Year Member Top Contributors Of The Month



> For sure that would explain some of it. I mean - why wouldn't they use that?

If Grok or xAi is following URL's posted on X, what would show up in the logs for:

IP address
referer
user agent

?

lucy24

3:44 pm on Oct 11, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If Grok or xAi is following URL's posted on X, what would show up in the logs

Not deducible from the information given. Most robots don’t send a referer*--especially when all it means is “I originally heard about URL A from URL B, and accordingly put it on my shopping list for later”. That wouldn't be a referer in any case.


* Biggest exception, other than malign robots trying to seem human, is search engines including a page as referer when requesting supporting files. But that doesn’t address the present question.

SumGuy

2:06 am on Oct 12, 2025 (gmt 0)

5+ Year Member Top Contributors Of The Month



> Not deducible from the information given. Most robots don’t send a referer*

But they will send a user agent, and obviously the request would come from a working IP address under their control. Unless they were using one of the various vpn networks. Or perhaps a starlink IP. Which would make Grok / xAi a rogue search bot, especially if it didn't request and follow robots.txt.

But trying to systematically acquire and catalog content from the web by following random URL's posted to X (which was the working hypothesis here) does not sound very likely anyways.

Wjak

8:27 pm on Dec 7, 2025 (gmt 0)



"Grok's browse tool uses a disguised iPhone Safari User-Agent (e.g., "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1") to avoid blocks—not unique for strict whitelisting / blacklisting."

I was trying to add Grok to the allowlist to have it perform tasks on my website, but thats not going to be easy, is it? The above statement was Grok's answer btw..

SumGuy

12:35 am on Dec 9, 2025 (gmt 0)

5+ Year Member Top Contributors Of The Month



Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1

I've only seen that UA twice. In June this year and again (from a different IP) just a few days ago. Both times just my landing page file, which only bots do. No referrers, lots of request headers were empty for both of those. Spur and AbusedIPDB say the IP's are clean.

I doubt this UA is used by Grok/Xai. I would tend to doubt that any AI/chat bots are given specific info about their own operational details. What-ever they tell you along those lines probably comes from what they stumble across on the net, public postings and such. The idea that this iPhone UA is supposedly Grok/Xai search bot probably came from someone's reddit post.

I have seen what looks like 1 legit hit from this slight varient:

Mozilla/5.0 (iPhone; CPU iPhone OS 17_0_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1

That was back in early 2024. As a result of this detective work I'll be adding "Version/17.0 Mobile/15E148" to my list of UA bot-detection.

lucy24

1:29 am on Dec 9, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'll be adding "Version/17.0 Mobile/15E148" to my list of UA bot-detection
:: quick run to raw logs ::

Hm, yes, most requests with this element are blocked. Further consultation of logged headers tells me it’s because of a certain header deficit that I won’t name, because you never know when a botrunner might be listening.

blend27

9:03 pm on Dec 9, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why not ask some things of "grok / xAI" itself about sites that you have access to when it comes to: See what you get for IP requests/UAs/headers...

Look at it in LIVE logs?..
See what rolls in....