AI Bots need huge amounts of content to learn about languages and context, so they clearly have been at work for some time before producing usable data for ChatGPT, Bing Chat, Bard and Vertex AI interface applications. Keeping up with the UAs that are crawling your site is an important part of managing your sites.
The largest AI applications have made their UAs known, I expect they will evolve as all do so I thought it might be a good idea to put the information together here and then as more become known, we can have a place to look up the latest on these AI UAs.
On Sep 25, 2023,
Bing announced [webmasterworld.com] they are sharing tips for users to better control their indexing, snippets and usage for AI training on your content. Their article has the details: [
blogs.bing.com...]
On Sep 29, 2023 both
OpenAI [webmasterworld.com] and
Google [webmasterworld.com] shared the same type of information on how to opt out of training their AI, how to control access and what UA strings they currently use.
Google's information about Bard's crawler is on their blog at [
blog.google...] and UA information on Vertex AI through Google Extended is at ht
tps://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#google-extended
OpenAI shares
information [platform.openai.com] on how to control their crawler GPTBot and ChatGPT-User in robots.txt.
GPTBot is their crawler. User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
The ChatGPT-User is used in their chat application, it is
not a crawler, but might access sites at users' direction.
Please note: Current updates to this information are posted here: [
webmasterworld.com...]