A couple days ago my website was vacuumed up by OpenAI. About 300 mb, 600 file requests. Even accessory files like small gif's to render page frames and graphics. Even requesting "editdata.mso" files that may have been in my filelist.xml but have long been deleted. This all happened over the space of about an hour.
It mostly happened from 74.7.241.31 and 74.7.242.174.
These are Microsoft IP's, with no host names (no reverse-dns). Scamalytics provides a bit of extra info, naming the organization behind the IP's as "Cloud". Spur ID's the IP's as OPENAI crawler.
I've seen hits from all over 74.7.x.y from OpenAI's GPTBot since mid last year so these IP's are not out of the ordinary, but nothing like today's behavior. User-agent was:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)
Either -
OpenAI has changed strategy - instead of asking for a dozen of my (mostly PDF) files on any given day, lots of duplicate requests when looking at any given week, they're now downloading / caching entire sites - maybe according to some criteria?
Or - someone gave ChatGPT a specific directive to look at my site, perform an analysis on my site (or my Company), which resulted in this download frenzy.
Who else sees massive entire-site downloading from OpenAI?