Forum Moderators: phranque
absolutely zero cross-linking to outside domains for any reasonAlso zero cross-linking from outside domains? At latest count, I see around 900 requests within this calendar year, and that's on teeny weeny sites.
80.248.227.abc - - [13/Mar/2019:01:52:01 -0700] "GET /robots.txt HTTP/1.1" 200 309 "-" "CipaCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/example.com)"
80.248.227.abc - - [13/Mar/2019:01:52:04 -0700] "GET /humans.txt HTTP/1.1" 403 929 "-" "CipaCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/example.com)"
80.248.227.abc - - [13/Mar/2019:01:52:07 -0700] "GET /ads.txt HTTP/1.1" 403 929 "-" "CipaCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/example.com)"
80.248.227.abc - - [13/Mar/2019:01:52:11 -0700] "GET / HTTP/1.1" 403 929 "-" "CipaCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/example.com)"Wasn't I only just talking about bad reasons to request robots.txt? On the test site--the only one they hit twice--they would have met a comprehensive, all-encompassing Disallow.
I see results fromWow, that sounds like a solid collection.
I don't know why you brought upBecause I was searching for requests for ads.txt and this was one of the very few with a name as opposed to a made-up humanoid UA. This led to noting that they were one of the very few that requested ads.txt as part of a set of requests for other stuff. If nothing else, it suggests that ads.txt is becoming a standard file that robots expect to find. (But humans.txt? Seriously? That one never did become a standard.)