Google Submits Formal Robots Exclusion Protocol to IETF

Google has drafted a Robots Exclusion Protocol Specification and submitted it to the IETF (Internet Engineering Task Force).
Surprisingly, the Robots Exclusion Protocol has never been formalised since its inception in 1994.
Note, the rules are not changed, merely reflecting today's web as we know it.

There are some things that are worth noting in this:-

Any URI based transfer protocol can use robots.txt. For example, it's not limited to HTTP anymore and can be used for FTP or CoAP as well.
Developers must parse at least the first 500 kibibytes of a robots.txt. Defining a maximum file size ensures that connections are not open for too long, alleviating unnecessary strain on servers.
A new maximum caching time of 24 hours or cache directive value if available, gives website owners the flexibility to update their robots.txt whenever they want, and crawlers aren't overloading websites with robots.txt requests. For example, in the case of HTTP, Cache-Control headers could be used for determining caching time.
The specification now provisions that when a previously accessible robots.txt file becomes inaccessible due to server failures, known disallowed pages are not crawled for a reasonably long period of time.

Google says it's also updated "the augmented Backus–Naur form in the internet draft to better define the syntax of robots.txt, which is critical for developers to parse the lines."

[webmasters.googleblog.com...]

Google Submits Formal Robots Exclusion Protocol to IETF

engine

sethgseo

mcneely

iamlost

iamlost

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week