Forum Moderators: Robert Charlton & goodroi
unavailable_after meta element, which enables you to give an expiry date to your pages. (See the thread Google Plans a New Meta Tag - "unavailable_after" [webmasterworld.com] for more information.) However, there is a second, more interesting, announcement in the same entry: the ability to control Googlebot behavior via HTTP headers rather than on-page meta elements: the
X-Robots-Tag header. We've extended our support fortags so they can now be associated with any file. Simply add any supportedMETAto a newMETAdirective in the HTTP Header used to serve the file.X-Robots-Tag
As mentioned in the post, this is very useful for non-HTML content such as PDF, Word or plain text [webmasterworld.com] files, where you cannot insert
meta elements. You can also reduce clutter in the document itself, as well as control indexing via the server configuration rather than editing the files. One caveat not mentioned by Google is that only Googlebot supports this syntax - unless the other search engines decide to follow suit - so you will still need
meta elements for Yahoo or MSN. Also, how long do you reckon we'll have to wait until the first case of a hacked server being modified to send a noindex HTTP header with every request?
If your server is hacked, search engine placement is the least of your worries.
LOL, no doubt! Last time my main unix server was hacked I didn't have any search engine placement worries, in fact I didn't have any websites left on it at all...thank goodness for my backup dedicated hosting the downtime was minimal.
If some one has unauthorized access to your website, there's already plenty of ways they can break down your business without any need for a new metatag, someone can already put a nofollow tag and get you out of the serps if they have access to your server.
someone can already put a nofollow tag and get you out of the serps if they have access to your server
The comment about hackers was merely an aside and not the main part of my post in any way, but I'll just reply to this: the HTTP header is much more unobtrusive, and therefore much harder to detect, than actually modifying the pages themselves or changing the robots.txt (something which has been reported as occuring in the past in order to remove a site from the index).
Am I right in thinking that this is basically a "Is-robot: true" or "Is-robot: 1" HTTP header? So we no longer have to sniff the User-agent string and guess whether it's a bot or a human using a Web browser?
This is not anything sent by the bot itself, so it doesn't help in identifying Googlebot - it is a HTTP header that you can add to your server's response to a GET request, which offers similar functionality to the usual robots meta elements more commonly seen. You can add the HTTP headers via a server-side scripting language (PHP, etc.) or via the server configuration (Apache httpd.conf, IIS...).
<Files ~ "\.(gif¦jp[eg]¦png)$">
Header append X-Robots-Tag "noindex"
</Files>
The X-Robots-Tag directive is a small step towards making robots.txt obsolete.
However, you can't use noarchive, nofollow, nosnippet, or unavailable_after in a robots.txt file. The header X-Robots-Tag is a much more powerful tool. It allows us to use these directives without needing to edit files. It also allows us to use these directives for media files, pdf files, etc, that can't have meta tags directives inserted in them. It can also be used for user-agent/ip delivery.