Forum Moderators: Robert Charlton & goodroi
Here's a related thread on Google's forums, with a reply from Google's John Mueller (JohnMu)
It looks like the URL of your Sitemap file is included in another one of your Sitemap files -- which is likely the reason we crawled and indexed it :). If you don't want them indexed, using the x-robots-tag HTTP header element is a great idea. An alternative would be to use a different Sitemap URL and remove this one with the URL removal tool.[google.com...]
I don't see them ranked for keywords, but using the site query shows it.
I am using the server based Google Sitemap Generator and there is no HTML link to the long (& generated) sitemap filename. It is ofcourse linked to in the Sitemap index that is automatically generated.
How can I implement a x-robots-tag for XML.GZ files?
and if I dissallow using robots.txt, will it not prevent Google from accessing my sitemap altogether? thereby nullifying it...
:(
Google MUST spider a page in order even to read the noindex meta tag, so this approach does not stop Google from using your Sitemap for its intended purpose. It only stops it from showing in the search results.
At the same time, if you are not getting any search traffic from the Sitemap's indexing, then you may want to just not worry about it.
Here is some more information about the X-Robots header:
WebmasterWorld Thread [webmasterworld.com]
Google Blog [googleblog.blogspot.com]
What concerns me, and what other webmasters might want to take note of, is that Google may not handle sitemap indexes [sitemaps.org] properly. Sitemap indexes are suppose to be a way to divide large sitemaps and then point to those sitemaps. Google seems to treat such pointers as direct links to resources to be indexed.
Perhaps my mistake was submitting my sitemap index via Google Webmasters Tools (GWT). Because GWT allows you to submit multiple individual sitemap files per site, it may be redundant to also submit a sitemap index. (However, a sitemap index may be useful for Microsoft Webmaster center which only has one URL input per site for sitemaps.)
I was hoping Google would clear up how they're handling sitemap indexes, but I've gotten no word from them on this. Do others have this problem?
I think it's not my job as a webmaster to explain to Google, by way of an X-Robots-Tag, that a sitemap shouldn't be in search results. Google should understand that themselves.
I added the "X-Robots-Tag" HTTP header for my XML sitemaps (with a "noindex, follow" value) about 2 weeks ago.
I remember setting up instructions for the X-Robots-Tag and I don't recall seeing a follow value in the docs. There are three options with the X-Robots-Tag from what I understand...
Example of X-Robots-Tag NoIndex Directive
<Files ~ "\.(gif¦jp[eg]¦png)$">
Header append X-Robots-Tag "noindex"
</Files> [Example of X-Robots-Tag NoFollow Directive
<Files ~ "\.(gif¦jp[eg]¦png)$">
Header append X-Robots-Tag "nofollow"
</Files> Example of X-Robots-Tag NoIndex, NoFollow Directive
<Files ~ "(about¦contact¦privacy)\.html$">
Header append X-Robots-Tag "noindex,nofollow"
</Files> The default behavior for bots is to follow. You only need to provide directives on noindex and/or nofollow.
I don't recall seeing a follow value in the docs
"follow" isn't mentioned by Google when they introduced the X-Robots-Tag [googleblog.blogspot.com] HTTP header, but I presume it's based on the HTML tag which does include it (according to the original notes [w3.org]). Since I don't know of any other detailed specifications for X-Robots-Tag, I chose to be explicit.
The default behavior for bots is to follow. You only need to provide directives on noindex and/or nofollow.
Definitely true for the HTML robots meta tag. In this case, Google [googlewebmastercentral.blogspot.com], Yahoo [ysearchblog.com], and Microsoft [bing.com] say so. Probably true for X-Robots-Tag too.
I get enough traffic originating from my sitemap that I figured this might be a low tech work around.
Does anything contraindicate this?