Forum Moderators: Robert Charlton & goodroi
(1) The server says the URL doesn't exist...
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed.
[w3.org...]
In addition to saying the "content" was purposely removed, a 410 says "please remove the link to this page".
As far as not seeing the noindex, do you really think Google, Bing, etc. just discard the html of an error page without even a look?
Google can go back and tell someone the version of the page posted N weeks ago contained a noindex tag and that's why it was not currently indexed, even though the current version of the page did not contain noindex and the person thinking they had been penalized or something had no clue the tag was there previously when they asked.
They save and use everything they have at their disposal.
They get all the content from any error page, because they use GET requests not HEAD, the 404, 410 or other "hey, no info for you" (401, 402, 403, 405, 406) does not stop them from getting any HTML on the error page itself -- The server sends it along with the response headers.
If they're looking for "quality websites" to send people to, why would they not check and see if the site had useful, visitor friendly error pages or not while they have the full contents of the page right there in the system?
Added: If I had 10 links on the custom 404/410 page of a 100 page website, and a page you requested could not be found which would you "the visitor" think were the most important [top of the structural hierarchy] pages: the 90 I left unlinked or the 10 I decided to include?
Why would Google or Bing or anyone else throw that info away, since being search engines they have to try to understand hierarchy, structure and relationships within documents constantly?
do you really think Google, Bing, etc. just discard the html of an error page without even a look
Any passing robot would have to assume that the page it receives-- assuming it even looks at it-- is the 410 page, not the page originally requested.
The first time I ever saw a known search engine requesting errorstyles.css was only a few weeks ago.*
The page seems to redirect to itself. This may result in an infinite redirect loop. Please check the Help Center article about redirects.
HTTP/1.1 410 Gone
Content-Type: text/html
Location: http://www.widget.com/123432
Date: Fri, 29 Nov 2013 17:19:28 GMT
Connection: close
Content-Length: 255
<!doctype html>
<html>
<head>
<meta charset=utf-8>
<meta name="robots" content="noindex,nofollow,noarchive">
<title></title>
</head>
<body>Go to <a href="http://www.widget.co.uk/" rel="nofollow">Widgetworld</body>
</html> The page seems to redirect to itself. This may result in an infinite redirect loop. Please check the Help Center article about redirects.
HTTP/1.1 410 Gone
Content-Type: text/html
Location: http://www.widget.com/123432
Date: Fri, 29 Nov 2013 17:19:28 GMT
Connection: close
Content-Length: 255
I think what's happening here is that Google's trying to second guess the intention of the server. It should really just ignore the Location, but I guess sees it and thinks 'this server is probably trying to redirect' even though its not a 300 series status.
But there is an interesting takeaway - that for all non-200 statuses be very careful about what and how you're doing it, don't try to be clever and certainly don't do anything non-standard, because you can't be sure how you're actions might be interpreted.