Error in Webmaster Tools after adding a custom 404

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Error in Webmaster Tools after adding a custom 404

JackR

11:28 am on Sep 12, 2008 (gmt 0)

Yesterday I added a custom .404 page after removing around 40 pages of links content. Today I see this in Webmaster Tools:

URLs not followed
When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot because they contained too many redirects. Please change the URLs in your Sitemap that redirect and replace them with the destination URL (the redirect target). All valid URLs will still be submitted

The page in question currently returns the following:

#1 Server Response: HTTP Status Code: HTTP/1.1 301 Moved Permanently
#2 Server Response: HTTP Status Code: HTTP/1.1 200 OK

I have three questions:

1.) Should the page(s) in question be removed from the sitemap?

2.) Is this error normal following the addition of a custom .404 page?

3.) If not, how do I fix this error?

tedster

12:42 pm on Sep 12, 2008 (gmt 0)

Google is doing you a favor by giving you an early warning on this.

Notice that your server never actually gave a 404 response? As time goes by, every bad url that search engines find will be 301 redirected to the same custom "error" message that comes with a 200 OK header. The end result will be thousands of valid urls with duplicate content.

g1smd

12:58 pm on Sep 12, 2008 (gmt 0)

You've got about a week to fix this before you start seeing wierd things in your SERP.

Leave it unfixed for a month, and you will struggle to get back where you were in Google SERPs.

JackR

1:07 pm on Sep 12, 2008 (gmt 0)

Don't you just hate it when you work late, make a stupid mistake and then spend more time wondering why you made such a stupid mistake?

Thank you both - it's now fixed!

#1 Server Response: HTTP Status Code: HTTP/1.1 404 Not Found

I have one other related question. Which is the best way to deal with non-existent pages:

1.) A simple .404.

2.) A .404 followed by a redirect to the homepage

3.) A .404 followed by a redirect to a custom .404 page

tedster

1:13 pm on Sep 12, 2008 (gmt 0)

As long as you return the 404 it doesn't matter to Google. So it's an internal company decision based on what serves your visitors the best.

I usually use the custom message page - that way the visitor knows they asked for a problematic url, and the home page redirect doesn't give them that kind of feedback. The custom page can give helpful choices to the visitor, and the standard error message doesn't do that.

JackR

1:32 pm on Sep 12, 2008 (gmt 0)

Thanks tedster - I've sent you a Stickymail

g1smd

1:41 pm on Sep 12, 2008 (gmt 0)

I don't like redirects for 404 pages. Customise the 404 page itself to tell the user what they need know, right there.

I would never redirect to the home page. That just doesn't seem like a good idea at all. The 404 should directly show your customised page at the URL "that doesn't exist".

To be clear, a redirect involves the browser making a new HTTP request for a different URL to the one it originally requested.

You should be looking at the HTTP status code in the HTTP header to see what is really going on.

Never rely on the fact that a user might see www.domain.com/404.html in their browser URL bar after making a request for a URL that does not exist.

In many cases, that "404" page will have returned a 200 OK status code in the HTTP Header, and you would have got there through the server previously issuing a 302 redirect when the URL you requested wasn't found.

In that case you do not have a 404 error page, you have a system for getting your "error" (sic) page indexed under an infinite number of URLs.

Only if the requested URL returns a 404 status code in the HTTP Header have you truly got yourself a proper 404 error page. What you put on that page for the human visitor to read is entirely up to you, but the bot only looks as far as the HTTP STATUS: 404 line in the HTTP Header to find out what is going on.

[edited by: g1smd at 1:52 pm (utc) on Sep. 12, 2008]

JackR

1:49 pm on Sep 12, 2008 (gmt 0)

I'm glad to see we agree g1smd,

For all pages not found, the user is currently directed to a custom .404 page which includes a .404 graphic, a two-line poem and several links to the main content on the site.

Each requested page that does not exist now returns the correct .404 HTTP Status Code.

JackR

1:56 pm on Sep 12, 2008 (gmt 0)

Just to confirm 100% that I've got it right, this is what a well-known HTTP Status Codes Checker is now reporting for the page which is causing the error in Webmaster Tools:

#1 Server Response: [example...]
HTTP Status Code: HTTP/1.1 404 Not Found
Date: Fri, 12 Sep 2008 14:33:07 GMT
Server: Apache/2.2.8 (Fedora)
Accept-Ranges: bytes
Content-Length: 4200
Connection: close
Content-Type: text/html

g1smd

1:58 pm on Sep 12, 2008 (gmt 0)

*** ... the user is currently directed to a custom .404 page ... ***

I never like to hear the words "directed to" when we are talking about 404 errors, as to me that implies that there is an extra step happening between the request for some URL, and the response with the error message being displayed. There is no such extra step, unless the browser is being redirected, and such a redirection is unnecessary, unwanted, and will cause problems.

Can you confirm (using Live HTTP Headers or somesuch) that the very first thing that comes back from the server after your request is sent to it, is a HTTP header that includes these words or something similar: HTTP Status: 404 Not Found?

g1smd

2:00 pm on Sep 12, 2008 (gmt 0)

Ah. You posted while I was asking.

Yes you can confirm it.

That all looks good (apart from your server date-time-stamp being fast/ahead by about half an hour).

JackR

2:03 pm on Sep 12, 2008 (gmt 0)

Sorry - I should have written that the visitor currently sees a custom .404 page.

I'm glad it's finally working correctly.

Thank you g1smd!

g1smd

2:10 pm on Sep 12, 2008 (gmt 0)

Bots don't care what a user "sees" in their browser. They care only that a URL that doesn't exist directly responds with (i.e. the very first response is) a "404" status code in the HTTP Header information.

Sorry to labour the point, but it is very misunderstood, and I am writing the extra detail for anyone else that reads this thread way into the future...

[edited by: g1smd at 2:11 pm (utc) on Sep. 12, 2008]

g1smd

2:32 pm on Sep 12, 2008 (gmt 0)

I see a lot of servers configured wrongly and emitting the wrong codes.

One common error when using Apache is to use a full URL including the domain name to define the 404 error page (like www.domain.com/error404.html or somesuch).

If you do that, the server will send a 302 status code whenever a page is not found on the server. That configuration error will cause you a LOT of problems. This behavior is highlighted in the Apache documentation, but widely overlooked or ignored.

The correct implementation specifies only the local filepath starting with a / and counting from the root of the current domain (like ErrorDocument 404 /errors/error404.html or somesuch).

JackR

6:08 pm on Sep 14, 2008 (gmt 0)

g1smd,

Can I clarify one further thing: When the visitor is presented with the .404 custom page, should the URL in the browser change from http://www.example.com/example-page.html to http://www.example.com/404page.html .. or should the original (unavailable) URL be displayed?

[edited by: tedster at 2:40 am (utc) on Sep. 15, 2008]
[edit reason] fix example urls [/edit]

g1smd

8:35 pm on Sep 14, 2008 (gmt 0)

The original URL should normally be visible.

If it isn't then there is a redirect happening. A redirect generally doesn't have a 404 code.

jdMorgan

9:04 pm on Sep 14, 2008 (gmt 0)

To emphasize:

As noted above by g1smd, a simple error (or misunderstanding) when defining your ErrorDocument can cause the server to generate a 302 redirect rather than the correct error response code. This behavior is documented in the Apache ErrorDocument documentation.

Wrong: ErrorDocument 404 http://www.example.com/path-to-404-error-document.html
Right: ErrorDocument 404 /path-to-404-error-document.html

The first ErrorDocument directive above, which includes "http://www.example.com," will result in a 302 redirect response to the client. This is the single most common cause of search engines seeing problems with server error response codes on Apache servers.

Jim

JackR

10:02 am on Sep 15, 2008 (gmt 0)

Thanks to you both once again. I just wanted to clarify that final point. The server is configured correctly and is returning a .404 Status Code when a non-existent URL is reached. That URL is still displayed in the browser.

g1smd

4:43 pm on Sep 15, 2008 (gmt 0)

That's all good to go then. It is always worth checking, as a misconfiguration can see Google and others failing to list your site properly.