Forum Moderators: phranque

Message Too Old, No Replies

Redirecting all 404s to home page - good or bad?

         

brix76

6:34 pm on Nov 28, 2011 (gmt 0)

10+ Year Member



Hi all,

I have been wondering if redirecting all 404s directly to my homepage is good?
Or having a custom 404 page is better.
Some say, redirecting everything to my homepage could make the robots think I have a lot of duplicate content and rank me lower.
Other say it is indeed search engine friendly - less 404s, better ranking.

What do you guys think - redirect all to homepage or nicely done custom 404 page?

Thanks a lot.

g1smd

4:20 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



See also [webmasterworld.com...] and I am reminded that Google asks for
example.com/noexist_7f328a3ce23a7283.html
type URLs on a random basis from time to time.

[edited by: g1smd at 4:23 pm (utc) on Dec 9, 2011]

pageoneresults

4:23 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



g1smd, we need to SEE it in writing from Google!

And, here it is...

Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s. Both of these cases can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content.


Do 404s hurt my site?
[GoogleWebmasterCentral.BlogSpot.com...]

enigma1

4:45 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What you're doing is incorrect and has the potential to harm your site overall (and your clients if you've implemented this on their sites). Continue doing so at your own risk.

I know what I am doing. We've been over this argument before. What you are referring to is irrelevant. These aren't links that exist in the domain. In order to get a problem you must have invalid links in your content somewhere. And a 301 or 404 won't help in these cases.

What you read in these posts is incorrect setup of 301,302,200,404 etc headers. That's what they're talking about.

Google asks for example.com/noexist_7f328a3ce23a7283.html type URLs on a random basis from time to time.

That's on the HTML site verification option which I don't use. It's not what we are talking about here. I use a verification header so it doesn't apply. It also doesn't apply for sites who don't have a gwt account and surely is not what google does to verify how a website responds to an HTTP header for indexing.

PS:
Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s

You should remember redirect is not only via 301. Again that's not what they're talking here.

pageoneresults

5:38 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



brix76 originally asked this question...

What do you guys think - redirect all to homepage or nicely done custom 404 page?


You responded with...

The thing to remember is the type of redirect. If you want to get rid of an old page and there is no similar page do a 301 redirect to the home page. Not just a redirect. The 301 means permanent redirect, the code tries to funnel traffic of a non-existing page to the home page. A 404 doesn't do that. It just says nothing here.


From that point forward we've been having a discussion on why it is not best practice to 301 ALL invalid requests to the home page. It's referred to as a Soft 404 by Google and it is suggested that you avoid it. It WILL cause indexing challenges.

enigma1

6:18 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



why it is not best practice to 301 ALL invalid requests to the home page.

That's not soft 404.
Some websites report a "not found" error by returning a standard web page with a "200 OK" response code; this is known as a soft 404.

I never suggested doing a 200 on an non-existing page with 404 content. The google soft 404 pretty much says the same thing. You present 404 like content returning some different headers.

So let me ask you this. If you have a website www.example.com and I do a request to example.com you will do a 301 redirect to www.example.com yes? If I feed you with infinite links on example.com you will do an infinite number of redirects right? So why you think this is any different, you know the requests are irrelevant why don't you display 404s right away?

Or if you do just one 301 redirect somewhere in your page I can again cause an infinite number of redirects by feeding the same script with various parameters. Are you going to have indexing problems? No you won't.

People are having problems because they use the wrong headers. Check the threads you guys posted:
haven't thought of that at all but did a check it does 301 to the 404 page so it is fine.

It's fine? If you guys think that's appropriate you have lots of problems with your code.

Here is a problem by a poster on the blogspot thread you posted that is relevant to the OP's question.
Does a sudden large number of 404s trigger a filter? We were 301 redirecting un-found products pages to the category page and switched to 404 not found. This resulted in a few thousand 404 pages in the crawl errors. Right after that, all search results for every page on the domain have been pushed back to page 30 or 40 in the rankings down from mostly page one positions.

See the phrase "crawl errors" What I believe happened here is the site had invalid links in the content. They switched to 404 from 301, no rank passed thereafter from the broken links to the real pages, pages lost position.

How much rank would I gain if you feed my domain with invalid links? None most likely, as there is no old page which hold rank, no relevant pages so either the home page or sitemap will be the 301 destination. In this case however I would save plenty of b/w by not serving immediately a nicely done 404. In many cases the 301 header is not followed because it's an automatic request.

pageoneresults

6:26 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's not soft 404.


Yes it is. It says it right here enigma1...

Q: Tell me more about “Soft 404s.”
A: A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn’t exist. A common example is when a site owner wants to return a pretty 404 page with helpful information for his users, and thinks that in order to serve content to users he has to return a 200 response code. Not so! You can return a 404 response code while serving whatever content you want. Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s. Both of these cases can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content.


What am I missing? What am I interpreting incorrectly? My brain frackin hurts...

How much rank would I gain if you feed my domain with invalid links?


It's not the rank I'm looking at, it's the association of anchor text being permanently redirected to the destination.

enigma1

6:37 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe I need to put the whole sentence.
From that point forward we've been having a discussion on why it is not best practice to 301 ALL invalid requests to the home page. It's referred to as a Soft 404 by Google and it is suggested that you avoid it. It WILL cause indexing challenges.

This is not a soft 404 what's so difficult to understand?


header("HTTP/1.1 301 Moved Permanently"); // 301
header("Location: http://www.example.com");
exit();

Where do you see the soft 404?

g1smd

6:40 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s. Both of these cases can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content.

pageoneresults

6:44 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I have to cut and paste one more link to the Soft 404 documentation... o_O

enigma1

6:48 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s.

I mentioned above. Redirects can be achieved without 301 headers.

netmeg

6:52 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@enigma1 can set up his sites any way he wants, and the chips will fall where they may.

But anyone else seriously seeking information on this - go with pageoneresults, g1smd and Google. This is really not a debatable issue, and I'm surprised it's gotten to this many responses.

JasonD

7:03 pm on Dec 9, 2011 (gmt 0)

10+ Year Member



I believe that every webmaster that operates sites in niches I compete in can only do one thing to fully comply with Google's guidelines and answer "the 404 question" once and for all.

ALL pages should return a 404 header yet still deliver the same content as previously.

In essence this will prove that you are only building your site for your human visitors and not for bots as the human experience will be identical to having your site set up correctly for them yet will deliver a clear sign to the search engine spiders.

enigma1

7:08 pm on Dec 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@enigma1 can set up his sites any way he wants, and the chips will fall where they may.

Yes where they may, but I don't think I am the one who's having infinite duplicated content.

lucy24

2:05 am on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No it says it no longer exists at the specified address and moved to a different one.

For www purposes there is no difference. Unfortunately there is currently no way to say "this exact page still exists, but its URL has changed". A 301 can mean that, but it can also mean "this other URL is probably what you need, so let's try it".

Consider, by analogy, what your browser does if you request a domain that doesn't exist-- expired, misspelled, whatever. The browser goes the rounds of DNSs but comes up cold. When this happens, the browser doesn't go back and pick the nearest match, or send you to www.example.com or IANA or site of its choice. You get the browser equivalent of a 404: the "check your typing, check your connection, try again later" screen.

tangor

2:30 am on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the same token infinite 404s create infinite error pages

A redundancy of bad programming if this occurs. Reality: a 404 indicates bad programming (ie, I screwed up) or the requests are not hitting my page. Either gives us clues how to make the experience better. If, on the other hand, one opts to redirect all errors, we have few clues on what to fix to ensure a proper response.

Ultimately, we're all individuals with different ideas. I will continue returning a server 404. I'll use that log entry to move forward... If everything "404" is redirected, that would make that process more difficult.

phranque

8:50 am on Dec 10, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you have a website www.example.com and I do a request to example.com you will do a 301 redirect to www.example.com yes? If I feed you with infinite links on example.com you will do an infinite number of redirects right? So why you think this is any different, you know the requests are irrelevant why don't you display 404s right away?

no and wrong - the ideal technical solution is to immediately return a 404 (Not Found) status code and serve a document clearly stating what is amiss with some useful navigation and/or site search to help the visitor to the intended destination.
if this is not technically feasible, a 301 to the canonical hostname and then a 404 is still a higher quality signal than sending an infinite number of urls through a chain of redirects eventually to the home page and a 200 status code.

irrelevance is not OK to a search engine or a sentient bag of flesh - it should be ignored.

the "OK" url space should be finite.
the "Not Found" url space is necessarily infinite.
i'm not sure how the google documentation could be more clear about how deficient and problematic your suggested solution is.

enigma1

10:10 am on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



a 301 to the canonical hostname and then a 404 is still a higher quality signal than sending an infinite number of urls through a chain of redirects eventually to the home page and a 200 status code.

A 301 redirect to a 404 page means you don't know what you're doing. That's bad programming. Because you are now processing the request inside your domain. The 301 takes place inside your domain and its destination url should point to a valid 200 OK page certainly not to a 404. 301 to 404 in the same domain definitely confuses spiders and will give you errors.

Also if you're having chained redirects more than 2 or 3 you're having a problem with your code. The chain of redirects comment is something you made up I never suggested that.

irrelevance is not OK to a search engine or a sentient bag of flesh - it should be ignored.

Sure and irrelevant requests may get irrelevant responses. See above about the thousands of invalid links was mentioned. I am not going to waste my b/w displaying 404 content just because a bot wants to hack a server. So if google decides it's ok to try all kinds of irrelevant requests on my site it will get the appropriate response. But that's not what's happening you see.

If your domain operates in www.example.com then any request to example.com regardless of query do a 301 redirect and point it to the home page which will return 200. This is not the same as transitioning domains.

And the documentation you read about is how the individual http headers used they are not in the same context as what we discuss here mixing headers.

If everything "404" is redirected, that would make that process more difficult.

Precisely. It is difficult to guess the appropriate page destination on a request to a non-existing page. But as I mentioned you are the webmaster of the site you know the domain's content and where to sent the visitor for an invalid request. 404 is just easier to implement but has the 2 problems that I mentioned before. Cannot transfer rank if required and you immediately consume resources especially with a dynamic web system.

Displaying a nice 404 etc vs doing a 301 to an appropriate page can have exactly the same results for the user experience. I mentioned in my comments in the first page of the thread. There is nothing that restricts you from displaying an appropriate message when doing the 301 to the 200 final page. The final page can display a message, there are ways to detect where the request came from. The 404 is easier to setup but not so efficient.

And we are talking about a site that has no problems in its content. All links exposed are valid and return 200 OK. Consider this along with the question of the OP. Because lots of these references you're talking about is about people who are having problems with their content in the first place. They aren't going to fix them just because they do a 404 or 301.

phranque

12:39 pm on Dec 10, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That's bad programming. Because you are now processing the request inside your domain. The 301 takes place inside your domain and its destination url should point to a valid 200 OK page certainly not to a 404. 301 to 404 in the same domain definitely confuses spiders and will give you errors.

not necessarily - you might be an awesome programmer who knows exactly what to do and how to do it, which is why i qualified my statement with "technical feasibility".
you don't process requests in a domain - you process requests in a server which hosts a hostname.
if example.com and www.example.com are on the same server you should do everything possible to provide a 404 (or 410 if appropriate) instead of a 301 when the requested content is not available.
and i'm not talking about a root directory request here - think about a request for something closer to http://example.com/total-junk/mixed-with-some-foo?this=crap
in many cases different hostnames of the same domain will be on autonomous servers and perhaps even controlled by another organization and only the server hosting the destination hostname will have the information or technology to know which urls are OK and which are Not Found.
in these cases your only solution is to redirect the request to the right server and sort it out there.
keeping this in mind, if your first redirect is just to get to the right planet, then your proposed solution is to further degrade the signal with, and i quote you from "the first page of the thread":
do a 301 redirect to the home page

that's where the bad programming comes into play.

And we are talking about a site that has no problems in its content.

obviously!
if you're an awesome programmer you wouldn't have problems in your content, would you?
=8)

pageoneresults

1:00 pm on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Back to the topic at hand...

enigma1 - The thing to remember is the type of redirect. If you want to get rid of an old page and there is no similar page do a 301 redirect to the home page. Not just a redirect. The 301 means permanent redirect, the code tries to funnel traffic of a non-existing page to the home page. A 404 doesn't do that. It just says nothing here.


DO NOT MASS REDIRECT INVALID REQUESTS TO YOUR HOME PAGE.

enigma1, if all of your invalid requests are going to the home page, how do you determine if one of those requests should be going somewhere else? How do you determine if you've got broken links internally? How do you know if an image path is incorrect? I don't understand how one can manage a site WITHOUT 404s? If you get an inbound link to a money page and that link is malformed, it gets 301'd to the home page, right? Well, that ain't right is it?

I really can't believe we've gone 3 pages discussing this. Even with written references from Google, you are still trying to justify your implementation. It ain't happenin. ;)

By the way, did some quick research into your site, you have challenges.

You landed here because we detected an outside redirection into our site possibly via a javascript. This intermediate step is necessary to ensure your browser is not compromised in anyway. If you did not intent to come here, click back on your browser or use its history navigation. Otherwise if you indeed want to access the requested page on our site please click the button below:


^ Ya, I got that when clicking on results from Google. WTF are you doing? You've got some major issues with the site in your profile which is set up to 301>200 all invalid requests.

P.S. Please don't ask me to list the other issues that I've encountered. You seem to know what you're doing and I'm guessing there is some mad reason why. I'll tell ya what, if I were a client and found out about that process, you'd be getting a pink slip. ;)

P.S.S. You're even redirecting what should be valid requests to your home page. Sheesh!

pageoneresults

1:37 pm on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



pageoneresults - I'd also be willing to set up a test page of 1,000 URIs (301>200)
pointed to your site with my choice of path names and anchor text. Are you that sure that there would be "zero effect"?


enigma1 - From me you have the go ahead, I can put in a formal email if you wish. I get at least a thousand of invalid requests daily of which I believe I channel some traffic to my advantage. All the figures I have access to show very low consumption of resources mainly because of the 301s.


Can I take it that this approval is still on the table? I'd like to experiment and test a few things. I also have some others who want to get involved too. If you're okay with me setting something up that may potentially affect your site negatively, then I'm okay with it too. Just give me a confirmation that it is okay to do what I want to do.

Note: It's going to take me a few days to set things up. I'm willing to invest the time to put my theory to the test.

incrediBILL

2:54 pm on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



MichaelBluejay had the right answer using a custom 404 response page "ErrorDocument 404 siteindex.html", which can be dynamic. All of my custom error pages are dynamic so I know it works.

Taking all 404 pages and "permanently moving it" via a 301 redirect to a 200 OK page is very bad webmastering. It's the kind of stuff that would give the Nuns a reason to whack your knuckles with a ruler in school.

Although Google can mostly detect this kind of redirected page not found nonsense and labels it a soft-404, it's just a poorly implemented thing to do that Google, or any other SE, shouldn't have to deal with in the first place.

The major offenders of implementing soft-404s described in this thread are domain parks. They don't want your link checkers to know that http://example.com/anyoldpage.html no longer points to the original site and most of them serve up all requests as a 200 OK just to keep their sites actively linked.

That's the kind of reputation you risk getting by redirecting your 404s.

The SE just might think you're doing something shady like the domain parks.

'Nuff said.

pageoneresults

3:12 pm on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



indrediBILL, didn't you build a tool called LinkScrubber that specifically detects stuff like this? I mean, you've gone out of your way to detect exactly this type of 301>200, correct?

LinkScrubber Link Checker Service - When 200 Is Not OK!

enigma1

6:04 pm on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How do you know if an image path is incorrect? I don't understand how one can manage a site WITHOUT 404s? If you get an inbound link to a money page and that link is malformed, it gets 301'd to the home page, right? Well, that ain't right is it?

Your problem is you don't want to read my responses for some reason, or you don't want to utilize your web system.

I don't bloat .htaccess -like others do- with conditions, rules, ip bans and the like. I handle almost everything at the application level. Is this clear, do you understand it? Any incoming request is handled now by the application not by an apache script.

If someone comes in with a malformed url (like a hack attempt) I may not even redirect him to my home page or anywhere in my domain. I may redirect him to a local ip range or somewhere outside. Why wasting any resources since the request is trash in the first place. And honestly I don't care who it is, bot or human. You on the other hand you're going to service a nice 404 as you said, so perhaps you should check your server stats how much you waste, servicing these requests, something you call good webmastering. Yea right.

What is so hard to believe, I cannot identify a malformed query with a CMS or E-Commerce application? Or I cannot check the format of the request or cannot compare the requested keywords with what the website is associated with, what database records has etc.
I explain the basic code functionality in my website, furthermore is available on sourceforge.net if you want to configure it, explore it try it and you can expand it, it's open source.

So putting aside malformed urls, what about normal requests to pages that no longer exist. Because that's the only type of a request you should be concerned about. So I take the query, process it through the application and I will match it to the most relevant page, then do a 301 redirect. That's right it will be relevant because I have the whole website's content in the database I can run comparisons, tests, you name it. That's the whole idea behind to have the user go to the most appropriate page. And I can display a message there too like "the item you were looking for is discontinued but we have similar items" or whatever is more appropriate subject to the request made if need be.

And finally what about the requests that have absolutely no match? Like keywords g1smd posted previously. If the request is irrelevant with the web content but not malformed it goes to the home page via a 301 redirect. There is no penalty about it, by SEs and I doubt the visitor is looking for something relevant to my web content. Why would I care if you try foo-quux on my site? What's the point? We all know is irrelevant with the web content, 301 redirect to the home page, let him start from scratch.

The google documentation that talks about redirects, not-found pages etc, is basic documentation that won't consider application specifics. Otherwise it would had been a thesis 1000s of pages long of script examples and possibilities. What do you expect it to tell you? You should be investigating these.

If you are so convinced that 301 to 200 is a soft redirect, why don't you test it on gwt? It's not that difficult see what errors are coming up. (gwt->crawl errors->web). I have tested it I see nothing despite my "infinite redirect space" as you call it. So no, what you claim above is completely wrong. The spider doesn't treat them as soft 404s. And why don't you test your internal 301->404 theory that you work with, to see what will come up. Force a request to a 301 redirect inside your domain and point it to a 404.

Can I take it that this approval is still on the table?

Yes pageoneresults you have my approval.

g1smd

8:08 pm on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't bloat .htaccess -like others do- with conditions, rules, ip bans and the like. I handle almost everything at the application level. Is this clear, do you understand it? Any incoming request is handled now by the application not by an apache script.

That's much like a night club deciding to not employ two bouncers on the door to keep riff raff out, and then having to employ ten security staff inside the building to clean up the mess.

Rejecting some requests at the front door, using mod_rewrite, mod_auth, etc is way more efficient than passing the request into the server application and then sorting out the mess from there. Sure, there are some requests that are right for the application to deal with, but certainly not all.

I may redirect him to a local ip range or somewhere outside. Why wasting any resources since the request is trash in the first place.

If the application is doing this, you've already wasted a lot more processor cycles than if mod_rewrite had bounced or blocked the request. And if the mod_rewrite rules are in httpd.conf rather than htaccess they're compiled into the server configuration and run even more efficiently.

If the request is irrelevant with the web content but not malformed it goes to the home page via a 301 redirect. There is no penalty about it, by SEs and I doubt the visitor is looking for something relevant to my web content. Why would I care if you try foo-quux on my site? What's the point? We all know is irrelevant with the web content, 301 redirect to the home page, let him start from scratch.

These requests should not be redirected at all and certainly not to the home page.

enigma1

8:38 pm on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the application is doing this, you've already wasted a lot more processor cycles than if mod_rewrite had bounced or blocked the request.

I will do the security at the application level. You keep the bloat on the server script, which hasn't got a clue about the request. That's our difference. And you assume too much about the server environment and setup.

Sure, there are some requests that are right for the application to deal with, but certainly not all.

Ok, once you figure out a magic way the server artificially detects the application content and matches requests against, let me know.

These requests should not be redirected at all and certainly not to the home page.

Why don't you check how much b/w you waste by serving the nice 404 pages since you're concerned about the processor cycles. It's in your stats.

g1smd

9:30 pm on Dec 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I know exactly the size of the returned 404 or 410 response for those requests. It's only a few hundred bytes or less but I'll send those error status codes rather than introduce infinite URL space by redirecting those requests elsewhere.

Ok, once you figure out a magic way the server artificially detects the application content and matches requests against, let me know.

Patterns: Regular Expressions. There's a whole bunch of stuff that doesn't even get through the front door. It's nuked on the doorstep.

pageoneresults

1:09 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



enigma1, what type of server are you on? It's Windows huh?

aakk9999

1:21 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



enigma1, what type of server are you on? It's Windows huh?

Response headers say it is Apache (Unix)

pageoneresults

1:40 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Response headers say it is Apache (Unix)


Heh, and I knew that! That's what I get for replying at 05:09 on a SundADD morning. :)

Apache/1.3.37 (Unix) PHP/5.2.5 FrontPage/5.0.2.2510 mod_ssl/2.8.28 OpenSSL/0.9.8b


It's not everyday you see ASP running on Apache.

Hey enigma1, when was the last time you checked for broken images on your site? Oh wait, you've got to go through all those 301s and figure out which ones they are, huh? For every page load, there are four http responses that are 301>200 to your home page for broken images.

incrediBILL

3:43 pm on Dec 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let's play a rousing round Debunko Squad as the FUD being shoveled is getting eyebrow deep.

We shall start on the definition of a Soft 404 error.

From the wiki on Soft 404's [en.wikipedia.org]
Soft 404

Some websites report a "not found" error by returning a standard web page with a "200 OK" response code; this is known as a soft 404. Soft 404s are problematic for automated methods of discovering whether a link is broken. Some search engines, like Yahoo, use automated processes to detect soft 404s.[4] Soft 404s can occur as a result of configuration errors when using certain HTTP server software, for example with the Apache software, when an Error Document 404 (specified in a .htaccess file) is specified as an absolute path (e.g. http://example.com/error.html) rather than a relative path (/error.html).[5]


Furthermore, Google adds warnings about Soft 404 pages [support.google.com] causing indexing problems:
Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic. Firstly, it tells search engines that there’s a real page at that URL. As a result, that URL may be crawled and its content indexed. Because of the time Googlebot spends on non-existent pages, your unique URLs may not be discovered as quickly or visited as frequently and your site’s crawl coverage may be impacted (also, you probably don’t want your site to rank well for the search query [File not found]).


Additionally, Google claims proper 404 reporting [googlewebmastercentral.blogspot.com] can improve your site crawl!
The web is infinite, but the time search engines spend crawling your site is limited. Properly reporting non-existent pages with a 404 or 410 response code can improve the crawl coverage of your site’s best content. Additionally, soft 404s can potentially be confusing for your site's visitors as described in our past blog post, Farewell to Soft 404s.



There, that issue is settled, redirecting 404s to a 200 OK page is by definition a
"soft-404" and can actually cause problems with your site being indexed, at least that's what Google claims.

Just tgnore them, what would Google possibly know about this issue, probably nothing.

The proper solution was, and still is, a custom 404 error page showing the content you want to show the end user, but it's still a 404 page.

I don't bloat .htaccess -like others do- with conditions, rules, ip bans and the like.


This straw man fallacy improperly implies that anyone using .htaccess bloats their .htaccess files which is ridiculous unless it's a novice at Apache (or incompetent) and doesn't really understand .htaccess files in the first place.

Can you say RewriteMap?

That's how you avoid bloated files and maximize speed in Apache.
This 110 message thread spans 4 pages: 110