Forum Moderators: phranque
example.com/noexist_7f328a3ce23a7283.html type URLs on a random basis from time to time. [edited by: g1smd at 4:23 pm (utc) on Dec 9, 2011]
Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s. Both of these cases can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content.
What you're doing is incorrect and has the potential to harm your site overall (and your clients if you've implemented this on their sites). Continue doing so at your own risk.
Google asks for example.com/noexist_7f328a3ce23a7283.html type URLs on a random basis from time to time.
Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s
What do you guys think - redirect all to homepage or nicely done custom 404 page?
The thing to remember is the type of redirect. If you want to get rid of an old page and there is no similar page do a 301 redirect to the home page. Not just a redirect. The 301 means permanent redirect, the code tries to funnel traffic of a non-existing page to the home page. A 404 doesn't do that. It just says nothing here.
why it is not best practice to 301 ALL invalid requests to the home page.
Some websites report a "not found" error by returning a standard web page with a "200 OK" response code; this is known as a soft 404.
haven't thought of that at all but did a check it does 301 to the 404 page so it is fine.
Does a sudden large number of 404s trigger a filter? We were 301 redirecting un-found products pages to the category page and switched to 404 not found. This resulted in a few thousand 404 pages in the crawl errors. Right after that, all search results for every page on the domain have been pushed back to page 30 or 40 in the rankings down from mostly page one positions.
That's not soft 404.
Q: Tell me more about “Soft 404s.”
A: A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn’t exist. A common example is when a site owner wants to return a pretty 404 page with helpful information for his users, and thinks that in order to serve content to users he has to return a 200 response code. Not so! You can return a 404 response code while serving whatever content you want. Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s. Both of these cases can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content.
How much rank would I gain if you feed my domain with invalid links?
From that point forward we've been having a discussion on why it is not best practice to 301 ALL invalid requests to the home page. It's referred to as a Soft 404 by Google and it is suggested that you avoid it. It WILL cause indexing challenges.
header("HTTP/1.1 301 Moved Permanently"); // 301
header("Location: http://www.example.com");
exit();
No it says it no longer exists at the specified address and moved to a different one.
In the same token infinite 404s create infinite error pages
If you have a website www.example.com and I do a request to example.com you will do a 301 redirect to www.example.com yes? If I feed you with infinite links on example.com you will do an infinite number of redirects right? So why you think this is any different, you know the requests are irrelevant why don't you display 404s right away?
a 301 to the canonical hostname and then a 404 is still a higher quality signal than sending an infinite number of urls through a chain of redirects eventually to the home page and a 200 status code.
irrelevance is not OK to a search engine or a sentient bag of flesh - it should be ignored.
If everything "404" is redirected, that would make that process more difficult.
That's bad programming. Because you are now processing the request inside your domain. The 301 takes place inside your domain and its destination url should point to a valid 200 OK page certainly not to a 404. 301 to 404 in the same domain definitely confuses spiders and will give you errors.
do a 301 redirect to the home page
And we are talking about a site that has no problems in its content.
enigma1 - The thing to remember is the type of redirect. If you want to get rid of an old page and there is no similar page do a 301 redirect to the home page. Not just a redirect. The 301 means permanent redirect, the code tries to funnel traffic of a non-existing page to the home page. A 404 doesn't do that. It just says nothing here.
You landed here because we detected an outside redirection into our site possibly via a javascript. This intermediate step is necessary to ensure your browser is not compromised in anyway. If you did not intent to come here, click back on your browser or use its history navigation. Otherwise if you indeed want to access the requested page on our site please click the button below:
pageoneresults - I'd also be willing to set up a test page of 1,000 URIs (301>200)
pointed to your site with my choice of path names and anchor text. Are you that sure that there would be "zero effect"?
enigma1 - From me you have the go ahead, I can put in a formal email if you wish. I get at least a thousand of invalid requests daily of which I believe I channel some traffic to my advantage. All the figures I have access to show very low consumption of resources mainly because of the 301s.
How do you know if an image path is incorrect? I don't understand how one can manage a site WITHOUT 404s? If you get an inbound link to a money page and that link is malformed, it gets 301'd to the home page, right? Well, that ain't right is it?
Can I take it that this approval is still on the table?
I don't bloat .htaccess -like others do- with conditions, rules, ip bans and the like. I handle almost everything at the application level. Is this clear, do you understand it? Any incoming request is handled now by the application not by an apache script.
I may redirect him to a local ip range or somewhere outside. Why wasting any resources since the request is trash in the first place.
If the request is irrelevant with the web content but not malformed it goes to the home page via a 301 redirect. There is no penalty about it, by SEs and I doubt the visitor is looking for something relevant to my web content. Why would I care if you try foo-quux on my site? What's the point? We all know is irrelevant with the web content, 301 redirect to the home page, let him start from scratch.
If the application is doing this, you've already wasted a lot more processor cycles than if mod_rewrite had bounced or blocked the request.
Sure, there are some requests that are right for the application to deal with, but certainly not all.
These requests should not be redirected at all and certainly not to the home page.
Ok, once you figure out a magic way the server artificially detects the application content and matches requests against, let me know.
Response headers say it is Apache (Unix)
Apache/1.3.37 (Unix) PHP/5.2.5 FrontPage/5.0.2.2510 mod_ssl/2.8.28 OpenSSL/0.9.8b
Soft 404
Some websites report a "not found" error by returning a standard web page with a "200 OK" response code; this is known as a soft 404. Soft 404s are problematic for automated methods of discovering whether a link is broken. Some search engines, like Yahoo, use automated processes to detect soft 404s.[4] Soft 404s can occur as a result of configuration errors when using certain HTTP server software, for example with the Apache software, when an Error Document 404 (specified in a .htaccess file) is specified as an absolute path (e.g. http://example.com/error.html) rather than a relative path (/error.html).[5]
Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic. Firstly, it tells search engines that there’s a real page at that URL. As a result, that URL may be crawled and its content indexed. Because of the time Googlebot spends on non-existent pages, your unique URLs may not be discovered as quickly or visited as frequently and your site’s crawl coverage may be impacted (also, you probably don’t want your site to rank well for the search query [File not found]).
The web is infinite, but the time search engines spend crawling your site is limited. Properly reporting non-existent pages with a 404 or 410 response code can improve the crawl coverage of your site’s best content. Additionally, soft 404s can potentially be confusing for your site's visitors as described in our past blog post, Farewell to Soft 404s.
I don't bloat .htaccess -like others do- with conditions, rules, ip bans and the like.