Forum Moderators: Robert Charlton & goodroi
"i.e if a listing that was in "City A" is now removed,"
We simply do a 301 redirect to the corresponding city's home page for the city specific listing that was removed. i.e if a listing that was in "City A" is now removed, then we do a 301 redirect to the home page of "City A" for that listing page URL.
- What would be the impact on Google rankings when it suddenly starts finding a huge number of 404 errors.
What is the average lifetime of these listing pages? Is it days? Weeks? Months?
I don't think you should be doing that.
Google has said don't take non-existent pages and 301 them to the HOME page.
I think that what you are doing is PRETTY similar to that.
Personally, I would "do it the other way around", meaning serve a correct 404 or 410 header for the removed pages but rather than a generic 404/410 error page I'd grab and display the information that would be redirected to on the 404/410 page for visitor convenience. (I would also noindex the 404/410 page, just to be safe.)
ADDED: I'd probably insert a "sorry, that page has been removed, but we've provided you with similar resources below." notice at the top of the error page so visitors know what they were looking for is not present, but there are other resources on the site they might like.
Personally, I would "do it the other way around", meaning serve a correct 404 or 410 header for the removed pages but rather than a generic 404/410 error page I'd grab and display the information that would be redirected to on the 404/410 page for visitor convenience. (..)
ADDED: I'd probably insert a "sorry, that page has been removed, but we've provided you with similar resources below." notice at the top of the error page so visitors know what they were looking for is not present, but there are other resources on the site they might like.
(I would also noindex the 404/410 page, just to be safe.)This is completely unnecessary if the correct header responses are returned. Better test your response codes properly.
What would be the impact on Google rankings when it suddenly starts finding a huge number of 404 errors.As I said, I would send 410 instead, but in any case, as long as you are not internally linking to these pages you will be ok, the WMT "Not Found" section is for your information only.
This is completely unnecessary if the correct header responses are returned. Better test your response codes properly.
This is completely unnecessary if the correct header responses are returned.
Adding robots noindex on page at the same time when the response code is changed to return 404/410 will not speed up dropping the page out of index since Google will not bother with page HTML once it receives 404/410 response.
[edited by: JD_Toims at 1:28 am (utc) on Aug 1, 2013]
Don't know about the others, but I was thinking "noindex" on the 404/410 page itself.
Returning a code other than 404 or 410 for a non-existent page (or redirecting users to another page, such as the homepage, instead of returning a 404) can be problematic. Such pages are called soft 404s, and can be confusing to both users and search engines.
A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn't exist. A common example is when a site owner wants to return a pretty 404 page with helpful information for his users, and thinks that in order to serve content to users he has to return a 200 response code. Not so! You can return a 404 response code while serving whatever content you want. Another example is when a site redirects any unknown URLs to their homepage instead of returning 404s. Both of these cases can have negative effects on our understanding and indexing of your site, so we recommend making sure your server returns the proper response codes for nonexistent content.
old listings are removed ... We simply do a 301 redirect to the corresponding city's home page ... lot of categories too become redundant and they are removed, so we ... 301 redirect ... either one level up/two level up/three level up/ or if none exist, then redirect to the home page.
What do you think should be done? Should we treat them as 404 or continue with the existing settings?
What would be the impact on Google rankings when it suddenly starts finding a huge number of 404 errors.
Generally, 404 errors don't impact your site's ranking in Google, and you can safely ignore them.
Q: Do the 404 errors reported in Webmaster Tools affect my site’s ranking?
A: 404s are a perfectly normal part of the web; the Internet is always changing, new content is born, old content dies, and when it dies it (ideally) returns a 404 HTTP response code. Search engines are aware of this; we have 404 errors on our own sites, as you can see above, and we find them all over the web. In fact, we actually prefer that, when you get rid of a page on your site, you make sure that it returns a proper 404 or 410 response code (rather than a “soft 404”). Keep in mind that in order for our crawler to see the HTTP response code of a URL, it has to be able to crawl that URL—if the URL is blocked by your robots.txt file we won’t be able to crawl it and see its response code. The fact that some URLs on your site no longer exist / return 404s does not affect how your site’s other URLs (the ones that return 200 (Successful)) perform in our search results.
We could also add the "Google Webmaster Tools 404 widget " to the 404 page which would show relevant related pages...
Me too... But unless someone is redirecting to a 404/410 rather than serving a custom 404/410 for the requested URL, then I think we're both saying the same thing.
"The URL requested will serve a 404/410 response code with a custom page that also includes an noindex meta tag." -- I think that's what we're both saying we prefer and do.
Having noindex within this HTML that is sent alongside 404 response is obsolete and will not speed up Google removing the original page out of its index.
On the other side, putting noindex in HTML that arrives with 404 response will do no harm, so if you prefer, you can do it but it will make no difference whatsoever.
Either way, my own perfunctory experimentation suggests that g### does not, in fact, index the 404 page. They may not even read it; once a robot has seen the 404 it can choose not to look at the accompanying HTML.
The only way I can get these two lines to make sense is if I read each one as meaning the opposite of what I think it was intended to mean sad (I also can't figure out what word your brain was aiming for when your fingers typed "obsolete", but that's minor.)
Are the two bits I've bolded the same thing or different things?
aakk9999 >> However - one exception - if one of such pages to be removed has very good external links, then on some cases I may decide to leave this page returning 200 OK and add a disclaimer to that page along the lines that this info does not exist any more, please go to city URL to see the full list of currently available services.
netmeg >> I never do any redirects on this stuff. I only redirect for replacements.
And yea, sometimes I have a lot of 404s when I clean out a bunch of old events, but I just have to trust that Google knows how to deal with it. At any rate, traffic on these sites has grown every year since 1999, so it's been okay so far. I'm reasonably sure if you have other quality signals that this one won't hurt you.
If you wait till you're totally sure, you'll be waiting forever. There are no "totally sure"s with Google. But personally, I think a bunch of real 404s are probably better than a bunch of soft ones.
lucy24 >> The rules that apply to the Very, Very Big Names don't necessarily apply to ordinary humans. So unless your name is Amazon, don't assume you can do whatever the competition is doing.
JD_Toims >> Eventually they will, but if the URL is noindexed when they receive a 404 they drop it nearly immediately. (I guess it's possible they've changed something since I've tested it, but a 404 + noindex has worked well for me to get pages dropped sooner in the past.)
phranque >> take a look at what google has said to webmasters recently about soft 404s.