Forum Moderators: open
Is it wise to use robots.txt to disallow the old url's, or will this prevent Google from seeing the new URL's which are 301 redirects? Or is it better to have this so that defunct url's will be replaced more quickly with the new URL's?
<added>If there are significant inbound links to the old URI's, you'll want to keep the 301's in place long term. If not, you can remove the 301's after 3-6 months, at which point the old URI's will 404 out of existence.</added>
Depending on the PageRank of each old URL, it will take anywhere between two weeks and a year to resolve all these redirects. As for requests for URLs returning 404s, it may take even longer.
Note that since you removed those old 404 URLs intentionally, you should be returning a 410-Gone response if they are requested. Search engines currently seem to treat 404 and 410 identically, but 410 is the correct response because it says "We removed this resource and it won't be back," rather than "The requested resource cannot be found, the reason is unknown, and the resource may or may not return."
Jim
One exception that I thought of is that brand-new url's that are not related to any old ones appear to be ranking better, so I thought that maybe if Google just indexes the new url's from scratch, it might be better, but it was just a theory.
I will take your advice and remove the disallow for the 301's url's...
I've had them blocked via robots.txt from using an old path and the Google Webmaster Tools still complains that I have over 30K "URLs restricted by robots.txt".
Doesn't seem to hurt my site any but they won't let those URLs go.
[edited by: incrediBILL at 10:00 pm (utc) on June 23, 2008]
That is NOT a problem, because if anyone were to click any such entry in the SERPs then your redirect will deliver them to the correct URL and to the correct content on your site anyway.
That is what the redirect is for.