Forum Moderators: Robert Charlton & goodroi
I've searched for this answer but haven't found anything definitive.
Matt Cutts wrote, in answer to a question on how URLs can be canonicalized, "Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs". (Source: [mattcutts.com...] )
But that answer seemed to be describing how Google internally canonicalizes URLs. My question is whether, in a pending CMS upgrade, we should force/standardize currently mixed-case URLs to lowercase, given that the mixed-case versions have been indexed and ranked for years.
In other words, if example,com/FOO is currently well-indexed and ranked, should we risk renaming the page to lower-case 'foo'?
WW_Watcher
Edited to add:
"In other words, if example,com/FOO is currently well-indexed and ranked, should we risk renaming the page to lower-case 'foo'?
If it ain't broke, don't fix it, if you decide to fix it, redirect it.
[edited by: WW_Watcher at 8:06 pm (utc) on Nov. 6, 2007]
There is some spidering evidence of Google trying to discover which sites are on non-case-sensitive servers, but that's a crazy job and I would not depend on Google or any other Search Engine getting it accurately sorted.
Help them out - if you can make all urls lower case, that is the best practice. If you can configure your server to be case sensitive, that's another best practice. If you have a URL that is already well ranked and it uses some uppercase, then know that changing those letters to lowercase does create alternative urls.
It is a rare thing to acquire a duplicate "penalty", but when the same content appears on technically different urls, then that kind of duplication has negative effects. Backlink influence gets split up, one or more of the url versions gets filtered out of search results and so on. This is not a true penalty, as in a black mark against your domain. However, the ranking and traffic problems that are generated can feel like one.
They could, by requesting every possible uppercase/lowercase variant of all of your pages, and then comparing the content of each of them. That is indeed possible, but your bandwidth costs would rise as the square of the length of your URLs, and they'd be able to update their index at least once every five years...
We as Webmasters have to accept some responsibility for the technical correctness of our domains, and cannot rely on Google to fix everything. Besides, who says they'll be the lead search provider forever?
Google did not set the rules. Those rules were set by the RFCs that defined HTTP and the other conventions used on the Web, long before Google's time.
Jim
I just realized another way to confirm this -- Google shows different TBPR values on our site for pages that don't enforce canonical capitalization, e.g.:
example.com/Foo
example.com/FOO
If the TBPR value differ, then clearly Google thinks these pages are distinct.
- http vs. https
- multiple domains and/or multiple TLDs
- www vs. non-www (the most common problem)
- named index pages vs. "/"
- "with-/" vs. "no-/" (server should redirect to "with-/" version)
- mutiple paths to the same content (e.g. virtual topic/directory structure on blogs)
- multiple parameters but with the parameters in a different order
- extra parameters (e.g. for "Print Friendly" pages)
- Capitalisation Issues (IIS only)
and so on. There are very many discussions of these points in the forum archive stretching back four, or more, years.
Although from a ranking perspective both can do equally well (on our site at least!) - the thinking is that from a user perspective, having easy URL's all of one type is much more memorable.. (we also have some hyphens and some underscores..)
With the mixed case URL's which have backlinks to them we've been more reluctant. With the ones where there is no external influence we've just changed the URL and put a 301 on the old URL and no problem. If there are backlinks I'd think about trying to get them redirected and then putting a 301 on the old page...
But as was mentioned - if it aint broke.... Are you really sure you want to be tampering with it?!