Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Case sensitive URLs cause duplicate google cache

         

warth0g

10:12 pm on Feb 1, 2007 (gmt 0)

10+ Year Member



I have a site that uses urls ending like as a directory liek so
/Ohmaha/Nebraska
I was just checking and found that google had a different cache for that page - dated Jan 31 than for /Ohmaha/nebraska (small N) which has a cache date of Jan 28th!

the pages are not case sensative yet google is caching them as 2 different pages, is this going to cause a duplicate content penalty?

yikes

tedster

10:47 pm on Feb 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



from the W3C website [w3.org]:

URLs in general are case-sensitive (with the exception of machine names). There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive.

So when your server is set up to ignore case, you are serving the same content for two or more different urls. Google rightly treats different capitalization as different urls, and you can get various complications, as you just discovered with the cache. The two urls will also have different real PR behind the scenes, most likely splitting the possible PR into to or more "piles".

It's not strictly speaking a "penalty" but it sure impairs your ability to rank well, especially if you are casual about capitalization in your own anchor tags.

warth0g

11:16 pm on Feb 1, 2007 (gmt 0)

10+ Year Member



the domain names are not case sensative but the paths are - all microsoft servers are like this:

www.microsoft.com/en/us/default.aspx

is the same as

www.microsoft.com/En/Us/default.aspx

is the same as

www.microsoft.com/En/Us/DeFaUlt.aSpx

this is how the operating system works, i always make sure my links are using correct capitolazation but i cant control other people linking to my site with incorrect case

so when i say URLS i actually mean the script name or document path, i dont see anyway around this - im just suprised google would not ignore case

theBear

11:42 pm on Feb 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are rewrite rulesets that can fix that issue.

While I'm at it Google is only following the standards.

[edited by: theBear at 11:43 pm (utc) on Feb. 1, 2007]

BigDave

12:09 am on Feb 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think you identified your own problem
all microsoft servers are like this

Microsoft always has trouble reading specs.

tedster

12:17 am on Feb 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes - by default Microsoft servers are set up in opposition to the web standard on case sensitivity. A wee bit of corporate hubris that everyone who uses their products now suffers from and should account for.

At a minimum, I ask my IIS clients to restrict their urls to lower case. This practice at least helps them not to get so many bad backlinks, or creative variations in url case around the site, written by various members of their web team.

If they're already rolling along with mixed case, then they need to inpsect every anchor on the site for incorrect capitalization. It's a stupid, tedious job made necessary by a misguided product development team at Redmond who has long been focused internally. Now they have legacy support to consider if they ever decide to join the rest of the world.

Even better is to install a third party module like ISAPI Rewrite on IIS - something that offers a regular expression engine to emulate the best feature's of Apache's mod_rewrite to manipulate urls.