Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Upper vs. Lower Case URL's in Google Sitemaps

Is this an errant Google sitemap/indexing manipulation?

         

Propools

6:50 pm on Oct 30, 2007 (gmt 0)

10+ Year Member



There is a Sitemap that has a URL in it as follows:
A-Domain/Floppy_Widget_Games-Plastic_Xenon_Fixer.php?frm=Floppy Widget Games:Plastic Xenon Fixer

In looking at their log file, I see that Google has tried to access the following page:
A-Domain/floppy_widget_games-plastic_xenon_fixer.php?frm=Floppy Widget Games:Plastic Xenon Fixer

The only difference in these URL's and there's 100's of them, is the Capitalization.
So, it causes me to ponder two things:
1. Is Google indexing the proper URL as in the sitemap and just trying variations of it?
2. Does Google just not like mixed case letters in a URL? If so, then why does it only change the capital letters prior to the .php statement?

tedster

1:28 am on Oct 31, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Any mix of upper and lower case makes for a "different" URL, as you probably know. Troubles can come from any server (IIS for a major example) that is not case sensitive. Googlebot does several kinds of "safety checks" for common server errors these days - your #1 would line up with that.

g1smd

1:54 am on Oct 31, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Make sure that all links on the site, and in any sitemap, all use consistent case.

If possible, always use all lower case for everything.

jomaxx

2:06 am on Oct 31, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can't think of any logical reason Google would attempt to spider variations on a URL. More likely someone has linked to the site using lowercase instead of mixed case. In my experience this is fairly common.

Robert Charlton

6:18 am on Oct 31, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



More likely someone has linked to the site using lowercase instead of mixed case. In my experience this is fairly common.

This is one of several good arguments for using all lower case in the first place.

Propools

2:51 pm on Oct 31, 2007 (gmt 0)

10+ Year Member



Well, our back end programmer has done the following:
I just installed an Apache module called mod_speling (that's the correct spelling of it, it's a joke) that makes Apache be case-insensitive, and recovers from minor misspellings (omitted/added characters, transposed characters). That now makes all those lower-case URLs reachable.

As opposed to making all of the URL's have a lower case naming structure. Will this Apache module resolve the problems I'm having?

tedster

3:45 pm on Oct 31, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The spec on this module says:

If, after scanning the directory,

  • no matching document was found, Apache will proceed as usual and return a "document not found" error.
  • only one document is found that "almost" matches the request, then it is returned in the form of a redirection response.
  • more than one document with a close match was found, then the list of the matches is returned to the client, and the client can select the correct candidate.

    [httpd.apache.org...]

    My concern is that the redirection response returns a 301 status code and not a 302. A 302 would open you up to serious duplicate url problems. Remember, Google IS case sensitive, as are all the major search engines.

  • Receptional Andy

    3:47 pm on Oct 31, 2007 (gmt 0)



    mod_speling uses permanent redirects.

    tedster

    4:14 pm on Oct 31, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Thanks, Andy, I should have known better. We're talking about Apache here, not some IIS oddity.

    As opposed to making all of the URL's have a lower case naming structure. Will this Apache module resolve the problems I'm having?

    Sounds like it will, Propools.

    Propools

    4:40 pm on Oct 31, 2007 (gmt 0)

    10+ Year Member



    Thanks to all.

    Receptional Andy

    4:46 pm on Oct 31, 2007 (gmt 0)



    One thing to watch with mod_speling is that it doesn't work with mod_rewritten URLs.

    g1smd

    5:46 pm on Oct 31, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Some very careful testing is in order here then.

    You'll need a variety of test URLs with a mixture of case types, and some way to see the indvidual HTTP headers (WebBug or Live HTTP Headers, or some online tool).

    Propools

    6:42 pm on Oct 31, 2007 (gmt 0)

    10+ Year Member



    Thanks g1smd, I'll check them out.