Forum Moderators: open
It's a problem that means we always employ rewrite rules, just to be on the safe side. We prefer the trailing slash to be the correct URL for the page, with a permanent redirect on the other.
It's annoying that search engines should choose to display your URL in a way that can cause issues.
I guess they do it for readability - not expecting people to type it in (and possibly get an error on a system that's not expecting it).
Let's remember that this discussion should probably be centred on the skills of the average site owner (let's use site owner rather than webmaster as the average person who publishes on the web is not the type of person that frequents WebmasterWorld).
The decision by BIG companies to use a system that could easily break URL's is scandalous.
The web should be a place where knowledge of a niche should be enough to rank well, not that in combination with knowledge of how to set up a server.
I know that many people here, including myself, profit from people being unaware of how to rank well. But we are talking about a basic error in SE's approach that WILL cause issues.
It's a problem that means we always employ rewrite rules, just to be on the safe side. We prefer the trailing slash to be the correct URL for the page, with a permanent redirect on the other.
Just something I've wanted to note for a while...
[webmasterworld.com...]
...takes you to the WebmasterWorld Google forum.
[webmasterworld.com...]
...gives you a 404.
Straight away you can say that scrapers could take the wrong URL and end up creating duplicate content for domains that serve both versions as the same page.
I remember that I used to see a lot of URIs in serps that had spaces in them that weren't in the originals. Haven't noticed these for a while and can't come up with an example now.
Assuming the engines have thought about how they're returning their URI displays, why are they doing it the way they are? And, for that matter, why isn't WebmasterWorld redirecting its forum URIs?
What some developers fail to do is Hack the URI and make sure that the proper response is returned as you take away from the URI. You just back your way up from the final destination.
1. http://www.example.com/sub/
2. http://www.example.com/sub
3. http://www.example.com/su
4. http://www.example.com/s
Number 1 serves a 200.
Number 2 serves a 301 to Number 1.
Number 3 serves a 404
Number 4 serves a 404
I have to be careful with Number 2. If a server is set up to use Content Negotiation, then I may not take Number 2 and 301 it to Number 1. Now that I've gone extensionless, Number 2 is a valid URI.
So, this is where a potential issue comes in. Google has it right. Yahoo! and MSN I believe may be causing harm to site owners due to the way they display incomplete URI's.
Did I wake up with a tin-foil hat on this morning or what?
[edited by: pageoneresults at 9:05 pm (utc) on Jan. 15, 2007]
I have to be careful with Number 2. If a server is set up to use Content Negotiation, then I may not take Number 2 and 301 it to Number 1. Now that I've gone extensionless, Number 2 is a valid URI.
Then Content Negotiation has not been setup properly. Content Negotiation, when properly configured, will return a 404 for Number 2. That is, if there really is not a resource at that URI.
And, for that matter, why isn't WebmasterWorld redirecting its forum URIs?
Why? [webmasterworld.com]
Also, a search engine should always obey a redirect by listing the corrected URL. But Yahoo has never been good with redirects anyway, and I wouldn't expect MSN to be much better than Yahoo.
So in summary, Apache and Google have it right, which is what I always expect.
because it's wrong for a missing ending slash to break the url.
This makes it sound like you believe that some specification requires that:
http://www.example.com/resource/
and
http://www.example.com/resource
have to refer to the same resource.
The RFC for URIs [gbiv.com] clearly says that's not a requirement (and specifically says it's not appropriate for a web spider to assume the two refer to the same resource unless it actually gets told that by the given web server).
Since I'm free to decide that, on my web server, http://www.example.com/resource/ is a valid resource and that http://www.example.com/resource is not, I can't see in what sense the word wrong applies. The "missing" slash in this case does not "break the URL", it simply changes the resource specifier into one that does not refer to a valid resource.
[edited by: pageoneresults at 9:25 pm (utc) on Jan. 16, 2007]
[edit reason] Examplified URI References [/edit]
For quite some time (and as recently as a year or so ago) Yahoo was also stripping the trailing URL from the click URL.
I just did a few searches to see what I could uncover in this instance. I found a few Click URIs that were without a trailing forward slash. Fortunately the server where those sites reside handled it correctly and redirected to the trailing forward slash.
There are way too many sites out there, particularly on Windows Servers that don't handle this correctly and me thinks it could be a potential issue.
This makes it sound like you believe ... have to refer to the same resource.
it's not appropriate for a web spider to assume the two refer to the same resource
I think it's valid in this case to say that over 99% of users and Webmasters expect "content" to be the same as "content/", and that Web servers and SE's can handle that so as to cause no problems for those 99%.
I think it's valid in this case to say that over 99% of users and Webmasters expect "content" to be the same as "content/",
I hope not as that would be a mistake.
And that Web servers and SE's can handle that so as to cause no problems for those 99%.
Hmmm, well there goes all the work I'm getting ready to do in moving to an extensionless environment also referred to as Content Negotiation. ;)
And even if the slashes are always correct and always obeyed, urls can still be ambiguous if you allow the ending slash to have meaning.
For example, what if you have a url like:
example.com/content/parameter
Where "content" is a script. Then suppose someone wants to pass a blank parameter. What's the correct url?
When a resource is requested without a trailing slash, and that resource is a content-negotiated script that would normally return a 200 OK response, you might want to check for that and handle accordingly. I tend to follow the default action of my http server and 301 redirect to the resource with a trailing slash first. If you choose not to, that is your prerogative.
Although the "Why?" link that I referred to earlier discusses case-sensitivity the same discussion could be applied here in regards to the resource requested having a trailing slash or not. If WebmasterWorld decides that they don't need to worry about the trailing slash and return a 404, so be it. If there is a fear that type-in traffic, inbound links, etc. are going to cause some form of loss, then a decision must be made and the server configured accordingly to accommodate. Either return a 200 OK and a new resource or 301 redirect to the intended location.