Forum Moderators: phranque
In the internet world, I know that 99% of people replace spaces with dashes, underscores or plus signs. Makes sense as well since different browsers translate the space, some put the %20 while others leave it a space.
So now I have a client who has rather sloppily just thrown category names, spaces and all, into their URLs. I have pleaded with them to change it as I am certain it will affect spidering somehow (and to be honest, the number of indexed pages has plummeted). Problem is, I am not 100% positive that it really is an issue and I am also not sure that if it is the case, why it affects the spiders or the site's performance.
This is the first time I have run into a client that has done this, honestly. Seems like every other developer just knows your don't leave spaces in the URL. I can't even seem to find any info on this.
Anyway, is the %20 an issue, really? Or am I just being anal? Why exactly is it an issue if it is one?
Google does not treat two_words as being two words, and underscores visually disappear in underlined links.
Spaces get converted to %20 and that makes%20the%20URL%20very%20hard%20to%20read.
Use hyphens, dots, commas, colons, the plus sign, whatever; avoid both spaces and underscores.
As for whether spaces are allowed by the server filesystem, that depends on the operating system running the server. Filenames are not URLs, the two things are only 'associated'. Avoid them there too, if you can.
So now I have a client who has rather sloppily just thrown category names, spaces and all, into their URLs. I have pleaded with them to change it.
As a programmer, I always turn these around: if a problem arises out of user error, it means there is a deficiency in my programming I have overlooked. I find it the "path of least resistance" to just program the fix into it. The "average user" doesn't understand half of what you tell them, much less remember it.
$separator = '-';
$url_title =~ s/\s+/$separator/g;
or more popular,
$url_title = preg_replace('/\s+/',$separator,$url_title);
Done.
As for the underscore issue, I have one site that uses underscores, not dashes, and I have read all the points about underscores. They are all perfectly valid and by better coders than I; still, while I agree on the hyperlink complaint, I'm still on the fence about how it affects SEO and user comprehension. This site is doing extremely well "as is."
Most of it's user's can't even find the address bar, much less remember how to type
example.com/Green Widgets
But even if they can, see point #1 about path of least resistance:
(After unencoding, %20 becomes a regular space)
$separator = '_';
if ($url_title =~ /\s+/) {
$url_title =~ s/\s+/$separator/ig;
}
or
if (preg_match('/\s+/',$url_title) {
$url_title = preg_replace('/\s+/',$separator,$url_title);
}
<dons flame suit>
Anyway, none of these images were indexed or at least I havent received Google traffic for any of them. the only pics I received traffic were the ones properly named!
Luckily I havent put space in my URL's except for the direct links to the old images: example.com/image%201.jpg