Forum Moderators: phranque

Message Too Old, No Replies

Removing ".html" extension after URL

Removing ".html" extension after URL

         

kylef

8:00 pm on Aug 4, 2008 (gmt 0)

10+ Year Member



Hi

I did search and found a similar thread, however there were numerous responses offering different solutions (and some looking quite complicated!) First off, I'm a bit of a n00b in this area but I'll hopefully survive - heh. I currently have a website, and I want to remove the ".html" extension after *all* URLs. ie www.mywebsite.com/random instead of www.mywebsite.com/random.html

I understand that some lines have to be written in an .htaccess file. I know how to do this - but does this file simply need to be placed in the root directory (where all my pages are)? What is the necessary code to do this?

Thanks in advance!
Kyle Flanigan

jdMorgan

8:34 pm on Aug 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The necessary code is available in many threads here, as this topic has been well-covered. Try a WebmasterWorld site search (see link at top of this page) for "remove file extensions rewriterule" and similar phrases. Dig into those threads and read the discussions, then post specific questions back here.

More information on how to get the most from this forum is available in our Forum Charter -- Please see the link at the top of this page.

You may place .htaccess files in any HTTP-accessible directory. Depending on the setting of RewriteOptions inherit, subdirectories inherit the rules and policies of rewritrules in parent-directory .htaccess files, or they stand alone. The simplest --if not most-processor-efficient-- implementation is a single, centralized .htaccess file in your site root (home page) directory. Reduced processing efficiency is traded off with improved ease of centralized maintenance, according to your preferences.

Jim

kylef

10:49 pm on Aug 4, 2008 (gmt 0)

10+ Year Member



Hi Jim - thanks for the reply!

As I understand it, if I remove all ".html" parts of the necessary links and add the following to my .htaccess file - this would solve the issue? (albeit with a slight performance issue)

RewriteCond %{REQUEST_fileNAME} !-d
RewriteCond %{REQUEST_fileNAME} !-f
rewriterule ^(([^/]+/)*[^./]+)/?$ /$1.html [L]

jdMorgan

11:11 pm on Aug 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, but remove that "/?" in the final part of the pattern, and consistently use URLs like "/dir/page" -- no trailing slash on a "page" URL. That part of the pattern is only needed to allow mixing of "/dir/page" and "/dir/page/" URLs -- and allows two URLs per page -- In other words, duplicate-content.

If you do that, then you can eliminate the two RewriteConds --and their associated performance issue-- if you're willing to say, "Rewrite any and all requests which do not contain a period or slash in the final URL-path-part to add a .html extension."

If your site's URL-architecture supports this convention now, then I'd recommend it because it's more efficient. And there are no "standard or customary" extensionless pages -- /robots.txt, /sitemap.xml, /labels.rdf, and /w3c/p3p.xml all have extensions, as would image and media files.

Jim

kylef

11:45 pm on Aug 4, 2008 (gmt 0)

10+ Year Member



rewriterule ^(([^/]+/)*[^./]+)$ /$1.html [L]

That? Due to a dedicated host, my files are in a different directory to most - specifically: root/usr/local/apache2/htdocs (and then my index.html etc is in there) - would that I mean I replace /dir/page with the above?

Thanks for the informative reply

jdMorgan

12:53 am on Aug 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, I simply mean don't put a slash on the end of your extensionless page URLs.

Jim

kylef

8:49 am on Aug 5, 2008 (gmt 0)

10+ Year Member



Ah, sorry - I understand that now thanks!

So - just to confirm, I put that above line in the .htaccess file and remove the ".html" section of all pages (and the navigation URLs within those pages)

Thanks again

jdMorgan

1:42 pm on Aug 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, you put that code into .htaccess, test it thoroughly, and then and remove the ".html" extension from the URLs in the linkson your pages. <a href="blah.html"> should be changed to <a href="blah">

Jim

kylef

1:46 pm on Aug 5, 2008 (gmt 0)

10+ Year Member



One last question, sorry :) Do I also remove ".html" from the actual filenames? As I understand it, the file still needs an extension (hence file extension) but a URL does not?

Thanks

jdMorgan

1:54 pm on Aug 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



URLs are URLs, and files are files -- Not at all the same thing. The server would have no way to tell how to serve a file if you removed its file extension...

Please do a bit of testing and experimentation. It will save you time waiting on replies here, and you will gain a much better understanding of the issues.

Jim

kylef

2:07 pm on Aug 5, 2008 (gmt 0)

10+ Year Member



Okay ;) Thanks for all your help, I really appreciate it!

Edit: I've done a bit of testing regarding the .htaccess file and file URLs, and the issue is at least 'semi-resolved'. I have routed navigation to simply go to <a href="links"> and the links page *does* work - however the URL still has the .html extension in it? I appreciate your help ..

hochstadt

7:41 pm on Nov 22, 2008 (gmt 0)

10+ Year Member



What if you did remove the .html extensions from all links on all pages, you uploaded the content of your previously hand-coded, static HTML pages into your CMS (WordPress, which I've set to create extensionless file names), and now you want to remove the old static HTML pages from the root directory?

Besides others, I tried the following two options -- to no avail.

First option:


RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_fileNAME} !-d
RewriteCond %{REQUEST_fileNAME} !-f
RewriteRule (.*) /$1.html [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]+)\.html\ HTTP
RewriteRule ^([^.]+)\.html$ http://www.domain.com/$1 [R=301,L]

Second option:

RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_fileNAME} !-d
RewriteCond %{REQUEST_fileNAME} !-f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.html [L]

In both cases, the redirection *only* works if the old file (page-name.html) still exists in the root directory of the site. However, when that file is removed from the root directory (whose name is now page-name), whenever I enter www.domain.com/page-name.html into the browser's addressbar to check if the redirection works, it gives me a 404 error.

Again, the host, the domain name, and all file names are the same; they're all just without the trailing .html now.

Apache version 2.2.9

I'm obviously missing a piece here.

Thank you very much in advance for your help.

~Marcus

g1smd

2:17 am on Nov 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In option 1, you have placed the rewrite first.

That rewrite updates the pointers inside the server to the new filepath - and then you invoke a redirect which will expose that "secret" pointer out to the outside world.

Place the redirect to be before the rewrite, and the problem will likely go away.

hochstadt

6:14 am on Nov 23, 2008 (gmt 0)

10+ Year Member



Thank you for the pointer, g1smd.

I did what you said, and it still gives a 404 error when the static HTML file doesn't exist.


RewriteEngine on
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]+)\.html\ HTTP
RewriteRule ^([^.]+)\.html$ http://www.domain.com/$1 [R=301,L]
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_fileNAME} !-d
RewriteCond %{REQUEST_fileNAME} !-f
RewriteRule (.*) /$1.html [L]

~M.

hochstadt

10:08 pm on Nov 26, 2008 (gmt 0)

10+ Year Member



Solution, anyone? :-)

phranque

1:00 am on Nov 27, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], hochstadt!

the server logs should tell you which resource is eventually being requested and that should provide a hint about solving your problem.

hochstadt

1:45 am on Nov 27, 2008 (gmt 0)

10+ Year Member



Thank you for your reply, phranque, although I must confess I haven't completely understood it.

Either way, I found a resolving .htaccess configuration in the meanwhile and am going to post it below for others to learn from, too.

With the following simple configuration, I can remove the old static HTML files while requests to those old files are being redirected to the respective new locations which are, again, without the .html file extension now:


RewriteEngine On
RewriteRule (.*)\.html $1 [R=301,L]

Anyway, browsing through your forum threads and learning from your posts was a big help, for which I'm very grateful.

Thank you.

~Marcus

phranque

5:54 am on Nov 27, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I haven't completely understood it

always check the server logs for clues when you have a problem.

jdMorgan

3:58 am on Dec 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think the answer to the mystery is simply that the internal rewrite in the original code was no longer needed. It was rewriting extensionless URLs to .html files, but those files no longer existed. Instead, the extensionless URLs should now be rewritten to WordPress using the mod_rewrite rule that is usually added by the WordPress install.

Jim