Forum Moderators: phranque

Message Too Old, No Replies

Correct HTTP code for /index.php to / redirect?

         

JAB Creations

2:36 pm on Oct 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am reading through the HTTP status codes and I'm not sure which HTTP code to serve when a client decides to request /index.php and I redirect them to /?

- John

rocknbil

3:53 pm on Oct 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wouldn't you want a 301? From the Apache forum archives on this site,

# Externally redirect direct client requests for "/index.html*" in any directory to "/" in that same directory
# Make sure it comes after any other redirects.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]

That's for html, you can change it to PHP and probably don't need the full URL, play with it.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ /$1 [R=301,L]

JAB Creations

4:43 pm on Oct 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member


The problem is I already have a rewrite rule for capturing all URL's and having my site's CMS handle everything. I've looked at Apache's site so many times and it still baffles me to this day.

- John

RewriteRule ^index\.php$ http://%{SERVER_NAME}/ [R=301,L]

jdMorgan

8:47 pm on Oct 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When used in a .htaccess context, your "simple" rule will create a loop if any other rewrite rule is used to internally rewrite requests for "/" URLs to /index.php filepaths. Therefore, the check of THE_REQUEST is used to distinguish between /index.php requests coming directly from the HTTP client (e.g. browser) and those occurring as a result of an internal rewrite during a previous pass through the Apache API fix-up phase. This is the reason for the phrase "direct client requests" in the comments.

To avoid canonicalization problems when "UseCanonicalName" is set to "On" and ServerName is declared differently from the redirect target hostname in the host configuration files (now or later), specify the full protocol and URL. It's cheap insurance against an unexpected config change by your host.

To avoid a sore brain when you re-visit this code after five years, use correct and comprehensive comments, and leave those comments in place.

# Externally redirect direct client requests for "/index.php*" in any directory to "/"
# in that same directory, retaining any appended query strings or url-fragments.
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.php([?#].*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]

This rule, being more URL-path-specific, should precede other less-specific external redirect rules such as any "catch-all" domain canonicalization rule. And all external redirect rules (ending with [R=30x,L] flags and specifying protocol plus full URLs) should precede any internal rewrite rules (ending with just [L] flags and specifying only internal filepaths) to prevent "exposure" of internal filepaths as URLs to clients -- most importantly, search engines. Proper Order for htaccess directives [webmasterworld.com]

This rule can easily be adapted to handle http/https, and with some work, to handle multiple canonical hostnames as well.

Jim

JAB Creations

1:53 pm on Oct 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow thanks for the replies! I actually have (to the best of my knowledge) ordered things in my .htaccess that way though I should double-check just in case since I will be merging two different sets of .htaccess commands from the new and old versions of my site.

In regards to UseCanonicalName will it conflict with %{SERVER_NAME}?

I spent some time with the code and I'm really really close to figuring this out. What does work is a second level redirect, in example example.com/blog/index.php will redirect example example.com/blog/ and on top of that it works both on localhost and my domain name without having to have two copies of the .htaccess file thanks to %{SERVER_NAME}.

The only thing the code does not do is effect all level of directories. So in example example.com/level1/level2/level3/index.php will redirect to example.com/level1/ when I'd prefer to have it redirect to example.com/level1/level2/level3/.

So I think the most effective question with keeping in mind that I do have a reasonable grasp of regular expressions is there a way to do a sub-string select with regex/Apache?

In example with PHP...

echo '<div>'.substr('http://example.com/level1/level2/level3/level4/index.php',0,-9).'</div>';


...will output...

http://example.com/level1/level2/level3/level4/

...regardless of how many levels of directories deep the request is.

Here is what is working in it's entirety minus most of the file paths and file extensions in the initial two lines. I've really reduced it to the absolute minimal amount of code that works (only works with first level directories of course). I do understand some of the operators such as ^ start and $ end though I don't see why all the other bits of code are included unless it correlates back to something like a different server variable that could change this or certain characters won't catch unless they're explicitly declared, etc?

RewriteRule ^(redirect.php|test.php|blog/|forums/) - [L]
RewriteRule !\.(css|cur|gif|gz|html|h3m|ico|jpg)$ index.php

RewriteCond %{THE_REQUEST} index\.php


One thing that has crossed my mind from all of this is having the sub-directory redirect condition placed before the rule that rewrites everything to the main index.php file. I'm pretty sure it would be necessary otherwise my CMS would have to handle the index.php to / redirects (which I could easily do with PHP if I was able to set that up).

I'm also aware that $1 through $9 can match parts of a regex return (I'm not entirely sure that is stated correctly) though I have a hunch that there may be a way to match everything of %{THE_REQUEST} except the index.php part of the URL and redirect to it? So are we trying to simply match * before index\.php? If my understanding of the $ return(?) is correct then it may be as simple as creating a regex that executes that kind of match? I know my post is pretty much all over the place though I'm going to follow my hunches and try some more regular expressions. I know before this post I did not mention anything about multiple level sub-directories so that may change the whole approach.

I'll try creating a regex to match * before index\.php, I use a free program called The Regex Coach which is awesome as it has a tool to see step-by-step matching of a regex against a string so I'll work with that and see if I can figure out the formula against the string "level1/level2/level3/level4/index.php" presuming that is what %{THE_REQUEST} contains.

- John

jdMorgan

2:22 pm on Oct 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The regex in the code I posted 'selects' any number of subdirectories from zero to the maximum allowed by your server configuration. As a stand-alone rule, it will work fine.

So the problem appears to be that your rule order is incorrect, and I commend the linked thread above to you.

Also note that you have un-escaped literal "." characters in your first rewriterule rule posted above, which can lead to ambiguity in matching and subtle failures.

As currently structured, your first-posted rule skips all subsequent rules if the request is for any of the URL-paths that it matches.

Jim

JAB Creations

2:58 pm on Oct 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I escaped the "." characters in the exceptions rule and updated it in my code and below.

Here are two localhost examples of request/redirects using only the code below...

localhost/Version 2.9.A.5/web/xhtml/index.php
localhost/index.php/xhtml/

localhost/Version 2.9.A.5/web/index.php
localhost/index.php/xhtml/

On my live domain the following happens...

example.com/sub/web/index.php
example.com/

example.com/sub/web/xhtml/index.php
example.com/index.php/

Both my local and live servers are running Apache 2.2 in case that has any bearing on anything.

When I put the regex that both of you posted The Regex Coach says it matches the entire URL string...presuming your code is right and that it's something else (I am doing this with an otherwise empty .htaccess file) I'm not exactly sure what to think though I do appreciate your help!

- John

AddType application/x-httpd-php .css .html .js .xml

RewriteEngine on
RewriteRule ^(redirect\.php|test\.php|blog/|forums/) - [L]
RewriteRule !\.(css|cur|gif|gz|html|h3m|ico|jpg)$ index.php

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.php([?#].*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://%{SERVER_NAME}/$1 [R=301,L]

JAB Creations

4:16 pm on Oct 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



User error, my bad, the code works and my PHP was interfering with everything. I resolved the issue and everything works peachy keen now.

Also I did notice differences in how the order of things in Apache can greatly effect the outcome. Thanks for your time.

- John