Forum Moderators: phranque

Message Too Old, No Replies

problem with easy mod rewrite

cant figure this out

         

pixeltierra

8:32 am on Dec 4, 2006 (gmt 0)

10+ Year Member



Ok, I give up trying, so I'll ask the masters...

I have multiple domains resolving to one hosting account, and a separate site at each domain. Each site has it's own directory, but I don't want it to show in the url.

So I essentially want the following in the address bar:

www.site1.com/index.php
www.site2.com/index.php

But I want the corresponding pages to actually be these:

www.site1.com/site1/index.php
www.site2.com/site2/index.php

This is the .htaccess code so far that I've tried to make work:

RewriteEngine on
RewriteCond %{HTTP_HOST} site1.com$ [NC]
RewriteRule ^(.*)$ [site1.com...]

I keep getting "Internal Server Error". Someone please tell me what I'm doing wrong.

ashis06

10:26 am on Dec 4, 2006 (gmt 0)

10+ Year Member



Hi,

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST}!^www.site1.com/site1/index.php$ [NC]
RewriteRule ^(.*)$ [www.site1.com...] [L,R=301]

Try This. Hope u can get what u want.

jdMorgan

1:16 pm on Dec 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you specify a domain in the substitution, you'll get a redirect rather than the internal rewrite you seek. Also, the code will go into an 'infinite loop' unless you take measures to prevent it. I'd suggest:

RewriteEngine on
#
RewriteCond %{HTTP_HOST} ^(www\.)?site1\.com [NC]
RewriteCond $1 !^site1/
RewriteRule (.*) /site1/$1 [L]
#
RewriteCond %{HTTP_HOST} ^(www\.)?site2\.com [NC]
RewriteCond $1 !^site2/
RewriteRule (.*) /site2/$1 [L]

Jim

pixeltierra

7:48 pm on Dec 4, 2006 (gmt 0)

10+ Year Member



I fiddled with this all night, and this ended up working (the difference from my orig post is in red):

RewriteEngine on
RewriteCond %{HTTP_HOST} site1.com$ [NC]
RewriteCond $1!^site1/
RewriteRule ^(.*)$ [site1.com...]

jdMorgan: I need the full path or I loose the domain in the address bar, and maybe it's specific to my server, but it doesn't do an external redirect, since the host is technically not an external one.


RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www\.)?site1\.com [NC]
RewriteCond $1!^site1/
RewriteRule (.*) /site1/$1 [L]

Now the issue is mostly resolved, but the re-write doesn't work when www.site1.com/site1/file.ext is used as the initial request. It just goes straight through, producing confusion in the users, and the allusion of duplicate content. So that the same file can be accessed at two urls:

www.site1.com/dogs/poodle.html (internal redirect)
www.site1.com/site1/dogs/poodle.html (no redirect applied)

This makes it hard to change a site that used to be accessed via it's directory, to one that isn't. I.E., changing this:

www.site1.com/site1/thesite...

to

www.site1.com/thesite...

since the old urls are out there and unaffected by the re-write. I tried to write a second rule to stop this, but always got Internal Server Error:

RewriteRule ^site1/(.*) [site1.com...] [R] [L]

RewriteEngine on
RewriteCond %{HTTP_HOST} site1.com$ [NC]
RewriteCond $1!^site1/
RewriteRule ^(.*)$ [site1.com...]

What could I do to make the old urls get to the right place?

jdMorgan

10:56 pm on Dec 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteRule ^site1/(.*) [site1.com...] [R] [L]

That won't work, for a couple of reasons. First, the flags syntax is wrong -- It should be [R=301,L], and second, your other "site1" rule will rewrite the request right back to /site1 and the result will be an "infinite" loop -- actually either a browser timeout or a 500-Server Error, depending on which end gives up first.

To avoid this, you need to check the original client request:


# Externally redirect direct client request for /site1 subdirectory to site1 domain
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /site1/
RewriteRule ^site1/(.*)$ http://www.site1.com/$1 [R=301,L]
#
# Externally redirect direct client request for /site2 subdirectory to site2 domain
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /site2/
RewriteRule ^site2/(.*)$ http://www.site2.com/$1 [R=301,L]
#
# Internally rewrite site1 domain requests to /site1 subdirectory
RewriteCond %{HTTP_HOST} ^(www\.)?site1\.com [NC]
RewriteCond $1 !^site1/
RewriteRule (.*) /site1/$1 [L]
#
# Internally rewrite site2 domain requests to /site2 subdirectory
RewriteCond %{HTTP_HOST} ^(www\.)?site2\.com [NC]
RewriteCond $1 !^site2/
RewriteRule (.*) /site2/$1 [L]

I'm not sure what you mean by "I need the full path or I lose the domain in the address bar" in the previous post. But the full path should not be needed unless you have another rule (in this file or perhaps set by Control Panel) or an Alias directive that is interfering, or a badly-misconfigured server. Perhaps you've got UseCanonicalName set to "on" in which case none of this is going to work.

The rules in this post should work all together as shown, but they won't work if the full path is used in the second two rules, because then you'll end up with conflicting external redirects; Regardless of whether the domain is on a different server, an external redirect (involving the client) is the result of including the [domain...] or an [R] flag in the rule. See the mod_rewrite RewriteRule documentation, where this behaviour is described.

In short, test this code without modification. If it doesn't work, then you'll have to figure out what else is going on with your server, because something this simple should work without any trouble.

In case there is a misunderstanding, your users will now see only www.site1.com and www.site2.com -- They will no longer see, nor be able to directly access, www.domain.com/site1/<anything> or www.domain.com/site2/<anything>

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

Jim

pixeltierra

1:50 am on Dec 5, 2006 (gmt 0)

10+ Year Member



jdMorgan:

Once again, I am in awe of your generosity and knowledge. Thank you.

The re-writes are working for me perfectly. That said, I have a couple of questions.

1) Why do you use:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /hf/

instead of:

RewriteCond %{REQUEST_URI} ^/hf/

I have tried REQUEST_URI and it doesn't work (error). Why does THE_REQUEST WORK and not REQUEST_URI?

2) I have read many mod-rewrite tutorials on the net, but I don't get some things. Why is the second line needed here:

1. RewriteCond %{HTTP_HOST} ^(www\.)?site1\.com [NC]
2. RewriteCond $1!^site1/
3. RewriteRule (.*) /site1/$1 [L]

From what I understand the order of line execution is 3, 1, 2, 3 (do redirect). But if $1 is ever going to match ^site1/, and "site1/" was never in the uri, then the above must create some kind of loop that I don't understand. Does the above loop until the conditions no longer hold? Or does it just loop once and then rewrite?

pixeltierra

1:58 am on Dec 5, 2006 (gmt 0)

10+ Year Member



jdMorgan:

Once again, I am in awe of your generosity and knowledge. Thank you.

The re-writes are working for me perfectly. That said, I have a couple of questions.

1) Why do you use:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /hf/

instead of:

RewriteCond %{REQUEST_URI} ^/hf/

I have tried REQUEST_URI and it doesn't work (error). Why does THE_REQUEST WORK and not REQUEST_URI?

2) I have read many mod-rewrite tutorials on the net, but I don't get some things. Why is the second line needed here:

1. RewriteCond %{HTTP_HOST} ^(www\.)?site1\.com [NC]
2. RewriteCond $1!^site1/
3. RewriteRule (.*) /site1/$1 [L]

From what I understand the order of line execution is 3, 1, 2, 3 (do redirect). But if $1 is ever going to match ^site1/, and "site1/" was never in the uri, then the above must create some kind of loop that I don't understand. Does the above loop until the conditions no longer hold? Or does it just loop once and then rewrite?

jdMorgan

2:05 am on Dec 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



THE_REQUEST is the original client request, unaffected by either of the internal rewrites. So that's how we tell that a browser requested /site1, and the the URI is not /site1 because of the internal rewrite to /site1.

An example of what it might look like is:

GET /index.html HTTP/1.0

-or-
PROPFIND /site1/index.html HTTP/1.1

If the URI has /site1 in it as received from the client, we need to redirect the cleint to the proper site1 domain. If the URI has site1 in it because of the internal rewrite, we don't want to redirect, because that would cause a loop.

After one of these internal rewrites, the URI seen by RewriteRule will be updated. Since mod_rewrite appears to be recursive, we need to stop /site1/blah from being rewritten to /site1/site1/blah, and thence to /site1/site1/site1/blah, ad infinitum. So, we look at $1, and if it already has "site1" in it, we quit.

The reason mod_rewrite behaves recursively is that it restarts after any internal rewrite, so that the results can be checked for access restrictions or further rewrites. So, in .htaccess, rewrite/redirect loops must be prevented explicitly.

Jim

pixeltierra

2:39 am on Dec 5, 2006 (gmt 0)

10+ Year Member



The reason mod_rewrite behaves recursively is that it restarts after any internal rewrite

This is what I haven't been able to understand for 2 years, but may be closer to getting thanks to you. So when it restarts, the previous $N and %N vars are still available? I figured they'd be wiped out.

This might be why I could never figure out the order of processing, even though I've read all the diagrams on apache's website.

Are the rewrite rules processed linearly, top to bottom? And if that is the case, then a [L] would force a break that skips the rules below, and as you say, restart the .htaccess file with the new URI with the old $Ns?

Is it the whole .htaccess that is restarted or just the last rule that was written, or just the rewrite rules, or just to the last RewriteEngine On declaration? Do you need a new RewriteEngine on for every rewrite rule?

Sorry to overload you with questions, but it is really important for me to get this once and for all. I'm a pretty smart guy and have no problem with programming logic in general, I've been very discouraged everytime I've worked with mod_rewrite. There is no way to debug. Just System Error. Not very helpful.

I really do appreciate your help and your time. You deserve a medal...

jdMorgan

3:12 am on Dec 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When you get a server error, go straight to the server error log. It often tells you exactly what is wrong. No access to the server error log? -- Time to change hosts.

The [L] flag on a rule stops processing if a rule matches, but only for for the current pass through this .htaccess file. If an internal rewrite has been invoked, processing re-starts at the top of the top-most-directory's .htaccess file. Local variables, such as $1 and %2 are discarded after each rule. Environment variables, such as those set by RewriteRules using [E=VarName:value] are retained throughout the current HTTP transaction -- and as pointed out in a recent thread [webmasterworld.com], are copied into REDIRECT_VarName when an internal rewrite is done.

I don't know what to tell you about processing order. If the pattern of a RewriteRule matches, then the RewriteConds are evaluated. If all ANDed RewriteCond patterns match, or if any of the ORed RewriteCond patterns match, then the specified RewriteRule action (a redirect, rewrite, error response, etc.) is invoked.

Back-references to up to nine parenthesized pattern matches in the RewriteRule pattern are available for use in RewriteConds and the RewriteRule itself as $1-$9. Back-references to up to nine parenthesized pattern matches in the (single) last-matched RewriteCond are available for use in subsequent RewriteConds and the RewriteRule as %1-%9.

In other words, I can't explain it any better than the pictures in the mod_rewrite documentation, accompanied by the narrative text. Some folks get very good at mod_rewrite in only a few months, with experience in writing and testing code as a determining factor. Re-reading the documentation every few months helps: More things will have meaning and make sense each time. I've been using mod_rewrite for more than six years, and I still make mistakes. It's just that now, my mistakes are much more complicated than they used to be. ;)

Jim

pixeltierra

6:39 am on Dec 5, 2006 (gmt 0)

10+ Year Member



The [L] flag on a rule stops processing if a rule matches, but only for for the current pass through this .htaccess file. If an internal rewrite has been invoked, processing re-starts t the top of the top-most-directory's .htaccess file. Local variables, such as $1 and %2 are discarded after each rule.

Ok this makes a lot of things make sense. I think I've really got it this time. Thanks a million.

I'll check out the server logs if I have access to them. If I don't maybe I'll consider other hosting options. Could you sticky me with a couple host recommendations?

jdMorgan

2:19 pm on Dec 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To avoid potential conflicts of interest, we don't do hosting recommendations here. But let me put it this way: Cheap hosting can be very expensive.

Consider the costs:

  • Higher downtime
  • No or little support
  • No access to critical raw server logs
  • Limited Apache module and application support
  • Bad neighborhoods on shared servers
  • Configuration problems
  • Security holes (Example: I've seen shared servers where I can access other users' files using FTP!)
  • Infrequent applications security upgrades
  • etc...

    The biggest change seems to be when going from the $7/month level to the $20/month level. This is a generalization wide enough to be false over a small sample set, but true on a larger scale; Some very good hosts are inexpensive, while some incompetent ones are expensive. Some good hosts go bad, and some bad hosts improve dramatically over time. However, given a large enough sample, the generalization is true, as the majority of the cheap hosts have one or even all of the problems listed above. I tend to use the short-term signup offers often to check out hosting companies because of this. If it doesn't work out, change the DNS to point to another host, and lesson learned.

    If you ask about this on other forums, be sure to consider who is recommending specific hosts -- They may be employees or owners of that company. And like cars, many people will recommend whatever they chose, simply as a means of psychological self-justification. Only with large numbers of recommendations can any sense be made of it.

    Sorry for the "non-answer," but we try to keep this place strictly non-promotional, so that for the most part, you can trust what people post here.

    Jim

  •