Forum Moderators: phranque

Message Too Old, No Replies

perfecting subdomain / directory to query string rewrite

         

rhbecker

8:46 pm on Nov 13, 2009 (gmt 0)

10+ Year Member



I've spend a LOT of time on the below, stealing suggestions from various posts (this forum is AWESOME). It seems to do exactly what I want it to, but because I'm new at this, I'm hoping to get some feedback - tweaks, ways to simplify it, potential problems I'm overlooking ... anything.

What it's supposed to do:

1. If there's a subdomain, make that the value of a url parameter 'page'. If there is not a subdomain, or it's 'www', make the value of 'page' = 'main'.

2. If subdirectories are given, append the first to the value of 'page' and make any further subdirectories the value of a second parameter named 'filter'.

Examples:

1. sub.example.com/a/b/c
becomes
example.com/index.php?page=suba&filter=/b/c

2. www.example.com/a/b/c OR example.com/a/b/c
becomes
example.com/index.php?page=maina&filter=/b/c

Options +FollowSymlinks
RewriteEngine on

# Has a subdomain that is NOT www
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST}<>%{REQUEST_URI} ^([^.]+)\.example\.com(:[0-9]{1,3})?<>/([^\./]*)(.*) [NC]
# url param 'page' = the subdomain (%1)
# if at least one dir is given in the path, the first (%3) is appended to url param 'page'
# if more than one dirs are given in the path, anything beyond the first (%4) becomes the value of url param 'filter'
RewriteRule ^(.*) index.php?page=%1%3&filter=%4 [NC,QSA]

# No subdomain, or subdomain is www
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^(www\.)?example\.com
RewriteCond %{REQUEST_URI} /([^\./]*)(.*) [NC]
# url param 'page' = 'main'
# if at least one dir is given in the path, the first (%1) is appended to url param 'page'
# if more than one dirs are given in the path, anything beyond the first (%2) becomes the value of url param 'filter'
RewriteRule ^(.*) index.php?page=main%1&filter=%2 [NC,QSA]

[edited by: jdMorgan at 1:45 am (utc) on Nov. 14, 2009]
[edit reason] example.com [/edit]

jdMorgan

2:31 am on Nov 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd suggest:

Options +FollowSymlinks
RewriteEngine on
#
# For non-www subdomain requests, the query string parameter 'page' is taken from
# the requested subdomain (%1). If at least one directory level is present in the
# requested URL-path, the first directory-level ($1) is appended to that 'page'
# parameter. If more than one directory level is present, anything beyond the
# first ($2) becomes the value of the query string 'filter' parameter.
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com [NC]
RewriteCond %1 !^www$ [NC]
RewriteRule ^([^/.]*)(.*)$ index.php?page=%1$1&filter=$2 [NC,QSA,L]
#
# If no subdomain is present in the requested hostname, or if the requested subdomain is
# 'www', then the query string 'page' parameter is set to 'main', and if at least one
# directory level is present in the requested URL-path, the first directory-level ($1) is
# appended to 'main'. If more than one directory level is present in the requested URL-path,
# anything beyond the first ($2) becomes the value of query string 'filter' parameter.
RewriteCond %{HTTP_HOST} ^(www\.)?example\.com
RewriteRule ^([^/.]*)(.*)$ index.php?page=main$1&filter=$2 [NC,QSA,L]

  • It's not necessary to check for "HTTP_HOST not blank" if there is a positive-true test for specific HTTP_HOST values in the same rule, as is the case in both rules here.

  • Because of simplifications to the code, the following two comments no longer apply to this code, but may apply to other code you write in the future:
    o Check for optional FQDN (e.g. "example.com.") with a trailing period -- as well as the existing optional port number check.
    o Valid (possible) port numbers are 0-65536 (2^0 through 2^16) so port number could have up to 5 decimal digits.

  • It is not necessary to escape literal periods within [groups].

  • Whenever possible, use a specific RewriteRule pattern and do not use "RewriteCond %{REQUEST_URI}". Because of the way that RewriteRules are executed, no RewriteConds need to be evaluated unless an until the RewriteRule pattern matches. Therefore, a good, specific pattern in the RewriteRule is almost always the most efficient solution.

    I have expanded your comments for clarity. However, I do note that your URL-path patterns will match even if no trailing slash is present in the requested URL-path -- for example, a request for "example.com/test" will be rewritten to "/index.php?page=maintest&filter=".

    Is that what you want? If so, then the code should work, but the comments will need further tweaking for accuracy since "test" with no trailing slash is not a "directory level." And if it isn't already obvious, my opinion is that it is more important --at least initially-- to get the comments right than it is to get the code right; You can't write good code without a very precise idea of what that code is intended to accomplish.

    Oh, and don't be shy about writing your comments in complete, precise English either; Only the "#" character of a comment line gets processed, so use 'em. You'll be really glad you did when you come back to this code after five years and need to remind yourself of its purpose... :)

    Anyway, the tweaked code above should do the same thing as what you posted, but faster... :)

    Jim

  • rhbecker

    10:47 am on Nov 14, 2009 (gmt 0)

    10+ Year Member



    Jim, I really appreciate your time. This is exactly what I was hoping for when I posted.

    Some of your comments were very educational, particularly the bit about the conditionals only being evaluated when a corresponding RewriteRule pattern is matched. I had no idea.

    Is there a text you recommend for better understanding the nuances of rewriting? Obviously, there's a lot I don't know on this subject, but the bigger problem is that I don't know what I don't know ... if that makes any sense.

    jdMorgan

    2:35 pm on Nov 14, 2009 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    The description of RewriteCond processing based on RewriteRule pattern match is given in the Apache mod_rewrite documentation, in the "Ruleset Processing" section. Disregarding for the moment all of the other available books and on-line tutorials, reading that document at apache.org (several times, and with great attention to detail) is a bare minimum for the safe use of mod_rewrite. As posted here many times in the past, mod_rewrite is a very powerful but dangerous tool: One single typo or minor misunderstanding can sink your site and potentially put you out of business.

    Even simple mistakes can be costly: These two rules do almost the same thing, but given requested URL-path-parts of only 10 characters each, the second can be tens of thousands of times faster than the first:


    RewriteRule ^pages/(.*)/(.*)/(.*)/(.*)$ script.php?arg1=$1&arg2=$2&arg3=$3&arg4=$4 [L]
    RewriteRule ^pages/([^/]+)/([^/]+)/([^/]+)/(.+)$ script.php?arg1=$1&arg2=$2&arg3=$3&arg4=$4 [L]

    Unfortunately, you'll find code like that first line all over the Web -- even included in some very well-known and popular script 'packages'. I suspect that this plays a part in many early/unnecessary/costly server upgrades.

    Jim