Forum Moderators: phranque

Message Too Old, No Replies

Rewrite if files do not exist

         

tntpower

10:21 pm on Apr 26, 2008 (gmt 0)

10+ Year Member



I am working on a Drupal project. There is a folder "archive" in my root directory.

When people visit my site, for example, www.example.com/abc.foo, if this URL does not exist, instead of a 404 page not found error, I want visitors automatically go to www.example.com/archive/abc.foo. If it still does not exist, then a 404 error.

Some features I'd like to see are:

1) the redirect to /archive/abc.foo is transparent to visitors, they will even not see /archive/abc.foo; instead, the url is still www.example.com/abc.foo

2) abc.foo is a file here. However, it is not limited to file, can also be directory. For example, www.example.com/directory, www.example.com/directory/subdirectory/file.foo, www.example.com/directory/subdirectory, etc.

3) Rewrite only applies to root directory. No rewrite in "archive" folder

4) No infinite rewrite (i.e.: /abc.html not exist, go to /archive/abc.html, if /archive/abc.html does not exist either, will NOT go to /archive/archive/abc.html)

Here is my .htaccess


# Apache/PHP/Drupal settings:
#

# Protect files and directories from prying eyes.
<FilesMatch "\.(engine¦inc¦info¦install¦module¦profile¦po¦sh¦.*sql¦theme¦tpl(\.php)?¦xtmpl)$¦^(code-style\.pl¦Entries.*¦Repository¦Root¦Tag¦Template)$">
Order allow,deny
</FilesMatch>

# Don't show directory listings for URLs which map to a directory.
Options -Indexes

# Follow symbolic links in this directory.
Options All

# Customized error messages.
ErrorDocument 404 /index.php

# Set the default handler.
DirectoryIndex index.php index.html index.htm

# Override PHP settings. More in sites/default/settings.php
# but the following cannot be changed at runtime.



# PHP 5, Apache 1 and 2.
<IfModule mod_php5.c>
php_value magic_quotes_gpc 0
php_value register_globals 0
php_value session.auto_start 0
php_value mbstring.http_input pass
php_value mbstring.http_output pass
php_value mbstring.encoding_translation 0
</IfModule>

# Requires mod_expires to be enabled.
<IfModule mod_expires.c>
# Enable expirations.
ExpiresActive On
# Cache all files for 2 weeks after access (A).
ExpiresDefault A1209600
# Do not cache dynamically generated pages.
ExpiresByType text/html A1
</IfModule>

# Various rewrite rules.
<IfModule mod_rewrite.c>
RewriteEngine on

# If your site can be accessed both with and without the 'www.' prefix, you
# can use one of the following settings to redirect users to your preferred
# URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
#
# To redirect all users to access the site WITH the 'www.' prefix,
# (http://example.com/... will be redirected to http://www.example.com/...)
# adapt and uncomment the following:
# RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
# RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
#
# To redirect all users to access the site WITHOUT the 'www.' prefix,
# (http://www.example.com/... will be redirected to http://example.com/...)
# uncomment and adapt the following:
# RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
# RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]

# Modify the RewriteBase if you are using Drupal in a subdirectory or in a
# VirtualDocumentRoot and the rewrite rules are not working properly.
# For example if your site is at http://example.com/drupal uncomment and
# modify the following line:
# RewriteBase /drupal
#
# If your site is running in a VirtualDocumentRoot at http://example.com/,
# uncomment the following line:
# RewriteBase /


# Rewrite current-style URLs of the form 'index.php?q=x'.


RewriteCond %{DOCUMENT_ROOT}/$1 !-f
RewriteCond %{DOCUMENT_ROOT}/$1 !-d
RewriteCond %{DOCUMENT_ROOT}/archive/$1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/archive/$1 -d
RewriteRule ^(.*) /archive/$1


RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
</IfModule>

# $Id: .htaccess,v 1.81.2.4 2008/01/22 09:01:39 drumm Exp $

The problem is that :

If I visit /abc.php, and abc.php is physically located in /archive (NO /abc.php, it is just a URL generated by Drupal's URL path feature), then it reports 404 error.

It is the same if I visit /directory and there is a physical /archive/directory folder.

If I visit /abc or /abc.php, and there is no /achive/abc or /archive/abc.php , then it works.

Here is the demo site:

<snip>

For example:

Visiting /mission.php gives 404 (as /archive/mission.php exists)
But /links.php is okay

Visiting /members gives 404 (as /archive/members exists)
But /projects is okay

If I change the Rewrtie rule order so that it is

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1

RewriteCond %{DOCUMENT_ROOT}/$1 !-f
RewriteCond %{DOCUMENT_ROOT}/$1 !-d
RewriteCond %{DOCUMENT_ROOT}/archive/$1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/archive/$1 -d
RewriteRule ^(.*) /archive/$1 [L,QSA]

Then the problem is that /members/ or URLs like it works; However, visit to /aborig_new/ gives 404 error, while it is supposed to be redirect to /archive/aborig_new/

How can I fix this?

Many thanks,

[edited by: jdMorgan at 1:27 pm (utc) on April 27, 2008]
[edit reason] No URLs, please. See Terms of Service. [/edit]

jdMorgan

1:36 pm on Apr 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please post the relevant contents of your server error log; The information in the error log is often very specific, and will help to clarify the cause of this problem. Make a series of test requests, then correlate the error log entries based on time and your own IP address. Completely flush your browser cache (Temporary Internet Files in Internet Explorer) before each test.

Reversing the rules won't work, simply because the first rule will then match *all* of the URLs that the second rule might have matched -- The second rule is more specific, in that it requires the requested URL-path to exist in the /archive directory, whereas the first rule imposes no such requirement.

You may also wish to review the mod_rewrite documentation, specifically that for the RewriteCond directive used with the -f and -d flags. Doing so will allow you to better understand the code you're using -- a good thing, since we may be able to *help* you here, but it's ultimately going to be up to you to fix this problem.

Jim

tntpower

6:08 pm on Apr 27, 2008 (gmt 0)

10+ Year Member



Thank you.

The log msg is pretty simple:

154.xx.53.14 - - [27/Apr/2008:18:01:10 +0000] "GET /mission.php HTTP/1.1" 404 3661
154.xx.53.14 - - [27/Apr/2008:18:00:30 +0000] "GET /events/workshops HTTP/1.1" 404 3964

... ...

[edited by: tedster at 1:24 am (utc) on July 16, 2008]
[edit reason] anonymize the IP address [/edit]

jdMorgan

9:01 pm on Apr 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Please post the relevant contents of your server error log
The info you posted from your access log is useful, but not nearly as as useful as that from the error log.

Jim

tntpower

9:50 pm on Apr 27, 2008 (gmt 0)

10+ Year Member



>The info you posted from your access log is useful, but not nearly as as useful as that from the error log.

I purposely visited a URL that reports 404 error. The info is logged in access log, not error log.

Here are two examples:
154.xx.53.14 - - [27/Apr/2008:21:37:40 +0000] "GET /members/ HTTP/1.1" 404 3661
154.xx.53.14 - - [27/Apr/2008:21:41:11 +0000] "GET /themes/fisheries/img/menu_projects_ovr.gif HTTP/1.1" 200 1037
154.xx.53.14 - - [27/Apr/2008:21:41:12 +0000] "GET /projects/ HTTP/1.1" 200 12399

The log setting for this website is:

ErrorLog "¦/usr/sbin/rotatelogs /srv/log/fisheries/error_log.%Y-%m-%d 86400"
CustomLog "¦/usr/sbin/rotatelogs /srv/log/fisheries/access_log.%Y-%m-%d 86400" common

I have no idea why 404 is logged in access log. Isn't it part of error log?

Thanks,

M.

[edited by: jdMorgan at 10:24 pm (utc) on April 27, 2008]
[edit reason] obscured specifics per TOS. [/edit]

jdMorgan

10:22 pm on Apr 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The contents of the error log should be something like this:
[Sat Apr 19 21:10:05 2008] [error] [client 72.xx.252.136] File does not exist: /u/web/fisheries/<some-rewritten-file-path-which-is-probably-incorrect>.

Note that it is a filepath, and not a URL, which is why we need to see it.

If you're not getting 404 errors logged, then contact your host and ask them to fix it; Chasing a problem with RewriteCond -f/-d is going to be difficult if we cannot see the filepath that the server is trying to access.

Jim

tntpower

10:34 pm on Apr 27, 2008 (gmt 0)

10+ Year Member



Jim,

Thanks for reply.

I have no idea why error log did not log it. It is a dedicated server and I have root access to it.

I double checked httpd.conf, it seems okay


... ...
<VirtualHost *:80>
DocumentRoot "/srv/www/fisheries"
ServerName fisheries.example.com
ErrorLog "¦/usr/sbin/rotatelogs /srv/log/fisheries/error_log.%Y-%m-%d 86400"
CustomLog "¦/usr/sbin/rotatelogs /srv/log/fisheries/access_log.%Y-%m-%d 86400" common
<Directory "/srv/www/fisheries">
AllowOverride All
allow from all
Options +Indexes
</Directory>
</VirtualHost>
... ...

I do not have today's error log, I do have yesterday's error log. But that basically just contains errors that brought by my errors in .htaccess. (spelling errors)

Thanks,

Ming

tntpower

10:48 pm on Apr 27, 2008 (gmt 0)

10+ Year Member



I think I find the reason why it is not in error log.

// Menu status constants are integers; page content is a string.
if (is_int($return)) {
switch ($return) {
case MENU_NOT_FOUND:
drupal_not_found();
//echo "You are going to visit an archived page:";
//echo "http://fisheries.example.com/archive";
//echo $_SERVER['REQUEST_URI'];
//$url=" [fishers.example.com...]
//header ("Location: $url");
break;
case MENU_ACCESS_DENIED:
drupal_access_denied();
break;
case MENU_SITE_OFFLINE:
drupal_site_offline();
break;
}
}
elseif (isset($return)) {
// Print any value (including an empty string) except NULL or undefined:
print theme('page', $return);

}

It seems that Drupal will use index.php as a handler even there is 404 error. In other words, Drupal considers a 404 visit a valid visit instead of an error.

jdMorgan

12:48 am on Apr 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, but I assume that drupal only gets control if you successfully rewrite a URL to drupal.

Here is how I would re-write your code, to save up to two unnecessary 'exists' checks:


# If requested resource exists as a file or directory, skip next two rules
RewriteCond %{DOCUMENT_ROOT}/$1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/$1 -d
RewriteRule (.*) - [S=2]
#
# Requested resource does not exist, do rewrite if it exists in /archive
RewriteCond %{DOCUMENT_ROOT}/archive/$1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/archive/$1 -d
RewriteRule (.*) /archive/$1 [L]
#
# Else rewrite requests for non-existent resources to /index.php
RewriteRule (.*) /index.php?q=$1 [L]

These rules should work if the correct and complete path to an archived page is "/srv/www/fisheries/archive/<page-here>"

If that is not the correct and complete server path, then the code won't work as shown.

I would strongly suggest that you never use a script to handle errors. As shown in this case, it can lead to problems which are impossible to debug. I recommend using only static HTML custom error pages to handle errors. For now, I suggest commenting-out the ErrorDocument 404 directive, so that the server will use its default error document instead.

Jim

tntpower

1:17 am on Apr 28, 2008 (gmt 0)

10+ Year Member



Hi jdMorgan,

What a miracle! I really, really appreciate your help!

It's a magic!

Thanks again!

Ming

tntpower

2:14 am on Jul 14, 2008 (gmt 0)

10+ Year Member



First, thank you for your help in this thread:

I still have some trouble with this. The problem is :

If I created a Drupal node, say: test, when visiting mysite/test, it will go to mysite/archive/test IF /archive/test exists, instead of rendering the Drupal node.

Could you please help?

Thank you very much,

jdMorgan

2:56 am on Jul 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can create exceptions by adding another [OR]ed RewriteCond to the first rule:

RewriteCond %{REQUEST_URI} ^/test [OR]

Alternate form specifically for the code posted above:

RewriteCond $1 ^test [OR]

Jim

tntpower

3:51 am on Jul 14, 2008 (gmt 0)

10+ Year Member



Do you mean:

RewriteCond %{REQUEST_URI} ^/test [OR]
RewriteCond %{DOCUMENT_ROOT}/$1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/$1 -d
RewriteRule (.*) - [S=2]

I try this but with no luck.

Could you please tell us what does "/test" mean?

Thank,

Ming.

jdMorgan

2:46 pm on Jul 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"/test" means the URL-path of "mysite/test" in your example preceding my post.

Jim

tntpower

7:10 am on Jul 15, 2008 (gmt 0)

10+ Year Member



jdMorgan, Thank you for your help. Many thanks.

But the problem still exists:

<code>
<IfModule mod_rewrite.c>
RewriteEngine on
# If requested resource exists as a file or directory, skip next two rules
RewriteCond %{REQUEST_URI} ^/archive [OR]
RewriteCond %{DOCUMENT_ROOT}/$1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/$1 -d
RewriteRule (.*) - [S=2]
#
# Requested resource does not exist, do rewrite if it exists in /archive
RewriteCond %{DOCUMENT_ROOT}/archive/$1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/archive/$1 -d
RewriteRule (.*) /archive/$1 [L]
#
# Else rewrite requests for non-existent resources to /index.php
RewriteRule (.*) /index.php?q=$1 [L]

</IfModule>
</code>

When I visit example.com/about, if there is a folder "about" in /archive, the visit will be redirected to example.com/archive/about, instead of rendering example.com/about (a Drupal node)

Ming

jdMorgan

3:04 pm on Jul 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Look at the code -- It is checking REQUEST_URI, not REQUESTED_DIRECTORY. Do not confuse URLs with filenames and directories.

I suspect you need to change

RewriteCond %{REQUEST_URI} ^/archive [OR]
to
RewriteCond %{REQUEST_URI} ^/about [OR]

Jim

tntpower

4:01 pm on Jul 15, 2008 (gmt 0)

10+ Year Member



It now gives a HTTP 404 error.

(I have /archive/about folder, I also have a Drupal node with the URL path example.com/about)

jdMorgan

8:12 pm on Jul 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm sorry, I don't use Drupal, and have no idea what a "Drupal node" is.

I would advise you to name nodes and archive subdirectories so that conflicts do not occur. If you cannot put into words a URL-based method to determine whether the node or the archive should be accessed by a particular URL, then it will also be impossible to code a mod_rewrite solution -- Mod_rewrite works on URL-patterns; If you can define a URL pattern to unambiguously 'map' URLs to server filepaths, then it will work.

Jim