Forum Moderators: phranque

Message Too Old, No Replies

htaccess and Relative Path for ErrorDocs

         

Steven Davis

4:54 pm on May 6, 2010 (gmt 0)

10+ Year Member



My htaccess file looks like the example below. The problem is I would like to change the ErrorDocuments to work as follows:

ErrorDocument 404 /
ErrorDocument 500 /

Because I have heard that external links back to one's own site are a bad idea. The problem is for some reason when I use the relative path "/" in my ErrorDocuments I'm getting a (500 - Internal Server Error). I would like to know if there is something in how the commands are written above the ErrorDocument directives that might be causing this.


Options +FollowSymLinks -MultiViews
RewriteEngine on
#
### Disallow Image Hotlinking
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://www\.example\.com
RewriteRule \.(jpe?g¦gif¦bmp¦png)$ - [F]
#
### Externally redirect to remove ".php" if the user adds it
RewriteCond %{THE_REQUEST} ^GET\ /([^/]+/)*[^.]+\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.php$ http://www.example.com/$1 [R=301,L]
#
### Externally redirect to remove double slashes
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . http://www.example.com/%1/%2 [R=301,L]
#
### Externally redirect to remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
### Externally redirect non-canonical hostname requests to canonical
### domain (if not already done by one of the preceding rules)
RewriteCond %{HTTP_HOST} !=www.example.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
### Internally rewrite requests for URLs which do not resolve
### to physically-existing files or directories to a PHP file
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ $1.php [L,QSA]

ErrorDocument 404 http://www.example.com
ErrorDocument 500 http://www.example.com


Any Assistance Would Be Greatly Appreciated.

g1smd

6:33 pm on May 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ErrorDocument 404 /


ErrorDocument directives must contain a filename. The code above does not contain a filename.

ErrorDocument 404 http://www.example.com


ErrorDocument directives must NOT contain a protocol or domain name. Your code above contains only protocol and domain name and therefore sends a 302 redirect instead of the correct 404 response.

I must absolutely caution against showing your home page when there is an error. Build a specific error page for these errors and make sure it explains there was an error and that it links to major content sections on your site.

ErrorDocument 404 /page.html


This is the correct syntax.

Steven Davis

7:47 pm on May 6, 2010 (gmt 0)

10+ Year Member



@g1smd

Yes, I agree it should use an actual error page, but the problem is anything other than an external redirect does not work. In other words, when I try to do the following

ErrorDocument 404 /404.html
ErrorDocument 500 /500.html

I receive a generic (500 - Internal Server Error) every time a page can not be found and I was wondering if it was possible that the preceding rules in my htaccess document was overriding my ability to setup an error page?

jdMorgan

11:06 pm on May 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What is the content of your error log file when you get an error?

Jim

Steven Davis

11:21 pm on May 6, 2010 (gmt 0)

10+ Year Member



Could it be that my error pages are not working because the server is trying to serve the files from the wrong directory?

For instance if the missing file comes from:

http://www.example.com/articles/no-file

and if my error directives read:

ErrorDocument 404 /error
ErrorDocument 500 /error

Does the Server try to Resolve:
http://www.example.com/error or...
http://www.example.com/article/error

and if the later is the case how do you fix this (Rewrite Base /?)

Note: My htaccess file resides in the root directory

jdMorgan

11:35 pm on May 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



 ErrorDocument 404 /error 

serves http://www.example.com/error regardless of the requested URL.

Mod_rewrite's RewriteBase directive has no effect on mod_dir directives.

"/error" is not a valid file. It must be a local filepath not a URL or URL-path, and it needs an extension if you want it to be served with a correct Content-Type response header.

 ErrorDocument 404 /404error.html 

would be valid

Again, what is the content of your error log file when you get an error?

Jim

Steven Davis

12:50 am on May 7, 2010 (gmt 0)

10+ Year Member



Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.

jdMorgan

1:40 am on May 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That clarifies things a bit.

Note that your final rule rewrites any and all requests for URLs which do not resolve to existing files or directories to .php scripts. This is done regardless of the URL-path in the request (So I can request /foo/bar/wham from your server and this will be rewritten to /foo/bar/wham.php which likely does not exist either, and will therefore be rewritten to /foo/bar/wham.php.php, which doesn't exist, and so on, ad infinitum... 'til the server gives up and throws a 500 Error.)

This means several things:

First, your php script(s) must be written to handle any and all missing objects on your server, because all missing object requests are being rewritten to it, and the Apache 404 handler will never be invoked.

Additionally, the conditions under which requests are rewritten to your script must be more unambiguously specified -- by deciding "what kind" of URLs you really want to rewrite to .php. You have to define "what kind of URLs" yourself in terms of their characteristics -- common 'directory paths' only, certain "filetypes" only, URLs with no "filetypes" specified (i.e. extensionless URLs), etc. Your definition must be expressible using regular-expressions patterns for unambiguous matching.

Alternatively, you could add a third RewriteCond to make sure that if "x" is requested, and "x" does not resolve to a file or directory AND if "x.php" does exist, then rewrite to it -- else let the request trigger the standard Apache 404-Not Found error handler.

This last option is the worst, performance-wise, requiring tons of CPU time to call the OS to invoke the file-handler to go check the hard drive to see if something exists. (You will now be doing this three times in a row for every HTTP access to your server). Therefore, I encourage you to use the second method if at all possible. However, since you're now stuck and likely won't make any progress before becoming un-stuck, the following is an example of how to code the "php file exists" test.

# If requested filepath exists when ".php" is appended to it
RewriteCond %{REQUEST_FILENAME}.php -f

Adding that as the third RewriteCond in your last rule will likely stop the loop.

I also suggest that you add at least a few qualifiers to the rule. For example, do your scripts actually handle image, css, and JS file requests -- do your scripts generate objects of this type? Probably not, in which case you can eliminate 90% of these wasteful disk checks by adding this as the first RewriteCond on your last rule:

RewriteCond $1 !\.(gif|jpe?g|png|ico|css|js|php)$

Note that any/all filetypes that will exist on your site but that your scripts *cannot* generate (create) can be added to that "list," while those that you don't use or that your script can generate should be left out. Each filetype you add reduces the number of wasted disk checks, but eventually you will reach a point of diminishing returns -- there is no requirement to include them all. Since 90% of all requests to a typical site's server are for images, they should generally be listed first, followed by other filetypes ordered by frequency of requests (check your stats).

Jim

Steven Davis

11:21 am on May 7, 2010 (gmt 0)

10+ Year Member



Thank You So Much for all Your Help Jim, so the last Rule should look like the following:

RewriteCond $1 !\.(gif|jpe?g|png|ico|css|js|php)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.*)$ $1.php [L,QSA]

jdMorgan

1:16 pm on May 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, but with a few more tweaks for efficiency, consistency, robustness, and security:

ErrorDocument 404 /404error.html
ErrorDocument 500 /500error.html
#
Options +FollowSymLinks -MultiViews
RewriteEngine on
#
# Skip all of the following rules to allow serving
# critical custom error documents to all requestors
RewriteRule ^(403|500)error\.html$ - [L]
#
# Disallow hotlinking
RewriteCond %{HTTP_REFERER} !^(http://www\.example\.com.*)?$
RewriteRule \.(gif|jpe?g|png|bmp|css|js)$ - [F]
#
# Externally redirect to remove ".php" from direct client requests
RewriteCond %{THE_REQUEST} ^GET\ /([^/]+/)*[^.]+\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.php$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to remove double (or multiple) slashes in requested URL-path
RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule / http://www.example.com/%1/%2 [R=301,L]
#
# Externally redirect to remove trailing slash if the requested
# URL-path does not resolve to an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect non-blank non-canonical hostname requests to
# canonical domain (if not already done by one of the preceding rules)
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
# Internally rewrite requests for URLs which do not resolve
# to physically-existing files or directories to a PHP file
# (Note that images, other included objects, and php scripts
# are excluded to avoid excessive -exists checking)
RewriteCond $1 !\.(gif|jpe?g|png|bmp|ico|css|js|php)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.*)$ /$1.php [L]

Jim
Be aware that "pipe characters" copied from old posts here on WebmasterWorld are broken and must be replaced with solid pipes before use. While the solid pipe characters function as a "local OR" operator in regular expressions patterns, the broken pipe characters are taken as literal characters to be matched. Your previous anti-hotlinking rule likely did not work because of this.

With the added exclusions on the last rule, you may see a noticeable improvement in the performance of your site -- especially when it is very busy.

The leading slash on the substitution in the last rule is strongly recommended for security reasons. However, some sites won't work with that leading slash in place. However, it is a risk to allow the user to control the initial part of the URL-path, as just about *anything* could then be injected there. As a result, a good rule of thumb is to never allow a RewriteRule substitution to start with a "naked" back-reference to a client-controlled path-part.

The [QSA] flag was removed because it is only needed when you need to append new query string data to any existing query string in the client request. Since you are not adding query data with this rule, the [QSA] was just a waste of bytes and time.

I moved the ErrorDocument directives to the top. While this is largely a matter of 'style,' I did it because some of the following rules handle error conditions, and it's good for a code reviewer (e.g. you or me) to know/be reminded of the ErrorDocument filenames before running into the mod_rewrite code. Moving these directives changes nothing about how your server will operate.

That's all I could spot right now, and only you can decide if this code is "right" for your site. Test it thoroughly! Always delete your browser cache before testing new code, and after changing "roles" when testing anti-hotlinking routines. Otherwise, your browser will likely show you stale previously-cached objects and server responses, invalidating your test results.

Jim

Steven Davis

3:28 pm on May 7, 2010 (gmt 0)

10+ Year Member



Jim Thanks a lot for walking me through all of that. Everything is testing great and I've learned quite a bit. The great thing about htaccess commands is that you can do so many things, but therein lies its curse as well.