Forum Moderators: phranque

Message Too Old, No Replies

Clean url's breaking custom 404 page

404's display words "file not found" but not custom page

         

JWJonline

9:27 am on Apr 11, 2021 (gmt 0)

10+ Year Member



Some time ago I decided to make all of my urls 'clean' and added some appropriate code to my htaccess file. I've only just discovered that my custom 404 error page isn't being displayed when I get a "File not found" condition. After a lot of digging into the problem I've found that the custom 404 works perfectly if I remove the clean url code from htaccess. I wondered if the order of the rules in htaccess might be the cause but I've moved things around quite a lot and the result is always the same. I've also found that if a misspelling is with the suffic of the url, the custom 404 works fine, but if it's the page name that's misspelt then it doesn't. In other words /paXge or /paXge.php display "file not found" but not the custom 404 page, whereas /page.phXp displays the 404 page correctly. Here is an extract from my htaccess ....
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.example.net/$1 [R=301,L]

DirectoryIndex index.php

Options +FollowSymLinks

Options -Indexes

Options -MultiViews

AddHandler application/x-httpd-php5 .html .htm
AddHandler server-parsed .htm

ErrorDocument 404 /404.php

RewriteEngine on
RewriteCond %{REQUEST_URI} !^/JWJforum/$
RewriteRule ^([A-Za-z0-9-]+)\/?$ $1.php [NC]


I'd be very grateful for any suggestions as to what might be wrong.
Thank you

[edited by: phranque at 10:45 am (utc) on Apr 11, 2021]
[edit reason] please use example.net [/edit]

phranque

10:49 am on Apr 11, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



which part is the "clean url code" that you removed?

JWJonline

12:26 pm on Apr 11, 2021 (gmt 0)

10+ Year Member



The last 3 lines.

not2easy

12:48 pm on Apr 11, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The https/canonical rewrite rule should be the last rule in your file. Rules that deal with a single URL at a time (such as the clean URL rule) should be before that rule. The error document and Options should not be in the middle of the rewrite rules.

I'm off to see if I can find the thread here that explains .htaccess order in detail - brb...

not2easy

1:10 pm on Apr 11, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



from 2010: Proper Order for htaccess
[webmasterworld.com...]

from 2014: Correct order of .htaccess ?
[webmasterworld.com...]

and 2015: htaccess code order makes a difference?
[webmasterworld.com...]

lucy24

4:28 pm on Apr 11, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{REQUEST_URI} !^/JWJforum/$
RewriteRule ^([A-Za-z0-9-]+)\/?$ $1.php [NC]
It’s been a while since we indulged in a really good chorus of

I really hate this damned machine
I wish that they would sell it
It never does quite what I want
But only what I tell it.


The rule says: If there is a request for any extensionless root-level file or directory except /JWJforum/ then rewrite to that-same-name with .php extension.

OK, and then what? Obviously you are not rewriting to a physical file, because there would have to be an infinite number of them on the server. I would have expected a target more like
/handler.php?url=$1
rewriting to a single php script which then takes action depending on what was requested. And if the request is something it hasn't been taught to handle, then the php file itself should return a 404 response and display the 404 page. (The page-displaying part has to be explicitly written into the php, since the ErrorDocument directive only kicks in when the 404 is generated by the server.)

w3dk

7:24 pm on Apr 11, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



The https/canonical rewrite rule should be the last rule in your file. Rules that deal with a single URL at a time (such as the clean URL rule) should be before that rule. The error document and Options should not be in the middle of the rewrite rules.


"Redirects" yes; "rewrites" no. The HTTPS/canonical redirect can be after other "redirects". But the HTTPS/canonical "redirect" needs to be before the "clean URL rule" (an internal rewrite) - as is written in the OP. (And this would seem to be the order as stated in those linked threads.) Otherwise, you could end up redirecting to the rewritten URL, exposing the ".php" extension in this case.

It doesn't really matter where the ErrorDocument and Options directives go. But it would be more logical/readable to have them at the top.

Obviously you are not rewriting to a physical file, because there would have to be an infinite number of them on the server.


Why not? If the ".php" file doesn't exist then you just get a 404. Although you do need to be a little careful not to report to the end user that "/does-not-exist.php" does not exist instead of "/does-not-exist" in the custom 404.

In other words /paXge or /paXge.php display "file not found" but not the custom 404 page, whereas /page.phXp displays the 404 page correctly.


Although, when you request "/paXge.php" then your "clean URL code" does not do anything (the regex does not match) - so I can't see how your "clean URL code" would affect this?

The problem seems to be ".php" requests... "/paXge" is rewritten to ".php" and "/paXge.php" is already a ".php" request.

How are ".php" files handled on your server? Are they being proxied?

Try setting "ErrorDocument 404 default" immediately before your custom ErrorDocument...


ErrorDocument 404 default
ErrorDocument 404 /404.php

lucy24

10:40 pm on Apr 11, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the ".php" file doesn't exist then you just get a 404.
Oddly enough, in my previous post I wasn't thinking about extensionless URLs, but of an all-purpose rewrite like a CMS would do. Oops. Yeah, if you have a finite number of pages, and those are the only ones you get requests for, then you can certainly rewrite to add/remove an extension.

But there still has to be something else going on. Otherwise the failed rewrite would eventually result in the 404 page.

Looking again at OP:
if a misspelling is with the suffic of the url, the custom 404 works fine, but if it's the page name that's misspelt then it doesn't. In other words /paXge or /paXge.php display "file not found" but not the custom 404 page, whereas /page.phXp displays the 404 page correctly.
This is puzzling, because that final rule--the one that really shouldn't be the final rule, but that's a different issue--is written to only act on extensionless URLs. Is there another rule somewhere else that deals with php requests? Is there a point where requests get routed to a different directory, with a different htaccess?

w3dk

11:06 pm on Apr 11, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



Here is an extract from my htaccess ....


Ah, this isn't your complete .htaccess file?! And you don't have an "L" flag on that last rewrite....

JWJonline

9:45 am on Apr 12, 2021 (gmt 0)

10+ Year Member



Oh dear, I do apologise ... you guys might as well be talking a foreign language for how much I'm understanding. I only posted an extract of the file as I was concerned about posting too much irrelevant stuff but I can see now that the 'whole' needs to be considered. This is the entire file, and I've replaced my domain with 'example' this time.

# added 21/3/17 to redirect all http calls to https
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.example.net/$1 [R=301,L]

# 480 weeks
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|JPG|png|gif|js|css|swf)$">
Header set Cache-Control "max-age=290304000, public"
</FilesMatch>
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/javascript
RewriteEngine on
# Inserted 22/11/07
#This section rewrites calls to example.net/index.htm as example.net/
#having first set the default homepage as 'index.php'.
#It doesn't screw up the subdomains as 301 redirects do.
#This is a link to the original post in WebmasterWorld
# https://www.webmasterworld.com/forum92/6012.htm
#
DirectoryIndex index.php

Options +FollowSymLinks

#next line prevents directory listings of all folders
Options -Indexes

Options -MultiViews

AddHandler application/x-httpd-php5 .html .htm
AddHandler server-parsed .htm

# to set the 404 page to a php version rather than the shtml default
ErrorDocument 404 /404.php

#####################################################
# Make clean url's
#####################################################

RewriteEngine on
RewriteCond %{REQUEST_URI} !^/JWJforum/$
RewriteRule ^([A-Za-z0-9-]+)\/?$ $1.php [NC]

######################################################
#----all in this section due to 404's ---------#
######################################################

Redirect permanent /fuchsia_cuttings.php https://www.example.net/fuchsia-cuttings
Redirect permanent /fuchsia_care.php https://www.example.net/fuchsia-care
Redirect permanent /fuchsia_overwinter.php https://www.example.net/fuchsia-overwinter
Redirect permanent /pencil.php https://www.example.net/pencil3
Redirect permanent /Art-Forum.php https://www.example.net/ArtForum
Redirect permanent /art.php https://www.example.net/galleries
Redirect permanent /wip-bridlifeboat.php https://www.example.net/WIP-BridLifeboat.php
Redirect permanent /penandwash.php https://www.example.net/penandwash2.php

# make SE friendly urls for php pages using variables
RewriteRule poems/(.*)/(.*)/$ /poems.php?$1=$2 [R=301,L]
RewriteRule poems/(.*)/(.*)$ /poems.php?$1=$2 [R=301,L]

# old style art pages with trailing slash
ReWriteRule watercolours1/painting/(.*)/$ https://www.example.net/watercolour1.php?autoload=$1 [R=301,L]
ReWriteRule watercolours2/painting/(.*)/$ https://www.example.net/watercolour1.php?autoload=$1 [R=301,L]
ReWriteRule watercolours3/painting/(.*)/$ https://www.example.net/watercolour2.php?autoload=$1 [R=301,L]
ReWriteRule watercolours4/painting/(.*)/$ https://www.example.net/watercolour2.php?autoload=$1 [R=301,L]
ReWriteRule watercolours5/painting/(.*)/$ https://www.example.net/watercolour3.php?autoload=$1 [R=301,L]
ReWriteRule watercolours6/painting/(.*)/$ https://www.example.net/watercolour3.php?autoload=$1 [R=301,L]
ReWriteRule watercolours7/painting/(.*)/$ https://www.example.net/watercolour4.php?autoload=$1 [R=301,L]
RewriteRule pencil1/painting/(.*)/$ https://www.example.net/pencil1.php?autoload=$1 [R=301,L]
RewriteRule pencil2/painting/(.*)/$ https://www.example.net/pencil1.php?autoload=$1 [R=301,L]
RewriteRule pencil3/painting/(.*)/$ https://www.example.net/pencil2.php?autoload=$1 [R=301,L]
RewriteRule pencil4/painting/(.*)/$ https://www.example.net/pencil2.php?autoload=$1 [R=301,L]
RewriteRule pen1/painting/(.*)/$ https://www.example.net/pen1.php?autoload=$1 [R=301,L]

# old style art pages WITHOUT trailing slash
ReWriteRule watercolours1/painting/(.*)$ https://www.example.net/watercolour1.php?autoload=$1 [R=301,L]
ReWriteRule watercolours2/painting/(.*)$ https://www.example.net/watercolour1.php?autoload=$1 [R=301,L]
ReWriteRule watercolours3/painting/(.*)$ https://www.example.net/watercolour2.php?autoload=$1 [R=301,L]
ReWriteRule watercolours4/painting/(.*)$ https://www.example.net/watercolour2.php?autoload=$1 [R=301,L]
ReWriteRule watercolours5/painting/(.*)$ https://www.example.net/watercolour3.php?autoload=$1 [R=301,L]
ReWriteRule watercolours6/painting/(.*)$ https://www.example.net/watercolour3.php?autoload=$1 [R=301,L]
ReWriteRule watercolours7/painting/(.*)$ https://www.example.net/watercolour4.php?autoload=$1 [R=301,L]
RewriteRule pencil1/painting/(.*)$ https://www.example.net/pencil1.php?autoload=$1 [R=301,L]
RewriteRule pencil2/painting/(.*)$ https://www.example.net/pencil1.php?autoload=$1 [R=301,L]
RewriteRule pencil3/painting/(.*)$ https://www.example.net/pencil2.php?autoload=$1 [R=301,L]
RewriteRule pencil4/painting/(.*)$ https://www.example.net/pencil2.php?autoload=$1 [R=301,L]
RewriteRule pen1/painting/(.*)$ https://www.example.net/pen1.php?autoload=$1 [R=301,L]

# new style art pages without trailing slash
ReWriteRule watercolour1/(.*)$ https://www.example.net/watercolour1.php?autoload=$1 [R=301,L]
ReWriteRule watercolour2/(.*)$ https://www.example.net/watercolour2.php?autoload=$1 [R=301,L]
ReWriteRule watercolour3/(.*)$ https://www.example.net/watercolour3.php?autoload=$1 [R=301,L]
ReWriteRule watercolour4/(.*)$ https://www.example.net/watercolour4.php?autoload=$1 [R=301,L]
RewriteRule pencil1/(.*)$ https://www.example.net/pencil1.php?autoload=$1 [R=301,L]
RewriteRule pencil2/(.*)$ https://www.example.net/pencil2.php?autoload=$1 [R=301,L]
RewriteRule pencil3/(.*)$ https://www.example.net/pencil3.php?autoload=$1 [R=301,L]
RewriteRule pencil4/(.*)$ https://www.example.net/pencil4.php?autoload=$1 [R=301,L]
RewriteRule pen1/(.*)$ https://www.example.net/pen1.php?autoload=$1 [R=301,L]
RewriteRule pen2/(.*)$ https://www.example.net/pen2.php?autoload=$1 [R=301,L]
RewriteRule penandwash1/(.*)$ https://www.example.net/penandwash1.php?autoload=$1 [R=301,L]
RewriteRule penandwash2/(.*)$ https://www.example.net/penandwash2.php?autoload=$1 [R=301,L]
RewriteRule othermedia/(.*)$ https://www.example.net/othermedia.php?autoload=$1 [R=301,L]

# catch all
RewriteRule watercolours.php /watercolour4.php [R=301,L]
RewriteRule watercolours1.php /watercolour1.php [R=301,L]
RewriteRule watercolours2.php /watercolour1.php [R=301,L]
RewriteRule watercolours3.php /watercolour2.php [R=301,L]
RewriteRule watercolours4.php /watercolour2.php [R=301,L]
RewriteRule watercolours5.php /watercolour3.php [R=301,L]
RewriteRule watercolours6.php /watercolour3.php [R=301,L]
RewriteRule watercolours7.php /watercolour4.php [R=301,L]
RewriteRule watercolours8.php /watercolour4.php [R=301,L]

# Block unwanted crawlers/spammers
order allow,deny
deny from 199.187.209.30
deny from 198.144.149.253
allow from all



@w3dk, inserting "ErrorDocument 404 default" makes no difference.

I'm going to check through those threads about the order of things and see if that makes any difference.
Thank you

phranque

10:34 am on Apr 12, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



this isn't necessarily related to the problem described in the OP, but these will cause other problems:

- this ruleset should follow all other rulesets that involve an external redirect:
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{SERVER_PORT} !^443$ [OR]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.example.net/$1 [R=301,L]


- also, it's not really necessary to test both the port number and the HTTPS flag in the ruleset above.
one or the other is sufficient.

- i would also suggest specifying a more precise pattern for the hostname test, such as:
RewriteCond %{HTTP_HOST} !^(www\.example\.net)?$ [NC,OR]




- if you are using mod_rewrite anywhere (which you must in your case) then you should not use mod_alias for any redirects.
for example, instead of this:
Redirect permanent /fuchsia_cuttings.php https://www.example.net/fuchsia-cuttings

you should use this:
RewriteRule ^fuchsia_cuttings\.php$ https://www.example.net/fuchsia-cuttings [R=301,L]


see this for why:
https://httpd.apache.org/docs/current/rewrite/avoid.html#redirect
The use of RewriteRule to perform this task may be appropriate if there are other RewriteRule directives in the same scope. This is because, when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first, regardless of the order of appearance in the configuration file.

JWJonline

10:49 am on Apr 12, 2021 (gmt 0)

10+ Year Member



Thank you phranque, I'll get those things altered.

To add further information to the original issue, I have found that the problem is not related to the php extension specifically, but to the absence of any extension. In a url like example.net/folder/page.suffix, a typo in folder or page or suffix will return my custom 404 page, however, with no suffix the custom 404 doesn't get displayed. I've tested this with php pages and jpg images.

I've also tried moving stuff around as best I can but nothing fixes the issue.

Further to this, please ignore what I said about other extensions .... that applied when I had the canonical and clean-url sections at the end of the file. With the file like that jpg extensions are now displaying the custom 404 as they should. Sorry for the confusion. I need a drink.

JWJonline

11:53 am on Apr 12, 2021 (gmt 0)

10+ Year Member



Trying to simplify the problem as much as possible I've reduced my htaccess to just 3 essential parts and the problem still exists. Without a doubt, having a suffix is relevant. With a suffix, even just the dot, the 404 page works as it should. This is what I have now ...

# added 21/3/17 to redirect all http calls to https
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} !^(www\.example\.net)?$ [NC,OR]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.example.net/$1 [R=301,L]

# Make clean url's
RewriteEngine on
RewriteCond %{REQUEST_URI} !^/JWJforum/$
RewriteRule ^([A-Za-z0-9-]+)\/?$ $1.php [NC,L]

# to set the 404 page to a php version
ErrorDocument 404 default
ErrorDocument 404 /404.php

w3dk

12:38 pm on Apr 12, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



We seem to be back to square one...

I have found that the problem is not related to the php extension specifically, but to the absence of any extension. In a url like example.net/folder/page.suffix, a typo in folder or page or suffix will return my custom 404 page, however, with no suffix the custom 404 doesn't get displayed. I've tested this with php pages and jpg images.


Curious, when I test it with ".php" pages I get the basic "File not found." message and no custom 404.

This seems to have everything to do with the ".php" extension - as mentioned in my post above.

In summary, I see the following output when requesting...


"/foo" - rewritten to "/foo.php" - results in "File not found."
"/foo.php" - not rewritten - results in "File not found."
"/foo." - not rewritten - results in custom 404.
"/foo.anything" - not rewritten - results in custom 404.

"/JWJforum/foo.php" - not rewritten ("/JWJforum/" is explicitly excluded by the condition) - results in "File not found."
"/JWJforum/foo.anything" - not rewritten - results custom 404.


If you remove the "make clean URLs" rule then I would expect the following


"/foo" - not rewritten - results in custom 404 ....?
"/foo.php" - not rewritten - results in "File not found." ....?


The common element here is the ".php" extension - either requested directly or rewritten to internally by your "make clean URL" rule.

There would seem to be something fundamental in the way ".php" files are processed on your server that results in this behaviour?

JWJonline

1:12 pm on Apr 12, 2021 (gmt 0)

10+ Year Member



Thank you w3dk. The behaviour after removing the "clean URL" rule is exactly what you expected it to be. So what we're saying here is that this ISN'T an htaccess issue but a server issue that I need to take up with my hosting service. Since I'm out of my depth here, is this issue a problem worth solving or is it just the cosmetics of not always getting a custom 404 displayed?

w3dk

6:27 pm on Apr 12, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



A slight amendment/clarification to my earlier post...



"/JWJforum/foo.php" - not rewritten ("/JWJforum/" is explicitly excluded by the condition) - results in "File not found."


The fact that the condition excludes "/JWJforum/" (only) is not why the request is not rewritten. That condition is actually irrelevant here - the request "/JWJforum/foo.php" simply does not match the `RewriteRule` pattern.


So what we're saying here is that this ISN'T an htaccess issue but a server issue that I need to take up with my hosting service.


Yes, it would seem so. How is PHP installed on your server?


is this issue a problem worth solving


To get an idea of numbers... you could check your server's "access log" to see how many 404s result from requests for non-existent ".php" files. And of those requests, how many are bots and how many are real users?

The fact that you are rewriting the request to append a ".php" extension should be irrelevant in terms of the server's "access log" - since this should log the "requested URL", not the "rewritten URL".

As a "workaround" you could instead try manually rewriting the request to your custom 404 error document (ie. "/404.php") when directly requesting a ".php" URL that does not exist. And only rewrite the request to append the ".php" extension if that file does exist.

A minor caveat with this is that you do need to manually set the "404 Not Found" header for all requests to "/404.php" - which you appear to be doing anyway. Otherwise the rewrite to "/404.php" will respond with a 200 OK, not a 404.

So, your "make clean URLs" section would become:


# Rewrite direct requests for ".php" files that don't exist to "/404.php"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule \.php$ 404.php [L]

# Append ".php" to single path segment URLs where the corresponding ".php" file exists
RewriteCond %{REQUEST_URI} !^/JWJforum/$
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
RewriteRule ^([\w-]+)/?$ $1.php [L]


... which needs to go after your external redirects (especially if you are redirecting non-existent ".php" files).

The "\w" short-hand character class is the same as your existing regex, except that it also includes underscores. There's no need to backslash-escape slashes. The NC flag is superfluous since you are already matching a-z and A-Z in the RewriteRule pattern.

lucy24

6:55 pm on Apr 12, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Incidentally, going back to mid-thread ...

RewriteRule pencil1/painting/(.*)$ https://www.example.net/pencil1.php?autoload=$1 [R=301,L]
and so on and so on. This whole set of rules can be compressed into much smaller set, along the lines of

RewriteRule ^pencil(\d+)/painting/(.*) https://www.example.net/pencil$1.php?autoload=$2

RewriteRule ^watercolours(\d+)/painting/(.*) https://www.example.net/watercolour$1.php?autoload=$2
but don’t quote me; that’s off the top of my head. Why the change from watercolours, plural, to watercolour, singular? If you hadn’t done that, the two could have been all one rule, with capture in the form (pencil\d+|watercolours\d+).

Can there both be, and not be, content after /painting/ ? If content is obligatory, .+ might be safer. Either way you don’t need the closing anchor; by default Regular Expressions are greedy (technical term) and will continue to the end. You do need the opening anchor, so the server doesn't have to check the whole request to see if, say, the string “pencil” shows up at some later point in the URL.

RewriteRule \.php$ 404.php [L]
RewriteRule ^([\w-]+)/?$ $1.php [L]
You meant to say /404.php and /$1.php ;) (Yes, the RewriteBase directive will achieve the same end, but I think it’s safer to just put a / at the front of all rewrite targets.)

:: suppressing editorial comment about -f test ::

JWJonline

9:09 am on Apr 13, 2021 (gmt 0)

10+ Year Member



@w3dk, that's amazing. I can't believe how well it works. My custom 404 page displays under every circumstance I throw at it .... couldn't be happier.

@lucy24, a short while ago I did a major restructure of my site to move away from dynamic pages that displayed paintings based on parameters. Those pages were the plural version. When I built my new pages I had so much trouble trying to deal with all the old back links out there and the recursive nature of the rewrites that I simplified the problem by dropping the 's' to create new pages. Now my main issue is fixed I'll tidy up the rest of the file and look at compressing the rules as you've suggested.