Forum Moderators: phranque

Message Too Old, No Replies

403 authoristion problem on RewriteRule which is being redirected

Problem since moving to an upgraded server where directory exists

         

penguintutor

11:30 am on Jul 6, 2011 (gmt 0)

10+ Year Member



I have a problem with my RewriteRule statements since my hosting provider has moved me to an upgraded server. This is running Apache on Linux.

I requested the upgraded server as the previous server was not up to spec for the new version of Wordpress, but this has changed the way my RewriteRules are being processed.

The site in questions is [watkissonline.co.uk ].

I am using mod_rewrite to provide user friendly urls in the main menu hiding the wordpress page locations (I have other more complex rules which is why I'm using mod_rewrite)

An examples of the rules are:

RewriteRule ^baby-children$ /wordpress/?page_id=272 [L]
RewriteRule ^info$ /wordpress/?page_id=2416 [L]



The first of these works as expected - going to www.watkissonline.co.uk/baby-children returns the page at: /wordpress/?page_id=272 , but hides the real url.

The second does not work and instead gives a 403 not authorised error.
[error] [client xyz.xyz.xyz.xyz] Directory index forbidden by rule: /home/username/public_html/info/, referer: http://www.watkissonline.co.uk/home


It appears to be trying to access /info/?page_id=2416 instead of /wordpress/?page_id=2416.

but if I add a R to the options which shows the real url that the user is going to then it works.

The difference between these two entries is that there is no directory for the first, but the second (info) happens to be the name of a directory as well.

eg.
www.watkissonline.co.uk/info should be rewritten to the wordpress page, but www.watkissonline.co.uk/info/otherfile.jpg would not be redirected and would provide the file within the directory.

I notice that the new cpanel provided by the hosting company includes an index manager applet which allows the directory listing to be turned on and off at a directory level (by creating .htaccess files), but I have just left these at the defaults.

I could rename the directories, but that would mean moving all their files into another folder and would need lots of changes or further Rewrite rules to remap them to their new locations.


Does anyone what is happening here and how I can change this behaviour using .htaccess (this is a hosted site so I am not able to change the apache configuration).

Thanks
Stewart

lucy24

6:59 pm on Jul 6, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The difference between these two entries is that there is no directory for the first, but the second (info) happens to be the name of a directory as well.

Yes, and that's the problem. When a user enters

www.example.com/somename

this will be auto-rewritten before your htaccess ever gets there as

www.example.com/somename/

which in turn will trigger the server and/or the user's browser to search for

www.example.com/somename/index.htm
www.example.com/somename/index.html
www.example.com/somename/index.php

et cetera, where the exact range and sequence of et ceteras depends on things outside your control. But if the directory

www.example.com/somename/

doesn't exist, two entirely different things might happen. The server might deploy mod_negotiation and look to see if there is a file named somename.extension. Or it might proceed directly to htaccess in hopes of finding further instructions.

So your htaccess may need to get more specific about anchors or filename extensions to make sure it picks up what it's supposed to.

penguintutor

3:57 pm on Jul 7, 2011 (gmt 0)

10+ Year Member



Thanks - that's helped, but it's still not quite working as expected.

I believe you are correct about rewriting as a directory name before applying the rewriteRule directives - that's where I was going wrong before. But I don't believe the search for index / default files happens until after the rewrite rules have been applied.

I've now changed the entry to:

RewriteRule ^info/?$ /wordpress/?page_id=2416 [L]


and I no longer get the 403 message, but this is not hiding the real name as I hoped it would. It's returning a temp redirect (302), but the browser url still changes to the new page.

lucy24

7:23 pm on Jul 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's returning a temp redirect (302), but the browser url still changes to the new page.

Those two things go together. A redirect is assumed to be 302 unless you explicitly say 301, and one element of a redirect--as opposed to a rewrite--is that the user will see the new address.

Why it's redirecting at all, when you haven't told it to, is a whole nother question. (Translation: g1smd is going to answer it ;))

penguintutor

9:54 pm on Jul 7, 2011 (gmt 0)

10+ Year Member



I think I've found the problem.

Why it's redirecting at all, when you haven't told it to, is a whole nother question.


So I've done some more digging and it looks like it's related to the DirectorySlash directive:

DirectorySlash Directive
Description:Toggle trailing slash redirects on or off
Syntax:DirectorySlash On|Off
Default:DirectorySlash On
Context:server config, virtual host, directory, .htaccess
Override:Indexes
Status:Base
Module:mod_dir
Compatibility:Available in version 2.0.51 and later

The DirectorySlash directive determines, whether mod_dir should fixup URLs pointing to a directory or not.

Typically if a user requests a resource without a trailing slash, which points to a directory, mod_dir redirects him to the same resource, but with trailing slash


This particular directive works by sending a redirect back to the browser.

So I added the following line to .htaccess

DirectorySlash Off

and now it's working as expected.

So it seams that this is a feature that has been enabled on the new server, but was not on my old server.

Note that I've also got a -Indexes option in my .htaccess to avoid the potential directory exposure security warning in the apache documentation: [httpd.apache.org ]

Thanks for your help.

g1smd

10:56 pm on Jul 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



DirectorySlash is an important directive for fixing requests for bare directory URLs when the slash is missing. Make sure that switching it off doesn't cause any problems with other directories.

g1smd

11:59 pm on Jul 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh and another thing. Although not explicitly banned by the HTTP specs it's a good idea to not have page names that are the same as a folder. This protects your sanity.

penguintutor

7:59 am on Jul 8, 2011 (gmt 0)

10+ Year Member



Whilst I can see the DirectorySlash being useful for those running static site using folders it isn't so important for dynamic sites.

All my pages are served from Wordpress so there should be no need to perform any directory default lookups. Also the old server that it was on wasn't adding this trailing slash so it's just a case of setting the site back to how it was before.

I agree that one should normally avoid using the same names as directories, but it was a decision I made when I switched from hand-edited pages to wordpress. I wanted to keep the page names the same name as the old directories and so used rewrite to direct to the wordpress urls, but I still have some files in the old directories that I link to.

I may try and move the files in those directories in future, but that will need a lot of changes to existing pages and/or rewrite rules to then handle the files that have moved.

On a more recent site that I created I created a completely fake url path structure and hid the linked files into a separate directory, which made writing my rewrite rules much easier. It's much easier creating something like that from scratch rather than trying to retrofit on a 8 year old website that has gone through many transformations and structures.

Thanks

lucy24

8:35 am on Jul 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Although not explicitly banned by the HTTP specs it's a good idea to not have page names that are the same as a folder. This protects your sanity.

:: unhappily thinking just how many folder/page pairs I have that fit this pattern (that is, foo/bar/exactname/exactname.html) ::

Or did you mean pairs like
foo/bar/exactname/blahblah.html
foo/bar/exactname.html
?
Ouch. That looks suicidal.

g1smd

8:45 am on Jul 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I meant having
example.com/somename
- as a "page" while also having
example.com/somename/
- as a "folder" with files like
example.com/somename/filename.png
- within it is a big problem.

Be aware that
example.com/somename/
- is the URL for the folder and is also the canonical URL for the index page within that folder.

However, having
example.com/somename/
as a folder while having
example.com/somename.html
as a page is not such a problem.
Confusing maybe, but not a problem per se.

lucy24

9:08 am on Jul 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Be aware that
example.com/somename/ - is the URL for the folder and is also the canonical URL for the index page within that folder.

Well they're really the same thing aren't they? That is, your browser can't take you to a folder, it can only go to a page. Doesn't matter if it's a page someone made and named "index.html" or a pseudo-page Apache created on the fly--probably in something goofy like html 3--and put up for your viewing pleasure. You're not "really" looking at the innards of the folder. It's just another kind of page.

Don't hold with this newfangled extensionless-url stuff myself anyway ;-)

g1smd

9:17 am on Jul 8, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They are the same thing, but many sites will link to
example.com/folder/index.html
and then wonder why they have a Duplicate Content problem. The canonical URL is simply
example.com/folder/
here.

As for extensionless. It makes URL rewriting much easier. I use
RewriteRule ^([^/.]+)$ /script.php?param=$1 [L]

OR
RewriteRule ^(([^/]+/)*([^/.]+))$ /script.php?param=$1 [L]


rather than
RewriteCond $1 !-f
RewriteCond $1 !-d
RewriteRule (.*) /script.php?param=$1 [L]

which works about a million times slower.