Forum Moderators: phranque

Message Too Old, No Replies

Redirecting query strings but not all

Allow specific folders and pages

         

Patrick Taylor

10:45 am on Sep 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am not too sure what I'm doing here. I am trying to prevent URLs being viewed with a query string by redirecting them to the same URL without the query string.

But I have some valid URLs that must allow a query string, eg:

https://example.com/e?page=string
https://example.com/preview?page=string
https://example.com/specific-folder/index.php?page=string
https://example.com/specific-folder/?page=string&mode=normal
https://example.com/specific-folder/index.php?status=logout
https://example.com/specific-folder/string.php?status=logout

My .htaccess file contains:

RewriteCond %{REQUEST_URI} !(specific-folder) [NC]
RewriteCond %{QUERY_STRING} .
RewriteRule ^$ /? [R=301,L]

This works to redirect https://example.com/?q=string to: https://example.com/

but it does not redirect https://example.com/page?q=string to https://example.com/page

I have tried:

RewriteCond %{REQUEST_URI} !(specific-folder|page) [NC]
RewriteCond %{QUERY_STRING} .
RewriteRule ^$ /? [R=301,L]

but I still see https://example.com/page?q=string

w3dk

11:16 am on Sep 16, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



RewriteCond %{REQUEST_URI} !(specific-folder|page) [NC]
RewriteCond %{QUERY_STRING} .
RewriteRule ^$ /? [R=301,L]


The first condition isn't doing anything here because the RewriteRule pattern (ie. ^$) only matches an empty URL-path (ie. the document root). The RewriteRule pattern is what's processed first and only if this matches are the preceding conditions processed.

So, you need to match every URL in the RewriteRule directive.

For example:


RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} !^/specific-folder [OR]
RewriteCond %{REQUEST_URI} !^/preview$ [OR]
RewriteCond %{REQUEST_URI} !^/e$
RewriteRule ^ %{REQUEST_URI} [QSD,R=301,L]


The pattern ^ is successful for every request. The QSD (Apache 2.4) flag discards the query string, similar to appending a "?" on Apache 2.2

"/preview" and "/e" are exact matches (according to your examples). Whereas "/specific-folder" just matches that prefix. The "NC" flag shouldn't be used unless you specifically need a case-insensitive match.

Patrick Taylor

11:52 am on Sep 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks.

I tried it. When I navigate to: https://example.com/specific-folder/index.php?page=string it takes me to https://example.com/specific-folder/index.php without the query string it needs for that particular folder..

Something not quite right.

w3dk

1:27 pm on Sep 16, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Oops, sorry, silly logic error... those conditions should all be AND'd, not OR'd!

Remove the "OR" flag...


RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} !^/specific-folder
RewriteCond %{REQUEST_URI} !^/preview$
RewriteCond %{REQUEST_URI} !^/e$
RewriteRule ^ %{REQUEST_URI} [QSD,R=301,L]


And make sure you've cleared your browser cache... any erroneous 301 (permanent) redirects (such as this!) will get persistently cached by the browser. Preferably test with 302 (temp) first to avoid potential caching issues.

Patrick Taylor

3:40 pm on Sep 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks again.

RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} !^/specific-folder
RewriteCond %{REQUEST_URI} !^/preview
RewriteCond %{REQUEST_URI} !^/e
RewriteRule ^ %{REQUEST_URI} [QSD,R=301,L]

I had to remove the dollar symbols after preview and e - I think because that signifies the end of the match, doesn't it? And it needs the query string added on? With the dollar symbols at the end, the query string wasn't there and it needs to be for those two conditions.

Anyway great. It seems to be working.

lucy24

4:44 pm on Sep 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think because that signifies the end of the match, doesn't it? And it needs the query string added on?
If your URLs end in / slash, then forms ending in blahblah$ will never match. A query string is not considered part of REQUEST_URI (though it would be part of THE_REQUEST) so you need not worry about that aspect.

Incidentally, you could also consolidate them:
!^(item1|item2|item3)/

In the specific case of "e" it seems a bit risky to have nothing after it, because who knows what future directories might happen to have names starting in the letter e. Including the / slash will solve this potential problem.

Incidentally ... I notice that some of your examples include "index.php" and some don't. Do you allow people to request "index.php" explicitly? This may be a non-issue, as people don't typically add "index.xtn" of their own volition if it isn't part of the published URL--and search engines don't either, unless it has in fact existed in your URLs at some time in the past.* Matter of fact, on some sites you might choose to explicitly ban any request for "index.php" because nobody but malign robots would ever try it.


* Tangentially: I’ve checked this in logs. The only requests for /realdirectory/index.html from legitimate crawlers are in a couple of extremely old directories that formerly had “index.html” as part of the visible URL. Illegitimate crawlers, meanwhile, may ask for “index.php” since that’s what many CMS use.

Patrick Taylor

8:06 pm on Sep 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The URLs 'e' and 'preview' don't end in slash because they are pages, not folders. I see your point though. A page such as 'email' might show with a query string but there doesn't seem much to be done about it because the rules don't work with a dollar symbol at the end. I might consolidate them as you suggest.

Regarding 'index.php', that is one of a series of URLs which comprise a 'backend' that no-one except admin will access.

Oops, sorry, silly logic error... those conditions should all be AND'd, not OR'd!

Remove the "OR" flag..

It works that way but I don't understand why. Any of those conditions applies in its own right so why do they need each other?

lucy24

8:44 pm on Sep 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The URLs 'e' and 'preview' don't end in slash because they are pages, not folders.
Huh. If they're extensionless, then a final $ certainly should work. If you use extensions, then obviously you should include the extension in the pattern--or at least
!^(onepage|otherpage|thirdpage)\.
like that.

Any of those conditions applies in its own right so why do they need each other?
Because they're negative conditions: the request is not abc AND ALSO not def AND ALSO not ghi. If you goofed and said [OR], then the conditions would always be met: a request for abc is not def, a request for def is not ghi and so on.

w3dk

9:08 pm on Sep 16, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



It works that way but I don't understand why. Any of those conditions applies in its own right so why do they need each other?


Note that the conditions are negated expressions (! prefix). If they are OR'd then just one needs to be successful for the rule to fire. "Success" in this instance is "not" equal to something. The URL is always going to be not equal to one of those (since it can't be equal to all of them). eg. If the URL is equal to "specific-folder" then it's not going to be equal to "preview", so the rule fires anyway.

For example, in pseudo-code...


set A=1
IF (A != 1) OR (A != 2) OR (A !=3) THEN
// Always successful


Regardless of the value of "A", the above expression is always true.

(Aside: I think why I wrote the OR in the first place is because I was thinking about regex alternation - but this is different - since it's the whole regex that is negated, not the individual parts.)

The URLs 'e' and 'preview' don't end in slash because they are pages, not folders.


How are these URLs routed? There must be some additional routing/processing going on here. Maybe there is a bit of a conflict? Either you are sending these URLs through a front-controller or you are appending a ".php" extension or something?

These redirects to remove the query string would need to go near the top of your .htaccess file, before any other "routing" directives.

As lucy24 mentioned, if the request is "https://example.com/e?page=string" then the REQUEST_URI is simply "/e". (Unless the URL has been rewritten beforehand.)

EDIT: Overlapped with lucy24's response.

Patrick Taylor

9:25 pm on Sep 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you goofed and said [OR]

I see it now, sort of. (onepage|otherpage|thirdpage) is 'or' (this is harder than Latin)

Maybe there is a bit of a conflict?

There may well be. There are several external redirects above it (canonical, force https etc). Should I put those below it with the query string rules at the top?

Edit: I put the query string rules at the top but the dollar symbol still has to be removed for it to work. There might be another conflict somewhere lower down the file but it gets so complicated I will leave things as they are, since it works well enough. Except maybe the order of the rules which I am not sure about.

I really appreciate the help with this.

w3dk

10:03 pm on Sep 16, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Should I put those below it with the query string rules at the top?


You could incorporate the canonical "absolute" URL in the redirect to remove the query string and then place the rule at the top (to avoid multiple redirects). ie:


RewriteRule ^ https://www.example.com%{REQUEST_URI} [QSD,R=301,L]


Otherwise, it would generally come after the canonical redirects but before any internal rewrites. (But note that this could result in multiple redirects.)

Edit: I put the query string rules at the top but the dollar symbol still has to be removed for it to work.


Is MultiViews enabled?

Try disabling this at the top of your `.htaccess` file:

Options -MultiViews


(Although if you are relying on this to append extensions to your files then this will break your site!)

Patrick Taylor

10:32 pm on Sep 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



-MultiViews had no effect either way.

Requesting http://example.com/?q=qwerty does indeed have two redirects to https://example.com/

I will see what I can do with a bit of experimentation. Thanks.

Edit.

Using this at the top (testing in a subfolder):
RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} !^/folder/(admin|preview|e)
RewriteRule ^ https://example.com%{REQUEST_URI} [QSD,R=301,L]

there is only one redirect. Good. Thanks again! (learnt a lot with this)

phranque

5:48 am on Sep 17, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



...does indeed have two redirects...


in general you want the more specific redirect rulesets first and the most general redirect rulesets last.