Forum Moderators: phranque

Message Too Old, No Replies

I need urgent help with %3f removal

         

dolcevita

10:26 am on Mar 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



3 years with hosting at a2hosting. Today I noticed that every query ends with %3f at the end of the generated url, which makes a 404. This happened suddenly.

This is my htaccess code that works for a years without any issue. It remove .php as extension, remove ?check= and replace it with /

# To externally redirect /example.com/foo.php?find=123 to example.com/dir/foo
RewriteCond %{THE_REQUEST} ^GET\s([^.]+)\?check=([^&\s]+) [NC]
RewriteRule ^ %1/%2? [R=301,L]

# To internally forward /example.com/foo.php?find=123 to example.com/dir/foo
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+?)/([^/]+)/?$ $1.php?check=$2 [L,QSA]


# Remove .php extension
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.php [NC,L]



Code works to change for example query

from
https://www.example.org/whatever.php?check=msn.com

to
https://www.example.org/whatever/msn.com


Right Now i get
https://www.example.org/whatever/msn.com%3f


I really need to remove %3f



[edited by: not2easy at 1:01 pm (utc) on Mar 22, 2023]
[edit reason] please use 'example' for domains [/edit]

not2easy

1:07 pm on Mar 22, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Have you contacted your host to see if they have a quick fix? We've seen several similar incidents in the past few weeks.

dolcevita

1:40 pm on Mar 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep. He told me that it is probably something related with .htaccess and that they can not do anything from a server side to solve issue.
It is strange that %3f is generated only through form query. And it is heart from my website. The people input domain into form field and then get information about that domain. RIght now they can see generated url fro past because https://www.example.org/whatever/msn.com works but if they try to input msn.com for example into query field then there is %3f and page is not generated.

Maybe if someone have coding experience can look into code posted above and try to solve issue (if there is any)

Thanks

dolcevita

2:58 pm on Mar 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Update.

After playing with htaccess code for some time it seems that this works. Changing

# To externally redirect /example.com/foo.php?find=123 to example.com/dir/foo
RewriteCond %{THE_REQUEST} ^GET\s([^.]+)\?check=([^&\s]+) [NC]
RewriteRule ^ %1/%2? [R=301,L]


to

# To externally redirect /dir/foo.php?find=123 to /dir/foo
RewriteCond %{THE_REQUEST} ^GET\s([^.]+)\?check=([^&\s]+)(?:&(\S+))? [NC]
RewriteRule ^ %1/%2?%3 [R=301,L]


solved the issue and %3f behind query generated url is gonna. Huuuuhhhhh.. The more I get into coding, the less I seem to know (:

btw

Maybe someone can improve the code above and make it more effective

not2easy

4:25 pm on Mar 22, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you're sure that people aren't requesting upper case or camel case URLs, you might leave off the NC flag so the server doesn't need to check for both/either case. Milliseconds maybe, but more efficient if it is not needed.

Good idea to check back for more helpful ideas later in the day.

lucy24

5:40 pm on Mar 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, yikes, this is the third thread in recent weeks where someone's host has been converting a trailing ? --which ought to simply disappear--into the percent-encoded %3F. I hope someone can eventually shed light on what causes it to happen.

dolcevita, I think your fix works because the ? is no longer the last thing in the target--apparently even if the final part of the query doesn't exist.

I must say I don't see how the rule is intended to work, since the # comment line says it's for requests in .php while the Condition says it's for extensionless requests. In any case, the \.php should be in the pattern, so the server doesn't have to evaluate the Condition on every single request ever. And the [^.]+ in the Condition would be more efficient if expressed as [\w/-] using only characters that can occur in URLpaths. Otherwise the server races through the whole request up to the . in HTTP/2.0 and then has to backtrack to see if there was a query. Or you can say [^.?\s] if you prefer to keep it as a negative.

dolcevita

5:54 pm on Mar 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



lucy24 and not2easy, thank you both for reply. I would appreciate if you can find time to post whole code that i can try out. Code below works right now but if someone can make even better i would be happy to try out. Here is full code regarding that works right now:

# To externally redirect /dir/foo.php?find=123 to /dir/foo
RewriteCond %{THE_REQUEST} ^GET\s([^.]+)\?check=([^&\s]+)(?:&(\S+))? [NC]
RewriteRule ^ %1/%2?%3 [R=301,L]

# To internally forward /dir/foo/12 to /dir/foo.php?find=12
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+?)/([^/]+)/?$ $1.php?check=$2 [L,QSA]


# Remove .php extension
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.php [NC,L]


# Code solve issue with Captcha for contact form
RewriteCond %{THE_REQUEST} ^(GET|HEAD)\s/(.+)\.php[^\s]* [NC]
RewriteRule ^ /%2 [R=301,NE,L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^ %{REQUEST_URI}.php [QSA,NC,L]


btw

Just to add that IPv4, IPv6 addresses and domain names (www.whatever.com or whatever.com without https:// or http:// ) are allowed in the form fields for queries, so that there is no confusion.

lucy24

9:35 pm on Mar 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ouch. That is a ### of a lot of -d and -f tests. They should be used as an absolute last resort, because of the extra drag it places on the server.

General principle: never put something in a Condition that can go in the body of the Rule. (Exceptions exist, but they are rare.) Here, this means constraining the pattern to
either
^[^.]+$ (for extensionless URLs)
or
^[^.]+\.php (for URLs containing .php)
In both cases, non-page files--ones with extension other than .php--should bypass the Conditions entirely.

Request methods are always upper-case, so there is no need for the [NC] flag. Again it makes extra work for the server, as it has to first convert both the pattern and the test string to lower-case before it can compare them.

I still don't get the “externally redirect” part. Seems like it should be something along the lines of (don't cut-and-paste, this is off the top of my head)
RewriteCond %{QUERY_STRING} ^check=([^&]*)(?:&(.*))?
RewriteRule ^([^.]+)\.php https://www.example.com/$1/%2?%3 [R=301,L]

It also isn't clear whether the extensionless URLs do or do not end in a / (slash). If they end in / but there are also real physical directories on the site, it is more efficient to express the Condition as something like
%{REQUEST_URI} !(actual|names|of|real|directories)
--again so the server doesn't have to go look.

dolcevita

9:00 am on Mar 23, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



URL's do not end in a slash. It is
https://www.example.com/test/whatever.com
or
https://www.example.com/test/111.111.111



You forgot below in code to add GET (as method used in form)

RewriteCond %{QUERY_STRING} ^check=([^&]*)(?:&(.*))?
RewriteRule ^([^.]+)\.php https://www.example.com/$1/%2?%3 [R=301,L]




[edited by: not2easy at 10:34 am (utc) on Mar 23, 2023]
[edit reason] please use 'example' for domains [/edit]

phranque

10:14 pm on Mar 23, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Here is full code regarding that works right now:

except for very rare cases, the external redirects should precede the internal rewrites.
otherwise, you risk exposing internal urls to external requests.

You forgot below in code to add GET (as method used in form)

the "GET" request method would only be in the "%{THE_REQUEST}" string.
however, the code sample supplied by lucy24 is testing the "%{QUERY_STRING} string.

dolcevita

12:40 am on Mar 24, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



thank you for reply phranque. I apologize for my dilettante knowledge of this matter, but as I said, this code below works ok in testing and generating query url results


# To externally redirect /dir/foo.php?find=123 to /dir/foo
RewriteCond %{THE_REQUEST} ^GET\s([^.]+)\?check=([^&\s]+)(?:&(\S+))? [NC]
RewriteRule ^ %1/%2?%3 [R=301,L]

# To internally forward /dir/foo/12 to /dir/foo.php?find=12
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+?)/([^/]+)/?$ $1.php?check=$2 [L,QSA]


# Remove .php extension
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.php [NC,L]


# Code solve issue with Captcha for contact form
RewriteCond %{THE_REQUEST} ^(GET|HEAD)\s/(.+)\.php[^\s]* [NC]
RewriteRule ^ /%2 [R=301,NE,L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^ %{REQUEST_URI}.php [QSA,NC,L]


I do not understand exactly...

except for very rare cases, the external redirects should precede the internal rewrites.
otherwise, you risk exposing internal urls to external requests.

...because as far as I can see, external redirects (by code that i use) happen here first, and only then internally.

And what should I do with the code from lucy24, obviously because of the GET method I should have {THE_REQUEST}" string as the first rule.

Thanks for your understanding and sorry for the confusion.