Forum Moderators: phranque

Message Too Old, No Replies

Redirect spoof query string requests for HTML pages

How to stop junk requests for /foo.html?23Abz

         

tigertom

4:46 pm on Dec 1, 2006 (gmt 0)

10+ Year Member



This isn't working:

RewriteCond %{REQUEST_URI} ^/(.*)\.([htm¦html¦shtml])$ [NC]
RewriteCond %{QUERY_STRING} ^([a-zA-Z0-9]+)$
RewriteRule ^(.*)$ [L,G]

I want to send a 'gone' or 404 response to requests for www.mysite.com/widget.html?Xzy03 and its ilk.

Thank you.

jdMorgan

6:35 pm on Dec 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You've got some spurious regex tokens in there, and an unneeded RewriteCond:

# If non-blank query string
RewriteCond %{QUERY_STRING} . [NC]
# on static filetype, then rewrite to nonexistent path to force a 404
RewriteRule \.s?html?$ /path_to_file_that_does_not_exist [NC,L]

Alternately, force a 410-Gone response:

# If non-blank query string
RewriteCond %{QUERY_STRING} . [NC]
# on static filetype, then force a 410 response
RewriteRule \.s?html?$ - [NC,G]

or "correct" the URL with a 301 redirect:

# If non-blank query string
RewriteCond %{QUERY_STRING} . [NC]
# on static filetype, then redirect to the same URL after stripping off the query string
RewriteRule ^([^.]+\.s?html?)$ http://www.example.com/$1? [NC,R=301,L]

I assume that you want to handle *any* query string attached to an .htm, .shtm, .html, or .shtml filetype. If that's not the case, then you can restore your original pattern of "^[a-z0-9]+$" with a NoCase [NC] flag.
However, that will reject a query string containing an "=" or a hyphen, or any other character that is not a-z, A-Z or 0-9, which is probably not what you want.

Also, if you use filepaths with periods in them other than the final one before the filetype, then change the pattern in the last rule to the much-less-efficient but less-selective "^(.+\.s?html?)$"

See also this recent thread: [webmasterworld.com...]

Jim

tigertom

7:29 pm on Dec 1, 2006 (gmt 0)

10+ Year Member



Many thanks for a such a complete, succinct and considered reply. I'll try it immediately.

----

Later: I tried the 410 Gone code. Worked a treat!
However, it returned a message saying the 'true' file (/foo.htm) had gone, so I've plumped for the 404 response instead.
Just to be safe.

Thank you, JD.