Forum Moderators: phranque
GET /index.php?xml_sitemap=params=pt-post-2014-10
GET /index.php?xml_sitemap=params=pt-post-2013-11
GET /?p=35
GET /?m=201312
GET /?page_id=2
GET /?paged=5&cat=1
# Make sure an error page is defined for 410
ErrorDocument 410 /path/to/custom/410.html
# For any non-empty query string
RewriteCond %{QUERY_STRING} ^.
# Match any path and send a 410 response
# The [L] prevents further matches from being executed
RewriteRule ^ - [R=410,L] You don't necessarily need a custom 410 ErrorDocument, if these URLs are just being requested by a "crawler".I always recommend having a 410 document, because the Apache default 410 message is pretty scary--not something you'd want to throw at an unsuspecting human using an old bookmark. But it doesn't necessarily have to be a separate document; on many sites you can say something like
ErrorDocument 410 /404.html
using the same physical file as you use for 404s. Order matters.Within any given module, that is. My general arrangement within mod_rewrite is
The regex "^." (in the RewriteCond directive) could be simplified to just "." (a single dot).
RewriteCond %{QUERY_STRING} ^. RewriteCond %{QUERY_STRING} . I always recommend having a 410 document, because the Apache default 410 message is pretty scary--not something you'd want to throw at an unsuspecting human using an old bookmark
Within any given module, that is. My general arrangement within mod_rewrite is
-- requests that should stop right here with no further handling, such as for robots.txt or error documents ([L] flag alone)
-- requests with 403 response (also some manual 404 or 302 if they're functioning as access control)
-- requests with 410 response
-- requests with 301 response, from most specific to most general
-- requests with internal rewrite alone ([L] flag)
-- requests with no [L] flag (rare, for example setting a cookie)
What will be the difference between this
RewriteCond %{QUERY_STRING} ^.
and this then ?
RewriteCond %{QUERY_STRING} .
i.e., they both match any string except a null string
http://example.com/?
Options +ExecCGI +FollowSymLinks
AddHandler cgi-script .pl
# Make sure an error page is defined for 410
ErrorDocument 410 /410.shtml
RewriteEngine on
# For any non-empty query string
RewriteCond %{QUERY_STRING} ^.
# Match any path and send a 410 response
# The [L] prevents further matches from being executed
RewriteRule ^ - [R=410,L]
Possibly the ErrorDocument line should be after the mod_rewrite ?
[edited by: w3dk at 10:58 pm (utc) on Feb 7, 2021]
I uploaded the new .htaccess and it works fine, and yes, it doesn't work forRight, because the query string has no content and is therefore the same as if there were no query string. (Hence the “non-empty query string” comment.) Does the site actually receive requests in this form? If not, it's a non-issue. If you do get a lot of requests with null query, you would have to change the Condition to
http://example.com/?
RewriteCond %{THE_REQUEST} \?
meaning “The request contains a literal question mark”. Possibly the ErrorDocument line should be after the mod_rewrite?It makes no difference whatsoever. Each module is an island. RewriteRules are mod_rewrite; ErrorDocument directives are core. Personally I like to put ErrorDocument directives near the top of htaccess because it's just a few lines.
RewriteCond %{QUERY_STRING} .
RewriteCond %{QUERY_STRING} !^fbclid
RewriteRule (^|\.html|/)$ - [F]
From a syntactical point of view it doesn't matter. However, for readability, the ErrorDocument directives should be defined first (as you have done).
Right, because the query string has no content and is therefore the same as if there were no query string. (Hence the “non-empty query string” comment.) Does the site actually receive requests in this form? If not, it's a non-issue
This is a bad idea IMO, some common referrers, most notably Facebook, add query params to the URLs linked from within their service ..(snip)
Yup, I've got one myself:
RewriteCond %{QUERY_STRING} .
RewriteCond %{QUERY_STRING} !^fbclid
RewriteRule (^|\.html|/)$ - [F]
When I go over my access logs, I look at any requests for the stylesheet that belongs specifically to error documents, because that lets me know when a human has been served a 403 or 404.
RewriteRule ^(testpath)($|/) - [L] Options +ExecCGI +FollowSymLinks
AddHandler cgi-script .pl
# Make sure an error page is defined for 410
ErrorDocument 410 /410.shtml
RewriteEngine on
# exit the rewriting process if query is in /testpath
RewriteCond %{REQUEST_URI} ^/testpath [NC]
RewriteRule .* - [L]
# For any non-empty query string
RewriteCond %{QUERY_STRING} ^.
# Match any path and send a 410 response
# The [L] prevents further matches from being executed
RewriteRule ^ - [R=410,L]
Would the following line added up near the top suffice ?Yes. In fact you can simplify to
RewriteRule ^(testpath)($|/) - [L]
RewriteRule ^testpath - [L]unless you happen to have a publicly accessible directory called, say, /testpathogen. You don't need the parentheses around (testpath), since you're not capturing. I tried this and it didn't work
> RewriteRule ^testpath - [L]
yet this works ..
> RewriteCond %{REQUEST_URI} ^/testpath [NC]
> RewriteRule .* - [L]
RewriteCond %{QUERY_STRING} .
RewriteRule !^testpath - [NC,G]
Options +ExecCGI +FollowSymLinks
AddHandler cgi-script .pl
# Make sure an error page is defined for 410
ErrorDocument 410 /410.shtml
RewriteEngine on
# For any non-empty query string
RewriteCond %{QUERY_STRING} ^.
# Match any path and send a 410 response
# The [L] prevents further matches from being executed
RewriteRule ^ - [R=410,L] Options +ExecCGI +FollowSymLinks
AddHandler cgi-script .pl
# Make sure an error page is defined for 410
ErrorDocument 410 /410.shtml
RewriteEngine on
# exclude /testpath from the next set of conditions
RewriteCond %{QUERY_STRING} .
RewriteRule !^testpath - [NC,G]
# For any non-empty query string
RewriteCond %{QUERY_STRING} ^.
# Match any path and send a 410 response
# The [L] prevents further matches from being executed
RewriteRule ^ - [R=410,L] RewriteCond %{QUERY_STRING} .The one problem with this construction is that the server then has to evaluate conditions on every request, not just pages. Oh, and do you really need [NC] here? If it's a directory that only you use, surely you can trust yourself not to type TESTPATH by mistake? The drawback to [NC] is that the server then has to do an extra step: convert the whole request into lower-case before matching it against the pattern. Those picoseconds add up.
RewriteRule !^testpath - [NC,G]
I don't know why the code I add is not properly formatting in this thread ?I think the [ code ] markup assumes some language or other, so you will see unexpected highlighting. Don't know what it thinks is so special about the numerical string "410", though :)
If it's a directory that only you use, surely you can trust yourself not to type TESTPATH by mistake?
RewriteEngine off I think the [ code ] markup assumes some language or other, so you will see unexpected highlighting. Don't know what it thinks is so special about the numerical string "410", though :)
So, what you are suggesting is to ..
1. Remove the .htaccess in the 'public_html/testpath' folder
2. Modify .htaccess in /public_html as follows ..
Oh, and do you really need [NC] here?