Forum Moderators: phranque
This is the file:
Options +FollowSymLinks
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?gregnmary.gotdns.com:8080/.*$ [NC]
RewriteRule \.(gif¦jpg)$ - [F]# No View
<Files .htaccess>
order allow,deny
deny from all
</Files>
IndexIgnore *
# Get Out
#<Limit GET POST>
order deny,allow
deny from 121.218.130.154
deny from 91.149.224.252
deny from 79.116.142.175
deny from 94.23.238.192
deny from 212.95.54.176
deny from 194.8.75.155
deny from 117.195.135.43
deny from 188.92.73.228
deny from 88.234.49.49
deny from 94.23.226.25
deny from 188.92.76.35
deny from 188.165.192.166
deny from 120.28.76.113
deny from .657liyiz.biz
deny from .youdao.com
allow from all
#</Limit>
# Get Out
SetEnvIfNoCase Referer 657liyiz spammer=yes
SetEnvIfNoCase Referer youdao spammer=yes
deny from env=spammer
#AuthUserFile "/srv/www/.htpasswd"
#AuthType Basic
#AuthName "By Invitation Only"
#require valid-user
Here is a section from my apache access log. The entry where it sees my api key for recaptcha is particularly disturbing though it is the public key:
188.165.192.168 - - [29/Dec/2009:17:13:33 -0500] "GET /index.php?topic=62.0 HTTP/1.0" 200 53827 "http://unlockiphone22.com" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13"
188.165.192.168 - - [29/Dec/2009:17:13:40 -0500] "POST /index.php?PHPSESSID=ln2fvs94hpvg1ibmuol0nmm3dhl6kqcj&action=quickmod2;topic=62.0 HTTP/1.0" 302 - "http://gregnmary.gotdns.com/index.php?topic=62.0" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13"
188.165.192.168 - - [29/Dec/2009:17:13:40 -0500] "GET /index.php/topic,62.0.html HTTP/1.0" 200 50685 "http://gregnmary.gotdns.com/index.php/topic,62.0.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13"
188.165.192.168 - - [29/Dec/2009:17:13:48 -0500] "GET /index.php?action=register HTTP/1.0" 200 34860 "http://gregnmary.gotdns.com/index.php?action=register" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13"
188.165.192.168 - - [29/Dec/2009:17:14:46 -0500] "POST /index.php?action=register2 HTTP/1.0" 200 14222 "http://api.recaptcha.net/noscript?k=6Lcf4AQAAAAAAAUR-2l0hCh16aDqWg-ZZj20num_" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13"
188.165.192.168 - - [29/Dec/2009:17:14:48 -0500] "GET /index.php?action=login HTTP/1.0" 200 13912 "http://gregnmary.gotdns.com/index.php?action=login" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13"
188.165.192.168 - - [29/Dec/2009:17:14:49 -0500] "POST /index.php?action=login2 HTTP/1.0" 200 16004 "http://gregnmary.gotdns.com/index.php?action=login" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13"
188.165.192.168 - - [29/Dec/2009:17:14:51 -0500] "GET /index.php?topic=62.0 HTTP/1.0" 200 50684 "http://gregnmary.gotdns.com/index.php?topic=62.0" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13" One more thing. Shouldn't I be able to test this by denying my own gateway ip address which is where my lan machines access the site from? I have cleared the cache and tried on different machines. Shouldn't that prevent me from accessing the site thus telling me it's working? If so that doesn't work either. Other directives do work though from that same file? I tested mod_rewrite today and it works too. I should say I've learned just enough to get myself into situations like this and not enough to know how to resolve it.
Any help would be greatly appreciated,
Thanks
In any event a simple weakness is in the following which is a previous denial by you (and likely the same pest):
deny from 188.165.192.166
Denying a precise focus on a Class D may return to bite you in the backside, which is exactly what's happened here. The visitor simply return:
188.165.192.168 - - [29/Dec/2009:17:13:33 -0500] "GET /index.php?topic=62.0 HTTP/1.0" 200 53827 "http ://unlockiphone22.com" "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13"
A more reliable solution is in either denying the providers complete range (deny from 188.165.0.0/16)
or
Using a Rewrite with a multiple condition based upon multiple criteria (which may protect some innocents):
1) IP and UA
2) IP and Refer
3) IP, UA and Refer
I've no clue how to assist you with this, or where it's even possible to do so without obscuring the domain in your refers.
I'm not sure what you mean by the above.
In any event a simple weakness is in the following which is a previous denial by you (and likely the same pest):
Fair enough, but today I have hack attempts in my logs from 188.92.76.35 which is definitely a match for one of the denials that is already in the file.
I understand the shortcomings of banning by individual addresses, but I can't figure out why my existing bans are being bypassed when other directives from that same file are working.
Does any of this from my httpd.conf file have anything that could be affecting it?
# forbid access to the entire filesystem by default
<Directory />
Options +FollowSymLinks
AllowOverride All
Order deny,allow
Deny from all
</Directory>
<Directory "/srv/www/htdocs/avs/">
php_admin_flag engine off
</Directory>
<Directory "/srv/www/htdocs/avatars">
php_admin_flag engine off
</Directory>
<Directory "/srv/www/htdocs/attachments">
php_admin_flag engine off
</Directory>
# use .htaccess files for overriding,
AccessFileName .htaccess
# and never show them
<Files ~ "^\.ht">
Order allow,deny
Deny from all
</Files>
Thanks again, I do appreciate your patience. Like I say, I have lot to learn.
Fair enough, but today I have hack attempts in my logs from 188.92.76.zz which is definitely a match for one of the denials that is already in the file.
deny access to an IP range does not prevent the "attempt" from being included in your logs, rather the page request is followed by a 403 tag.
I've no clue how to assist you with this, or where it's even possible to do so without obscuring the domain in your refers.
I'm not sure what you mean by the above.
Within the "Forum Charter" at page top is included the following:
Please do not post specific details such as domain names, full IP addresses, or personally-identifiable information such as name, e-mail address, etc. Such specifics will be edited or removed in accordance with our Terms of Service, and may render your post meaningless. Please replace your domain name with "example.com" before posting.
end of quote
As a result the log lines in your initial request are a violation of same charter.
The forum site this is all about is reporting in it's log that it is returning a ban message to these addresses which would indicate that they're reaching the site no? Also I have a referrer log in the docroot that keeps saying that registration, login and post attempts are taking place. Like this one this afternoon:
Referrer : [[b]myactualsiteaddress.com[...]
User Agent : opera/9.22 (windows nt 5.1; u; cs)
IP Address : [whois.#*$!...]
Date and Time : Thursday 31st December 2009 03:23:10 PM
This is being reported as referred from my actual site address, like from the inside. Am I wrong in believing this indicates that the ban is not working?
I suggest that you declare "Order deny,allow" only once, unconditionally, at the top, and then restructure your code to work with that, adding a SetEnvIf and Allow from directive so that your robots.txt file and custom 403 error page are accessible even to 'banned' IP addresses. If you don't do that, you'll suffer an awful lot of grief...
Jim
The main problem appears to be that you're trying to use both "Order allow,deny" and "Order deny,allow" and their scopes are overlapping in some cases. If you put multiple Order statements in your file, then only the last one found that applies to this request will be applied.
I actually did read around this site quite a bit before starting this thread and have seen a lot of your posts. Hence I have no doubt that you are many years ahead of me in your skills in these areas.
So, I'm only asking, but doesn't what you say above only apply to order statements that are not in containers? In other words I have 2 order statements, but one of them is in a files container and should only apply to what's in that container? Of course it's entirely possible I have misapprehended that whole concept and will take your word without hesitation.
I suggest that you declare "Order deny,allow" only once, unconditionally, at the top, and then restructure your code to work with that, >>>
You mean like this?:
order deny,allow
RewriteEngine On
Options +FollowSymLinks
IndexIgnore *
<Files .htaccess>
deny from all
</Files>
# Get Out
deny from 121.218.130.154
deny from 91.149.224.252
deny from 79.116.142.175
deny from 94.23.238.192
deny from 212.95.54.176
deny from 194.8.75.155
deny from 117.195.135.43
deny from 188.92.73.228
deny from 88.234.49.49
deny from 94.23.226.25
deny from 188.92.76.35
deny from 188.165.192.166
deny from 120.28.76.113
deny from .657liyiz.biz
deny from .youdao.com
allow from all
<<< adding a SetEnvIf and Allow from directive so that your robots.txt file and custom 403 error page are accessible even to 'banned' IP addresses. If you don't do that, you'll suffer an awful lot of grief...
I'm afraid you're over my head here. I don't have a custom 403 error page though I do know what that is, but not how to make one. I do have a simple robots.txt file with this in it:
User-agent: *
Disallow: /*?action*
Disallow: /*sort=*
Disallow: /*msg*
but I don't know how to use the directives you mention offhand.
Does the httpd.conf bit I posted interfere with what I'm trying to do? I also wish I knew whether restricting my own gateway address was an effective test or if NAT messes that up somehow. I further wish I didn't have to bug you guys with these noob questions, but I have spent hours researching all this, really, and it seems that everybody either assumes too much of their readers or conflict with one another. The apache manual mystifies me the way it's written a lot of times.
I do appreciate your patience.
Thanks
To allow unconditional access to robots.txt (to prevent robots thinking that they are allowed access to all files because they can't read your "Disallows" in robots.txt) and to prevent an 'infinite' loop on 403 errors, add this code -- adapted to your URL-paths:
SetEnvIf Request_URI "/robots\.txt$" AllowAll
SetEnvIf Request_URI "/403-error-page\.html$" AllowAll
...
Allow from env=AllowAll
Jim
You're saying I need the first 2 lines of the file as the SetEnvIfs?
after that order deny,allow?
After that Allow from env=AllowAll?
Remove the order and allow all directives from the section with the ip denials altogether?
The rest won't matter as far as placement order in the file?
Assuming this is all correct, the next thing I want to be sure about is the paths in the SetEnvIfs.
Go ahead and yell at me. I don't blame ya, but I don't understand the syntax. Does the / begin a folder path from the docroot down? Does this: robots\.txt$" represent the file name robots.txt so mine being in the docroot would be exactly what you already have?
or do I put the path and then also also include the filename like this:
SetEnvIf Request_URI "/robots.txt\.txt$" AllowAll
Same thing with the 403 error. Supposing a folder in the docroot named "errors" and an error page named "403.html" do I want this:
SetEnvIf Request_URI "/errors/403\.html$" AllowAll
or this:
SetEnvIf Request_URI "/errors/403.html\.html$" AllowAll
Or am I wrong either way? Is it necessary that I have custom error page? The default one won't work?
I'm pullin my hair out imaging you pullin out yours reading these question as I'm sure this is 2nd nature to you, but I'm doing the best I can for where I'm at. I can't tell you how much I appreciate the help. I've never had occasion to have to learn these things until now.
SetEnvIf Request_URI "/robots.txt\.txt$" AllowAll
To specify an exact directory, include the entire path and start-anchor it, as in:
SetEnvIf Request_URI "/errors/403\.html$" AllowAll
Check out the regex tutorial; It will benefit *all* of your coding projects: .htaccess, PERL, PHP, C, JavaScript, etc. Look up each new .htaccess directive in the Apache module documentation. Make the changes and test, then post for review or if any problems.
Jim
Because I do not know your directory paths, I did not start-anchor either pattern. For info on anchoring, see the Regular Expressions tutorial cited in our Forum Charter. Do not proceed with any .htaccess or script coding until you've digested a good bit of that... What you'll find is that without a start anchor,SetEnvIf Request_URI "/robots.txt\.txt$" AllowAll
actually matches a robots.txt file in *any* directory on your site.
To specify an exact directory, include the entire path and start-anchor it, as in:
SetEnvIf Request_URI "/errors/403\.html$" AllowAll
Maybe I'm not as smart as I thought I was. I've been reading that regex tutorial since 6 oclock this morning. Every word and I cannot garner a flickering clue how anything in the anchors section corresponds to the above. They say anchors are: ^, $, \<, \>, \b, and \B. The only one here is $ and that is in both examples you give, one anchored and one not. I'm lost as to where the anchor is or how it's used even after hours of additional reading.
The SetEnvIf section of the Apache manual uses backslashes where it appears you would have forward slashes with no indication if there's a difference and says nothing about how paths are treated that I can discern. Do I need the url to the files in question or the local path. If local does Apache recognize locations above it's own docroot?
If the docroot is /srv/www/htdocs does it want that or does it see / as it's docroot. One would think not the latter because before you said a single / with robots.txt would tell it to look in every directory so I can't figure out how to tell it to look in (it it were local) /srv/www/htdocs for robots.txt. Also if this even is the docroot why do no .htaccess files work unless they are in the directory above /srv/www/htdocs? Maybe Suse has some weird setup.
Meanwhile I've had 2 more attempted attacks since last night from that same address. I dunno. Maybe I'm in over head and should have learned more about foundational stuff like this before I spent a year working on the other components of my site which is just an educational toybox anyway.
I do so very much appreciate your help, don't get me wrong. I should know better than to trust all these over simplistic tutorials floating around. There are dozens for banning IP address with .htaccess that consist of about a paragraph like this:
Blocking users by IPIs there a pesky person perpetrating pain upon you? Stalking your site from the vastness of the electron void? Blockem! In your htaccess file, add the following code--changing the IPs to suit your needs--each command on one line each:
order allow,deny
deny from 123.45.6.7
deny from 012.34.5.
allow from allYou can deny access based upon IP address or an IP block. The above blocks access to the site from 123.45.6.7, and from any sub domain under the IP block 012.34.5. (012.34.5.1, 012.34.5.2, 012.34.5.3, etc.) I have yet to find a useful application of this, maybe if there is a site scraping your content you can block them, who knows.
You can also set an option for deny from all, which would of course deny everyone. You can also allow or deny by domain name rather than IP address (allow from .javascriptkit.com works for www.javascriptkit.com or virtual.javascriptkit.com, etc.)
Apparently it is decidedly not that simple.
Basic anchoring:
The pattern "^this$" matches only exactly "this"
The pattern "this$" matches anything that ends with "this"
The pattern "^this" matches anything that starts with "this"
The pattern "this" matches anything that *contains* this.
Thus the pattern /robots.txt$ matches a request for "robots.txt" in any directory, but will not match a request for "/not-robots.txt".
Jim
I took you to mean that if I remove the "allow from all" I would then need the SetEnvIfs so I left it. However, get this.
One of the simple machines (that's the forum package) honchos suggested I change the order to allow,deny which didn't work. He then told me to start a new file with only:
<Limit GET PUT POST>
order allow,deny
allow from all
deny from 192.168.1.1
</Limit>
That also didn't work. My gateway address which is actually how I've been testing all along.
For the absolute helluvit I put the file one directory down into what my default-server.conf file says is the docroot. That's what I thought it was anyway. The damn thing worked. I got a 403 forbidden error. Then I renamed that file and put my .htaccess file in that directory and it worked too after adding my gateway address. Now I'm sitting there staring blankly at the screen because that's how this all started in the first place.
All I can think of is that through some accidentally precise confluence of my playing with allowoverride in httpd.conf along with the rest of this I got myself offtrack early on and that inadvertently, but adversely colored the way I viewed everything else I was told from then on.
The short version is I shouldn't have moved the file out of the docroot several days ago. I have been so pickled in information overload I don't even remember the order I tried everything in anymore.
I now have this:
order allow,deny
deny from 24.131.181.199
deny from 121.218.130.154
deny from 91.149.224.252
deny from 79.116.142.175
deny from 94.23.238.192
deny from 212.95.54.176
deny from 194.8.75.155
deny from 117.195.135.43
deny from 188.92.73.228
deny from 88.234.49.49
deny from 94.23.226.25
deny from 188.92.76.35
deny from 188.165.192.166
deny from 120.28.76.113
deny from 188.92.73.175
deny from .657liyiz.biz
deny from .youdao.com
allow from all
and it's working... near as I can tell. Should I be concerned about SetEnvIfs still?
BTW, if you made it through this post without pounding your head on your keyboard my hat is off.
Oh yeah, great explanation about anchoring. I don't think the tutorial mentions them being used in combinations like that on the same line. It may give examples that way, but I don't think it declares that to be the case and if it does I missed it. I'm pretty sure that's what threw me off there.
Thanks again.
So in this case, the attempt to fetch the custom error document will result in a second 403-Forbidden error response, followed by another, and another, and another. It will 'loop' like this until the server gives up but in the meantime, you've essentially created a "self-inflicted denial-of-service" attack.
Also, if a user-agent at a blocked IP address attempts to fetch your robots.txt file, it will get a 403 response, just as it would with any other request. Unfortunately, it may treat this as if your robots.txt file were blank or invalid. And in that case, the Standard for Robot Exclusion does not prohibit treating this situation as carte-blanche to spider your entire site. So the result may well be a flood of requests -- all of which your server will have to respond to.
I don't make recommendations lightly, so I'd urge you to understand the differences and side-effects of various methods before choosing them. Tiny changes can have big effects on server operation, server load, and your rankings in search results -- The scope of such server config changes is not limited to 'just the few lines of code'.
I suggest "Order deny,allow" and the "SetEnvIfs" and "Allow from env=" directives to allow robots.txt and your custom 403 error document to be fetched by *all* requestors.
Jim
SetEnvIf Request_URI "/robots\.txt$" AllowAll
match robots.txt in the docroot?
Also I don't know html very well and don't know how to build a custom error page.
In short you're saying there's no safe way to deny ip addresses without a properly configured custom error page.
I now have this with a 403 error page I "stole" and modified:
SetEnvIf Request_URI "/robots\.txt$" AllowAll
SetEnvIf Request_URI "/errors/403\.html$" AllowAll
Order deny,allow
Allow from env=AllowAll
RewriteEngine On
Options +FollowSymLinks
#RewriteCond %{HTTP_REFERER} !^$
#RewriteCond %{HTTP_REFERER} !^http://(www\.)example.com:8080/.*$ [NC]
#RewriteRule \.(gif¦jpg)$ - [F]
#RewriteRule ^link([^/]*).html$ rewrite.php?link=$1 [L]# No View
<Files .htaccess>
order allow,deny
deny from all
</Files>IndexIgnore *
# Get Out
#<Limit GET POST>
#deny from 192.168.1.1
deny from 24.***.181.199
deny from 121.***.130.154
deny from 91.***.224.252
deny from 79.***.142.175
deny from 94.**.238.192
deny from 212.**.54.176
deny from 194.*.75.155
deny from 117.***.135.43
deny from 188.**.73.228
deny from 88.***.49.49
deny from 94.**.226.25
deny from 188.**.76.35
deny from 188.**.73.175
deny from 188.**.192.166
deny from 188.**.192.168
deny from 120.**.76.113
deny from .xyz-***.biz
deny from .youdao.com
#</Limit>ErrorDocument 403 /errors/403.html
Near as I can tell everything works. Denying myself gets me the custom page which leads me to believe that the SetEnvIf for robots.txt is correct as well. The Google crawler access test works for whatever that's worth, but I'm not thinking that should be affected by this. I did BTW understand your warnings about the 403 loop and deep drill issue without this setup.
How does this look?
[edited by: jdMorgan at 1:12 am (utc) on Jan. 4, 2010]
[edit reason] examplified domains and IP addresses [/edit]
But if you do have a custom 403 error page, then it must be excluded from all access denials because otherwise it can't be served to denied user-agents. They'd try to fetch the custom 403 error page, get denied (triggering another 403), try again, get denied, try again... You'd see this as a very fast sequence of 403'ed requests in your access log, and continuing at a high rate until the server gave up.
And the robots.txt file also needs to be excluded, so that even Disallowed robots can discover that they are in fact Disallowed.
The non-start-anchored pattern "/robots\.txt$" matches robots.txt filea in *any* directory. This follows from what I posted above: The pattern matches *anything* that ends with "/robots.txt".
Note on custom 403 pages:
Keep them short.
Do not provide too much information. It will be read by bad guys.
Do not gloat. This will 'challenge' the bad guys to hack your site.
Don't be rude. A few innocent victims may read it if you make an error in your code.
To help the innocent, consider providing a text link to a second 403 page with a full explanation of the problem and some method to resolve it such as an obscured e-mail (use a throw-away address that you can change often!) This additional-403-info page will also need to be excluded from the Deny-from list.
Note on Deny from <hostname> directives:
I note that you have "deny from youdao.com" and another hostname in your list. I believe they've changed the spelling to "yodao" but more importantly, that method of denying access requires your server to make an rDNS request to the DNS system in order to look up the requesting IP address to see whether it belongs to yodao.com or not. Your server must issue an ougoing request and await the response for every request that it receives!
This is highly-inefficient, and *all* requests to your server will stall while waiting for the DNS system to return an answer to this reverse-DNS request. It would be *far* more efficient to deny access based on an IP address range (look up the address range using "WHOIS") or based on the user-agent (if it is consistent). For example:
SetEnvIfNoCase User-Agent "Yodaobot" pesky
SetEnvIfNoCase User-Agent "^Toata\ dragostea" pesky
...
Deny from env=pesky
This rDNS warning also applies to mod_rewrite-based access controls such as
RewriteCond %{REMOTE_HOST} yodao\.com
RewriteRule . - [F]
Also, look at the several IP-address-range notations that you can use. I note that you have at least one 'pair' of Ip addresses which are very close together, and which could both be denied in one line of code similar to
Deny from 188.***.192.166/30
...
The continuing tales of hammered, shot, and defenestrated computers sadden me... Glad to have helped prevent yet another... :)
Jim
SetEnvIf Request_URI "/^robots.txt$" AllowAll
Unless I'm still misunderstanding wouldn't that tell it to match exactly robots.txt, but still in every directory?
Or this:?
SetEnvIf Request_URI "/srv/www/htdocs/^robots.txt$" AllowAll
Or this:?
SetEnvIf Request_URI "^/robots.txt$" AllowAll
I suppose worse things could be happening than it looking in every directory, but I'd obviously rather have it look only in the docroot. I guess the pathing thing still has me squinting a bit, but I'm miles ahead of where I was.
Thank you again very much. You've been extremely helpful and patient. I'm pretty sure I get the rest.
That's the simple answer. But do be aware that Request_URI looks at the requested URI -- The "URL" as more-commonly termed, and it doesn't care where the file that corresponds to that URI is located. So, for example, if that URI were to be internally rewritten (e.g. using mod_rewrite) to some other place in the filesystem, then the SetEnvIf Request_URI would still match, but the file corresponding to that URI would no longer be located in DocumentRoot.
This may seem pedantic, but in addition to understanding regular expressions patterns and anchoring, you also need to be deeply aware of the fact that URLs and filepaths are not at all the same thing; They are two very different things with no relationship whatsoever except for the 'association' between them which is provided by the action of the server. This in fact is the basic function of a server: to translate URI location-specifiers used 'out on the Web' into the potentially-completely-different location specifiers used within the server's filesytem.
This is important both conceptually *and* practically, because you can't fully understand mod_rewrite without appreciating this fact, and you may not otherwise notice that some Apache directives work based on filepaths and some work on URL-paths, and their effects can be vastly different because of that.
And it's not so obvious as one might think. See that address in the URL-bar of your browser? I guarantee you that that doesn't resemble the filepath to the actual file holding this thread in any but the most superficial way. For example, this is not really a static .htm file, and there is no directory anywhere in the WebmasterWorld filesystem named "apache". :)
Jim
The SetEnvIf section of the Apache manual uses backslashes where it appears you would have forward slashes with no indication if there's a difference and says nothing about how paths are treated that I can discern. Do I need the url to the files in question or the local path. If local does Apache recognize locations above it's own docroot?
I think you and I have been traveling down different roads and and hence been viewing different scenery from the start here. You have been largely speaking about regular expressions and their application with Apache generally and I have been stuck on my precise situation.
If that is correct then you have been saying all along that the fact that it's looking in every directory for in this instance robots.txt is how it's supposed to work and I've been looking for a solution to a non existent problem?
In other words:
No slash no match
or / scans every directory down
or /pathname/ etc. does specify a folder, but
there's no way with SetEnvif to specify docroot only and no reason to want it to? I say again, you're a patient guy and I do appreciate it. I feel a real forehead smacker coming on for me here. This whole concept is, I'm sure, staring me right in the face and I've been myopically stabbing around it from the start.
The difference in our 'focus' is due to the Charter of this forum. It is a discussion forum, and not a 'help desk.' Therefore, many if not most responses here focus on the 'educational' aspect of things as opposed to just handing out code free patches.
Please understand that this thread will (hopefully) be useful to future readers coming here with the same problem. If not, then the time 'donated' by respondents is not particularly well-spent and amounts only to "free consulting for a single client" and constitutes only a loss of time with little or no benefit to the community.
But to be clear, "SetEnvIf Request_URI ^xyz$" looks only at the incoming client-requested URL-path, and has no comprehension or care as to whether or not that URL-path will resolve to a physical file or a script. It's looking at the "characters typed into the browser address bar or specified in an on-page HTML link following http://example.com and preceding any '?' or '#' characters" and nothing else. There is no 'scanning of the filesystem' involved here, it's just looking at the characters in the client-requested URL-path.
In fact, if this code were placed in a server config file (which it could be, but isn't) then the URL-to-filename translation wouldn't even be done yet. That would take place only in the next phase of the Apache API. However, that wouldn't change anything about what I previously stated. :)
[added] Please see comments in my previous post about the difference between URLs and filepaths. [/added]
Jim
But to be clear, "SetEnvIf Request_URI ^xyz$" looks only at the incoming client-requested URL-path, and has no comprehension or care as to whether or not that URL-path will resolve to a physical file or a script. It's looking at the "characters typed into the browser address bar or specified in an on-page HTML link following http://example.com and preceding any '?' or '#' characters" and nothing else. There is no 'scanning of the filesystem' involved here, it's just looking at the characters in the client-requested URL-path.
Smacks forehead.
This is what I was missing. SetEnvIf Request_URI only becomes active at all IF the incoming client request matches a resource it is told to respond to and even then it simply presents the request with a response that the request itself thinks it was asking for in the first place. I think I said that right.
I kept thinking it was: look for "this" "there" with a search to follow when actually "this" may be a virtual resource generated from maybe several actual sources.
I understand completely about the type of environment that this forum wishes to maintain. Sometimes it can be tough to twist oneself away from the prevailing situation of the moment. I was becoming frustrated with having to watch these pests pounding on my doors and windows because I couldn't get the gate locked. Of course I do realize the pests will change form quite often, but it's the principle of the thing.
Thanks once again.
Having done that, future exploits can be shut out immediately by adding just one or a few lines of code. You won't have to replace the hinges, the doorknob, and the knocker before slamming the door any more... :)
Jim
(Whose access controls are at least twenty times more complex than anything discussed in this thread.) ;)