Forum Moderators: phranque

Message Too Old, No Replies

How I can block hosts in .htaccess

e . g. bc.googleusercontent.com

         

klaus100

10:31 am on Jul 14, 2022 (gmt 0)

Top Contributors Of The Month



I want to block hosts, because there too many IP-adesses available, therefore it does not make any sense to block single Ip-adresses or a range of IP-adresses.
I tried the following:
<IfModule !mod_authz_core.c>
Order allow,deny
Allow from all
Deny from *.bc.googleusercontent.com
Deny from microsoft.com
</IfModule>
<IfModule mod_authz_core.c>
<RequireAll>
Require all granted
Require not host microsoft.com
Require not host *.bc.googleusercontent.com
</RequireAll>
</IfModule>

*.bc.googleusercontent.com does not work. And concerning microsoft.com I am not shure.
Many thanks in advance for the answers.

Brett_Tabke

12:02 pm on Jul 14, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Welcome to the forum.
There is a really good section here that may be of help, on allow/deny:

[inmotionhosting.com...]

My question would be, why do you care if Mod_Authz is enabled? Is this on a pw protected directory? If so, then put the .htaccess in that directory?

I use this here:
deny from googleusercontent.com amazonaws.com
deny from 68.183.245.101 62.149.225.67


And seems to work ok. You have to have HostnameLookups on in your Server (Apache standard syntax) config.

klaus100

4:07 pm on Jul 14, 2022 (gmt 0)

Top Contributors Of The Month



Thank your for your answer.I am not an expert for .htaccess.
A part of the text of the link is:
Important! THIS IS NOT RECOMMENDED. If you use a host name in a Deny rule in the .htaccess, Apache will convert your Apache log into host names instead of IP addresses. This will remove your ability to see the logs with IP addresses. You will want to use the IP address instead of host name; unless, you want to check your site access by host name alone.
Can I neglexct this text?
No password protected directory.

How i can recognize that Mod_Authz is enabled?

lucy24

5:35 pm on Jul 14, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Heed this:
If you use a host name in a Deny rule in the .htaccess, Apache will convert your Apache log into host names instead of IP addresses.

In fact this is one aspect of a general rule: If your mod_auth-whatsit rules contain anything that doesn't translate to an IP block--I've done it by accidentally leaving a comma in the middle of a list of IPs--it throws the server into lookup mode, and you will have the devil of a time reading your logs. (It also puts the server to more work, since it has to do all those lookups.)

I am also concerned by the <IfModule> structure of the original post. Surely you know what Apache version your server is running? If it is 2.4 or later, use the Require syntax; if not, use Allow/Deny. Apache 2.4 comes with mod_compat, which allows the old Allow/Deny rules to continue working. But that's just a placeholder so everyone's sites won't break before you get your htaccess updated. If your host refuses to divulge even this basic information*, there is a simple test that will tell you what mods you are running. But it shouldn't be necessasry.

But, to answer the original question: It really doesn't save you any work to go by hostname instead of IP. It is not that difficult to look up the various major entities' IP ranges, and use them numerically. (Besides, why on earth would you want to block, for example, everything microsoft? Have you got something against the bingbot?)


* Mine sent out multiple announcements about one of their periodic php upgrades, but wholly neglected to mention that this time they were also moving to Apache 2.4. This annoyed me very much.

klaus100

6:39 pm on Jul 14, 2022 (gmt 0)

Top Contributors Of The Month



Thank your for your answer. I am not an expert for .htaccess.
Why microsoft .com? Because I cannot insert screenshots here, I refer to [abuseipdb.com...]
Two examples for IPs with microsoft.com
20.68.116.210 - Confidence of Abuse is 99%:
20.10.22.160 Confidence of Abuse is 100%:
In both cases Domain Name = microsoft.com
How I can block them instead of using single IPs or IP-ranges?
In the meantime is clear, that my microsoft.com settings in .htaccess are not working.

klaus100

6:59 pm on Jul 14, 2022 (gmt 0)

Top Contributors Of The Month



Concerning your proposal of IP ranges.
Can I work with the following ranges?
34.0/24
35.0/24

Brett_Tabke

7:08 pm on Jul 14, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



What version of Apache, Litspeed, or Nginx?

It looks like you are using apache 2.4 or up by the syntax you started with?

not2easy

7:36 pm on Jul 14, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



@klaus100 - for that 20.68.116.210 you could block 20.64.0.0/10
for the 20.10.22.160 you could use 20.0.0.0/11

I do hope you have a custom 403 page to handle accidental blocks though. While those ranges are MSFT Azure Cloud, it is not impossible that some numbers in there could be used for mobile networks and wi-fi.

lucy24

8:24 pm on Jul 14, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't think 20 is used by humans at all. I have the entire /8 set to bad_range. (This lets me poke holes for DuckDuckGo's favicons-bot. The DuckDuckBot itself uses a few specific IPs in the same range, but if they cannot be bothered to honor robots.txt I see no reason to let them in.)

Within the present calendar year, the only results for my standard test
^20\..+?\.css HTTP
(because robots other than search engines don't generally request stylesheets) are a handful of /errorstyles.css which means exactly what it sounds like: the original page request was blocked on other grounds, generally deficient headers. And that's a literal handful--I think less than a dozen total--as against tens of thousands of other requests from 20, including for things like wp-admin which gives you some idea what kind of neighborhood it is.

klaus100

8:05 am on Jul 15, 2022 (gmt 0)

Top Contributors Of The Month



The used apache version is > 2.3
This means
that all "deny from" entries are irrelevant, no impact?
that I could delete all deny from" entries in the .htaccess?

klaus100

8:06 am on Jul 15, 2022 (gmt 0)

Top Contributors Of The Month



Yes there is a custom 403 page.

not2easy

12:31 pm on Jul 15, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I don't think that Apache 2.3 uses that "Require not" syntax you have there. I seem to recall that syntax beginning with 2.4 forward though 2.4 can also digest the older syntax.

Edited to add, I finally found one of the more useful explanations: [webmasterworld.com...]
Thank you lucy24

lucy24

2:50 pm on Jul 15, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



“> 2.3” seems a pretty silly way of saying >= [darn those ISO-1252 limitations!] 2.4, since if I remember correctly, odd-numbered versions after 2.0 weren't publicly released at all.

klaus100

6:50 pm on Jul 15, 2022 (gmt 0)

Top Contributors Of The Month



In .htacess I found
# Apache < 2.3
above the from deny section.

Therefore I used > 2.3, althoug it is > 2.4

klaus100

7:42 am on Jul 16, 2022 (gmt 0)

Top Contributors Of The Month



@lucy24
"I have the entire /8 set to bad_range"

Please can you give me your code/syntax for blocking 20?
Many thanks in advance!

not2easy

12:42 pm on Jul 16, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



An entire range is n.0.0.0/8 in the same syntax you use to block any other CIDR.

klaus100

2:03 pm on Jul 16, 2022 (gmt 0)

Top Contributors Of The Month



So it is?
required not 20.0.0.0/8
Sorry, I am not an .htacess expert.

not2easy

2:41 pm on Jul 16, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yes, that blocks everything in 8.0.0.0 - 8.255.255.255

Don't worry klaus100, there are very few .htaccess experts and I'm not one either. Everything I know about .htacces I learned here reading through the threads. And a little of visiting the Apache site to read up. I find the Apache site to be often confusing and hard to find what you want to know.

lucy24

5:14 pm on Jul 16, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"I have the entire /8 set to bad_range"

Please can you give me your code/syntax for blocking 20?
I use mod_setenvif in conjunction with mod_auth-I-forget. That is, first the environmental variables are set, and then those are used for access control. In the specific case of IP ranges, it looks like this.

setenvif part:
SetEnvIf Remote_Addr ^20\. bad_range=$0
[as one item in a list of large undesirable ranges, such as most of 52 and 54 and The Usual Suspects]

BrowserMatch DuckDuckGo-Favicons !bad_range
[as one item in a list that turns off selected environmental variables for authorized robots]

access part (inside RequireNone envelope)
Require env=bad_range

In cases where I don't need to poke holes--i.e. the given IP is not used by any authorized visitor--you can proceed directly to

Require ip 20


Edit: This is identical to
Require ip 20.0.0.0/8

For any /8 or /16 or /24 block you can, if you choose, simply leave off the zero-and-slash part.

klaus100

9:07 am on Jul 17, 2022 (gmt 0)

Top Contributors Of The Month



Many thanks!
Because I do not understand
mod_auth-I-forget
SetEnvIf Remote_Addr ^20\. bad_range=$0 (Where to insert in -.htaccess?)
Require env=bad_range (Where to insert in .htaccess?)

I test the following and will report the results.

<IfModule mod_authz_core.c>
<RequireAll>
Require all granted
Require not ip 20.0.0.0/8
</RequireAll>
</IfModule>

As I mentioned, I am not an .htaccess expert.

lucy24

4:14 pm on Jul 17, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If it works, you can get rid of the <IfModule> envelope. (Not its contents! Just the two lines of the envelope itself.) There exist situations where <IfModule> might be appropriate. But in general, find out what you've got, and write the rules accordingly.

The two forms
Require not ip 20
and
Require not ip 20.0.0.0/8
are identical in meaning. The first version saves eight bytes, but you would need to have an absolutely vast htaccess for this to make any perceptible difference. Use whichever one you are comfortable with.

klaus100

7:02 am on Jul 22, 2022 (gmt 0)

Top Contributors Of The Month



It does not work.
If one gives me the whole syntax/code
I will try it with
mod_auth-I-forget
SetEnvIf Remote_Addr ^20\. bad_range=$0

lucy24

4:33 pm on Jul 22, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good grief, if you’re going to take it literally ....

:: detour to Apache docs ::

No wonder I can never remember the name. There are three mods called mod_auth_something, six called mod_authn_something, two called mod_authnz_something (for use, presumably, in New Zealand), and finally seven called mod_authz_something, for a total of eighteen mod_auth-thingummy.

The Require directives belong to mod_authz_core, which indeed is what your IfModule envelope says.

Are you saying that none of your rulesets involving Require work? What about rulesets using Allow/Deny? Is this a brand-new site, or are there existing access-control rules that have worked in the past? In a site using Apache 2.4 it's especially important to make sure you don't have both syntaxes (Allow/Deny and Require) along the same filepath--not because it will break the server but because the execution order means that Allow/Deny is liable to override Require.

If “Require ip” or “Require not ip” doesn’t work, then neither will “Require env”, since they are the same module.

klaus100

8:46 am on Jul 24, 2022 (gmt 0)

Top Contributors Of The Month



Regular require not for single IP works and rangeof IPs like e.g.requiire not 5.188.210.0/24 works as well
Okay. seemingly there is no workaround for block hosts in .htaccess .But I will wait 2 days.
May be I will create then "How I can block hosts in .htaccess (2)", because this thread is overloaded. No reproach, because it was okay to try to find a workaround.

lucy24

4:47 pm on Jul 24, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Regular require not for single IP works and rangeof IPs like e.g.requiire not 5.188.210.0/24 works as well
OK, now I'm confused. What is it that doesn't work?

Brett_Tabke

2:01 pm on Jul 25, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



> What is it that doesn't work?

This does not work:

RewriteCond %{REQUEST_FILENAME} ^robots\.txt$
RewriteRule ^(.*)$ - [END]

If you have the ip denied elsewhere in the htaccess file.

The idea is to allow bots and ips to have robots.txt, but ban them from all other files.

RewriteCond %{REQUEST_FILENAME} ^robots\.txt$
RewriteRule ^(.*)$ - [END]

deny from 123.123.123.123..etc
RewriteCond %{HTTP_USER_AGENT} ^.*(DataForSeoBot|AhrefsBot|wp_is_mobile|AppleBot|meerkatseo|LWP|AppleNewsBot|yacy|infotiger|amazonbot|YisouSpider).*$ [NC]
RewriteRule .* - [F,L]

[edited by: Brett_Tabke at 2:06 pm (utc) on Jul 25, 2022]

not2easy

2:25 pm on Jul 25, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You would need to allow denied traffic to your 403 document as well as robots.txt or they can cause server errors trying to process the 403. It is a good idea to create a custom 403 page for accidental blocks so accidentally blocked humans can attempt to request entry.

There are multiple methods of blocking IPs, but the final '.' does not help anything, it is not part of the IP address and can cause errors. OK, I see I was cross posting while you cleared that up. (nevermind)

If you set an environment for deny, then it is simple to poke a hole for robots.txt and your 403 error page. And don't forget to specify the error document:
ErrorDocument 403 /403.php
(or whatever the name of the 403 file is)

You don't need that ^.* before and .*$ after the parentheses but you do need to \. escape all '.' entries contained in |entries| or you can cause errors.

lucy24

3:46 pm on Jul 25, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you have the ip denied elsewhere in the htaccess file.
As noted elsewhere, each module is an island. Actions taken in mod_authz_core * cannot be overriden with mod_rewrite, and vice versa.

This construction
RewriteCond %{REQUEST_FILENAME} ^robots\.txt$
RewriteRule ^(.*)$ - [END]
just makes extra work for the server. Anything that can go in the body of a RewriteRule, should go in the body of a RewriteRule. Here it would be simply
RewriteRule ^robots.txt - [END]

Edit: OK, no wonder it's starting to sound like an echo chamber. There is massive overlap between this thread and the neighboring
[webmasterworld.com...]
(and Brett, why are you wearing a Klaus mask?)


* Struggling heroically to keep this name in the active forebrain.

klaus100

8:50 am on Jul 28, 2022 (gmt 0)

Top Contributors Of The Month



Because there is no solution visible, I will try to find an other information source.
I say thank you and good bye.