Forum Moderators: open

Message Too Old, No Replies

Blocking web site access using Sec-Fetch

         

dstiles

9:35 am on Apr 26, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is anyone here blocking by Sec-Fetch? I'm new to this and wondering if I have the right approach and understanding.

As I understand it, in general I can block access to my sites if a header contains...
Sec-Fetch-Site: cross-site
which implies that the access comes from another server and not from a "human". Is that correct? Too simplistic? Stupid?

I get a fair number of cross-site hits that are currently allowed, despite various X-headers etc. Would Sec-Fetch-Site be a way to stop these or would it also stop access for real visitors? As an example, two hits to the same page from the same broadband IP within 3 or 4 seconds...
Sec-Fetch-Site:cross-site
Sec-Fetch-Mode:navigate
Sec-Fetch-Dest:document

Sec-Fetch-Site:cross-site
Sec-Fetch-Mode:no-cors
Sec-Fetch-Dest:empty
And yet again, two hits eleven seconds apart to two different pages give first cross-site and then same-site...

Sec-Ch-Ua-Platform:"Windows"
Sec-Ch-Ua-Mobile:?0
Sec-Ch-Ua:" Not A;Brand";v="99", "Chromium";v="100", "Microsoft Edge";v="100"
Sec-Fetch-Dest:document
Sec-Fetch-Mode:navigate
Sec-Fetch-Site:cross-site

Sec-Fetch-Dest:document
Sec-Fetch-User:?1
Sec-Fetch-Mode:navigate
Sec-Fetch-Site:same-origin
Sec-Ch-Ua-Platform:"Windows"
Sec-Ch-Ua-Mobile:?0
Sec-Ch-Ua:" Not A;Brand";v="99", "Chromium";v="100", "Microsoft Edge";v="100"
which is from a MacAfee IP so possibly a VPN.

There is another pair of hits I've noticed hitting two pages within 5 secods, from a broadband IP, that suggest I may have got this all wrong.
Sec-Fetch-User:?1
Sec-Fetch-Site:cross-site
Sec-Fetch-Mode:navigate
Sec-Fetch-Dest:document

Sec-Fetch-User:?1
Sec-Fetch-Site:same-origin
Sec-Fetch-Mode:navigate
Sec-Fetch-Dest:document


At present the top "real" bots do not use Sec of any kind. I wonder what effect the above (if implemented) would have on these predators. Presumably they would pretend to be browsers?

lucy24

4:06 pm on Apr 26, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: detour to logged headers ::

No luck. Some of those requests with
Sec-Fetch-Site: cross-site
are clearly human, darn it. Granted, about a quarter of them (of the total, not the clear humans) are blocked, but that’s not a very satisfactory proportion for rule-making purposes.

Speaking for myself, it's the package of headers in
Sec-Ch-Ua-blahblah
that I’m currently trying to make sense of.

Remember when you could look at "Mozilla" at the front of the UA and safely assume it was human?

Remember when you could look for the “Upgrade-Insecure-Requests” header and ditto?

Sigh.

dstiles

10:27 am on Apr 27, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can remember a simple UA with the word "Netscape" in it - and then a completely rubbish browser from some upstart called microsoft whose boss said the internet would never catch on. Happy days. <sigh>

I feel there should be a combination of sec headers that would define a not-very-human visitor. I'm sure some of the cross-site hits are bad and sure that some are probably good, possibly the result of a bad browser interpreter?

This morning I got 7 hits in half an hour from a single UK broadband IP to the home page of a single site. There were a couple of pairs, one at 07:55 the other at 08:31, of...
Sec-Fetch-Site:cross-site
Sec-Fetch-Mode:navigate
Sec-Fetch-Dest:document

Sec-Fetch-Site:cross-site
Sec-Fetch-Mode:no-cors
Sec-Fetch-Dest:empty
then a triplet at 08:32 of...
Sec-Fetch-Site:cross-site
Sec-Fetch-Mode:no-cors
Sec-Fetch-Dest:empty

Sec-Fetch-User:?1
Sec-Fetch-Site:none
Sec-Fetch-Mode:navigate
Sec-Fetch-Dest:document

Sec-Fetch-Site:cross-site
Sec-Fetch-Mode:no-cors
Sec-Fetch-Dest:empty

If even one of those hits was genuine then I should not block the others. I've just checked and there's been 155 of these in the past month. Checking an email from the site's owner, it was sent from the same IP so that defines the hits as genuine, probably from him firing up his browser which is set to that page.

So, what use Sec-Fetch?

User-Agent for all of them was: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0

lucy24

3:43 pm on Apr 27, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, what use Sec-Fetch?
What use indeed, other than to make site administrators tear their hair :)

I too keep suspecting there is some combination of Sec-Fetch headers that is never sent by a human. But that, already, is a problem, since it’s trivial to block on the basis of any one header, but much trickier when you have to look at them in combination.

dstiles

4:20 pm on Apr 27, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've just discovered Sec-Fetch-User. If this is present it's "guaranteed" to be "only sent for requests initiated by user activation, and its value will always be ?1".

The problem I have with that is: in the above set of hits they ALL originate with my customer's browser but only ONE has Sec-Fetch-User. Nor can I understand why so many are cross-site and no-cors.

I suspect the specification has been assembled by non-real-world techies. :(

lucy24

7:13 pm on Apr 27, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



only sent for requests initiated by user activation, and its value will always be ?1
Does there exist any header in the world that cannot equally well be falsified by a robot, whether malign or just stupid?

dstiles

8:32 am on Apr 28, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I got fed up with SEs asking for well-known and ads files yesterday and added to all the robots.txt files...
User-agent: *
Disallow: /apple-app-site-association
Disallow: /.well-known/
Disallow: /ads.txt

That seems to have discouraged them... Except for google-proxy which is still asking for /.well-known/traffic-advice. I've semi-blocked proxy but it's awkward because it shares the bot IP /16, which I permit based on UA, but I'm thinking of shoving the proxy part into iptables.

Which has nothing to do with SEC. I will have to look at those in more detail and see if I can discover a killing combination.

dstiles

6:33 pm on Apr 28, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, that's pretty well stuffed the notion of allowing users based on the Sec-Fetch-User header. The following came from headless chrome:
Sec-Fetch-Dest:document
Sec-Fetch-User:?1
Sec-Fetch-Mode:navigate
Sec-Fetch-Site:none

An "attack" I had this morning used the UA:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4
with no SEC-anything. I suggest a newish browser that contains no SEC could be rejected?

I've been trying to discover if Apache has anything that could be used for rejection based on SEC. Some of the up-market versions seem to have it but not plain vanilla apache, as far as I can see. So looks like home-grown blockers by people like me with no real knowledge of the finer points. :(

dstiles

6:34 pm on Apr 28, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, that's pretty well stuffed the notion of allowing users based on the Sec-Fetch-User header. The following came from headless chrome:
Sec-Fetch-Dest:document
Sec-Fetch-User:?1
Sec-Fetch-Mode:navigate
Sec-Fetch-Site:none

An "attack" I had this morning used the UA:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4
with no SEC-anything. I suggest a newish browser that contains no SEC could be rejected?

I've been trying to discover if Apache has anything that could be used for rejection based on SEC. Some of the up-market versions seem to have it but not plain vanilla apache, as far as I can see. So looks like home-grown blockers by people like me with no real knowledge of the finer points. :(

lucy24

4:24 pm on Apr 29, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've been trying to discover if Apache has anything that could be used for rejection based on SEC.
mod_rewrite can read any header if you use the syntax
:: shuffling papers ::
%{HTTP:exact-header-name}
and analogously
%{ENV:environmental-variable-name}

You may, of course, decide that this is more trouble than it's worth. But that's how you would look at a package of header fields and/or environmental variables, either to see if they exist at all (set the pattern to . alone) or with some specific value.

dstiles

8:48 am on Apr 30, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can block/enable anything reasonable but I was hoping apache would have something that parsed the headers and came up with something useful like "run away from this evil bot". As it stands it seems difficult to determine what is good and what is not. Cross-site - bad; but genuine visitors' hits are sometimes cross-site so how does that work? Someone must know what/why and can explain it in English rather then Technical, with examples.

lucy24

3:29 pm on Apr 30, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Have you studied mod_security? It only runs in config, so I don't know much about its inner workings--which is just as well, since there are literally entire books [feistyduck.com] covering the details.

But, yeah, it's one thing to learn How To Block. It's another thing--a constantly changing thing, at that--to know What To Block. Especially when any header can be faked to make your nasty scheming robot look like an honest law-abiding human. I imagine there are entire sectors of black-hat forums dedicated to nothing else.

dstiles

9:24 am on May 1, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I used mod_security at one time but found it to much to cope with.

I'll continue studying the SEC headers and see what I can devise.

jay5r

3:19 pm on May 2, 2022 (gmt 0)

10+ Year Member Top Contributors Of The Month



Why was Sec-Fetch-* developed? What problem was it trying to solve?

To me it feels like an anti-hotlinking tool, not a tool to block bad bots. Many anti-hotlinking methods sorta failed after the introduction of the referrer meta tag which could completely block the referrer data for sites that hotlink images. This restores that capacity (at least to the imperfect level it was at before the meta referrer tag). As an anti-hotlinking tool it's "good enough" - you don't need perfection for that - every little bit helps bring down server load and bandwidth charges.

dstiles

8:33 am on May 3, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's a view I hadn't thought of, jay5r. It's worht looking into. Thanks.

lucy24

3:08 pm on May 3, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't understand how this applies to page requests.