Forum Moderators: open

Message Too Old, No Replies

Using Sec- to block scrapers

Sec-Fetch and Sec-Ch-Ua

         

dstiles

3:32 pm on Nov 17, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



After three major (to me) scrapes using current chrome user-agents I'm looking at blocking using the Sec- type headers, which these scrapes lacked. But which ones? Not all are present on some hits (eg Sec-Ch-Ua) and some appear to be ambiguous.

Typical hits with Sec- headers:
Sec-Fetch-Dest:document
Sec-Fetch-User:?1
Sec-Fetch-Mode:navigate
Sec-Fetch-Site:same-origin
Sec-Ch-Ua-Platform:"Android"
Sec-Ch-Ua-Mobile:?1
Sec-Ch-Ua:"Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"

Sec-Fetch-Dest:document
Sec-Fetch-User:?1
Sec-Fetch-Mode:navigate
Sec-Fetch-Site:cross-site
Sec-Ch-Ua-Platform:"macOS"
Sec-Ch-Ua-Mobile:?0
Sec-Ch-Ua:"Chromium";v="118", "Google Chrome";v="118", "Not=A?Brand";v="99"

Sec-Fetch-Mode:navigate
Sec-Fetch-Dest:document
Sec-Fetch-Site:cross-site

Sec-Ch-Ua-Platform:"macOS"
Sec-Gpc:1
Sec-Ch-Ua-Mobile:?0
Sec-Ch-Ua:".Not/A)Brand";v="99", "Google Chrome";v="114", "Chromium";v="114"

Sec-Fetch-User:?1
Sec-Fetch-Site:none
Sec-Fetch-Mode:navigate
Sec-Fetch-Dest:document

and so on.

I am looking at [developer.mozilla.org...] to work out options.

I think I'll begin by blocking all non-bots that do not have any Sec- headers (probably by checking for Sec-Fetch-User:?1), followed by a modern UA that does NOT have a Sec-Ch-Ua.

Probably also block Sec-Fetch-Site:cross-site (allowing for local policy that permits certain non-site content).

Does anyone have comments / experience on any of this?

SumGuy

3:22 am on Nov 18, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



Umm - maybe something useful here?

[webmasterworld.com...]

dstiles

9:39 am on Nov 18, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes... I forgpt I'd writen that - excuse: anno domini. :)

Browsers have come on a bit since then and all? top ones are now SEC compatible. I'll revisit the earlier thread, anyway. Anyone else?

dstiles

4:24 pm on Nov 18, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have begun blocking Sec-Fetch-User:?1 and Sec-Fetch-Site:cross-site. Still (possibly) a few tech points to iron out.

Problem: Apple's claim for safari's security and privacy is true for neither. All other curent browses (going back a reasoanble time) supply the meta headers but safari does not. Up to iphone OS 16.5 it has no meta headers at all and I've had to allow an exception for them. OS 16.6 has three of the headers but not Sec-Fetch-User nor the Sec-Ch-Ua set.

lucy24

4:31 pm on Nov 18, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have begun blocking Sec-Fetch-User:?1 and Sec-Fetch-Site:cross-site
Blocking or whitelisting? I'm confused.

dstiles

11:50 am on Nov 19, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry. blocking if no User and not Safari, blocking cross-site.

A further problem I'm programming around is that quite a few browsers with Safari in the UA also have Chrome (which the browser actaully seems to be according to Sec's UA (Sec-Ch-Ua:"Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24") and I've added Firefox for luck.

Still refining the multi-browser bit.

dstiles

3:38 pm on Nov 19, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There seems to be a flaw in Sec-Fetch-Site:cross-site, possibly in the browser implementation?

I have set up to block on Sec-Fetch-Site:cross-site, as I said. I became suspicious when site links origination from G got blocked as cross-site. Technically, of course, they are but in this circumstance no sensible browsing can ocur.

Using Firefox I loaded one of my sites directly - ok. The site has a link in the menu to a sister site. I clicked the link and was refused acess until I removed the cross-site test.

The Sec parameters were:
Sec-Gpc:1
Sec-Fetch-User:?1
Sec-Fetch-Site:cross-site
Sec-Fetch-Mode:navigate
Sec-Fetch-Dest:document
(note the extra (experimental) Sec-Gpc - an advert blocker)

My apache coding so far is:
<if " ! (BOT)"> // lots of test for bots
# check for metadata (sec- etc)
<if " %{HTTP:Sec-Fetch-Mode} =~ m#navigate# " > // necessary test to permit non-page items (pics etc)
SetEnvIf ^Sec-Fetch-User$ ^$ metau=no-user
</if>
<if " ( %{HTTP_USER_AGENT} =~ m#Safari# ) && ! ((%{HTTP_USER_AGENT} =~ m#Chrome# ) || (%{HTTP_USER_AGENT} =~ m#Firefox#))">
setenv !metau // remove env if only safari
</if>
SetEnvIf ^Sec-Fetch-Site$ ^cross-site$ metac=cross-site // currently disabled as per above text
</if>

I accept I could improve the browser/no-user test

My contention is: if the request originates from a web site link (including SEs) there should be another flag to say so. Yes? Otherwise the cross-site test becomes useless. Or am I missing something?

lucy24

7:34 pm on Nov 19, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The catch with cross-site is that it depends on filetype requested. For images, it could be a hotlink, and then you have to poke holes for legitimate search engines (if and only if your images are indexed, of course). For pages, it means they've been linked from somewhere else, and generally you do want those.

For human browsers it's trivial to send a fake referer, or none at all. So far I don't think it's easy to modify the Sec-Fetch headers.

I took a quick look at logged headers. Some robots send a fake referer (mostly either http://www.example.com when site is correctly https://example.com, or google when requesting nonexistent files) with no Sec-Fetch header. But most of those would be blocked/404'd anyway.

dstiles

2:28 pm on Nov 20, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> depends on filetype requested

This seems to be covered by testing...
%{HTTP:Sec-Fetch-Mode} =~ m#navigate#
You can probably then neglect document type unless for other situations. I have not looked into image hotlinks.

> trivial to send a fake referer

It's already occurred to me that curl and wget (for example) will soon be upated fo this and no doubt evil bots as well. At the moment, though, all bots sem to not send Sec headers at all.

But the problem is still with cross-site, which is pretty much useless as far as I can tell. To make it effective as it stands would require a list of referers (google, bing, my other site, my bookmarks etc) in order to test against permitted cross-sites. That assumes the browser sends the referer in the first place, which is no longer certain. Either this has not been thought through or it's not working as planned - or I still have missed something. :(

And, of course, there is still the problem of non-compliant Safari, which is used on most mobiles and macs, though srome are actually chrome with Safari added to the UA - probably for Apple compliance.

I have modified the no-user case above to...
<if " (( %{HTTP_USER_AGENT} =~ m#Chrome#) || (%{HTTP_USER_AGENT} =~ m#Firefox#)) && ( %{HTTP:Sec-Fetch-Mode} =~ m#navigate# ) && ! ( %{HTTP:Sec-Fetch-User} =~ m#^$# )" >
SetEnv meta=no-user
</if>

On reflection I can combine chrome and firefox in the same regex.

lucy24

5:04 pm on Nov 20, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



some are actually chrome with Safari added to the UA
I think all webkit-based browsers include “Safari” in the UA string. Opera tops this by having both “Safari” and “Chrome”--in addition to, not instead of, “OPR”. So you get a sequence like

BrowserMatch Safari safari
BrowserMatch Chrome !safari chrome
BrowserMatch OPR !chrome opera

dstiles

2:56 pm on Nov 21, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not so here.
Typical safari-only iphone:
Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1

Safari-only Mac:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Safari/605.1.15

Chrome/Safari Android (this had fullish Sec- but no Sec-Fecth-User):
User-Agent:Mozilla/5.0 (Linux; Android 9; KFTRWI) AppleWebKit/537.36 (KHTML, like Gecko) Silk/118.1.77 like Chrome/118.0.5993.111 Safari/537.36

This is unusual -Crome-OS (Sec-Ch-Ua-Platform:"Chrome OS") (no Sec-Fecth-User)
Mozilla/5.0 (X11; CrOS x86_64 14541.0.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36

Windows chrome/safari (no Sec-Fecth-User):
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36

My linux Chromium:
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
(which is quoting an obsolete Safari version (as do most of the chrome ones.))

Firefox is, of course, not applewebkit:
Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0

blend27

12:44 pm on Nov 23, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We don't do mod_security and host our sites almost exclusively on latest IIS/ColdFusion Servers, but anyway..

So, I have spoken on this forum about these SEC-FETCH- headers before....

The information about these headers is available here(not my writing but I learned tons from the article): [web.dev...] .

SEC-FETCH- Browser compatibility here: [developer.mozilla.org...] (choose one from the left menu)

@dstiles

Do you have a site with a dissent amount of traffic that you can test your logic?

IF Site Attracts multiverse of browsers, real and not.....

Log all headers including:

Request to a page
Request to almost empty CSS File (this could use a rewrite to capture headers and May give you an indication that this user requests CSS files)
Request to almost empty JS file (this could use a rewrite to capture headers and MAY give you a Browser resolution if you code for it)
Request to almost empty JSON file (this could use a rewrite to capture headers, JS Works! see whats in the headers, make it a POST request triggered by mouse-move event over any link)
Request to a Small Image File, pixel size (this could use a rewrite to capture headers, no one wants you to see your pixel thingy, but if requeeesteeed,....)

Send proper Response Headers and content for above.

Now split that into A/B test.

A test - will contain collection of all headers passed by Browsers that officially support SEC-Fetch according to a second link I provided above, look at a browser number .
B test - will contain collection of all headers passed by Browsers that officially DO NOT support SEC-Fetch according to a second link I provided above.

Then Split between Mobile/Desktop/Browser

For example:

Desktop Browsers:
Chrome 75 and below will go in B Test set(no sec-fetch headers should be present, and if they ARE it is a bot)
Chrome 76 and above will go in A test set(proper sec-fetch headers should be present, and if they ARE NOT its a bot)

Firefox 90 and below will go in B Test set(same as above)
Firefox 90 and above will go in A test set(same as above)

Edge 89 and below will go in B Test set(same as above)
Edge 89 and above will go in A test set(same as above)

IE 11 - not sure if there is much traffic for it, but hey, learning is learning
IE 10 and below, in my book there is a blank page for it for a reason, so is on our sites, it is the end of 2023

Opera.....

Safari is a big one, but none the less same as ABOVE

etc.. etc..

Same for Mobile/Tablet.... based on compatibility at developer.mozilla.org website...

Now when longing headers write them into a separate files based on (do a db-thingy if wanted, I do)

DATE
Browser Type(Mobile/tablet/phone)
Old/New based on Browser Version
Actual headers Passed(if you do a db-thingy, one to many or many-to-many setup will do), I personally write a file in FileSystem as JSon, then parse it into DB.

This will give you a BASE understanding of what is what.

NEXT:

Benefiting/Learning from actual scrapers....

Setup a subdomain on you site. Include a HIDDEN "SOMEWHAT" LINK to that on your main domain. Include one page with content related to your website. If your site is about Christmas, subdomain could be one page about Reindeer Names and wood used to build slays.

Point of Subdomain: Scraper comes along to the mainsite, picks up a link...., there are you regular scraper headers logged if they follow a link...

NEXT:

Hosting IP Ranges... Lots of of scrapes come from those. Same setup for content served(4 files mentioned above) as on main site bur a friendly one page, just to log headers(just SEC-FETCH- if one desires )..
--------------------------------------------------
Mix and Match.

Bonus:

Perfect Scenario would be to log headers on a corporate Wi-FI network, no Scrapers there right? or a free Wi-Fi access point(that does not really work) Advertising its SID as Free WiFi with a Welcome page(to log headers, Nothing else ;) )

-----------------------------------------------------------------------------------
The whole point of this is to learn and protect, but AI is coming... :)

dstiles

11:05 am on Nov 25, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> information about these headers is available here

Yes, I've seen that. Not a lot of use in general terms because the Sec-Fetch-Site=cross-site is not practical, as noted previously. It's only usable if a list of valid sources can be cross-referenced. I'm looking at a single instance where it may be useful: if cross-site is registered on a hit to a webform page then that is suspicious and could mean a form-spammer. Or simply someone has posted a link on social media, which I've seen happen, so it's necessary to redirect to a "who are you / what are you doing page. :(

> SEC-FETCH- Browser compatibility

Ok for chrome or firefox but absolutely useless for safari (unless it's chrome in disguise) so I check for Sec-Fetch-User on chrome and firefox but not on Safari unless in disguise. A small number of Safari provide Sec-Fetch-Site, Sec-Fetch-Mode and Sec-Fetch-Dest but not User, which is the only term I've so far managed to implement. Those three terms may be useful in determining valid requests for resources but not as a general blocking mechanism. In general I block browsers older than a specific version anyway.

My sites log everything in the way of headers and ENV values FOR PAGES (not resourrces) and I block on bad combinations of parameters. I have very few instances of JS across several sites and no JSON. Usually any request for such ends in a block. I check for posts to non-form pages and reject accordingly.

After a couple of days blocking on Sec-Fetch-User I find a small number of hits blocked on that parameter alone; not sure, so far, if it's been worth the coding effort.

blend27

10:41 pm on Nov 28, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@dstiles

The starting point should be that IF a version of browser listed here [developer.mozilla.org...] is not providing SEC-FETCH- headers it is a BOT.

Your first 2 UA's you provided CAN NOT HAVE those headers by design, if they do it is a BOT.

And the other way around, it is a BOT.

I will post some regex we use to flag those a bit later, but as it stands at the point that SEC-FETCH is a Sign from(u guest it).

blend27

10:56 pm on Nov 28, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Proper iPhones and Pads only provide 3-4 SEC-FETCH headers max as far as I know..

iPhone 16_ 16.6(16.2 was a starting point)

Document -
"headers": {
"user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Mobile/15E148 Safari/604.1",
"referer": "https://www.google.com/",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "cross-site",
"Sec-Fetch-Dest": "document"
}
Image-now , that is the image pulled from the page request before
"headers": {
"user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Mobile/15E148 Safari/604.1",
"referer": "https://www.example.com/",
"Sec-Fetch-Mode": "no-cors",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-Dest": "image",
}


No SEC - User Header, at all.

Proper iPhones and Pads-sh seem on a low side providing SEC-headers when compared to Chrome, latest Forefox..etc..

blend27

11:29 pm on Nov 28, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Safari on iPhone/Pad is not what u need to look for, it is Version/16.2 and above Mobile(Phone) VS 16.1 or less, then get the SEC-headers, if less than 16.2 - no sec should be there, above 16.1.

3-4 max SEC is you safe bet., mind bugling...

Of-cause a Cathedral inspection of IP-Ranges and such must preside prior to the song sung!

dstiles

2:27 pm on Nov 29, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks but I have User prety well sorted for chrome, firefox and their derivatives.

I note, by the way, that while Safari 604 SOMETIMES gives 3 Sec headers, 605 gives none (so far); both are for iPhone AppleWebKit/605.1.

blend27

5:27 pm on Nov 29, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



re: 605 gives none....

Both are AppleWebKit/605.1.15, and both are Safari/604.1 - thousands hits for both....

This one should have them

Mozilla/5.0 (iPhone; CPU iPhone OS 16_7_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Mobile/15E148 Safari/604.1

This one should NOT

Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1

Again Version/16.2 and Greater will have SEC-FETCH.

I think Safari/605 is for MACs only for now(might be wrong though)

dstiles

2:01 pm on Nov 30, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> both are Safari/604.1

I was referring to ...
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Safari/605.1.15
... Safari/605.1 which does not have Sec headers.

I am trialling a modification to combat an apparent error in Android chrome:
Sec-Fetch-Dest:document
Sec-Fetch-Mode:navigate
Sec-Fetch-Site:cross-site
Sec-Ch-Ua-Platform:"Android"
Sec-Ch-Ua-Mobile:?1
Sec-Ch-Ua:"Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"

This does not have Sec-Fetch-User but does have Sec-Ch-Ua-Mobile:?1 - ie it is a mobile. It appears this may be a valid condition, although incorrect (or I have mis-read the documentation). I have modified the test to ...
<if " ( (%{HTTP_USER_AGENT} =~ m#Chrome|Firefox# ) && ( %{HTTP:Sec-Fetch-Mode} =~ m#^$|navigate# ) ) && ! ( %{HTTP_USER_AGENT} =~ m#${GoodBotSet}#) ">
SetEnvIf ^Sec-Fetch-User$ ^$ meta=no-user
SetEnvIf ^Sec-Ch-Ua-Mobile$ ^\?1$ !meta
</if>
... where ${GoodBotSet} is a defined variable containing a list of real bots.

blend27

2:09 pm on Dec 2, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



re: m#Chrome|Firefox

...and then there is Firefox for iOS...

Firefox on Iphone
Mozilla/5.0 (iPhone; CPU iPhone OS 14_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) FxiOS/120.0 Mobile/15E148 Safari/605.1.15

Firefox on Ipad
Mozilla/5.0 (iPad; CPU OS 14_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) FxiOS/120.0 Mobile/15E148 Safari/605.1.15

Firefox on Ipod
Mozilla/5.0 (iPod touch; CPU iPhone OS 14_1 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) FxiOS/120.0 Mobile/15E148 Safari/605.1.15

dstiles

10:03 am on Dec 3, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Interesting. I haven't seen Fx so far. Thanks.

Although, without that check it would get accepted and I doubt there are that many iPhone fakers... Are there?

dstiles

9:59 am on Dec 8, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just found another use for Sec.

I try to block headless browsers. After noticing one which got through by claiming to be not headless I grep'd the logs and discovered loads of them. The UA says chrome but given away by
Sec-Ch-Ua:"HeadlessChrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"

They all seem to have been blocked for other reasons so far...

blend27

9:43 pm on Dec 10, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...They...

-- v="24"

...that one stands 24 hour surveillance. she said it not me...

v="119" is most likely One plus One plus Nine, it is LIKE (11) thing all over again... , but who knows, maybe a Sunday night thingy. before Sunday...

, naaaah, just kidding ;)

DS Tiles, As it goes from a while back, Holmes would think anything HEADLESS should not be on a Horse..

dstiles

9:45 am on Jan 15, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




System: The following message was spliced on to this thread from: https://www.webmasterworld.com/search_engine_spiders/5100316.htm [webmasterworld.com] by engine - 10:00 am on Jan 15, 2024 (utc 0)


(unable to add to original posting)
Just discovered one more thing about Sec (https://www.webmasterworld.com/search_engine_spiders/5097442.htm)

The firefox family issues (apparently) correct Sec unless one presses Ctrl-R or Ctrl-Shift-R (forced reload). There is then no Sec-Fetch-User. So, this is useless for firefox and its derivatives.

blend27

5:48 pm on Jan 15, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



-forced reload-

Very Nice catch! Same goes if one just hits F5 on windows machine, no Sec-Fetch-User header. I would NOT label it as useless at all. Mobile users have a setting on Android Phones, I believe it is worded "Pull Down to Refresh"?

In my case if the user hits "refresh", user would already have(must have) session pair of cookie set by our website, so not a big deal.

We use combination/pairs of sec-fetch-dest & sec-fetch-mode on top of that.

---------------------------------------------------------------------------------------------
BTW, we finished our A/B traffic tests with Older browsers on one of our client sites. Took about 3 month to do(test test test), did it over the hot holiday season too.

First: all older browsers(no sec fetch) were presented with "Please upgrade your browser". "People" ether switched to more modern browser of different flavor(IP, then filled out CAPTCHA) or left and later came back to visit website with upgraded modern one(IP + server side cookie).

Human Traffic dipped 0.012% - good sign for this site. We later compared requests of those that did NOT comeback - almost all were ether Proxies/Tor or Unknown Hosting ranges(recorded).

Recently pulled the plug on Server Level(URL Rewrite Module for IIS) to drop all requests for browsers that do not provide Fetch metadata request headers (https://developer.mozilla.org/en-US/docs/Glossary/Fetch_metadata_request_header). Older Browsers no more - for this particular site details were discussed in detail where client agreed to No More IE or other Old Junk, so not afraid....

Dropped All requests - based on Browser compatibility (version) , take [developer.mozilla.org...] for example. If the version of one's browser can not provide those headers by design, no Soup for U & no more garbage logs for us to see(!) . Dataset of Browser version usage was based on 1 year of IIS Log files where UAs have been extracted from.

If the "user" is accessing the site via Modern UA - several rules were implemented, UA Sec-Fetch fingerprinting methods based on Browser Make.

So that basic table is:

Desktop Versions:

Chrome: 76
Edge: 79
Firefox: 90
Opera: 63
Safari: 16.4

Mobile Versions:

Chrome Android: 76
Firefox Android: 90
Opera Android: 54
Safari on iOS 16.4
Samsung Internet: 12
WebView Android" 76

....anything below that 0 bite response.

dstiles

3:52 pm on Jan 16, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



blend27, a good test! Must have taken ages. Far more in-depth than I even have time for. :(

In my case if the user hits "refresh", user would already have(must have) session pair of cookie set by our website, so not a big deal.

I can't guarantee that. In my own case I sometimes come back to a browser tab several days after original use and hit Ctrl-Shift-R; cookies may (and usually have) expired by then.

We use combination/pairs of sec-fetch-dest & sec-fetch-mode on top of that.

Is that always guaranteed to be a browser? (Apart from deliberate application to a mal bot.)

blend27

8:06 pm on Jan 16, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



-- I sometimes come back to a browser tab several days after original use --

Your headers "interrogation routine" should star from Step 1 dis-entanglements, just like with the Bank App. WebmasterWorld uses a different routine, at least to previously logged in members from the same IP.

Cookies do expire, go bad. It would be a hassle at first. Use Local Storage Cookie maybe?

blend27

3:53 pm on Apr 21, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Something interesting to add here...

Was trying to verify Google Business Profile the other day for one of the clients...

Exact UA coming from Google:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; Mozilla/5.0, Google-AdWords-Express) Chrome/122.0.6261.94 Safari/537.36


The request is made by somewhat modern Chrome UA but did not contain any of SEC-Fetch headers whatsoever, which had thrown a monkey wrench into verification process on my end, cause all requests were being blocked by code at first.

There were also requests from this exact UA
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P; Google-AdWords-Express) AppleWebKit/[WEBKIT_VERSION] (KHTML, like Gecko) Chrome/[CHROME_VERSION] Mobile Safari/[WEBKIT_VERSION]
..With no SEC-Fetch headers coming from Goog 66.249.88/24 range.

lucy24

5:17 pm on Apr 21, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



[WEBKIT_VERSION]
Anything in this form--meaning that the botrunner forgot to fill in the blanks in their script--can safely be blocked :) Mine currently says
BrowserMatch _VERSION bad_agent=noversion

Exact UA coming from Google:
Somewhere G### has a list of their user-agents, and the IPs they come from, so poking holes should not be difficult. 66.249.80.0/20 annoys me mightily, because it's such a mix of legitimate and illegitimate (“what do those b######s want now?”) requests. I currently have 66.249.84 blocked unconditionally, though I can’t remember exactly what they did to cause offense. As for 66.249.88, I find a lot of this:
66.249.88.77 - - [16/Apr/2024:08:27:27 -0700] "GET /.well-known/traffic-advice HTTP/1.1" 403 7473 "-" "Chrome Privacy Preserving Prefetch Proxy"
whatever the ### that is. Further poking-around tells me they’re blocked for not sending the Language header, not that it matters since they are requesting a non-existent file.