Forum Moderators: open
Regarding the request header priority flagI checked the Priority header last time I ran logs, and there turned out to be nothing diagnostic. Darn.
Is there an X-forwarded-for header that's in use?It's pretty rare. I’ve got a rule that begins
SetEnvIf X-Forwarded-For ^\Dmeaning that if the value isn't a number, out they go. But Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:147.0) Gecko/20100101 Firefox/147.0CAN mean Also look into "sentry-trace" header.Checking my logged headers, I find
Sentry-Trace: {long string of hexadecimals, sometimes with -0 at the end}
...
botheader: Sentry-Tracewhich tells me that at some time in the past I flagged this header as up to no good ... and then promptly forgot I’d done so ;) Neither mozilla nor rfc seems to have anything to say--in fact nobody does except the manufacturer--which does seem to suggest it isn't used for any legitimate purpose.
Accept language is "zh,zh;q=0.8" or "zh,zh;q=0.9"You’re more tolerant than I am; I unconditionally block ^zh unless it specifies zh-(tw|TW). Sorry, China. Mumble mumble signal-to-noise ratio mumble mumble.
That's it. Not that the accept-language includes those strings, but that they ARE those strings.
^zh,zh;q=0\.[89]$
rather than laying out two separate patterns. If it is requested within the same second as the page it self >>> 99% a Bot.But if it is requested three seconds later, it may still be a bot, because there’s often a delay before requesting supporting files. Unfortunately this sometimes happens with humans too--especially when, as would not be the case here, the page is significantly longer than the viewport. (I kinda think some browsers, especially mobiles, behave this way by default.)
"user-agent" is no way sent by a normal browser behind "Accept Language" header
Content-Length: 0
Connection: close
Host: www.example.com
Priority: u=0, i
Sec-Fetch-User: ?1
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-Dest: document
Upgrade-Insecure-Requests: 1
Dnt: 1 *
Accept-Encoding: gzip, deflate, br, zstd
Accept-Language: {PII suppressed}
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:148.0) Gecko/20100101 Firefox/148.0
In that order. They're not scraping by following links within the file.Unfortunately, that seems to be exactly what the last few days’ robots are doing. Page, few seconds’ delay, then all supporting files including favicon and analytics, followed by analytics file request (which lives on a different site, so a further few seconds’ delay is never diagnostic). These may or may not be the same robots I formerly flagged as botnets, where supporting files are always requested by someone in a Usual Suspects range. Maybe they finally noticed they weren't getting those other files.
my software has no ability to tell me the order of the fields let alone make decisions based on it.Likewise. That is, I could look at logged headers and maybe get some further information--beyond the presence/absence/content of individual headers, which I do use for access control--but that only helps in after-the-fact identification. I guess it would be theoretically possible to detour every page request via a php-or-equivalent script that analyzes the headers. But holy cow that would be a lot of extra work, not just for me but for the server.
Host: www.webmasterworld.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:147.0) Gecko/20100101 Firefox/147.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br, zstd
Referer: https://www.webmasterworld.com/post-v6.cgi
Connection: keep-alive
Cookie: lastvisitinfo=mana-mama-strip-stiff-curly.01; splorks=googlygook
Upgrade-Insecure-Requests: 1
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: same-origin
Sec-Fetch-User: ?1
Priority: u=0, i
Pragma: no-cache
Cache-Control: no-cache <?php
$headers = getallheaders();
// check if 'Tth-Endproxy' is set
if (isset($headers['Tth-Endproxy'])) {
// do you logging and block request
header('HTTP/1.1 403 Forbidden');
exit;
}
// Check if 'User-Agent' contains 'Windows NT'
if (isset($headers['User-Agent']) && stripos($headers['User-Agent'], 'Windows NT') !== false) {
// Check if the 'sec-ch-ua-platform' header exists
if (isset($headers['sec-ch-ua-platform'])) {
// Also Check if headers contain 'Linux'
if (stripos($headers['sec-ch-ua-platform'], 'Linux') !== false) {
// do you logging and block request
header('HTTP/1.1 403 Forbidden');
exit;
}
}
}
?>