Forum Moderators: phranque

Message Too Old, No Replies

How to prevent Web pages preload from browsers

         

gronetwork

10:39 pm on Dec 18, 2023 (gmt 0)

Top Contributors Of The Month



Hi,

I have enhanced the "core web vitals" (responsiveness, latency, ...) of my website in october. And now I have much more visitors which are using mobiles. They are replacing the number of users that were using desktop devices.

However it has generated some consequences : I was getting thousands of 408 errors in my logs, and a slow connection to the server. By updating the HTTP protocol (http 1.1 > http 2), it has widely decreased the number of 408 errors, as the slow connection. Then I have noticed the presence of hundreds of TIME_WAIT connections in netstat (8-10 times more than ESTABLISHED).

I have tried several settings in apache2.conf, systcl.conf (timeout, keepalive, tcp_tw_reuse) to limit the impact of these connections, but without success. In dmesg, I have this error several times per day "TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies. Check SNMP counters." But I don't have SYN connections, just a lot of TIME_WAIT.

When I check my Apache log files, I can see hundreds of connections (GET) per minute, each from different IP (not bots, all coming from Ireland, New Zealand, Australia, Canada, UK, and USA). But in Google Analytics, I have only between 120 and 250 visitors on the last 30 minutes.

I suppose that these connections are preload (prefetch?) from the browsers. In Chrome (Mobile version) it is set by default, Chrome will preload the Web page of every links present on a Web page visited by the Internet user.

How can I prevent the browsers to preload my website pages ? As it is currently making crazy high CPU usage and bandwidth (4-5 more times than usual). Sometimes it stops for several hours or one day and everything comes back to normal. It happens the days I get less visitors using mobile devices. And then it starts again.

lucy24

11:35 pm on Dec 18, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sec-Purpose: prefetch;anonymous-client-ip
Purpose: prefetch
I block requests with “Purpose: prefetch”. (This may be redundant; a quick riffle through logged headers shows no other possible value of the “Purpose” header, so its bare existence should be enough.) I only find it from Chrome, but that does take care of the Androids, which seem to be your main trouble.

Obligatory caution: Blocking a request won't keep it from being made. But a simple 403 is going to be less work for the server, especially if your pages are large or rely on a CMS.

gronetwork

1:15 am on Dec 19, 2023 (gmt 0)

Top Contributors Of The Month



Hi lucy24,

Thanks for your answer! So, I have added \"%{Purpose}i\" in apache2.conf LogFormat to check it they were doing prefetch. But unfortunately, I have only 8 "prefetch" among 1770 Web page requests during 4 minutes.

What would you do to understand what are these IP, knowing that pratically all these webpage requests come from a unique IP, they are not accounted in Google Analytics or Google Search Console. They are sucking the server ressources. The referrer only shows "-" in the log.

lucy24

4:51 am on Dec 19, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are the requests only for pages, not supporting files? If they are getting your pages but not analytics, that does point to robots. If they were referer-spam robots they would do the opposite--get Analytics but don't bother with the site itself--but the missing referer cuts out that possibility.

The form "-" in logs means that the element in question--most often Referer, sometimes User-Agent--was not sent at all. Rarely you'll see "" which means the element was sent, but is empty. (I've been vexed with a number of those in recent weeks. Unfortunately there doesn't seem to be any way to distinguish between an empty referer and a nonexistent referer, except after the fact, in logs.)

pratically all these webpage requests come from a unique IP
Do you mean that each one is different? If so, it's probably not worth blocking by IP. But you might look up a few and see if they're coming from servers/colos or from human ISPs.

A useful first step is to log headers for a few days or even weeks, and see if anything leaps out at you. It doesn't really sound like ordinary prefetch activity, though. (And how infuriating, if it were prefetch, because it would imply that only one in a thousand people who see a link to your site actually go there. Even my own ctr isn’t that dire!) Are the user-agents all Chrome?

for what it’s worth: For the last week or so I have been hit with an inordinate number of robots--say, ten times as many as normal--all claiming to be Androids using relatively elderly Chrome (mostly 40s and 50s). I'm hoping that they will soon get bored and go away, as there's only so much I can block without intercepting law-abiding humans at the same time.

ClosedForLunch

11:33 am on Dec 19, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



gronetwork:

When I check my Apache log files, I can see hundreds of connections (GET) per minute, each from different IP (not bots, all coming from Ireland, New Zealand, Australia, Canada, UK, and USA).


Lucy24:

For the last week or so I have been hit with an inordinate number of robots--say, ten times as many as normal--all claiming to be Androids using relatively elderly Chrome (mostly 40s and 50s).


This seems related to what I have noticed, which I posted about recently :

''Suspicious hits from Macintosh user agents"

These hits continue, predominately from AU, IE and NZ, always with low Chrome version numbers. However, the OS in the UA changes over time... not so much Macintosh now, but Android

Note that the IPs are residential IPs, but the hits are clearly bots... all of which receive the big [F]

gronetwork

12:38 pm on Dec 19, 2023 (gmt 0)

Top Contributors Of The Month



The requests are for pages supporting files. Yes, every web page request is different and every time the IP address is different. They all come from human ISPs.

At first I thought I was under a Slow HTTP (Slow Loris) attack. I tried antiloris but it is not able to act in this case (thousands of IPs making less than 4-5 connections). However, these IPs clearly make PHP scripts and SQL connections work, because PHP-FPM and Mysql have much more load. These connections are triggered when I reach 6,000+ daily mobile users. I can find them in the TIME_WAIT connections list.

Bandwidth increased from 2.5 TB per month to 8.5 TB per month. It continues to grow. Fortunately, I am with OVH (unlimited bandwidth). I would have to pay hundreds of dollars more per month with AWS.

User-Agents are practically always the same:

"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2691.1542 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.4061.1315 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.4061.1315 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2425.1227 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.9131.1527 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.4578.1932 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.4578.1932 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.4578.1932 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.4578.1932 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.6026.1344 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.7471.1660 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.6830.1011 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.4015.1616 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.1045.1416 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2444.1604 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.8781.1673 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.8638.1067 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.6077.1923 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.1371.1249 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2856.1017 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2744.1720 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.2787.1209 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.5199.1026 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.4702.1426 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.7652.1913 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.7848.1375 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.9605.1264 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.8505.1911 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.3306.1078 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.5294.1891 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.6871.1827 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.6871.1827 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.7095.1139 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.8195.1756 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2247.1336 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2247.1336 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.6637.1618 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.1923.1904 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.5499.1055 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.1214.1806 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.4776.1127 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.4776.1127 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.6137.1116 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.6137.1116 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.8911.1677 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2412.1508 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.1004.1784 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2300.1888 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.5972.1720 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.6026.1344 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.6026.1344 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.3797.1105 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.3397.1381 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.3646.1456 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.3646.1456 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.4651.1390 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.8721.1968 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2800.1113 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.8193.1247 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2878.1428 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3272.1436 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.5102.1629 Mobile Safari/537.36"
"Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.6683.1747 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.3835.1676 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.3835.1676 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.8942.1265 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.4945.1962 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2635.1109 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.8406.1640 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.4555.1489 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.8164.1752 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.4832.1349 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.8862.1240 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.9731.1481 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.7100.1154 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.9715.1724 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.9715.1724 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.9855.1093 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.8613.1157 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.2034.1419 Mobile Safari/537.36"
"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3956.1284 Mobile Safari/537.36"


I always see either SM-G900P, Nexus 5, Pixel 2, or iPhone OS 11_0.

Yes, the main countries of origin are IE (24%), NZ (19%), AU (17%), then Canada (14%), UK (10%), USA (9%).

What is the best method to log all the headers ? I have tried DumpIO but it doesn't seem to work or I have not the good settings.

[edited by: gronetwork at 1:32 pm (utc) on Dec 19, 2023]

not2easy

1:29 pm on Dec 19, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hi gronetwork and welcome to WebmasterWorld [webmasterworld.com]

What is the best method to log all the headers ?
Someone else has asked that before and lucy24 shared basic info here: [webmasterworld.com...] though there may be updates since that 2018 thread. She's far better at this than I am.

If the "SM-G900P" part of the UA is unique to unwanted traffic, you might work with that snippet to block the UA. I would not hesitate to block hundreds/thousands of antique UAs like the "iPhone OS 11" - seriously out of date. My 403 includes a method for accidentally blocked humans to have a 2nd chance.

lucy24

6:39 pm on Dec 19, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ugh, that UA list does look familiar. Now, the exact cutoff will depend on your specific site's audience, but many of those can definitely be blocked. The relevant section of my access controls currently looks like this
BrowserMatch Chrome/[56]\d\. old_chrome=$0
BrowserMatch Android !old_chrome

BrowserMatch Chrome/[1-4]?\d\. old_chrome=$0
(leading to a Require env old_chrome among many others) meaning that Chrome < 50 is unconditionally blocked, while the 50s and 60s get a pass if it's Android. I update this kind of thing every year or so by checking if there have been requests for .css or /piwik/ (the directory still has this name although it's now Matomo); if not, it can be consigned to robot-dom.

fwiw, I just checked my own Android, which I rarely use and have no recollection of ever intentionally upgrading. It comes through as Chrome/95, so values in the 50s and 60s are definitely at the outer limits.

lucy24

6:50 pm on Dec 19, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



there may be updates since that 2018 thread
Here's what mine currently looks like. It grew out of something incrediBill posted a few years earlier, so there are parts I personally don't understand. I added output buffering a couple years ago in hopes it would prevent tangles when two requests came too close together. (It didn't; I just haven't got around to removing that part.)

The bit with '/^[abce-z]/' may be server-specific; its purpose is to leave out a bunch of environmental variable starting in d (lower-case) that I don't need to know about. The ones I set myself start in other lower-case letters: noagent, badref and so on.

<?php
// shared header function
function get_server($var)
{
return isset($_SERVER[$var]) ? $_SERVER[$var] : false;
}

if (!function_exists('getallheaders'))
{
function getallheaders()
{
$headers = '';
foreach ($_SERVER as $name => $value)
{
if (substr($name, 0, 5) == 'HTTP_')
{ $headers[str_replace(' ', '-', ucwords(strtolower(str_replace('_', ' ', substr($name, 5)))))] = $value; }
}
return $headers;
}
}

if (!function_exists('getenvif'))
{
function getenvif()
{
$envvar = getenv();
return $envvar;
}
}

ob_start();
$ip = get_server('REMOTE_ADDR');
$fh = fopen($_SERVER['DOCUMENT_ROOT'] . "/boilerplate/headers-". date('Ymd') . ".log","a");
fwrite($fh, date('Y-m-d:') . date("H:i:s\n"));
$thispage = $_SERVER['REQUEST_URI'];
fwrite($fh, "URL: $thispage\n");
$status = $_SERVER['REDIRECT_STATUS'];
fwrite($fh, "Status: $status\n");
$secure = $_SERVER['HTTPS'];
fwrite($fh, "HTTPS: $secure\n");
fwrite($fh, "IP: $ip\n");
fwrite($fh, "----\n");

foreach (getallheaders() as $name => $value)
{
fwrite($fh, "$name: $value\n");
}
fwrite($fh, "----\n");

foreach (getenvif() as $name => $value)
{
// if (preg_match ('/(REDIRECT_)?[a-z]/',$name) && $value)
// exclude gzip_only_text_html
if (preg_match ('/^[abce-z]/',$name) && $value)
{
fwrite($fh, "$name: $value\n");
}
}
fwrite($fh, "----\n\n");

fclose($fh);
ob_end_flush();
?>

gronetwork

7:41 pm on Dec 19, 2023 (gmt 0)

Top Contributors Of The Month



Okay, thanks!

I have tried to block the IPs in htaccess by taking a part of the user agent :

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "Pixel 2 Build/OPD3\.170816\.012" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Nexus 5 Build/MRA58N" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "iPhone OS 11_0 like Mac OS X" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "SM-G900P Build/LRX21T" [NC]
RewriteRule ^ - [F,L]


It didn't work. I have tried it differently but without success.

Then, I have tried to filter them with a php script :

<?php

$useragent="";
if(isset($_SERVER['HTTP_USER_AGENT']))$useragent=$_SERVER['HTTP_USER_AGENT'];

//echo "<br>Current user agent : ".$useragent . "<br>";

if (strstr($useragent, 'Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012')
|| strstr($useragent, 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N')
|| strstr($useragent, 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X')
|| strstr($useragent, 'Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T')
) {
die('System error!');
}
else{
//echo '<br>You are OK.';
}

?>


For the moment it only displays a white page to these IPs.

And after few seconds : the CPU usage, the number of connections (in phpMyAdmin) and the bandwidth get back to normal. The number of TIME_WAIT has been divided by 4.

The fact that they are using one of these 4 system information and that when I count the occurences in the log I get practically the same number of requests for each of these system information (4 x 87 500 = 350 000 webpage requests per day) shows that it is possibly a flooding attack to saturate the server.

The connections in the log file are still here, but they have decreased (divided by 2).

not2easy

7:55 pm on Dec 19, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You would need to escape spaces, slashes and punctuation, non alpha-numeric characters. You do not need to us the entire string, and you can specify more than one UA or unique partial UA per line, divided with a vertical line like
(Capture|Client|Copy|crawl|curl)

although it may not be the proper format, I don't know what version of Apache you use or whether that matters for UA blocking. I know that it works fine for my sites. It is worthwhile to hang on a bit and hope lucy24 stops in.. ;)

A very oulde example (May 2013) can be found in this thread: [webmasterworld.com...]

lucy24

9:57 pm on Dec 19, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My two cents: Yes, mod_rewrite can be used for access control--but it's pretty server-intensive, so I prefer to combine mod_setenvif with mod_auth-I-forget-the-rest (Require blahblah, or Allow/Deny if your server is really old). Besides, inheritance is more straightforward, so all the rules can go in a single htaccess or <Directory> to apply to multiple sites.

escape spaces, slashes and punctuation
In mod_setenvif, happily you can deal with spaces by putting the whole thing in quotation marks. In mod_rewrite, you have to escape them. (And beware of trailing spaces, or your server will explode!)

Slashes don't need to be escaped unless you are either in javascript or one of those obscure Apache mods that use /slashes/ as delimiters. In most situations, the only punctuation marks that need escaping are . period/fullstop, brackets [ ] and parentheses ( ). Oh, and ^ and $ when used as literal characters. Escape - (hyphen) only inside grouping brackets, where it has syntactic meaning.

Non-essential escapes won't break anything, but it adds a couple of bytes to every request. More of a problem in htaccess where everything has to be recompiled every time, but leaving out the extranea does make your config file easier to read.

Don't use [NC] (or NoCase in mod_setenvif) unless you absolutely have to. It makes an extra bit of work for the server, flattening the pattern and the test string every time. It's very rare for offending strings to come in multiple permutations of casing; at most you’d have something like [Bb]ot with just one variable. And sometimes wrong casing is itself the giveway, like the “GoogleBot” [sic] that was popular a few years ago.

divided with a vertical line like
Technically known as a pipe ;) If you're checking for more than one possibility of the same thing--user-agent, IP, referer, filename--you can collapse them all into a single line, as long as you don't let the line get too long. This applies both to Conditions and to the Rule itself, as in the typical hole-poking rule along the lines of
RewriteRule ^(forbidden|missing|gone|repairs) - [L]

Over here it's been about 48 hours since I added Chrome/[34]\d to the unconditional blocks. They continue to come in, but now they are eating a steady stream of 403. If that isn't enough to make them get bored and go away, I'll have to extend upward into [56]\d, though this does shut out a tiny handful of humans.

gronetwork

1:16 am on Dec 20, 2023 (gmt 0)

Top Contributors Of The Month



Thanks for the information.

I did some research, many webmasters were affected by this quartet of UAs. I suspect these are actually IP addresses hijacked by bytedance, as this started popping up right after an incident I had with bytespider. One day I mistakenly removed their IPs from my blocking rules in htaccess, and the next day bytespider made my server inaccessible. Then I blocked these IPs again. And ultimately this new multitude of IPs began to emerge.

Some webmasters came to the same conclusion here: [webmasterworld.com...]

Bytedance is really trying hard to gobble up the worldwide web (like PetalBot back in the day). I don't understand why the "West" hasn't blocked this company yet. Their malicious intent is clear.

lucy24

4:44 am on Dec 20, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<tangent>
like PetalBot back in the day
I think PetalBot must be under new management. I see them about once every other day--never requesting anything but robots.txt.
</tangent>

gronetwork

1:45 pm on Dec 20, 2023 (gmt 0)

Top Contributors Of The Month



These IPs completely disappeared from the logs after sending 403.

This lasted 2 months, I had only noticed their presence when their effects began to be visible one month ago.

<?php

$useragent="";
if(isset($_SERVER['HTTP_USER_AGENT']))$useragent=$_SERVER['HTTP_USER_AGENT'];

//echo "<br>Current user agent : ".$useragent . "<br>";

if (strstr($useragent, 'Pixel 2 Build/OPD3.170816.012')
|| strstr($useragent, 'Nexus 5 Build/MRA58N')
|| strstr($useragent, 'iPhone OS 11_0 like Mac OS X')
|| strstr($useragent, 'SM-G900P Build/LRX21T')
) {
http_response_code(403);
die('Forbidden');
}
else{
//echo '<br>You are OK.';
}

?>

lucy24

4:54 pm on Dec 20, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Did a brace { get lost in posting, or do I blame my eyesight? Either way, I was struck by the “else”
if(isset($_SERVER['HTTP_USER_AGENT']))
{ do stuff }
else{
//echo '<br>You are OK.';
}
I would think that if there is no ($_SERVER['HTTP_USER_AGENT']) then that would be decidely not OK, since noagent is one of the most basic blocking criteria.