Forum Moderators: open
Cloudflare's Automated Tools
Go to Security >> Settings. There are various tools here, the one you are most interested in is "Bot Fight Mode". This will automatically block some of the most aggressive bot traffic Cloudflare has identified as malicious. Optionally, you can also enable some of the AI blocking tools.
Cloudflare Custom Rules
Go to Security >> Security Rules >> Create new Rule >> New Custom Rule. CF has an easy-to-use GUI. With the free plan, you get 5 rules. Each rule can have multiple conditions but only one action. Rules are fired in order so make sure the top rules do not interfere with subsequent rules. The following actions can be applied:
Skip - This will skip further rules based on whatever you select under WAF components to skip
Block - The request is blocked
Managed Challenge - Cloudflare will choose what challenge to issue.
Interactive Challenge - CAPTCHA that requires user interaction
JSChallenge - The "Checking your browser...." page that requires no user interaction.
Rule 1 will be used for whatever you want to allow through and skip the rest of the rules. CF maintains a list of known bots that adhere to robots.txt so you can add that if you are using robots.txt. RSS readers cannot pass the Cloudflare check, that is something else you might want to allow through if you have feeds enabled.
Field: Known Bots Operator: Equals Value: <checked>
OR
Field: URI Full Operator: Wildcard Value: https://example.com/forum/feeds/*
Action: Skip All Remaining Custom Rules
Rule 2 will be used for what you want to outright block. You can block using a variety of criteria like ASN, user agent, country, continent and many others. For this example we are blocking the "country" T1 which is used for the Tor network and the continent of Antarctica. These are just examples, phpBB harbors no ill will toward TOR or penguins :).
Field: Country Operator: Equals Value: Tor
OR
Field: Continent Operator: Equals Value: Antarctica
Action: Block
Rule 3 are phpBB specific rules for phpBB's registration page to help stop spammers from registering and brute force attacks for logins. phpBB has it's own brute force detection but for the convenience of users it's not that strict.
Field: URI query string Operator: Contains Value: mode=register
OR
Field: URI query string Operator: Contains Value: mode=login
Action: Managed Challenge
Rule 4 adds a rule for problematic countries or other conditions you want to elevate the Challenge. For action issue an Interactive Challenge. The Interactive Challenge requires the user to perform some action on screen, usually a check box. In the following example it's issued to India and China.
Field: Country Operator: Equals Value: China
OR
Field: Country Operator: Equals Value: India
Action: Interactive Challenge
Rule 5 allows you to whitelist countries and deploy a blanket policy for the rest of the world. For the action, use the JSChallenge, which is the brief "Checking your browser..." page. Countries listed here will not be challenged, add countries where you expect the bulk of your traffic to come from. It's important to note you need to use the "Does not equal" operator with AND. In the following example the US, Canada and the UK are whitelisted.
Field: Country Operator: Does not equal Value: United States
AND
Field: Country Operator: Does not equal Value: United Kingdom
AND
Field: Country Operator: Does not equal Value: Canada
Action: JSChallenge
No-indexing of pages is absolutely ignoredNow I’m confused. I thought the subject was malign robots, not search engines.
Now I’m confused. I thought the subject was malign robots, not search engines.
no-index/robots.txt stops SERP exposure but doesn’t ease crawl pressure.Again, these are different things. The “noindex” flag, whether in a page’s HEAD or in an x-robots tag, will only be seen if the robot first requests the page; its sole function is as an instruction to search engines. And robots.txt is the equivalent of a “No Admittance” or “Employees Only” sign: law-abiding people will heed it (which saves you the annoyance of constantly having the doorknob rattled) but if you absolutely need to keep them out, you need to install a deadbolt. Whether that’s a 403 or a 410 is a matter of personal preference. 404 is a last resort, as the server still has to go looking for the page, unless you choose to return a manual 404 to selected requests. (This approach has its appeal, as you’re not giving the robot any information at all, while a 403 says “I’m onto you”.)
yes, Cloudflare does offer a free outer layer: you get SSL, global CDN, basic WAF, and unmetered DDoS protection out of the box — plus Bot Fight Mode, which automatically challenges known bots without extra cost.
If you need more control (like blocking AI scrapers or building honeypots), there are new free tools like AI crawler blocking and AI Labyrinth that can drop bots into decoy pages to waste their cycles and help flag them.
That said, the more granular protections—like Super Bot Fight Mode or full Bot Management with analytics and scoring—do require Pro or Enterprise plans. So, right now, we’re evaluating how much we can get using CF’s free tools, and weighing that against building deeper, self-hosted layers (JS challenge hardening, fireball filters, geo-IP cache, etc.).
Might still end up adding CF as the outer shield,
If I was going to go any further they have an API and one of the things you can do with it is manage the cache. You can invalidate the cached url when it changes on server.
Unless web hosting provider ranges are identified, whitelisting US ranges and CA ranges may not be a good approach.
I checked the logs yesterday and noticed heavy bot activity.
The total number of requests was around 350,000 per hour, and the bots are still active today.
Our new defense is working at the moment, but it looks like they may be trying to reverse-engineer our code and specifically target us. Hopefully, our script will continue to block them effectively
Bots are still active and haven’t stopped, but their request frequency varies. On the production server, we usually see around 12,000 requests per hour as normal traffic. Recently, after optimizing the database server, it has been able to handle more requests without overloading the CPU. As a result, the bots have also increased their request volume. Right now the last hour number of requests is around 27000