Forum Moderators: phranque
Along with you, I've done a lot of work to create a perfect .htaccess Ban List. There will always be rogues that get through or new and better bots. You can't block all of them but you CAN keep your server load down and your access streamlined to your target audience. I have 228 User Agents blocked and have minimal annoyances. Here is my latest .htaccess code:
## .htaccess Code :: BEGIN
## Block Bad Bots by User-Agent
SetEnvIfNoCase User-Agent ^$ bad_bot
SetEnvIfNoCase User-Agent "^AESOP_com_SpiderMan" bad_bot
SetEnvIfNoCase User-Agent "^Alexibot" bad_bot
SetEnvIfNoCase User-Agent "Anonymouse.org" bad_bot
SetEnvIfNoCase User-Agent "^asterias" bad_bot
SetEnvIfNoCase User-Agent "^attach" bad_bot
SetEnvIfNoCase User-Agent "^BackDoorBot" bad_bot
SetEnvIfNoCase User-Agent "^BackWeb" bad_bot
SetEnvIfNoCase User-Agent "Bandit" bad_bot
SetEnvIfNoCase User-Agent "^Baiduspider" bad_bot
SetEnvIfNoCase User-Agent "^BatchFTP" bad_bot
SetEnvIfNoCase User-Agent "^Bigfoot" bad_bot
SetEnvIfNoCase User-Agent "^Black.Hole" bad_bot
SetEnvIfNoCase User-Agent "^BlackWidow" bad_bot
SetEnvIfNoCase User-Agent "^BlowFish" bad_bot
SetEnvIfNoCase User-Agent "^Bot\ mailto:craftbot@yahoo.com" bad_bot
SetEnvIfNoCase User-Agent "^BotALot" bad_bot
SetEnvIfNoCase User-Agent "Buddy" bad_bot
SetEnvIfNoCase User-Agent "^BuiltBotTough" bad_bot
SetEnvIfNoCase User-Agent "^Bullseye" bad_bot
SetEnvIfNoCase User-Agent "^BunnySlippers" bad_bot
SetEnvIfNoCase User-Agent "^Cegbfeieh" bad_bot
SetEnvIfNoCase User-Agent "^CheeseBot" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^ChinaClaw" bad_bot
SetEnvIfNoCase User-Agent "Collector" bad_bot
SetEnvIfNoCase User-Agent "Copier" bad_bot
SetEnvIfNoCase User-Agent "^CopyRightCheck" bad_bot
SetEnvIfNoCase User-Agent "^cosmos" bad_bot
SetEnvIfNoCase User-Agent "^Crescent" bad_bot
SetEnvIfNoCase User-Agent "^Curl" bad_bot
SetEnvIfNoCase User-Agent "^Custo" bad_bot
SetEnvIfNoCase User-Agent "^DA" bad_bot
SetEnvIfNoCase User-Agent "^DISCo" bad_bot
SetEnvIfNoCase User-Agent "^DIIbot" bad_bot
SetEnvIfNoCase User-Agent "^DittoSpyder" bad_bot
SetEnvIfNoCase User-Agent "^Download" bad_bot
SetEnvIfNoCase User-Agent "^Download\ Demon" bad_bot
SetEnvIfNoCase User-Agent "^Download\ Devil" bad_bot
SetEnvIfNoCase User-Agent "^Download\ Wonder" bad_bot
SetEnvIfNoCase User-Agent "Downloader" bad_bot
SetEnvIfNoCase User-Agent "^dragonfly" bad_bot
SetEnvIfNoCase User-Agent "^Drip" bad_bot
SetEnvIfNoCase User-Agent "^eCatch" bad_bot
SetEnvIfNoCase User-Agent "^EasyDL" bad_bot
SetEnvIfNoCase User-Agent "^ebingbong" bad_bot
SetEnvIfNoCase User-Agent "^EirGrabber" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^EroCrawler" bad_bot
SetEnvIfNoCase User-Agent "^Exabot" bad_bot
SetEnvIfNoCase User-Agent "^Express\ WebPictures" bad_bot
SetEnvIfNoCase User-Agent "Extractor" bad_bot
SetEnvIfNoCase User-Agent "^EyeNetIE" bad_bot
SetEnvIfNoCase User-Agent "^FileHound" bad_bot
SetEnvIfNoCase User-Agent "^FlashGet" bad_bot
SetEnvIfNoCase User-Agent "^Foobot" bad_bot
SetEnvIfNoCase User-Agent "^flunky" bad_bot
SetEnvIfNoCase User-Agent "^FrontPage" bad_bot
SetEnvIfNoCase User-Agent "^GetRight" bad_bot
SetEnvIfNoCase User-Agent "^GetSmart" bad_bot
SetEnvIfNoCase User-Agent "^GetWeb!" bad_bot
SetEnvIfNoCase User-Agent "^Go!Zilla" bad_bot
SetEnvIfNoCase User-Agent "Google\ Wireless\ Transcoder" bad_bot
SetEnvIfNoCase User-Agent "^Go-Ahead-Got-It" bad_bot
SetEnvIfNoCase User-Agent "^gotit" bad_bot
SetEnvIfNoCase User-Agent "Grabber" bad_bot
SetEnvIfNoCase User-Agent "^GrabNet" bad_bot
SetEnvIfNoCase User-Agent "^Grafula" bad_bot
SetEnvIfNoCase User-Agent "^Harvest" bad_bot
SetEnvIfNoCase User-Agent "^hloader" bad_bot
SetEnvIfNoCase User-Agent "^HMView" bad_bot
SetEnvIfNoCase User-Agent "^httplib" bad_bot
SetEnvIfNoCase User-Agent "^HTTrack" bad_bot
SetEnvIfNoCase User-Agent "^humanlinks" bad_bot
SetEnvIfNoCase User-Agent "^ia_archiver" bad_bot
SetEnvIfNoCase User-Agent "^IlseBot" bad_bot
SetEnvIfNoCase User-Agent "^Image\ Stripper" bad_bot
SetEnvIfNoCase User-Agent "^Image\ Sucker" bad_bot
SetEnvIfNoCase User-Agent "Indy\ Library" bad_bot
SetEnvIfNoCase User-Agent "^InfoNaviRobot" bad_bot
SetEnvIfNoCase User-Agent "^InfoTekies" bad_bot
SetEnvIfNoCase User-Agent "^Intelliseek" bad_bot
SetEnvIfNoCase User-Agent "^InterGET" bad_bot
SetEnvIfNoCase User-Agent "^Internet\ Ninja" bad_bot
SetEnvIfNoCase User-Agent "^Iria" bad_bot
SetEnvIfNoCase User-Agent "^Jakarta" bad_bot
SetEnvIfNoCase User-Agent "^JennyBot" bad_bot
SetEnvIfNoCase User-Agent "^JetCar" bad_bot
SetEnvIfNoCase User-Agent "^JOC" bad_bot
SetEnvIfNoCase User-Agent "^JustView" bad_bot
SetEnvIfNoCase User-Agent "^Jyxobot" bad_bot
SetEnvIfNoCase User-Agent "^Kenjin.Spider" bad_bot
SetEnvIfNoCase User-Agent "^Keyword.Density" bad_bot
SetEnvIfNoCase User-Agent "^larbin" bad_bot
SetEnvIfNoCase User-Agent "^LeechFTP" bad_bot
SetEnvIfNoCase User-Agent "^LexiBot" bad_bot
SetEnvIfNoCase User-Agent "^lftp" bad_bot
SetEnvIfNoCase User-Agent "^libWeb/clsHTTP" bad_bot
SetEnvIfNoCase User-Agent "^likse" bad_bot
SetEnvIfNoCase User-Agent "^LinkextractorPro" bad_bot
SetEnvIfNoCase User-Agent "^LinkScan/8.1a.Unix" bad_bo
SetEnvIfNoCase User-Agent "^LNSpiderguy" bad_bott
SetEnvIfNoCase User-Agent "^LinkWalker" bad_bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bad_bot
SetEnvIfNoCase User-Agent "^LWP::Simple" bad_bot
SetEnvIfNoCase User-Agent "^Magnet" bad_bot
SetEnvIfNoCase User-Agent "^Mag-Net" bad_bot
SetEnvIfNoCase User-Agent "^MarkWatch" bad_bot
SetEnvIfNoCase User-Agent "^Mass\ Downloader" bad_bot
SetEnvIfNoCase User-Agent "^Mata.Hari" bad_bot
SetEnvIfNoCase User-Agent "^Memo" bad_bot
SetEnvIfNoCase User-Agent "^Microsoft.URL" bad_bot
SetEnvIfNoCase User-Agent "^Microsoft\ URL\ Control" bad_bot
SetEnvIfNoCase User-Agent "^MIDown\ tool" bad_bot
SetEnvIfNoCase User-Agent "^MIIxpc" bad_bot
SetEnvIfNoCase User-Agent "^Mirror" bad_bot
SetEnvIfNoCase User-Agent "^Missigua\ Locator" bad_bot
SetEnvIfNoCase User-Agent "^Mister\ PiX" bad_bot
SetEnvIfNoCase User-Agent "^moget" bad_bot
SetEnvIfNoCase User-Agent "^Mozilla/3.Mozilla/2.01" bad_bot
SetEnvIfNoCase User-Agent "^Mozilla.*NEWT" bad_bot
SetEnvIfNoCase User-Agent "^NAMEPROTECT" bad_bot
SetEnvIfNoCase User-Agent "^Navroad" bad_bot
SetEnvIfNoCase User-Agent "^NearSite" bad_bot
SetEnvIfNoCase User-Agent "^NetAnts" bad_bot
SetEnvIfNoCase User-Agent "^Netcraft" bad_bot
SetEnvIfNoCase User-Agent "^NetMechanic" bad_bot
SetEnvIfNoCase User-Agent "^NetSpider" bad_bot
SetEnvIfNoCase User-Agent "^Net\ Vampire" bad_bot
SetEnvIfNoCase User-Agent "^NetZIP" bad_bot
SetEnvIfNoCase User-Agent "^NextGenSearchBot" bad_bot
SetEnvIfNoCase User-Agent "^NG" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^NimbleCrawler" bad_bot
SetEnvIfNoCase User-Agent "^Ninja" bad_bot
SetEnvIfNoCase User-Agent "^NPbot" bad_bot
SetEnvIfNoCase User-Agent "^Octopus" bad_bot
SetEnvIfNoCase User-Agent "^Offline\ Explorer" bad_bot
SetEnvIfNoCase User-Agent "^Offline\ Navigator" bad_bot
SetEnvIfNoCase User-Agent "^Openfind" bad_bot
SetEnvIfNoCase User-Agent "^OutfoxBot" bad_bot
SetEnvIfNoCase User-Agent "^PageGrabber" bad_bot
SetEnvIfNoCase User-Agent "^Papa\ Foto" bad_bot
SetEnvIfNoCase User-Agent "^pavuk" bad_bot
SetEnvIfNoCase User-Agent "^pcBrowser" bad_bot
SetEnvIfNoCase User-Agent "^PHP\ version\ tracker" bad_bot
SetEnvIfNoCase User-Agent "^Pockey" bad_bot
SetEnvIfNoCase User-Agent "^ProPowerBot/2.14" bad_bot
SetEnvIfNoCase User-Agent "^ProWebWalker" bad_bot
SetEnvIfNoCase User-Agent "^psbot" bad_bot
SetEnvIfNoCase User-Agent "^Pump" bad_bot
SetEnvIfNoCase User-Agent "^QueryN.Metasearch" bad_bot
SetEnvIfNoCase User-Agent "^RealDownload" bad_bot
SetEnvIfNoCase User-Agent "Reaper" bad_bot
SetEnvIfNoCase User-Agent "Recorder" bad_bot
SetEnvIfNoCase User-Agent "^ReGet" bad_bot
SetEnvIfNoCase User-Agent "^RepoMonkey" bad_bot
SetEnvIfNoCase User-Agent "^RMA" bad_bot
SetEnvIfNoCase User-Agent "Siphon" bad_bot
SetEnvIfNoCase User-Agent "sitecheck.internetseer.com" bad_bot
SetEnvIfNoCase User-Agent "^SiteSnagger" bad_bot
SetEnvIfNoCase User-Agent "^SlySearch" bad_bot
SetEnvIfNoCase User-Agent "^SmartDownload" bad_bot
SetEnvIfNoCase User-Agent "^Snake" bad_bot
SetEnvIfNoCase User-Agent "^Snapbot" bad_bot
SetEnvIfNoCase User-Agent "^Snoopy" bad_bot
SetEnvIfNoCase User-Agent "^sogou" bad_bot
SetEnvIfNoCase User-Agent "^SpaceBison" bad_bot
SetEnvIfNoCase User-Agent "^SpankBot" bad_bot
SetEnvIfNoCase User-Agent "^spanner" bad_bot
SetEnvIfNoCase User-Agent "^Sqworm" bad_bot
SetEnvIfNoCase User-Agent "Stripper" bad_bot
SetEnvIfNoCase User-Agent "Sucker" bad_bot
SetEnvIfNoCase User-Agent "^SuperBot" bad_bot
SetEnvIfNoCase User-Agent "^SuperHTTP" bad_bot
SetEnvIfNoCase User-Agent "^Surfbot" bad_bot
SetEnvIfNoCase User-Agent "^suzuran" bad_bot
SetEnvIfNoCase User-Agent "^Szukacz/1.4" bad_bot
SetEnvIfNoCase User-Agent "^tAkeOut" bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^Telesoft" bad_bot
SetEnvIfNoCase User-Agent "^TurnitinBot/1.5" bad_bot
SetEnvIfNoCase User-Agent "^The.Intraformant" bad_bot
SetEnvIfNoCase User-Agent "^TheNomad" bad_bot
SetEnvIfNoCase User-Agent "^TightTwatBot" bad_bot
SetEnvIfNoCase User-Agent "^Titan" bad_bot
SetEnvIfNoCase User-Agent "^toCrawl/UrlDispatcher" bad_bot
SetEnvIfNoCase User-Agent "^True_Robot" bad_bot
SetEnvIfNoCase User-Agent "^turingos" bad_bot
SetEnvIfNoCase User-Agent "^TurnitinBot" bad_bot
SetEnvIfNoCase User-Agent "^URLy.Warning" bad_bot
SetEnvIfNoCase User-Agent "^Vacuum" bad_bot
SetEnvIfNoCase User-Agent "^VCI" bad_bot
SetEnvIfNoCase User-Agent "^VoidEYE" bad_bot
SetEnvIfNoCase User-Agent "^Web\ Image\ Collector" bad_bot
SetEnvIfNoCase User-Agent "^Web\ Sucker" bad_bot
SetEnvIfNoCase User-Agent "^WebAuto" bad_bot
SetEnvIfNoCase User-Agent "^WebBandit" bad_bot
SetEnvIfNoCase User-Agent "^Webclipping.com" bad_bot
SetEnvIfNoCase User-Agent "^WebCopier" bad_bot
SetEnvIfNoCase User-Agent "^WebEMailExtrac.*" bad_bot
SetEnvIfNoCase User-Agent "^WebEnhancer" bad_bot
SetEnvIfNoCase User-Agent "^WebFetch" bad_bot
SetEnvIfNoCase User-Agent "^WebGo\ IS" bad_bot
SetEnvIfNoCase User-Agent "^Web.Image.Collector" bad_bot
SetEnvIfNoCase User-Agent "^WebLeacher" bad_bot
SetEnvIfNoCase User-Agent "^WebmasterWorldForumBot" bad_bot
SetEnvIfNoCase User-Agent "^WebReaper" bad_bot
SetEnvIfNoCase User-Agent "^WebSauger" bad_bot
SetEnvIfNoCase User-Agent "^WebSite" bad_bot
SetEnvIfNoCase User-Agent "^Website\ eXtractor" bad_bot
SetEnvIfNoCase User-Agent "^Website\ Quester" bad_bot
SetEnvIfNoCase User-Agent "^Webster" bad_bot
SetEnvIfNoCase User-Agent "^WebStripper" bad_bot
SetEnvIfNoCase User-Agent "^WebWhacker" bad_bot
SetEnvIfNoCase User-Agent "^WebZIP" bad_bot
SetEnvIfNoCase User-Agent "^Wget" bad_bot
SetEnvIfNoCase User-Agent "Whacker" bad_bot
SetEnvIfNoCase User-Agent "^Widow" bad_bot
SetEnvIfNoCase User-Agent "^WISENutbot" bad_bot
SetEnvIfNoCase User-Agent "^WWWOFFLE" bad_bot
SetEnvIfNoCase User-Agent "^WWW-Collector-E" bad_bot
SetEnvIfNoCase User-Agent "^Xaldon" bad_bot
SetEnvIfNoCase User-Agent "^Xenu" bad_bot
SetEnvIfNoCase User-Agent "^Zeus" bad_bot
SetEnvIfNoCase User-Agent "^Zyborg" bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>
## .htaccess Code :: END
Just insert into your .htaccess file and you'll be in great shape. Let me know if you have updates or questions.
If it is your intent to allow these other methods, then <LIMIT GET POST> is sufficient; As documented, specifying GET implies that HEAD requests are also allowed.
The length of this code can be reduced greatly if desired by combining user-agents. For example, denying from "e-?mail" and "download" using SetEnvIfNoCase will cover many undesirables all at once.
I always suggest that such lists be trimmed to include only those user-agents which are actually problems for your site, plus any newly-reported ones that *may* become a problem for your site. As WebmasterWorld member IncrediBill points out, the problem with such lists is that they grow forever if not trimmed, and you are always playing catch-up to new "bad" user-agents. And if your site is growing in popularity, the length of the list and the number of requests to your site will eventually result in a noticeable performance penalty.
For sites already experiencing that problem, a combined whitelist/blacklist approach often results in smaller overall code size and better performance.
These comments are intended to provide a bit of background info, and not to detract from the work that went into compiling this list of user-agents!
Jim
Does ^badguy = [anythinghere]badguy?
If so, is there anytime that would be bad to have? (Sorry, but searching anywhere for the definition of ^ doesn't exactly bring an answer.)
This is a better way to write for a more complete block?
<Files *>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Files>
See the tutorials listed in our forum charter [webmasterworld.com], especially, the one on Regular Expressions.
Jim
I found a separate prob...I downloaded Offline Explorer and used it on my site to see how it worked. They've gotten smart with recent versions...They don't use their User-Agent by default anymore...they use IE. When I switched it to the Offline Explorer UA my htaccess worked and blocked it, but that's not a real-life scenario...normally it will be set to IE and no one will change it. So that's a problem...any ideas?
SetEnvIfNoCase ...
SetEnvIfNoCase User-Agent "^Black.Hole" bad_bot
SetEnvIfNoCase User-Agent "^Java" bad_bot
SetEnvIfNoCase User-Agent "^Jakarta" bad_bot
SetEnvIfNoCase ...
Order Allow,Deny
Allow from all
Deny from env=bad_bot
2) My awstats reports show also "Acrobat" as a grabber; is it recommendable to add it to the list and if so, how? I fear that it will not be possible to view pdf files from my site if I do that.
3) Somewhere else it is recommended to block also empty user agents. How do I do that with SetEnvIfNoCase? I tried
SetEnvIfNoCase User-Agent "" bad_bot
but I got a 500 error...
Thanks in advance!
[edited by: jdMorgan at 5:42 pm (utc) on May 14, 2008]
[edit reason] No URLs, please. [/edit]
I also wonder why hybrid6studios adds a \ before a space like in "Indy\ Library". would "Indy Library" not be effective?
Thanks!
2) Never heard of it. Many of the user-agents in the threads here are obsolete and no longer active. I suggest you go through your stats, and include blocking lines only for those user-agents that are actually a problem for your site.
3) Empty user-agent:
SetEnvIf User-agent ^$ bad-bot
"Fake" empty user-agent:
SetEnvIf User-agent ^-$
Both:
SetEnvIf User-agent ^-?$
Layered and others can be blocked by IP address
One IP address:
Deny from 12.34.56.78
or 256 IP addresses 12.34.56.0 - 12.34.56.255
Deny from 12.24.56
or
Deny from 12.34.56.0/24
or
Deny from 12.34.56.0/255.255.255.0
(that's an example IP address/address range only)
You could also use reverse-DNS if available on your server, but some servers will then start logging all accesses using hostnames, and reverse-DNS checks are slow and introduce an additional point of failure -- If your server cannot successfully complete a request to the DNS server, the user's request will 'hang'.
The format for that would be:
Deny from example.com
See mod_access and mod_setenvif docs for more details.
Jim
Their are numerous and redundant repetion of excess in these lines.
As well of the repeated use of QUOTES in each line, which in these examples implies EXACTLY AS and is redundant and without understanding of the application.
EX:
"^CherryPicker"
Whereas ^CherryPicker (BEGINS with CherryPicker) is intended, however see later explanation for condesning of lines.
#condenses two lines
SetEnvIfNoCase User-Agent ^BackWeb bad_bot
SetEnvIfNoCase User-Agent ^Black bad_bot
SetEnvIf User-Agent Grab bad_bot
SetEnvIfNoCase User-Agent ^Info bad_bot
SetEnvIfNoCase User-Agent lwp bad_bot
SetEnvIfNoCase User-Agent ^Pro bad_bot
SetEnvIfNoCase User-Agent site bad_bot
#condenses three lines
SetEnvIfNoCase User-Agent ^Email bad_bot
SetEnvIfNoCase User-Agent ^Get bad_bot
SetEnvIfNoCase User-Agent ^Link bad_bot
#condenses five lines
SetEnvIfNoCase User-Agent ^Download bad_bot
#condenses six lines
SetEnvIfNoCase User-Agent ^Net bad_bot
#condenses twenty-two lines
SetEnvIfNoCase User-Agent ^Web bad_bot
There are likely more ways to condense lines, however I've provided enough examples to this rebirth ;)
2) My awstats reports show also "Acrobat" as a grabber; is it recommendable to add it to the list and if so, how? I fear that it will not be possible to view pdf files from my site if I do that.
The full version of Acrobat/Adobe offers a "print" option to PDF which will spider and retrieve entire websites and outbound linked pages (if desired) as well.
The UA escapes me at the moment and I have too much going on presently.
Will provide the UA later on.
In order to detect problematic accesses these days, you need far more sophisticated methods, some of which are "paid options" you can install on your server. Some intermediate solutions are available here at WebmasterWorld in the PERL and PHP forum libraries, generally findable using "bad bot" and "runaway bots" in the WebmasterWorld forum library search facility.
Jim
You can omit <Files *>
Also, I have the following in my .htaccess and wonder if I'm doing it correctly:
<Files ~ "example.php">
Order allow,deny
Deny from all
</Files>
<Files ~ ".htaccess">
Order allow,deny
Deny from all
</Files>
order allow,deny
deny from example.com
allow from all
Regarding upper/lowercase is the following recommended or is there no difference?
Order allow,deny
Deny from example.com
Allow from all
Thanks,
EG