Fail2ban禁止垃圾采集爬虫，保护Nginx服务器

安装fail2ban和iptables版权声明：本文遵循 CC 4.0 BY-SA 版权协议，若要转载请务必附上原文出处链接及本声明，谢谢合作！

#CentOS内置源并未包含fail2ban，需要先安装epel源
yum -y install epel-release
#安装fial2ban
yum -y install fail2ban iptables

安装完成后，服务配置目录为：/etc/fail2ban
/etc/fail2ban/action.d #动作文件夹，内含默认文件。iptables以及mail等动作配置
/etc/fail2ban/fail2ban.conf #定义了fai2ban日志级别、日志位置及sock文件位置
/etc/fail2ban/filter.d #条件文件夹，内含默认文件。过滤日志关键内容设置
/etc/fail2ban/jail.conf #主要配置文件，模块化。主要设置启用ban动作的服务及动作阀值
/etc/rc.d/init.d/fail2ban #启动脚本文件

使用jail.local设定，覆盖默认配置

cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
vi /etc/fail2ban/jail.local
[DEFAULT] #全局设置
ignoreip = 127.0.0.1 #忽略的IP列表,多个IP以空格分割或IP段127.0.0.1/8不受设置限制（白名单）
bantime = 600 #屏蔽时间，单位：秒
findtime = 600 #这个时间段内超过规定次数会被ban掉
maxretry = 3 #最大尝试次数
backend = auto #日志修改检测机制（gamin、polling和auto这三种）

[nginx-badbots]

enabled  = true
port     = http,https
filter   = nginx-badbots
action = iptables-multiport[name=nginx-badbots, protocol=tcp]
sendmail-whois[name=nginx-badbots, dest=root, [email protected]]
logpath = /data/wwwlogs/infvie.com_access.log
bantime = 43600
maxretry = 5
findtime  = 10

自定义filter规则

vi /etc/fail2ban/filter.d/nginx-badbots.conf

# Fail2Ban configuration file
#
# Regexp to catch known spambots and software alike. Please verify
# that it is your intent to block IPs which were driven by
# above mentioned bots.

[Definition]

badbotscustom = EmailCollector|WebEMailExtrac|TrackBack/1\.02|sogou music spider
badbots = -|Atomic_Email_Hunter/4\.0|atSpider/1\.0|autoemailspider|bwh3_user_agent|China Local Browse 2\.6|ContactBot/0\.2|ContentSmartz|DataCha0s/2\.0|DBrowse 1\.4b|DBrowse 1\.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1\.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1\.00|ESurf15a 15|ExtractorPro|Franklin Locator 1\.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|Guestbook Auto Submitter|Industry Program 1\.0\.x|ISC Systems iRc Search 2\.1|IUPUI Research Bot v 1\.9a|LARBIN-EXPERIMENTAL \(efp@gmx\.net\)|LetsCrawl\.com/1\.0 \+http\://letscrawl\.com/|Lincoln State Web Browser|LMQueueBot/0\.2|LWP\:\:Simple/5\.803|Mac Finder 1\.0\.xx|MFC Foundation Class Library 4\.0|Microsoft URL Control - 6\.00\.8xxx|Missauga Locate 1\.0\.0|Missigua Locator 1\.9|Missouri College Browse|Mizzu Labs 2\.2|Mo College 1\.9|MVAClient|Mozilla/2\.0 \(compatible; NEWT ActiveX; Win32\)|Mozilla/3\.0 \(compatible; Indy Library\)|Mozilla/3\.0 \(compatible; scan4mail \(advanced version\) http\://www\.peterspages\.net/?scan4mail\)|Mozilla/4\.0 \(compatible; Advanced Email Extractor v2\.xx\)|Mozilla/4\.0 \(compatible; Iplexx Spider/1\.0 http\://www\.iplexx\.at\)|Mozilla/4\.0 \(compatible; MSIE 5\.0; Windows NT; DigExt; DTS Agent|Mozilla/4\.0 efp@gmx\.net|Mozilla/5\.0 \(Version\: xxxx Type\:xx\)|NameOfAgent \(CMS Spider\)|NASA Search 1\.0|Nsauditor/1\.x|PBrowse 1\.4b|PEval 1\.4b|Poirot|Port Huron Labs|Production Bot 0116B|Production Bot 2016B|Production Bot DOT 3016B|Program Shareware 1\.0\.2|PSurf15a 11|PSurf15a 51|PSurf15a VA|psycheclone|RSurf15a 41|RSurf15a 51|RSurf15a 81|searchbot admin@google\.com|ShablastBot 1\.0|snap\.com beta crawler v0|Snapbot/1\.0|Snapbot/1\.0 \(Snap Shots, \+http\://www\.snap\.com\)|sogou develop spider|Sogou Orion spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sogou spider|Sogou web spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sohu agent|SSurf15a 11 |TSurf15a 11|Under the Rainbow 2\.2|User-Agent\: Mozilla/4\.0 \(compatible; MSIE 6\.0; Windows NT 5\.1\)|VadixBot|WebVulnCrawl\.unknown/1\.0 libwww-perl/5\.803|Wells Search II|WEP Search 00|ZmEu|spiderman|sqlmap|FeedDemon|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|YisouSpider|HttpClient|MJ12bot|heritrix|EasouSpider|LinkpadBot|YandexBot|RU_Bot|200PleaseBot|DuckDuckGo-Favicons-Bot|Wotbox|SeznamBot|Exabot|SemrushBot|PictureBot|SMTBot|SEOkicks-Robot|AdvBot|TrueBot|BLEXBot|WangIDSpider|Ezooms

failregex = ^ -.*"(GET|POST|HEAD).*HTTP.*" \d+ \d+ ".*" "(?:%(badbots)s|%(badbotscustom)s)" (-|.*)$

ignoreregex =

# DEV Notes:
# List of bad bots fetched from http://www.user-agents.org
# Generated on Thu Nov  7 14:23:35 PST 2013 by files/gen_badbots.
#
# Author: Yaroslav Halchenko

默认badbots没有很多，而且比较老，根据自己需要我又添加了下，同时也检查过滤空的user agent

检查过滤规则

检查正则表达式写的对或者不对，可以使用fail2ban-regex命令，具体用法如下所示。

fail2ban-regex /data/wwwlogs/infvie.com_access.log /etc/fail2ban/filter.d/nginx-badbots.conf

修改action规则

cat /etc/fail2ban/action.d/iptables-blocktype.local | grep --color --color -v ^# | less 

[INCLUDES]
after = iptables-blocktype.local
port = 80,443
protocol =tcp
[Init]
returntype = RETURN
lockingopt = -w
iptables = iptables <lockingopt>
blocktype = DROP

fail2ban 重启验证

service fail2ban restart
查看iptables规则是否生效 
iptables -nL --line-number（注意查看f2b-模块名nginx-badbots所在行）

常用命令

查看黑名单状态
fail2ban-client status nginx-badbots(模块名)
黑名单移除IP
fail2ban-client set nginx-badbots unbanip IP地址
过滤规则检查
fail2ban-regex 日志文件.log 规则文件../filter.d/nginx-badbots.conf

防火墙规则查看与清理
iptables -nL --line-number（注意查看f2b-模块名nginx-badbots所在行）
iptables -D f2b-nginx-badbots num（-D清理 防火墙模块名 编号）

注：
1.重启iptables 后需要重启fail2ban ,以此重新加载fail2ban黑名单规则到iptables，不然i版权声明：本文遵循 CC 4.0 BY-SA 版权协议，若要转载请务必附上原文出处链接及本声明，谢谢合作！ptables规则为空；
2.原理：匹配到规则后先将IP加入黑名单—-再加入防火墙规则中；若防火墙规则已经删除，未清理黑名单，再次触发则不会加入防火墙

参考文献

https://github.com/fail2ban/fail2ban
https://www.fail2ban.org/

Fail2ban禁止垃圾采集爬虫，保护Nginx服务器

安装fail2ban和iptables版权声明：本文遵循 CC 4.0 BY-SA 版权协议，若要转载请务必附上原文出处链接及本声明，谢谢合作！

自定义filter规则

检查过滤规则

修改action规则

fail2ban 重启验证

参考文献

APISIX插件开发之精细化限速插件

Apache Traffic Server 管理员手册

confd & Nacos 动态配置变更管理

Nginx 动态发现方案与实践