在我的网站日志目录里先找到 Sogou spider 的 IP:
# grep -h -F "Sogou web spider" * | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 5
109766 220.181.94.231
26244 220.181.125.69
93 220.181.94.235
90 220.181.125.107
83 220.181.94.236
然后看看从访问最多的那个 IP 来的都是什么 user agent:
# grep -h -F "220.181.94.231" * | grep -v -F "robots.txt" | awk '{ for (i=12; i<=NF; i++) printf("%s ", $i); printf("\n"); }' | sort | uniq -c | sort -nr
109497 "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
187 "Sogou-Test-Spider/4.0 (compatible; MSIE 5.5; Windows 98)"
109 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Avant Browser; InfoPath.1; .NET CLR 2.0.50727; .NET CLR1.1.4322)"
70 "Tsinghua AI Lab Robot 2.0"
55 "Tsinghua AI Lab Robot"
35 "-"
21 "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.7) Gecko/2009031915 Gentoo Firefox/3.0.7"
18 "Sogou Pic Spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
1 "Sogou Mobile Spider1.0 (http://wap.sogou.com)"
真有意思。
Leave a Reply