小打小闹

在我的网站日志目录里先找到 Sogou spider 的 IP:

# grep -h -F "Sogou web spider" * | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 5
 109766 220.181.94.231
  26244 220.181.125.69
     93 220.181.94.235
     90 220.181.125.107
     83 220.181.94.236

然后看看从访问最多的那个 IP 来的都是什么 user agent:

# grep -h -F "220.181.94.231" * | grep -v -F "robots.txt" | awk '{ for (i=12; i<=NF; i++) printf("%s ", $i); printf("\n"); }' | sort | uniq -c | sort -nr
 109497 "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 
    187 "Sogou-Test-Spider/4.0 (compatible; MSIE 5.5; Windows 98)" 
    109 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Avant Browser; InfoPath.1; .NET CLR 2.0.50727; .NET CLR1.1.4322)" 
     70 "Tsinghua AI Lab Robot 2.0" 
     55 "Tsinghua AI Lab Robot" 
     35 "-" 
     21 "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.7) Gecko/2009031915 Gentoo Firefox/3.0.7" 
     18 "Sogou Pic Spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 
      1 "Sogou Mobile Spider1.0 (http://wap.sogou.com)"

真有意思。

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.