下記の"apache-loggen"というツールを利用すると、ダミーのアクセスログを簡単に作成することができます。
apache_log_gen
Amazon Linux上だと下記のように簡単にインストールできます。
$ sudo gem install apache-loggen --no-ri --no-rdoc -V GET https://rubygems.org/latest_specs.4.8.gz 302 Moved Temporarily GET https://rubygems.global.ssl.fastly.net/latest_specs.4.8.gz 200 OK GET https://rubygems.org/quick/Marshal.4.8/apache-loggen-0.0.4.gemspec.rz 302 Moved Temporarily GET https://rubygems.global.ssl.fastly.net/quick/Marshal.4.8/apache-loggen-0.0.4.gemspec.rz 200 OK GET https://rubygems.org/quick/Marshal.4.8/json-1.8.3.gemspec.rz 302 Moved Temporarily GET https://rubygems.global.ssl.fastly.net/quick/Marshal.4.8/json-1.8.3.gemspec.rz 200 OK Installing gem apache-loggen-0.0.4 Downloading gem apache-loggen-0.0.4.gem GET https://rubygems.org/gems/apache-loggen-0.0.4.gem 302 Moved Temporarily GET https://rubygems.global.ssl.fastly.net/gems/apache-loggen-0.0.4.gem Fetching: apache-loggen-0.0.4.gem (100%) 200 OK /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/Gemfile /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/LICENSE.txt /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/README.md /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/Rakefile /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/apache-loggen.gemspec /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/bin/apache-loggen /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/lib/apache-loggen-boot.rb /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/lib/apache-loggen.rb /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/lib/apache-loggen/base.rb /usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/lib/apache-loggen/version.rb /usr/local/bin/apache-loggen Successfully installed apache-loggen-0.0.4 1 gem installed
こんな感じでログを作成できます。
$ apache-loggen --rotate=300 test.log
可能な限りログを”test.log”に出力し続け、300秒でログファイル(test.log)をローテーションしてます。
“t2.micro”インスタンス(gp2 8G)で実行していますが、下記のようなログが、300秒で1500万行ほど作成されました。
$ head test.log 116.168.225.172 - - [19/Oct/2015:18:17:44 +0000] "GET /item/electronics/3607 HTTP/1.1" 200 125 "/item/books/70" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1" 92.36.171.201 - - [19/Oct/2015:18:17:44 +0000] "GET /item/sports/4471 HTTP/1.1" 200 87 "/category/jewelry" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7" 152.174.149.220 - - [19/Oct/2015:18:17:44 +0000] "GET /category/toys HTTP/1.1" 200 72 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; YTB730; GTB7.2; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)" 96.195.53.225 - - [19/Oct/2015:18:17:44 +0000] "GET /category/finance HTTP/1.1" 200 114 "/item/books/597" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1" 156.48.113.173 - - [19/Oct/2015:18:17:44 +0000] "GET /item/toys/2687 HTTP/1.1" 200 129 "/category/books" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; YTB730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)" 28.192.173.67 - - [19/Oct/2015:18:17:44 +0000] "GET /category/software HTTP/1.1" 200 117 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7" 52.36.40.92 - - [19/Oct/2015:18:17:44 +0000] "POST /search/?c=Electronics HTTP/1.1" 200 120 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 148.99.222.197 - - [19/Oct/2015:18:17:44 +0000] "GET /category/music HTTP/1.1" 200 128 "/category/finance" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1" 200.156.28.119 - - [19/Oct/2015:18:17:44 +0000] "GET /category/toys HTTP/1.1" 200 107 "/category/giftcards" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.1) Gecko/20100101 Firefox/10.0.1" 188.132.216.195 - - [19/Oct/2015:18:17:44 +0000] "GET /item/electronics/3284 HTTP/1.1" 200 123 "/search/?c=Electronics" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
せっかくなので、”grep”を使って特定の(最初の2行の)IP以外のログを簡単に抽出してみます。
grep -v -e '116.168.225.172' -e '92.36.171.201' test.log | head 152.174.149.220 - - [19/Oct/2015:18:17:44 +0000] "GET /category/toys HTTP/1.1" 200 72 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; YTB730; GTB7.2; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)" 96.195.53.225 - - [19/Oct/2015:18:17:44 +0000] "GET /category/finance HTTP/1.1" 200 114 "/item/books/597" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1" 156.48.113.173 - - [19/Oct/2015:18:17:44 +0000] "GET /item/toys/2687 HTTP/1.1" 200 129 "/category/books" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; YTB730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)" 28.192.173.67 - - [19/Oct/2015:18:17:44 +0000] "GET /category/software HTTP/1.1" 200 117 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7" 52.36.40.92 - - [19/Oct/2015:18:17:44 +0000] "POST /search/?c=Electronics HTTP/1.1" 200 120 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 148.99.222.197 - - [19/Oct/2015:18:17:44 +0000] "GET /category/music HTTP/1.1" 200 128 "/category/finance" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1" 200.156.28.119 - - [19/Oct/2015:18:17:44 +0000] "GET /category/toys HTTP/1.1" 200 107 "/category/giftcards" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.1) Gecko/20100101 Firefox/10.0.1" 188.132.216.195 - - [19/Oct/2015:18:17:44 +0000] "GET /item/electronics/3284 HTTP/1.1" 200 123 "/search/?c=Electronics" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1" 176.87.120.37 - - [19/Oct/2015:18:17:44 +0000] "POST /search/?c=Games+Music HTTP/1.1" 200 98 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11" 64.150.185.107 - - [19/Oct/2015:18:17:44 +0000] "GET /category/books HTTP/1.1" 200 113 "-" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
問題なく該当するIPを含む行(最初の2行)が除外されました。
って処理を、もっと大量なデータでEMR(HIVE)を使って試してみる準備です。
(最終的にはLambdaから実行)