下記の"apache-loggen"というツールを利用すると、ダミーのアクセスログを簡単に作成することができます。

apache_log_gen

2015-10-20_11-09-26

Amazon Linux上だと下記のように簡単にインストールできます。

$ sudo gem install apache-loggen --no-ri --no-rdoc -V
GET https://rubygems.org/latest_specs.4.8.gz
302 Moved Temporarily
GET https://rubygems.global.ssl.fastly.net/latest_specs.4.8.gz
200 OK
GET https://rubygems.org/quick/Marshal.4.8/apache-loggen-0.0.4.gemspec.rz
302 Moved Temporarily
GET https://rubygems.global.ssl.fastly.net/quick/Marshal.4.8/apache-loggen-0.0.4.gemspec.rz
200 OK
GET https://rubygems.org/quick/Marshal.4.8/json-1.8.3.gemspec.rz
302 Moved Temporarily
GET https://rubygems.global.ssl.fastly.net/quick/Marshal.4.8/json-1.8.3.gemspec.rz
200 OK
Installing gem apache-loggen-0.0.4
Downloading gem apache-loggen-0.0.4.gem
GET https://rubygems.org/gems/apache-loggen-0.0.4.gem
302 Moved Temporarily
GET https://rubygems.global.ssl.fastly.net/gems/apache-loggen-0.0.4.gem
Fetching: apache-loggen-0.0.4.gem (100%)
200 OK
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/Gemfile
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/LICENSE.txt
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/README.md
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/Rakefile
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/apache-loggen.gemspec
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/bin/apache-loggen
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/lib/apache-loggen-boot.rb
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/lib/apache-loggen.rb
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/lib/apache-loggen/base.rb
/usr/local/share/ruby/gems/2.0/gems/apache-loggen-0.0.4/lib/apache-loggen/version.rb
/usr/local/bin/apache-loggen
Successfully installed apache-loggen-0.0.4
1 gem installed

こんな感じでログを作成できます。

$ apache-loggen --rotate=300 test.log

可能な限りログを”test.log”に出力し続け、300秒でログファイル(test.log)をローテーションしてます。
“t2.micro”インスタンス(gp2 8G)で実行していますが、下記のようなログが、300秒で1500万行ほど作成されました。

$ head test.log
116.168.225.172 - - [19/Oct/2015:18:17:44 +0000] "GET /item/electronics/3607 HTTP/1.1" 200 125 "/item/books/70" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
92.36.171.201 - - [19/Oct/2015:18:17:44 +0000] "GET /item/sports/4471 HTTP/1.1" 200 87 "/category/jewelry" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7"
152.174.149.220 - - [19/Oct/2015:18:17:44 +0000] "GET /category/toys HTTP/1.1" 200 72 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; YTB730; GTB7.2; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)"
96.195.53.225 - - [19/Oct/2015:18:17:44 +0000] "GET /category/finance HTTP/1.1" 200 114 "/item/books/597" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
156.48.113.173 - - [19/Oct/2015:18:17:44 +0000] "GET /item/toys/2687 HTTP/1.1" 200 129 "/category/books" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; YTB730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)"
28.192.173.67 - - [19/Oct/2015:18:17:44 +0000] "GET /category/software HTTP/1.1" 200 117 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7"
52.36.40.92 - - [19/Oct/2015:18:17:44 +0000] "POST /search/?c=Electronics HTTP/1.1" 200 120 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
148.99.222.197 - - [19/Oct/2015:18:17:44 +0000] "GET /category/music HTTP/1.1" 200 128 "/category/finance" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
200.156.28.119 - - [19/Oct/2015:18:17:44 +0000] "GET /category/toys HTTP/1.1" 200 107 "/category/giftcards" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
188.132.216.195 - - [19/Oct/2015:18:17:44 +0000] "GET /item/electronics/3284 HTTP/1.1" 200 123 "/search/?c=Electronics" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"

せっかくなので、”grep”を使って特定の(最初の2行の)IP以外のログを簡単に抽出してみます。

grep -v -e '116.168.225.172' -e '92.36.171.201' test.log | head
152.174.149.220 - - [19/Oct/2015:18:17:44 +0000] "GET /category/toys HTTP/1.1" 200 72 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; YTB730; GTB7.2; EasyBits GO v1.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)"
96.195.53.225 - - [19/Oct/2015:18:17:44 +0000] "GET /category/finance HTTP/1.1" 200 114 "/item/books/597" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
156.48.113.173 - - [19/Oct/2015:18:17:44 +0000] "GET /item/toys/2687 HTTP/1.1" 200 129 "/category/books" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; YTB730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)"
28.192.173.67 - - [19/Oct/2015:18:17:44 +0000] "GET /category/software HTTP/1.1" 200 117 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7"
52.36.40.92 - - [19/Oct/2015:18:17:44 +0000] "POST /search/?c=Electronics HTTP/1.1" 200 120 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
148.99.222.197 - - [19/Oct/2015:18:17:44 +0000] "GET /category/music HTTP/1.1" 200 128 "/category/finance" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
200.156.28.119 - - [19/Oct/2015:18:17:44 +0000] "GET /category/toys HTTP/1.1" 200 107 "/category/giftcards" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
188.132.216.195 - - [19/Oct/2015:18:17:44 +0000] "GET /item/electronics/3284 HTTP/1.1" 200 123 "/search/?c=Electronics" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
176.87.120.37 - - [19/Oct/2015:18:17:44 +0000] "POST /search/?c=Games+Music HTTP/1.1" 200 98 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
64.150.185.107 - - [19/Oct/2015:18:17:44 +0000] "GET /category/books HTTP/1.1" 200 113 "-" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"

問題なく該当するIPを含む行(最初の2行)が除外されました。

って処理を、もっと大量なデータでEMR(HIVE)を使って試してみる準備です。
(最終的にはLambdaから実行)

元記事はこちら

“apache-loggen”でダミーのアクセスログを作成する