Elasticsearch でダミーデータがほしい

と思って調べたら apache-loggen というツールがあるらしいので試してみた。

apache-loggen とは

Apache のダミーログを出力します

Apache のダミーログを出力し続ける gem のようです。

インストールとヘルプ

インストール

sudo gem install apache-loggen --no-ri --no-rdoc -V

ヘルプ

$ apache-loggen -h
Usage: apache-loggen [options]
        --limit=COUNT                最大何件出力するか。デフォルトは0で無制限。
        --rate=RATE                  毎秒何レコード生成するか。デフォルトは0で流量制限無し。
        --rotate=SECOND              ローテーションする間隔。デフォルトは0。
        --progress                   レートの表示をする。
        --json                       json形式の出力

使ってみる

毎秒 10 レコードで出力する

以下のように実行する。

$ apache-loggen --rate=10 --limit=10

以下のように標準出力に出力される。

108.81.70.158 - - [16/Jun/2015:13:59:34 +0000] "GET /item/electronics/3717 HTTP/1.1" 200 86 "-" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
176.33.96.225 - - [16/Jun/2015:13:59:35 +0000] "GET /category/cameras HTTP/1.1" 200 88 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
32.129.29.109 - - [16/Jun/2015:13:59:35 +0000] "GET /item/toys/2278 HTTP/1.1" 200 129 "/search/?c=Toys" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
192.75.60.93 - - [16/Jun/2015:13:59:35 +0000] "POST /search/?c=Giftcards+Electronics HTTP/1.1" 200 127 "/category/giftcards" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
72.42.173.76 - - [16/Jun/2015:13:59:35 +0000] "GET /category/software HTTP/1.1" 200 66 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
168.180.150.153 - - [16/Jun/2015:13:59:35 +0000] "GET /category/books HTTP/1.1" 200 107 "/item/software/3528" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
172.171.186.182 - - [16/Jun/2015:13:59:35 +0000] "POST /search/?c=Software+Networking HTTP/1.1" 200 80 "-" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
212.147.150.123 - - [16/Jun/2015:13:59:35 +0000] "GET /category/electronics HTTP/1.1" 200 49 "/category/books?from=20" "Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"
140.210.67.42 - - [16/Jun/2015:13:59:35 +0000] "GET /item/electronics/1466 HTTP/1.1" 200 80 "/category/software" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0.1) Gecko/20100101 Firefox/9.0.1"
104.33.70.151 - - [16/Jun/2015:13:59:35 +0000] "GET /category/garden?from=10 HTTP/1.1" 200 89 "/item/garden/2967" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
152.90.29.208 - - [16/Jun/2015:13:59:35 +0000] "GET /category/electronics HTTP/1.1" 200 41 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0.1) Gecko/20100101 Firefox/9.0.1"

おお、まさに Apache のアクセスログ。--limit 10 で 10 件出力を指定しているものの 11 件出力されているな。まあ、誤差の範囲ということで。

json 形式で出力する

apache-loggen には JSON フォーマットでダミーのログを出力することもできる。

$ apache-loggen --rate=10 --limit=10 --json
{"host":"96.201.193.203","user":"-","method":"GET","path":"/category/finance","code":200,"referer":"http://www.google.com/search?ie=UTF-8&q=google&sclient=psy-ab&q=Finance+Sports&oq=Finance+Sports&aq=f&aqi=g-vL1&aql=&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&biw=3162&bih=221","size":80,"agent":"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; Trident/4.0; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; .NET4.0C)"}
{"host":"132.153.70.200","user":"-","method":"GET","path":"/item/books/687","code":200,"referer":"/category/office","size":71,"agent":"Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"}
{"host":"208.198.30.167","user":"-","method":"GET","path":"/category/giftcards?from=20","code":200,"referer":"/category/software","size":138,"agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"}
{"host":"116.138.59.46","user":"-","method":"GET","path":"/category/sports","code":200,"referer":"/item/toys/2678","size":96,"agent":"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; YTB730; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C)"}
{"host":"96.75.52.63","user":"-","method":"GET","path":"/item/software/2714","code":200,"referer":"/search/?c=Software","size":111,"agent":"Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1"}
{"host":"76.213.205.148","user":"-","method":"POST","path":"/search/?c=Electronics+Software","code":200,"referer":"-","size":108,"agent":"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; BTRS122159; GTB7.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; BRI/2)"}
{"host":"144.93.224.58","user":"-","method":"GET","path":"/category/software","code":200,"referer":"/search/?c=Software","size":135,"agent":"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"}
{"host":"124.93.120.210","user":"-","method":"GET","path":"/category/software","code":200,"referer":"-","size":134,"agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"}
{"host":"100.99.79.157","user":"-","method":"GET","path":"/category/games","code":200,"referer":"http://www.google.com/search?ie=UTF-8&q=google&sclient=psy-ab&q=Games&oq=Games&aq=f&aqi=g-vL1&aql=&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&biw=4855&bih=281","size":89,"agent":"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; YTB720; GTB7.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"}
{"host":"132.45.136.52","user":"-","method":"GET","path":"/category/networking","code":200,"referer":"-","size":58,"agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7"}
{"host":"80.24.204.225","user":"-","method":"GET","path":"/category/electronics","code":200,"referer":"/category/cameras","size":73,"agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"}

おお、JSON フォーマットだ。

apache-loggen → fluentd → Elasticsearch

Elasticsearch の設定については…

  • 割愛

fluentd の設定から起動

  • 以下のような test.conf を用意

  type tail
  format apache
  path /tmp/dummy_access_log
  tag dummy.apache.log



  index_name adminpack
  type_name apache
  type elasticsearch
  include_tag_key true
  tag_key @log_name
  host localhost
  port 9200
  logstash_format true
  flush_interval 10s

念の為の dry-run を。

fluentd -c test.conf --dry-run

以下のように出力。緑が綺麗。

20150618222142

  • fluentd の起動
fluentd -c test.conf -l debug.log &

apache-loggen でログを吐く

apache-loggen --rate=10 --limit=100 --progress /tmp/dummy_access_log
100[rec] 9.91[rec/s]

Elasticsearch に登録されているのドキュメントをカウントしてみる。

$ curl localhost:9200/logstash-2015.06.20/_count?pretty=true
{
  "count" : 101,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

ドキュメントを 1 件取得してみる。

$ curl 'localhost:9200/logstash-2015.06.20/apache/_search?size=1&pretty=true'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 101,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "logstash-2015.06.20",
      "_type" : "apache",
      "_id" : "AU4O5dX5wPtNHwv9m4wo",
      "_score" : 1.0,
      "_source":{"host":"108.201.189.227","user":"-","method":"GET","path":"/category/office?from=10","code":"200","size":"78","referer":"/search/?c=Office+Jewelry","agent":"Mozilla/5.0 (Windows NT 6.0; rv:10.0.1) Gecko/20100101 Firefox/10.0.1","@log_name":"dummy.apache.log","@timestamp":"2015-06-20T02:56:54+00:00"}
    } ]
  }
}

おお、登録されている。

元記事はこちら

apache-loggen を使って Apache アクセスログのダミーログを生成する