Datadog 公式で Terraform を使った管理方法のブログが公開されていました。(多分2017/04/07公開)

Datadog Provider は結構前から用意されていたようですが、 触れたことが無かったので、ほぼDatadogブログの内容のままですが実際に使用してみます。

目次

Datadog Provider

Terraform 公式ドキュメントは以下です。

管理できるリソースは今の所以下の4つです。

  • Downtime
  • Monitor
  • Timeboard
  • User

Datadog API Key 設定

Terraform はインストール済みの前提です。 今回利用したバージョンは Terraform v0.9.2

tfvars

Datadog の API Key を設定した tfvars ファイルを作成します。

$ cat terraform.tfvars
datadog_api_key="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
datadog_app_key="YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"

tf

tfvars から API Key を読み込みます。main.tfとして作成しました。

$ cat main.tf
# Variables
variable "datadog_api_key" {}
variable "datadog_app_key" {}

# Configure the Datadog provider
provider "datadog" {
  api_key = "${var.datadog_api_key}"
  app_key = "${var.datadog_app_key}"
}

plan

API Key 設定を行った後、リソース部分が空の状態で plan 実行した結果が以下になります。 エラー出力されるようなら設定に何かしら誤りがあります。

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
# Variables
persisted to local or remote state storage.

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, Terraform
doesn't need to do anything.

実行例(Monitor)

モニター定義を管理します。

パラメータ未指定での実行

パラメータを未指定で実行すると必須パラメータ不足のエラーとなります。

cat monitor.tf
# Monitors
resource "datadog_monitor" "cpumonitor" {
}
$ terraform plan
4 error(s) occurred:

* datadog_monitor.cpumonitors: "message": required field is not set
* datadog_monitor.cpumonitors: "name": required field is not set
* datadog_monitor.cpumonitors: "query": required field is not set
* datadog_monitor.cpumonitors: "type": required field is not set

必須パラメータ指定での実行

必須パラメータ name,type,message,query を指定してplan実行します。

$ cat monitor.tf
# Monitors
resource "datadog_monitor" "cpumonitor" {
  name = "cpu monitor"
  type = "metric alert"
  message = "CPU usage alert"
  query = "avg(last_1m):avg:system.cpu.system{*} by {host} > 60"
}
$ terraform plan
・・・
+ datadog_monitor.cpumonitor
    include_tags:        "true"
    message:             "CPU usage alert"
    name:                "cpu monitor"
    new_host_delay:      ""
    notify_no_data:      "false"
    query:               "avg(last_1m):avg:system.cpu.system{*} by {host} > 60"
    require_full_window: "true"
    type:                "metric alert"


Plan: 1 to add, 0 to change, 0 to destroy.

plan は問題無いので apply を実行します。

$ terraform apply
datadog_monitor.cpumonitor: Creating...
  include_tags:        "" => "true"
  message:             "" => "CPU usage alert"
  name:                "" => "cpu monitor"
  new_host_delay:      "" => ""
  notify_no_data:      "" => "false"
  query:               "" => "avg(last_1m):avg:system.cpu.system{*} by {host} > 60"
  require_full_window: "" => "true"
  type:                "" => "metric alert"
datadog_monitor.cpumonitor: Creation complete (ID: XXXX732)

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

実行結果

事前にインポートを行っていない限りは、新規作成となります。

$ terraform show
datadog_monitor.cpumonitor:
  id = XXXX732
  include_tags = true
  message = CPU usage alert
  name = cpu monitor
  notify_no_data = false
  query = avg(last_1m):avg:system.cpu.system{*} by {host} > 60
  require_full_window = true
  type = metric alert
$ cat terraform.tfstate
{
    "version": 3,
    "terraform_version": "0.9.2",
    "serial": 0,
    "lineage": "f751ef78-ced3-4035-896b-aa0008b760e3",
    "modules": [
        {
            "path": [
                "root"
            ],
            "outputs": {},
            "resources": {
                "datadog_monitor.cpumonitor": {
                    "type": "datadog_monitor",
                    "depends_on": [],
                    "primary": {
                        "id": "XXXX732",
                        "attributes": {
                            "id": "XXXX732",
                            "include_tags": "true",
                            "message": "CPU usage alert",
                            "name": "cpu monitor",
                            "notify_no_data": "false",
                            "query": "avg(last_1m):avg:system.cpu.system{*} by {host} \u003e 60",
                            "require_full_window": "true",
                            "type": "metric alert"
                        },
                        "meta": {},
                        "tainted": false
                    },
                    "deposed": [],
                    "provider": ""
                }
            },
            "depends_on": []
        }
    ]
}

変更無しの状態で確認

何も変更を行っていない状態で、更新が掛からない事を確認します。

$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

datadog_monitor.cpumonitor: Refreshing state... (ID: XXXX732)
No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, Terraform
doesn't need to do anything.
$ terraform apply
datadog_monitor.cpumonitor: Refreshing state... (ID: XXXX732)

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

更新

閾値を追加して更新を行います。

$ cat monitor.tf
# Monitors
resource "datadog_monitor" "cpumonitor" {
  name = "cpu monitor"
  type = "metric alert"
  message = "CPU usage alert"
  query = "avg(last_1m):avg:system.cpu.system{*} by {host} > 60"
  thresholds {
    ok = 20
    warning = 50
    critical = 60
  }
}
$ terraform plan
・・・
datadog_monitor.cpumonitor: Refreshing state... (ID: XXXX732)
・・・

~ datadog_monitor.cpumonitor
    thresholds.%:        "0" => "3"
    thresholds.critical: "" => "60"
    thresholds.ok:       "" => "20"
    thresholds.warning:  "" => "50"


Plan: 0 to add, 1 to change, 0 to destroy.
$ terraform apply
datadog_monitor.cpumonitor: Refreshing state... (ID: XXXX732)
datadog_monitor.cpumonitor: Modifying... (ID: XXXX732)
  thresholds.%:        "0" => "3"
  thresholds.critical: "" => "60"
  thresholds.ok:       "" => "20"
  thresholds.warning:  "" => "50"
datadog_monitor.cpumonitor: Modifications complete (ID: XXXX732)

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
・・・

更新結果

閾値設定されたことを確認できます。

$ terraform show
datadog_monitor.cpumonitor:
  id = XXXX732
  escalation_message =
  include_tags = true
  locked = false
  message = CPU usage alert
  name = cpu monitor
  new_host_delay = 300
  no_data_timeframe = 0
  notify_audit = false
  notify_no_data = false
  query = avg(last_1m):avg:system.cpu.system{*} by {host} > 60
  renotify_interval = 0
  require_full_window = true
  silenced.% = 0
  tags.# = 0
  thresholds.% = 3
  thresholds.critical = 60.0
  thresholds.ok = 20.0
  thresholds.warning = 50.0
  timeout_h = 0
  type = metric alert

show結果を見ると、指定していないパラメータについても値が出力されています。
注意点としては、これらのデフォルト値は Datadog API ではなく、 Terraform provider 側で指定される事です。

削除

Terraformで管理している設定の削除を実行します。

$ terraform destroy
Do you really want to destroy?
  Terraform will delete all your managed infrastructure.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

datadog_monitor.cpumonitor: Refreshing state... (ID: XXXX732)
datadog_monitor.cpumonitor: Destroying... (ID: XXXX732)
datadog_monitor.cpumonitor: Destruction complete

Destroy complete! Resources: 1 destroyed.

AWS EC2 インスタンス起動と合わせて Monitor 作成

他Providerと組み合わせる例として EC2インスタンスとの連携例がありました。

ec2.tf

$ cat ec2.tf
# Configure the AWS Provider
provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region     = "ap-northeast-1"
}

resource "aws_instance" "base" {
  ami = "ami-859bbfe2" # Amazon Linux AMI 2017.03.0 (HVM), SSD Volume Type
  instance_type = "t2.micro"
}

resource "datadog_monitor" "cpumonitor" {
  name = "cpu monitor ${aws_instance.base.id}"
  type = "metric alert"
  message = "CPU usage alert"
  query = "avg(last_1m):avg:system.cpu.system{host:${aws_instance.base.id}} by {host} > 10"
  new_host_delay = 30
}

plan

$ terraform plan
・・・
+ aws_instance.base
    ami:                         "ami-859bbfe2"
    associate_public_ip_address: ""
    availability_zone:           ""
    ebs_block_device.#:          ""
    ephemeral_block_device.#:    ""
    instance_state:              ""
    instance_type:               "t2.micro"
    ipv6_addresses.#:            ""
    key_name:                    ""
    network_interface_id:        ""
    placement_group:             ""
    private_dns:                 ""
    private_ip:                  ""
    public_dns:                  ""
    public_ip:                   ""
    root_block_device.#:         ""
    security_groups.#:           ""
    source_dest_check:           "true"
    subnet_id:                   ""
    tenancy:                     ""
    vpc_security_group_ids.#:    ""

+ datadog_monitor.cpumonitor
    include_tags:        "true"
    message:             "CPU usage alert"
    name:                "cpu monitor ${aws_instance.base.id}"
    new_host_delay:      "30"
    notify_no_data:      "false"
    query:               "avg(last_1m):avg:system.cpu.system{host:${aws_instance.base.id}} by {host} > 10"
    require_full_window: "true"
    type:                "metric alert"


Plan: 2 to add, 0 to change, 0 to destroy.

apply

$ terraform apply
aws_instance.base: Creating...
  ami:                         "" => "ami-859bbfe2"
  associate_public_ip_address: "" => ""
  availability_zone:           "" => ""
  ebs_block_device.#:          "" => ""
  ephemeral_block_device.#:    "" => ""
  instance_state:              "" => ""
  instance_type:               "" => "t2.micro"
  ipv6_addresses.#:            "" => ""
  key_name:                    "" => ""
  network_interface_id:        "" => ""
  placement_group:             "" => ""
  private_dns:                 "" => ""
  private_ip:                  "" => ""
  public_dns:                  "" => ""
  public_ip:                   "" => ""
  root_block_device.#:         "" => ""
  security_groups.#:           "" => ""
  source_dest_check:           "" => "true"
  subnet_id:                   "" => ""
  tenancy:                     "" => ""
  vpc_security_group_ids.#:    "" => ""
aws_instance.base: Still creating... (10s elapsed)
aws_instance.base: Still creating... (20s elapsed)
aws_instance.base: Creation complete (ID: i-0XXXXXXXXXXX6f52e)
datadog_monitor.cpumonitor: Creating...
  include_tags:        "" => "true"
  message:             "" => "CPU usage alert"
  name:                "" => "cpu monitor i-0XXXXXXXXXXX6f52e"
  new_host_delay:      "" => "30"
  notify_no_data:      "" => "false"
  query:               "" => "avg(last_1m):avg:system.cpu.system{host:i-0XXXXXXXXXXX6f52e} by {host} > 10"
  require_full_window: "" => "true"
  type:                "" => "metric alert"
datadog_monitor.cpumonitor: Creation complete (ID: XXXX862)

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
・・・
  • EC2インスタンス起動
|        AZ        |      InstanceId       | InstanceType  |  State   |
+------------------+-----------------------+---------------+----------+
|  ap-northeast-1a |  i-0XXXXXXXXXXX6f52e  |  t2.micro     |  running |
  • インスタンスIDを指定した Monitor 作成

WebUI上での手動更新を行う

変更点が無い状態であることを確認します。

$ terraform plan
・・・
aws_instance.base: Refreshing state... (ID: i-0XXXXXXXXXXX6f52e)
datadog_monitor.cpumonitor: Refreshing state... (ID: XXXX862)
No changes. Infrastructure is up-to-date.
・・・

DatadogのWebUI上でMonitorを更新します。

再度planを実行します。

$ terraform plan -target datadog_monitor.cpumonitor
・・・
aws_instance.base: Refreshing state... (ID: i-0XXXXXXXXXXX6f52e)
datadog_monitor.cpumonitor: Refreshing state... (ID: XXXX862)
・・・
~ datadog_monitor.cpumonitor
    name:                "cpu monitor terraform-dd-test" => "cpu monitor i-0XXXXXXXXXXX6f52e"
    no_data_timeframe:   "2" => "0"
    thresholds.%:        "1" => "0"
    thresholds.critical: "10.0" => ""


Plan: 0 to add, 1 to change, 0 to destroy.

変更をしていないパラメータも変更有りと認識されるようになってしまいました。 Datadog Provider に限りませんが、意図しない更新には注意が必要です。

まとめ

Terraform でのホスト管理にDatadog監視設定も併せて設定できます。 Datadog上のリソースはID指定となっているため、他の設定に影響することも無く、使い勝手は良いと思います。
期間限定で起動するインスタンスで他の設定に影響を与えず、管理・更新する等で使えそうです。

元記事はこちら

Terraform Datadog Provider を試してみる