EKS CloudWatch Agent Module

This Terraform Module installs and configures Amazon CloudWatch Agent on an EKS cluster, so that each node runs the agent to collect more system-level metrics from Amazon EC2 instances and ship them to Amazon CloudWatch. This extra metric data allows using CloudWatch Container Insights for a single pane of glass for application, performance, host, control plane, data plane insights.

This module uses the community helm chart, with a set of best practices inputs.

This module is for setting up CloudWatch Agent for EKS clusters with worker nodes (self-managed or managed node groups) that have support for DaemonSets. CloudWatch Container Insights is not supported for EKS Fargate.

How does this work?

CloudWatch automatically collects metrics for many resources, such as CPU, memory, disk, and network. Container Insights also provides diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly.

In Amazon EKS and Kubernetes, using Container Insights requires using a containerized version of the CloudWatch agent to discover all of the running containers in a cluster. It collects performance data at every layer of the performance stack as log events using embedded metric format. From this data, CloudWatch creates aggregated metrics at the cluster, node, pod, task, and service level as CloudWatch metrics. The metrics that Container Insights collects are available in CloudWatch automatic dashboards, and also viewable in the Metrics section of the CloudWatch console.

cloudwatch-agent is installed as a Kubernetes DaemonSet, which ensures that there is one cloudwatch-agent Pod running per node. In this way, we are able to ensure that all workers in the cluster are running the cloudwatch-agent service for shipping the metric data into CloudWatch.

Note that metrics collected by CloudWatch Agent are charged as custom metrics. For more information about CloudWatch pricing, see Amazon CloudWatch Pricing.

You can read more about cloudwatch-agent in the GitHub repository. You can also learn more about Container Insights in the official AWS docs.

Reference

Inputs
Outputs

Required

eks_cluster_namestringrequired

Name of the EKS cluster where resources are deployed to.

iam_role_for_service_accounts_configobject(…)required

Configuration for using the IAM role with Service Accounts feature to provide permissions to the helm charts. This expects a map with two properties: openid_connect_provider_arn and openid_connect_provider_url. The openid_connect_provider_arn is the ARN of the OpenID Connect Provider for EKS to retrieve IAM credentials, while openid_connect_provider_url is the URL. Set to null if you do not wish to use IAM role with Service Accounts.

Type Details

object({
    openid_connect_provider_arn = string
    openid_connect_provider_url = string
  })

Optional

aws_cloudwatch_agent_image_repositorystringoptional

The Container repository to use for looking up the cloudwatch-agent Container image when deploying the pods. When null, uses the default repository set in the chart.

Default:null

aws_cloudwatch_agent_versionstringoptional

Which version of amazon/cloudwatch-agent to install. When null, uses the default version set in the chart.

Default:null

aws_cloudwatch_metrics_chart_versionstringoptional

The version of the aws-cloudwatch-metrics helm chart to deploy. Note that this is different from the app/container version (use aws_cloudwatch_agent_version to control the app/container version).

Default:"0.0.7"

dependencieslist(string)optional

Create a dependency between the resources in this module to the interpolated values in this list (and thus the source resources). In other words, the resources in this module will now depend on the resources backing the values in this list such that those resources need to be created before the resources in this module, and the resources in this module need to be destroyed before the resources in the list.

Default:[]

iam_role_name_prefixstringoptional

Used to name IAM roles for the service account. Recommended when iam_role_for_service_accounts_config is configured.

Default:null

namespacestringoptional

Namespace to create the resources in.

Default:"kube-system"

pod_node_affinitylist(object(…))optional

Configure affinity rules for the Pod to control which nodes to schedule on. Each item in the list should be a map with the keys key, values, and operator, corresponding to the 3 properties of matchExpressions. Note that all expressions must be satisfied to schedule on the node.

Type Details

list(object({
    key      = string
    values   = list(string)
    operator = string
  }))

Default:[]

More Details

Details

   Each item in the list represents a matchExpression for requiredDuringSchedulingIgnoredDuringExecution.
   https://kubernetes.io/docs/concepts/configuration/assign-pod-node/affinity-and-anti-affinity for the various
   configuration option.
  
   Example:
  
   [
     {
       "key" = "node-label-key"
       "values" = ["node-label-value", "another-node-label-value"]
       "operator" = "In"
     }
   ]
  
   Translates to:
  
   nodeAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
       nodeSelectorTerms:
       - matchExpressions:
         - key: node-label-key
           operator: In
           values:
           - node-label-value
           - another-node-label-value

pod_resourcesanyoptional

Specify the resource limits and requests for the cloudwatch-agent pods. Set to null (default) to use chart defaults.

Type Details

Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.

Default:null

More Details

Details

   This object is passed through to the resources section of a pod spec as described in
   https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
   Example:
  
   {
     requests = {
       cpu    = "250m"
       memory = "128Mi"
     }
     limits = {
       cpu    = "500m"
       memory = "256Mi"
     }
   }

pod_tolerationsanyoptional

Configure tolerations rules to allow the Pod to schedule on nodes that have been tainted. Each item in the list specifies a toleration rule.

Type Details

Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.

Default:[]

More Details

Details

   Each item in the list represents a particular toleration. See
   https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ for the various rules you can specify.
  
   Example:
  
   [
     {
       key = "node.kubernetes.io/unreachable"
       operator = "Exists"
       effect = "NoExecute"
       tolerationSeconds = 6000
     }
   ]

EKS CloudWatch Agent Module

How does this work?​

Reference​

Required​

Optional​

How does this work?

Reference

Required

Optional