EKS CloudWatch Agent Module
This Terraform Module installs and configures Amazon CloudWatch Agent on an EKS cluster, so that each node runs the agent to collect more system-level metrics from Amazon EC2 instances and ship them to Amazon CloudWatch. This extra metric data allows using CloudWatch Container Insights for a single pane of glass for application, performance, host, control plane, data plane insights.
This module uses the community helm chart, with a set of best practices inputs.
This module is for setting up CloudWatch Agent for EKS clusters with worker nodes (self-managed or managed node groups) that
have support for DaemonSets
. CloudWatch Container
Insights is not supported for EKS Fargate.
How does this work?
CloudWatch automatically collects metrics for many resources, such as CPU, memory, disk, and network. Container Insights also provides diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly.
In Amazon EKS and Kubernetes, using Container Insights requires using a containerized version of the CloudWatch agent to discover all of the running containers in a cluster. It collects performance data at every layer of the performance stack as log events using embedded metric format. From this data, CloudWatch creates aggregated metrics at the cluster, node, pod, task, and service level as CloudWatch metrics. The metrics that Container Insights collects are available in CloudWatch automatic dashboards, and also viewable in the Metrics section of the CloudWatch console.
cloudwatch-agent
is installed as a Kubernetes
DaemonSet
, which ensures that there is one
cloudwatch-agent
Pod
running per node. In this way, we are able to ensure that all workers in the cluster are running the
cloudwatch-agent
service for shipping the metric data into CloudWatch.
Note that metrics collected by CloudWatch Agent are charged as custom metrics. For more information about CloudWatch pricing, see Amazon CloudWatch Pricing.
You can read more about cloudwatch-agent
in the GitHub repository.
You can also learn more about Container Insights in the official AWS
docs.
Reference
- Inputs
- Outputs
Required
eks_cluster_name
stringName of the EKS cluster where resources are deployed to.
iam_role_for_service_accounts_config
object(…)Configuration for using the IAM role with Service Accounts feature to provide permissions to the helm charts. This expects a map with two properties: openid_connect_provider_arn
and openid_connect_provider_url
. The openid_connect_provider_arn
is the ARN of the OpenID Connect Provider for EKS to retrieve IAM credentials, while openid_connect_provider_url
is the URL. Set to null if you do not wish to use IAM role with Service Accounts.
object({
openid_connect_provider_arn = string
openid_connect_provider_url = string
})
Optional
The Container repository to use for looking up the cloudwatch-agent Container image when deploying the pods. When null, uses the default repository set in the chart.
null
Which version of amazon/cloudwatch-agent to install. When null, uses the default version set in the chart.
null
The version of the aws-cloudwatch-metrics helm chart to deploy. Note that this is different from the app/container version (use aws_cloudwatch_agent_version
to control the app/container version).
"0.0.7"
dependencies
list(string)Create a dependency between the resources in this module to the interpolated values in this list (and thus the source resources). In other words, the resources in this module will now depend on the resources backing the values in this list such that those resources need to be created before the resources in this module, and the resources in this module need to be destroyed before the resources in the list.
[]
iam_role_name_prefix
stringUsed to name IAM roles for the service account. Recommended when iam_role_for_service_accounts_config
is configured.
null
namespace
stringNamespace to create the resources in.
"kube-system"
pod_node_affinity
list(object(…))Configure affinity rules for the Pod to control which nodes to schedule on. Each item in the list should be a map with the keys key
, values
, and operator
, corresponding to the 3 properties of matchExpressions. Note that all expressions must be satisfied to schedule on the node.
list(object({
key = string
values = list(string)
operator = string
}))
[]
Details
Each item in the list represents a matchExpression for requiredDuringSchedulingIgnoredDuringExecution.
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/affinity-and-anti-affinity for the various
configuration option.
Example:
[
{
"key" = "node-label-key"
"values" = ["node-label-value", "another-node-label-value"]
"operator" = "In"
}
]
Translates to:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-label-key
operator: In
values:
- node-label-value
- another-node-label-value
Specify the resource limits and requests for the cloudwatch-agent pods. Set to null (default) to use chart defaults.
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
null
Details
This object is passed through to the resources section of a pod spec as described in
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Example:
{
requests = {
cpu = "250m"
memory = "128Mi"
}
limits = {
cpu = "500m"
memory = "256Mi"
}
}
Configure tolerations rules to allow the Pod to schedule on nodes that have been tainted. Each item in the list specifies a toleration rule.
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
[]
Details
Each item in the list represents a particular toleration. See
https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ for the various rules you can specify.
Example:
[
{
key = "node.kubernetes.io/unreachable"
operator = "Exists"
effect = "NoExecute"
tolerationSeconds = 6000
}
]