Amazon Elastic Kubernetes Service Monitoring Integration

Amazon Elastic Kubernetes Service (Amazon EKS) enables you to easily deploy, manage, and scale containerized applications using Kubernetes on AWS. With Kubernetes  you can automate the deployment, scaling, and management of containerized applications at scale.

With Site24x7's integration, monitor your Amazon EKS at the cluster, node, and namespace level to achieve full-stack visibility into your Amazon EKS.

Setup and Configuration

1. If you haven't already, enable access to your AWS resources between your AWS account and Site24x7's AWS account by either:

  • Creating Site24x7 as an IAM user.
  • Creating a cross-account IAM role. Learn more

2. On the Integrate AWS Account page, check the box next to Amazon EKS. Learn more

Prerequisite

  • Install the Container Insights on Amazon EKS. Learn more

Policy and Permissions

Site24x7 uses various Amazon EKS APIs to collect information about your clusters. Assign the AWS Managed policy ReadOnlyAccess to the Site24x7 entity (IAM user or IAM role) to help Site24x7 collect metrics and metadata. If you want to assign a custom policy, please make sure the following read-level actions are present in the policy JSON. Learn more

  • "eks:DescribeCluster",
  • "eks:ListClusters",
  • "cloudwatch:ListMetrics"

Polling Frequency

Site24x7 collects metric data on your clusters, namespace and nodes as per the poll frequency set, ranging from one minute up to one day. Learn more

Cluster-level metrics

CloudWatch metricDescriptionStatisticData Type
cluster_failed_node_count Number of failed nodes in a cluster Maximum Nodes
cluster_node_count Total nodes in a cluster Maximum Nodes
namespace_number_of_running_pods Number of pods running in namespaces Maximum Pods
service_number_of_running_pods Number of pods running in services Maximum Pods
node_number_of_running_pods Number of pods running in nodes Maximum Pods
node_number_of_running_containers Number of containers running in nodes Maximum Containers
node_cpu_usage_total CPU used by all nodes Maximum Units
node_cpu_limit CPU assigned to nodes Maximum Units
node_cpu_reserved_capacity CPU reserved for nodes Average Percentage
node_cpu_utilization CPU used by nodes Average Percentage
node_filesystem_utilization File system capacity on nodes Average Percentage
node_memory_limit Memory assigned to nodes Maximum MB
node_memory_working_set Memory used in working sets of nodes Average MB
node_memory_reserved_capacity Memory reserved for nodes Average Percentage
node_memory_utilization Memory utilized by nodes Average Percentage
node_network_total_bytes Total network traffic in nodes Sum MB/sec
pod_cpu_reserved_capacity CPU reserved for pods Average Percentage
pod_cpu_utilization CPU utilized by pods Average Percentage
pod_cpu_utilization_over_pod_limit CPU utilized over pod limit Average Percentage
pod_memory_reserved_capacity Memory reserved for pods Average Percentage
pod_memory_utilization Memory utilized by pods Average Percentage
pod_memory_utilization_over_pod_limit Memory utilized over pod limit Average Percentage
pod_network_rx_bytes Total bytes received by pods Sum MB/sec
pod_network_tx_bytes Total bytes sent by pods Sum MB/sec

Node-level metrics

CloudWatch metricDescriptionStatisticData Type
node_number_of_running_pods Number of pods running in nodes Maximum Pods
node_number_of_running_containers Number of containers running in nodes Maximum Containers
node_cpu_reserved_capacity CPU reserved for nodes Average Percentage
node_cpu_utilization CPU used by nodes Average Percentage
node_filesystem_utilization File system capacity on nodes Average Percentage
node_memory_reserved_capacity Memory reserved for nodes Average Percentage
node_memory_utilization Memory utilized by nodes Average Percentage
node_network_total_bytes Total network traffic in nodes Sum MB/sec

Namespace-level metrics

CloudWatch metricDescriptionStatisticData Type
namespace_number_of_running_pods Number of pods running in namespaces Maximum Pods
pod_cpu_utilization CPU utilized by pods Average Percentage
pod_cpu_utilization_over_pod_limit CPU utilized over pod limit Average Percentage
pod_memory_utilization Memory utilized by pods Average Percentage
pod_memory_utilization_over_pod_limit Memory utilized over pod limit Average Percentage
pod_network_rx_bytes Total bytes received by pods Sum MB/sec
pod_network_tx_bytes Total bytes sent by pods Sum MB/sec

Service-level metrics

CloudWatch metricDescriptionStatisticData Type
service_number_of_running_pods Number of pods running in services Maximum Pods
pod_cpu_utilization CPU Utilized by pods Average Percentage
pod_cpu_utilization_over_pod_limit CPU Utilized over pod limit Average Percentage
pod_memory_utilization Memory utilized by pods Average Percentage
pod_memory_utilization_over_pod_limit Memory utilized over pod limit Average Percentage
pod_network_rx_bytes Total bytes received by pods Sum MB/sec
pod_network_tx_bytes Total bytes sent by pods Sum MB/sec

Pod-level metrics

CloudWatch metricDescriptionStatisticData Type
pod_cpu_reserved_capacity CPU reserved for pods Average Percentage
pod_cpu_utilization CPU Utilized by pods Average Percentage
pod_cpu_utilization_over_pod_limit CPU utilized over pod limit Average Percentage
pod_memory_reserved_capacity Memory reserved for pods Average Percentage
pod_memory_utilization Memory utilized by pods Average Percentage
pod_memory_utilization_over_pod_limit Memory utilized over pod limit Average Percentage
pod_network_rx_bytes Total bytes received by pods Sum MB/sec
pod_network_tx_bytes Total bytes sent by pods Sum MB/sec
pod_number_of_container_restarts Number of container restarts Maximum Containers

Threshold Configuration

Go to Admin > Configuration Profiles > Threshold and Availability (+) > choose the monitor type as EKS Cluster/EKS Node/EKS Namespace. You can set threshold values for all the metrics mentioned above. Further, for EKS Namespace and EKS Node monitors, you can set inactive namespaces and nodes respectively into maintenance in the threshold form.

Site24x7's EKS monitoring interface

Summary

Gain an overview of different events occurring within each resource with time series charts. These charts provide event timelines on CPU utilization and memory utilization at a pod and node level in percentage, total bytes sent or received, the file system capacity, and the number of running containers and pods. All time series charts have the average, minimum, and maximum values listed.

Node and Namespace Details

Here you can view a list of nodes and namespaces associated with your Elastic Kubernetes environment. Click on an individual listing to see performance and resource usage stats associated with that resource. You can also set thresholds and be notified when any of these services fail by clicking the pencil icon under Action.

Logs

Collect EKS control plane log entries for selected log types, with the logs being fetched from CloudWatch and categorized under log stream name.

Configuration

The configuration details of an EKS are provided under this tab. Details on the resource name, endpoint URL, region of a resource, status of a resource, security groups, subnets, VPC ID, status on the public access/private access, security groups, and many more are provided in this section.

Was this document helpful?
Thanks for taking the time to share your feedback. We’ll use your feedback to improve our online help resources.