Key AWS Elastic Load Balancer metrics you need to monitor

02-Jan-2017 11:49 AM by Lakshmi Narayan J

A month back we announced public Beta monitoring for two new AWS Services - Classic type Elastic Load Balancer (ELB) and Simple Notifications Service (SNS), this included dashboards. alerting and performance reports for some of the most important metrics. This post is here to help you understand why a continuous monitoring of an Elastic load balancer is required.

Monitoring your dynamic AWS cloud infrastructure and is quite complex. With AWS expanding its portfolio of services, it has become imperative to not only monitor cloud server instances but also other dependency services that help power your applications in the cloud. One such service is the Elastic Load Balancer (ELB). The ELB is an integral component of your AWS architecture. It makes your application environment more resilient and fault-tolerant by distributing the incoming traffic to multiple instances in different availability zones. Your load balancer, EC2 instances, and AutoScaling all work together, making it easier for you to build and scale applications.

With Site24x7 including support for ELB, you can now gather and visualize data for your ELB nodes. The ELB metrics you can monitor in Site24x7 include:

Request count

Analyze the amount of inbound traffic received by your ELB. Check whether the number of requests coming in is growing on a day to day basis or is highly variable. Use this data to make decisions on whether to add or remove instances or to configure AutoScaling policy groups.

Latency

From an ELB perspective, measure how long it takes for your back-end instances to generate a first-byte response to the HTTP request received. This metric will give you an idea of the latency experienced by your clients. Monitor average latency to identify latency patterns and check latency spikes to correlate resource overload.

Surge queue length

The surge queue length metric will start to increase when your back-end instances are unable to process application requests as fast as they are coming in. A number of reasons can be attributed to this, chief among them are the capacity constraints experienced by your backend instances. Improve computational capacity by scaling up instances to ensure your surge queue length never increases beyond the maximum queue capacity.

Spillover count

Spillover is a direct consequence of the surge queue length increase. When your ELB reaches its maximum queue capacity, it will start dropping the new requests coming in. Monitor to get alerted before a potentially high spillover count occurs, resulting in dropped requests and poor end-user experience.

ELB 5XX errors

HTTP 5XX error codes are generated when there is an issue either with your load balancer or in your back-end instance. Troubleshoot these errors by configuring the optimal application idle timeout and also by ensuring there are enough healthy EC2 instances registered to the availability zone.

ELB 4XX errors

HTTP 4XX error codes are generated when clients send faulty or malformed requests to the load balancer. Though potential causes for these errors can be guessed, not much can be done to troubleshoot.

Back-end connection errors

Back-end connection errors occur when your ELB is unable to successfully connect with your EC2 instances. This type of error generally occurs when your EC2 instances are overloaded or when packet loss occurs due to some network issue. Diagnose and troubleshoot connection errors, by monitoring your application, network, and instance resource utilization.

Healthy and unhealthy host count

Your ELB labels EC2 instances either "in service" or "out of service" by periodically checking them using ping checks or requests. In AWS, this is called an "ELB health check." By doing this, the load balancer separates the unhealthy instances from the healthy ones.

But make sure your health check interval isn't too restrictive, as it might cause the ELB to remove healthy instances from the pool, increasing the unhealthy host count value. Optimize response timeout and health check intervals configurations to ensure there are enough healthy backend EC2 instances

Whether you are a seasoned AWS user or are just testing the waters of AWS, monitoring these metrics provides a great baseline for identifying and troubleshooting errors associated with your load balancer. Signup for a 30-day free trial and start monitoring your AWS services today. And if you already have a Site24x7 account, log in and give ELB monitoring a try.