A Complete Overview of HAProxy Monitoring

HAProxy, or High Availability Proxy, is a popular load balancer for web applications. It’s optimized to perform well under high traffic, both on-premise and in the cloud. It’s scalable and secure by design, ships with major Linux distributions, and supports HTTP/3 over QUIC.

HAProxy is integral to routing TCP/HTTP requests in a distributed architecture. To avoid performance bottlenecks in the overall system, monitoring HAProxy is essential. In this article, we share a comprehensive guide to HAProxy monitoring. We’ll explain its architecture, briefly compare it with Nginx, elaborate why monitoring is essential, and discuss how to monitor key metrics using native tools.

What is HAProxy?

HAProxy is an open-source TCP/HTTP load balancer and reverse proxy. It’s widely used to distribute traffic across multiple servers based on predefined rules and algorithms. This load distribution increases the availability, reliability, and throughput of applications.

HAProxy supports Layer 4 (TCP) and Layer 7 (HTTP) load balancing. It works well with WebSocket and gRPC. It can be used to implement various routing features, such as URL rewriting, IP address load balancing, path-based writing, and rate limiting.

HAProxy also provides SSL termination out-of-the-box, a feature that encrypts/decrypts traffic at the load balancing layer. SSL termination protects against cyberattacks without adding any encryption/decryption overhead to the applications.

You can integrate HAProxy with many third-party applications to extend its capabilities. For example, HAProxy can be natively integrated with NS1, a DNS platform. Since it’s open-source, you can add custom functionality using the Lua scripting language.

HAProxy is used for wide-ranging use cases, including e-commerce, gaming, social media, and streaming services. In e-commerce, for example, HAProxy is deployed as a scalable load balancer to handle high volumes of requests during peak shopping periods. In gaming and streaming, it reduces latency by routing traffic to the nearest server or data center.

HAProxy architecture

HAProxy is known to deliver exceptional performance, reliability, and availability. As per the official website, HAProxy can handle up to 2 million requests over SSL and 100 Gbps of forwarded traffic. These impressive figures can be attributed to its flexible, event-driven architecture.

HAProxy is a non-blocking engine that enables concurrent processing of incoming traffic through a fast I/O layer and a multi-threaded, priority-based scheduler. As its primary objective is forwarding requests to application servers, HAProxy is designed to move data with minimal processing.

A typical HAProxy-powered architecture comprises a frontend and a backend. The frontend contains listening sockets that accept incoming requests from clients. The backend defines a group of application servers that will be assigned requests by the frontend. Decoupling the frontend and backend layers also makes it possible to scale each independently.

Multi-threading allows HAProxy to leverage the maximum computational capacity by binding threads to CPU cores and decreasing the need for inter-CPU communication. HAProxy also uses a cache to store responses in RAM, which ensures that subsequent requests for the cached object aren’t forwarded to the backend.

HAProxy vs. Nginx

HAProxy and Nginx are both excellent options for load balancing and reverse proxies. However, there are certain differences between the two solutions that may make you choose one over the other.

Nginx is more than just a load balancer or a reverse proxy. It can also be used as a standalone web server for delivering static or dynamic content. Conversely, HAProxy is predominantly a request-forwarder and can’t be used to serve static or dynamic content.

HAProxy offers a wider range of load balancing algorithms, including least connections, source IP hash, URI hash, header-based, and weighted round robin. Both solutions allow you to enforce rate limiting and SSL termination. However, Nginx offers generic UDP load balancing support, whereas HAProxy doesn’t.

Both web technologies do equally well in the configurability, performance, availability, and security departments. Whether you want to define access control lists (ACLs), increase threads for better throughput, or deploy across cloud and on-premise environments, both Nginx and HAProxy allow you to do so.

To summarize, there is no clear winner in the battle between HAProxy and Nginx. Which one you choose depends on your business use case and protocol requirements. For example, if you need both UDP and TCP load balancing, consider Nginx. If you want to use a specialized routing scheme, HAProxy may be a better choice.

Why is it important to monitor HAProxy?

Here are a few reasons why it’s essential to monitor the health and performance of HAProxy:

Predict malfunctions and avoid downtime

Periodic monitoring will equip you with the insight necessary to predict malfunctions. For example, if you notice a spike in the memory usage of an instance, you can investigate and take remedial action before HAProxy runs out of memory and crashes.

Monitoring can also help identify misconfigurations. For instance, if you increase the number of threads and notice that CPU and memory utilization rise without improving overall throughput, it's a sign that the thread count was set too high.

Instant troubleshooting

When it comes to a complicated, multi-layered system like HAProxy, troubleshooting can prove challenging in the absence of logs and monitoring tools. However, by cross-referencing logged events with crucial metrics, one can contextualize issues quicker.

For example, if the frontend queue depth increases exponentially along with backend latency, it suggests that the application servers are slowing down. Similarly, if a node crashes and the monitoring tool reports high memory usage on the backend, it’s likely that your application logic has memory leaks.

Ensure maximum performance of the larger system

HAProxy serves as the backbone of numerous distributed IT infrastructures, enabling a cluster of application servers to cater to thousands of clients simultaneously. Even minor problems or bottlenecks in HAProxy can significantly impair the overall infrastructure's performance.

For instance, if a listening socket hangs due to overuse, clients may be unable to access their data from the website. It’s therefore crucial to monitor HAProxy’s scalability, performance, and health metrics in the context of the larger system.

Keep your infrastructure secure

Regular monitoring enables the identification of potential security incidents, bugs, or vulnerabilities. For instance, if any security-critical action is performed, or a security control is disabled, the monitoring tool may issue an alert.

Additionally, monitoring allows you to verify the adherence of security best practices across all areas, such as SSL/TLS encryption, rate limiting, access control lists, client fingerprinting, and packet analysis.

Key metrics to monitor HAProxy performance

HAProxy exposes a range of metrics to monitor the status, performance, and throughput of the frontend, backend, and overall system. These metrics can also be exported as a comma-separated values (CSV) file.

Frontend metrics

Frontend is the entry point for all requests. Frontend metrics, like the following, clearly show how well an instance handles client connections.

qcur: The number of currently queued requests. If the system is in a healthy state, this metric’s value shouldn’t grow too large. (The definition of ‘large’ in this case depends on your throughput expectations.)
qmax: The historical max value of qcur.
qtime: The average queue time in milliseconds, calculated for the last 1,024 requests.
ctime: The average connect time in milliseconds, calculated for the last 1,024 requests.
scur: The number of active sessions.
smax: The historical max value of scur.
bin: The total number of incoming bytes.
bout: The total number of outgoing bytes.
dses: The total number of requests rejected due to TCP-request session rules.
dreq: The total number of requests rejected due to security reasons. A sudden increase in the value of this metric can be a sign of an attempted attack, and should be investigated immediately.
dcon: The total number of requests rejected due to TCP-request connection rules.
ereq: The total number of request errors. Potential reasons can be: early termination from the client, client timeout, connection closed by the client, or too many bad requests from the client.
econ: The total number of requests that failed to connect to a backend server. A rise in the value of this metric should be immediately investigated.
wretr: The number of times a server connection was retried. A sudden increase in this metric’s value warrants immediate investigation.
wredis: The number of times a request had to be dispatched to an alternate server. If this metric’s value keeps rising over time, it indicates that some of your application servers are underperforming.
rate: The number of incoming sessions per second, measured over the past second.
rate_max: The max number of incoming sessions per second.
eq_rate: The number of requests per second, measured over the past second.
req_rate_max: The max number of incoming requests per second.
req_total: The total number of received requests since the system started.
connect: The total number of times a connection was attempted.
conn_rate: The number of received connections in the last second.
conn_rate_max: The historical max value of conn_rate.
conn_tot: The total number of connections that this instance has made since startup. Correlate this metric with the connect metric to identify the number of failed connection attempts.
srv_icur: The number of currently idle connections that can be reused.
qtime_max: The maximum queue time, in milliseconds, observed since startup.
ctime_max: The maximum connect time, in milliseconds, observed since startup.
idle_conn_cur: The number of currently idle connections that have been deemed unsafe.
safe_conn_cur: The number of currently idle connections that have been deemed safe.
used_conn_cur: A real-time count of the connections in use.
need_conn_est: An estimated number of required connections.

Backend metrics

The backend represents the request processing layer of HAProxy. Use the following backend-specific metrics to gauge performance and throughput of your application servers.

rtime: The average response time in milliseconds, calculated for the last 1,024 requests. Use rtime as a metric for the overall throughput of an instance.
rtime_max: The maximum response time, in milliseconds, observed since startup.
dresp: The total number of responses rejected due to security reasons. A sudden rise in this metric’s value should be investigated.
eresp: The total number of response errors.
act: The total number of active servers. For a healthy instance, this metric’s value should be equal to the number of configured application servers.
bck: The total number of backup servers.
chkdown: The number of times the backend has gone from the UP state to the DOWN state. For a healthy instance, this metric should report zero.
lastchg: The time at which the last UP to DOWN transition happened in the backend state.
downtime: The number of seconds the backend has spent in the DOWN state.
lbtot: A cumulative count of the times a server was selected to process requests. This includes both new sessions and re-dispatched requests.
hrsp_5xx: The total number of http responses with 5xx return codes.
hrsp_4xx: The total number of http responses with 4xx return codes.
hrsp_3xx: The total number of http responses with 3xx return codes.
hrsp_2xx: The total number of http responses with 2xx return codes.
hrsp_1xx: The total number of http responses with 1xx return codes.
hrsp_other: The total number of http responses with other return codes, such as protocol errors.
lastsess: The number of seconds since the last session was sent to the backend.
cookie: The name of the backend cookie.

System metrics

Monitoring the following metrics will give you an idea of the overall system’s health, availability, and performance.

stime: The average session time in milliseconds, calculated for the last 1,024 requests.
ttime_max: The maximum session time, in milliseconds, observed since startup.
agent_status: The status of the last agent health check performed by HAProxy. Some of the possible values are: UNK (unknown), INI (initializing), L4TOUT (layer 1-4 timeout), and L70k (agent is up).
cache_lookups: The total number of times the cache was queried.
cache_hits: The total number of times a response was found in the cache.
chkfail: The total number of failed health checks. If this metric reports a non-zero value, use the hanafail metric to fetch details about the health check failure(s).
hanafail: This metric returns information related to failed health checks. If you are experiencing performance issues, use the output of this metric to debug.
qlimit: The user-defined, maximum queue depth for the server.
throttle: The current throttle percentage. This metric is only available when slowstart is active.
check_status: The status reported by the last health check. Some of the possible values are: UNK (unknown), SOCKERR (socket error), L7TOUT (Layer 7 timeout), L70K (check succeeded on Layer 7), and L4CON (connection problem in layer 1-4).
check_duration: The time taken by the last health check, reported in milliseconds.
cli_abrt: A cumulative count of the data transfers that were aborted from the client side.
srv_abrt: A cumulative count of the data transfers that were aborted by a server.
comp_in: A cumulative count of the HTTP response bytes that have been sent to the HTTP compressor.
comp_out: A cumulative count of the HTTP response bytes that have been released by the HTTP compressor.
comp_byp: A cumulative count of the HTTP response bytes that didn’t pass through the HTTP compressor.
comp_rsp: A cumulative count of the HTTP responses that were compressed.
algo: The configured load balancing algorithm.
eint: A cumulative count of internal errors. In a healthy system, this metric should report zero.
reuse: The total number of reused connections.
wrew: A cumulative count of warnings related to header rewriting.
mode: The configured proxy mode. Possible values: tcp, http, unknown, health.

Monitoring HAProxy using the stats page

HAProxy offers a built-in web-based dashboard, known as the HAProxy Stats page, which displays several key metrics. By default, this dashboard is disabled. To enable it, add the following lines of code to your haproxy.cfg file.

listen stats

bind :9001 # Listening port. A different port can also be used. 

mode http 

stats enable  # This enables the stats page 

stats hide-version  

stats realm Haproxy\ Statistics  # A custom title for the stats window 

stats uri /stats  # A custom URI for the stats page 

stats auth Username:Password  # Login credentials

Remember to set your own username and password in the last line. After you have done that, restart the HAProxy service. Then you will be able to log into the stats page at: http://[HAProxy-IP]:9001/stats using the defined login credentials.

An alternative way to access the stats is via a UNIX socket interface. To enable the socket interface, add the following line to your haproxy.cfg file, under global settings:

stats socket /run/haproxy/haproxy.sock mode 660 level admin

Monitoring HAProxy using the Site24x7 monitoring plugin

The HAProxy monitoring plugin offered by Site24x7 is a great tool to monitor HAProxy in real time. The Python-based plugin can be downloaded from GitHub and installed in a few simple steps. You can also visualize the metrics this plugin aggregates on the Site24x7 web client.

Some of the performance metrics you can track using the plugin include:

Number of bytes sent and received by the frontend
Total number of request errors
Number of created sessions per second
Currently queued requests.

Conclusion

HAProxy is a leading load balancer and reverse proxy for TCP/HTTP applications. It supports advanced load balancing schemes, scales well in the cloud, offers a diverse routing feature set, and is highly extensible. To maximize its potential, use the information in this article to identify the metrics and tools that are best suited for you to monitor it closely.

Sorry to hear that. Let us know how we can improve the article.

Guide to HAProxy Monitoring