Monitor your Elasticsearch cluster performance by knowing details on cluster status, nodes and shards details, JVM stats and more.
Install and configure the Elasticsearch plugin to monitor the open source, distributed document store and search engine. It depends strongly on Apache Lucene, a full text search engine in Java. Keep a pulse on the performance of the Elasticsearch environment to ensure you are up to date with the internals of your working cluster.
This document details how to configure the Elasticsearch plugin and the monitoring metrics for providing in-depth visibility into the performance, availability, and usage stats of Elasticsearch clusters.
Elasticsearch performance monitoring metrics:
Use our wide array of metrics and get notified of hazardous errors that require your attention. Keep track of unexpected trends through our metric graphs and troubleshoot as quickly as possible. Various out-of-the-box metrics we support are:
The active_shards indicates the number of primary shards in your cluster. This is an aggregate total across all indices
The initializing_shards is the number of shards that are being freshly created
Number of nodes/data nodes
The number of nodes/data nodes in the cluster is represented by the metric number_of_nodes and number_of_data_nodes respectively. Data nodes hold data and perform data related operations such as CRUD, search and aggregations
The relocating_shards is the number of shards that are currently moving from one node to another node
Active primary shards
The active_primary_shards is an aggregate total of all shards across all indices, including replica shards
From the initializing position, the shards move to a state of unassigned, as the master node starts to assign shards to the nodes in the cluster. The unassigned_shards exist in the cluster state, but can’t be found in the cluster itself. Being in the unassigned position for a long time could be a warning for an unstable cluster
The status of the cluster is represented by Red: 0, Green: 1 and Yellow: 2. Cluster status in green means that all primary and replica shards are allocated. Being yellow indicates that atleast one replica shard is unallocated or missing. The cluster status being red means one or more primary shards have not been assigned
Elasticsearch runs on Java Virtual Machine (JVM) and one of the ways through which it uses the RAM on your nodes is via JVM heap. The metric jvm_mem_pool_old_used_perc is the average of each node's JVM memory usage (in percentage) of old generation in the Garbage Collection (GC). Metrics jvm_gc_old_coll_time and jvm_gc_old_coll_count give the GC time (in milli seconds) and count of old generation in all the nodes since last poll (5 minutes by default)
Memory and CPU usage
As Elasticsearch depends on the machine it is installed, it is critical to monitor CPU and memory usage. Monitoring CPU usage for each of your node types help in studying the distribution of workload between the nodes. Metrics including free (mem_free), used (mem_used), shared (shared_mem), resident (resident_mem), total virtual memory (virtual_mem) help to keep an eye on memory usage and understand how it loads and impacts the cluster
How it works?
Log-in to your Site24x7 account. Sign up here if you don't have one
The default python path given in the plugin script is #!/usr/bin/python. If you wish to provide an alternate python path, replace the existing one preceded by the shebang character "#!".
Change the values of HOST, USERNAME, PORT, PASSWORD to match your configuration. By default, proxy is not configured. Change the proxy settings if needed
The server agent will report stats on the performance of Elasticsearch cluster under the Plugins tab in the Site24x7 web client. In case the plugin is not listed in the Site24x7 web client, restart the agent
sudo /etc/init.d/site24x7monagent restart
Monitoring additional metrics:
To monitor additional metrics, edit the "elasticsearch.py" or "elasticsearchcluster.py" or "elasticsearchnodes.py" file and add the new metrics that need monitoring
Increment the plugin version value in these files to view the newly added metrics (For e.g., change the default plugin version from PLUGIN_VERSION = "1" to "PLUGIN_VERSION = "2")
ActiveMQ plugin - Monitor performance metrics of your Apache ActiveMQ instances
Gearman plugin - Monitor performance metrics of your Gearman job servers