Jenkins monitoring guide

Jenkins is an integral part of many modern IT infrastructures. It allows organizations to model CI/CD pipelines via code, automate operational workflows, increase efficiency, and reduce the time it takes to test, ship, and deploy applications.

In this article, we share a comprehensive guide to monitoring Jenkins. We discuss why it’s important to monitor Jenkins, some key performance and health metrics, and explore two of the best monitoring plugins.

Why is it important to monitor Jenkins?

Monitoring Jenkins will allow you to predict errors, ensure high availability, optimize your configuration sets, and track the progress of key automation workflows.

Ensure that critical automation workflows are working as expected

Organizations use Jenkins jobs and pipelines to automate various areas of the IT ecosystem. For example, they might have Jenkins pipelines that:

  • Move code from a temporary branch to the master branch, and then to production.
  • Scan for memory leaks in newly committed code.
  • Perform static code analysis on the entire codebase every week.

To ensure crucial pipelines and jobs are working optimally, monitoring Jenkins is essential. Pipelines of an unhealthy Jenkins instance might fail or get stuck, leading to delays in testing, analysis, and code deployment. Metrics related to memory, CPU usage, build success rate, and queues. can be used to gauge the general health of Jenkins in real time.

Predict malfunctions and avoid downtime

Malfunctions can arise due to system errors, issues in any plugins or shared libraries, or bad pipeline code. For example, someone might make a mistake while editing a Jenkinsfile, causing a pipeline to run indefinitely, a bug in a third-party plugin might cause jobs to slow down, or the Jenkins server might be running out of space.

Periodic monitoring helps with timely detection and investigation of such issues and bottlenecks. For example, by setting an alert on your monitoring system, you can get notified as soon as the disk usage on your Jenkins server reaches 80%. This will give you ample time to free up space before it becomes a problem.

Ensure high availability and performance

A well-monitored system is less likely to go down. Tracking key health metrics equips you with the insights you need to keep Jenkins up and running. Metrics can also be used to contextualize issues for easier debugging. For instance, if a rise in build failures coincides with spikes in Java Virtual Machine (JVM) memory utilization, you can surmise that the Jenkins server is running out of memory.

If you are monitoring the right metrics, you can even identify avenues for improving the performance of the overall IT infrastructure. For example, you may notice that an increase in the number of concurrently executing builds slows down the Jenkins server. To resolve this, you can increase the number of threads from the Jenkins configuration, which could lead to faster execution of builds and a boost in the performance of the larger system.

Find the best set of configurations

Jenkins is highly configurable. To extract maximum performance from a Jenkins server, you must choose the right configuration parameters based on your automation workflows and operational requirements.

One way to do so is by tweaking configurations and monitoring changes in Jenkins performance. For example, you can toggle the number of polling threads and monitor how it impacts pipeline execution or change the number of executors and see how it affects overall performance and throughput.

By monitoring the impact of different configuration sets on the performance of a Jenkins instance, you will be able to identify your optimal configuration set.

Key performance metrics for Jenkins

Jenkins exposes multiple metrics for real-time monitoring of an instance’s health, performance, and throughput.

Standard health metrics

The Jenkins API returns a few standard health checks that output PASS/FAIL with an optional message. They are a great way to get a quick overview of an instance’s status.

The disk-space metric reports a failure if a Jenkins space monitor reports that the disk usage has breached the configured threshold. Tracking this metric will allow you to avoid disk-space-related disasters.

The plugins metric reports a failure if any of the Jenkins plugins failed to execute. This failure is either solved by disabling the problematic plugin, or by resolving any dependency issues causing it to fail. This is another critical metric to watch, as even one plugin failure can cause a Jenkins instance to behave unexpectedly.

The thread-deadlock metric reports a failure when there is a thread deadlock in the JVM. A thread deadlock is a condition where two or more threads may hang indefinitely, waiting for each other. Thread deadlocks can significantly impact Jenkins’ performance, potentially causing it to crash.

The temporary-space metric reports a failure if a Jenkins temporary space monitor reports that the temporary space is lower than the configured threshold. This is also an important metric, as Jenkins needs temporary space to create temporary files during job and build execution.

System and JVM-related metrics

As Jenkins runs inside JVM, it’s crucial to monitor JVM-related metrics when measuring the server’s overall performance. The most important JVM-related metrics are:

system.cpu.load This metric returns the overall load on the Jenkins controller as reported by the operating system JMX bean of JVM. It’s worth noting that the load calculation is dependent on the operating system. Periodic monitoring of this metric is important, as it lets you know the exact amount of load the Jenkins server is dealing with at any given time.
vm.uptime.milliseconds The number of milliseconds since the Jenkins JVM was initialized. On a healthy instance, the value of this metric corresponds to the start-up time of the Jenkins server.
vm.count The total number of threads in the JVM. This metric reports the sum of the following six metrics: vm.blocked.count, vm.runnable.count, vm.terminated.count, vm.timed_waiting.count, vm.waiting.count, and vm.new.count.
vm.new.count The total number of JVM threads that have not begun execution yet.
vm.timed_waiting.count The total number of JVM threads that have suspended execution for a specific period. An exceptional rise in the value of this metric may lead to high memory utilization of the Jenkins instance.
vm.blocked.count A count of the blocked threads that are waiting to acquire the monitor lock. Ideally, the value of this metric shouldn’t fluctuate too much over time
vm.deadlocks The total number of threads that are in a deadlock with at least one other thread. Ideally, this metric should always report a value of 0. A rapid increase in its value is an immediate cause for concern
vm.memory.heap.init The amount of heap memory, in bytes, that the JVM initially requested from the operating system
vm.memory.heap.committed The amount of heap memory, in bytes, that the operating system has made available to the Jenkins JVM for object allocation. The desirable range for this metric’s value depends on your infrastructure and operational needs
vm.memory.heap.max The maximum amount of heap memory, in bytes, that the JVM can obtain from the OS. If this memory value is greater than the value of vm.memory.heap.committed, the OS may not grant the memory to JVM
vm.memory.heap.usage This metric returns the ratio of vm.memory.heap.used to vm.memory.heap.max. It’s a great way to track the usage of heap over time
vm.memory.non-heap.init The amount of non-heap memory, in bytes, used for object allocation that JVM initially requested from the operating system
vm.memory.non-heap.committed The amount of non-heap memory, in bytes, that the operating system guarantees to be available to the Jenkins JVM
vm.memory.non-heap.max The maximum amount of non-heap memory, in bytes, that the JVM can request from the operating system. This amount of memory is not guaranteed to be available to the JVM if it’s greater than the value of vm.memory.non-heap.committed
vm.memory.total.committed The total amount of memory (heap and non-heap) the operating system has made available to the Jenkins JVM
vm.memory.total.max This metric represents the maximum amount of heap and non-heap memory, in bytes, that the JVM can obtain from the OS. If this memory value is greater than the value of vm.memory.total.committed, a memory allocation request may fai
vm.daemon.count The total number of JVM threads that have been marked as daemon threads. Daemon threads run in the background indefinitely
vm.gc.X.count The total number of times the garbage collector “X” has run

Web UI metrics

A lot of user interactions with the Jenkins server take place through the web UI. Tracking the following metrics will give you a clear idea of how the web UI is performing:

http.requests The overall rate at which Jenkins is receiving requests and the time taken for the request processing and response generation
http.activeRequests A count of the total active requests that the Jenkins server is processing. This metric shouldn’t grow beyond the server’s typical request processing capacity
http.responseCodes.created The response rate of requests with HTTP/201 status codes
http.responseCodes.ok The response rate of requests with HTTP/200 status codes
http.responseCodes.badRequest The response rate of requests with HTTP/400 status codes. Track this metric to ensure that you are not getting only a few failures
http.responseCodes.noContent The response rate of requests with HTTP/204 status codes
http.responseCodes.forbidden The response rate of requests with HTTP/403 status codes. A growing value of this metric may indicate attempts to gain unauthorized access to the server
http.responseCodes.notModified The response rate of requests with HTTP/304 status codes
http.responseCodes.notFound The response rate of requests with HTTP/404 status codes. This metric is a count of the number of times the user didn’t find what they were looking for.
http.responseCodes.serverError The response rate of requests with HTTP/500 status codes. This metric indicates the number of server errors encountered while processing UI requests
http.responseCodes.serviceUnavailable The response rate of requests with HTTP/503 status codes. The 503-error code indicates that the server isn’t ready to handle the request. Ideally, a healthy instance would rarely return 503 error codes, if at all
http.responseCodes.other The rate at which the UI is responding with non-informational codes – i.e., a status code that’s not in this list: HTTP/200, HTTP/201, HTTP/204, HTTP/304, HTTP/400, HTTP/403, HTTP/404, HTTP/500, and HTTP/503

Jenkins-specific metrics

In this section, we will look at Jenkins-specific metrics, which will give us real-time insight into the performance of Jenkins jobs, executors, nodes, plugins, and queues.

jenkins.executor.count.value The total number of Jenkins executors across all online nodes.
jenkins.executor.free.value The total number of Jenkins executors that are available for use. This metric’s desirable range will depend on your infrastructure and operational needs.
jenkins.executor.in-use.value The total number of Jenkins executors that are currently in use.
jenkins.job.blocked.duration The rate at which jobs are becoming blocked and the time they are spending in the blocked state. Strive to keep this metric’s value to a minimum.
jenkins.job.building.duration The rate at which jobs are being built and the time spent in their execution (building). An unreasonable build duration increase typically indicates something is wrong in the Jenkins ecosystem.
jenkins.job.queuing.duration The rate at which jobs are being queued and the time they spend waiting in the build queue. Ideally, jobs shouldn’t be queued for too long.
jenkins.job.buildable.duration The rate at which jobs from the build queue are assuming the buildable state and the time they spend in that state
jenkins.job.waiting.duration The rate at which jobs are entering the quiet period and the time they spend in the quiet period. Quiet period is a configuration parameter that determines how long a Jenkins instance should wait before triggering a job
jenkins.job.total.duration The rate at which jobs are entering the queue and the time it’s taking for them to reach completion
jenkins.job.count.value The total number of jobs present in the Jenkins instance. The historical value of this metric can be retrieved from the jenkins.job.count.history metric.
jenkins.job.scheduled The rate at which jobs are being scheduled. If a job has already been queued and another request to schedule the job is received, Jenkins will combine both requests. If you multiply this metric’s value with that of jenkins.job.building.duration, you will get an estimated number of executors required to service all build requests.
jenkins.node.count.value The total number of online and offline build nodes available to the Jenkins server. The historical value of this metric can be analyzed using the jenkins.node.count.history metric.
jenkins.node.offline.value The total number of build nodes that are currently offline. To view the historical stats of this metric, use the jenkins.node.offline.history metric.
jenkins.plugins.active The number of plugins that started successfully. Ideally, this count should be equal to the total number of plugins
jenkins.plugins.failed The number of plugins that didn’t start successfully or malfunctioned. A value other than zero typically indicates that something is wrong with the Jenkins installation.
jenkins.plugins.withUpdate The number of plugins for which there is a newer version available in the Jenkins update center. You should strive to keep your plugins up to date to avoid any potential bugs or vulnerabilities – that means that this metric should always return a value of 0
jenkins.queue.size.value The total number of jobs present in the Jenkins build queue. Historical values of this metric can be seen using jenkins.queue.size.history.
jenkins.queue.stuck.value The total number of jobs that are stuck in the build queue. View historical statistics of this metric using the jenkins.queue.stuck.history metric.

Monitoring Jenkins performance using plugins

Now that we have explored some of the key performance and health metrics of Jenkins, let’s look at two of the best monitoring plugins to measure and track these metrics.

Monitoring Jenkins with JavaMelody

JavaMelody is an open-source plugin for monitoring Java applications. It’s available out-of-the-box with Jenkins. Follow these steps to install and enable it:

  • From the Jenkins dashboard, click “Manage Jenkins”.
  • Scroll down to select “Manage Plugins”.
  • Select the “Available” tab.
  • Select the plugin called “Monitoring”.
  • Finally, install the plugin and restart your Jenkins server.

Once installed, you can access the monitoring dashboard by visiting http://jenkins-ip/monitoring. To view the report for all your nodes, visit http://jenkins-ip/monitoring/nodes. Across the different web pages of the monitoring dashboard, you can find various metrics, including threads, current HTTP requests, process list, HTTP sessions, build queue length, build time by period, memory and CPU usage charts, errors, logs, and all metrics exposed by MBeans.

Monitoring Jenkins with Site24x7 monitoring plugin

Site24x7’s monitoring plugin offers granular visibility into the health and performance of each Jenkins instance. It has a web-based dashboard that displays various key metrics as graphs and charts. You can start monitoring in two simple steps: download the plugin from GitHub, and configure it as per your needs.

Some of the metrics you can track with the plugin are online nodes, disabled and enabled projects, used and free executors, stuck and blocked queues, active and inactive plugins, job schedule rate, job queuing duration, job waiting duration, JVM-related metrics, and web UI metrics.

Conclusion

Jenkins is the go-to tool to automate the testing, analysis, and code deployment. It enables organizations to set up fully automated, multi-step CI/CD pipelines that enhance productivity and reduce time to production. Monitoring a Jenkins instance allows you to increase its performance, predict malfunctions, and avoid downtime.

Was this article helpful?

Related Articles

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 "Learn" portal. Get paid for your writing.

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.

Apply Now
Write For Us