Help Docs

Kubernetes scheduler monitoring

The Kubernetes scheduler is responsible for placing pods onto the most suitable nodes based on resource availability, constraints, and scheduling policies. Monitoring the scheduler helps you troubleshoot pod placement delays, understand scheduling bottlenecks, detect preemption loops, and ensure overall cluster responsiveness.

Site24x7 automatically discovers the scheduler component in your Kubernetes control plane and collects key metrics across scheduling performance, queue latencies, goroutine activity, workqueue operations, and cache health.

Prerequisites

Note

The Site24x7 Kubernetes agent must be installed and running on the cluster before enabling scheduler monitoring.

View your Kubernetes scheduler monitor

As soon as you upgrade your agent, the Site24x7 Kubernetes monitoring agent will fetch all the scheduler monitoring metrics.

To view your Kubernetes scheduler monitor:

  1. Log in to your Site24x7 account.
  2. Navigate to K8s. Then, select the cluster name.
  3. Click Scheduler.

This will open the list of scheduler monitors in the particular cluster. Select any one to view detailed insights into that monitor.

Supported metrics

The following tables list all metrics collected from the Kubernetes scheduler.

Utilization

Metric name Description Units
Goroutine Runnable Count The total count of goroutines in the scheduler in a runnable state during the last poll period. Count
Preemption Attempts The total number of attempts the scheduler makes to preempt resources in order to satisfy the scheduling constraints during the last poll period. Count
Scheduler Pending Pods The number of pending pods across scheduling states such as ActiveQ, BackoffQ, Gated Unschedulable, and Failed Scheduling that were captured at the last poll period. Count
Process Open File Descriptors The number of file descriptors that a process has open by the scheduler process during the last poll time. Count
Process CPU Time The CPU time consumed by the scheduler process during the last poll period. Seconds
Preemption Victim Attempts The total number of preemption victims attempts made during the last poll period. Count
Go Schedule Latency
Average Goroutine Scheduling Latency The average time spent per goroutines in the scheduler during the last poll period. Seconds
Total Goroutine Scheduling Latency The total time of goroutines have spent in the scheduler in a runnable state before actually running during the last poll period. Seconds
Scheduler Preemption Victims
Average Preemption Victims The average number of pods being preempted per attempt during the last poll period. Count
Total Preemption Victims The total number of selected preemption victims (process of terminating Pods with lower Priority so that Pods with higher Priority can schedule on Nodes) during the last poll period. Count
Process Memory Usage
Process Resident Memory The amount of resident memory size in bytes used by scheduler process during the last poll period. Bytes
Process Virtual Memory The amount of virtual memory size in bytes used by scheduler process during the last poll period. Bytes
Go Usage
Go Threads The number of OS threads created by the Go runtime of the scheduler process during the last poll period. Count
Goroutines The number of goroutines that currently exist for scheduler process during the last poll period. Count

Operations

Metric name Description Units
Successfully Scheduled Pods The total number of successfully scheduled pod count during the last poll period. Count
Cached Bound Pods The number of assumed (bound) pods in the scheduler cache during the last poll period. Count
Total Algorithm Executions The total count of attempts received for the scheduling algorithm during the last poll period. Count
Pod Scheduling Attempts
Average Successful Scheduling Attempts The average amount of successfully scheduled pods per attempt during the last poll period. Count
Total Successful Scheduling Attempts The total number of attempts to schedule a pod during the last poll period successfully. Count
Scheduling Attempts
Scheduler Error Attempts The total number of attempts to schedule pods by scheduler resulted in Error during the last poll period. Count
Successful Schedule Attempts The total number of attempts to schedule pods by scheduler resulted in Scheduled during the last poll period. Count
Scheduler Unscheduled Attempts The total number of attempts to schedule pods by scheduler resulted in Unscheduled during the last poll period. Count
Scheduler Cached Resources
Cached Nodes The number of nodes in the scheduler cache during the last poll period. Count
Cached Pods The number of pods in the scheduler cache during the last poll period. Count
Scheduling Algorithm Duration
Average Scheduling Algorithm Duration The average time taken by scheduling algorithm per request during the last poll period. Seconds
Total Scheduling Algorithm Duration The total time taken in scheduling algorithm latency in seconds during the last poll period. Seconds
Average Scheduling Attempt Duration
Average Scheduling Duration for Errors The average time taken for scheduling attempt resulted in Error during the last poll period. Seconds
Average Scheduling Duration for Success The average time taken for scheduling attempt resulted in Scheduled during the last poll period. Seconds
Average Scheduling Duration for Unscheduled Attempts The average time taken for scheduling attempt resulted in Unscheduled during the last poll period. Seconds
Total Scheduling Attempt Duration
Total Scheduling Duration for Errors The total time spent on scheduling and binding that resulted in an error state during the last polling period. Seconds
Total Scheduling Duration for Success The total time spent on scheduling and binding that resulted in a scheduled state during the last polling period. Seconds
Total Scheduling Duration for Unscheduled Attempts The total time spent on scheduling and binding that resulted in an unscheduled state during the last polling period. Seconds

Scheduling Flow

Metric name Description Units
Queue Additions by Event Name The total number of pods added to scheduling queues grouped by event name during the last poll period. Count
Queue Additions by Queue Type The total number of pods added to scheduling queues grouped by queue type during the last poll period. Count
Average Pod Scheduling Duration by Attempt The average time taken by a pod to be scheduled for the given attempts during the last poll period. Seconds
Successful Schedule Attempts The total count of pod being scheduled for the given attempts during the last poll period. Count
Goroutines by Operation The total number of running goroutines grouped by operation (work) they do, during the last poll period. Count
Unschedulable Pods by Plugin The total number of unschedulable pods grouped by plugin name during the last poll period. Count

Work Queue

Metric name Description Units
Total Workqueue Adds The total number of add events handled by the workqueue grouped by action name during the last poll period. Count
Workqueue Depth The number of actions or tasks waiting in the workqueue to be processed during the last poll period. Count
Average Workqueue Queue Duration The average time an item remained in the workqueue before being requested during the last poll period. Seconds
Average Workqueue Work Duration The average time taken to process an item from the workqueue during the last poll period. Seconds
Workqueue Retries The total number of retries handled by the workqueue grouped by name during the last poll period. Count
Workqueue Unfinished Work Duration The total number of seconds of in-progress work not yet captured by work duration. High values may indicate stuck threads. Seconds

Related article

Was this document helpful?

Would you like to help us improve our documents? Tell us what you think we could do better.


We're sorry to hear that you're not satisfied with the document. We'd love to learn what we could do to improve the experience.


Thanks for taking the time to share your feedback. We'll use your feedback to improve our online help resources.

Shortlink has been copied!