Kubernetes scheduler monitoring
The Kubernetes scheduler is responsible for placing pods onto the most suitable nodes based on resource availability, constraints, and scheduling policies. Monitoring the scheduler helps you troubleshoot pod placement delays, understand scheduling bottlenecks, detect preemption loops, and ensure overall cluster responsiveness.
Site24x7 automatically discovers the scheduler component in your Kubernetes control plane and collects key metrics across scheduling performance, queue latencies, goroutine activity, workqueue operations, and cache health.
Prerequisites
- Install Site24x7 Kubernetes agent version 22.1.00 or later.
- For existing customers, upgrade your Kubernetes agent to the latest version 22.1.00 and above.
The Site24x7 Kubernetes agent must be installed and running on the cluster before enabling scheduler monitoring.
View your Kubernetes scheduler monitor
As soon as you upgrade your agent, the Site24x7 Kubernetes monitoring agent will fetch all the scheduler monitoring metrics.
To view your Kubernetes scheduler monitor:
- Log in to your Site24x7 account.
- Navigate to K8s. Then, select the cluster name.
- Click Scheduler.
This will open the list of scheduler monitors in the particular cluster. Select any one to view detailed insights into that monitor.

Supported metrics
The following tables list all metrics collected from the Kubernetes scheduler.
Utilization
| Metric name | Description | Units |
|---|---|---|
| Goroutine Runnable Count | The total count of goroutines in the scheduler in a runnable state during the last poll period. | Count |
| Preemption Attempts | The total number of attempts the scheduler makes to preempt resources in order to satisfy the scheduling constraints during the last poll period. | Count |
| Scheduler Pending Pods | The number of pending pods across scheduling states such as ActiveQ, BackoffQ, Gated Unschedulable, and Failed Scheduling that were captured at the last poll period. | Count |
| Process Open File Descriptors | The number of file descriptors that a process has open by the scheduler process during the last poll time. | Count |
| Process CPU Time | The CPU time consumed by the scheduler process during the last poll period. | Seconds |
| Preemption Victim Attempts | The total number of preemption victims attempts made during the last poll period. | Count |
| Go Schedule Latency | ||
| Average Goroutine Scheduling Latency | The average time spent per goroutines in the scheduler during the last poll period. | Seconds |
| Total Goroutine Scheduling Latency | The total time of goroutines have spent in the scheduler in a runnable state before actually running during the last poll period. | Seconds |
| Scheduler Preemption Victims | ||
| Average Preemption Victims | The average number of pods being preempted per attempt during the last poll period. | Count |
| Total Preemption Victims | The total number of selected preemption victims (process of terminating Pods with lower Priority so that Pods with higher Priority can schedule on Nodes) during the last poll period. | Count |
| Process Memory Usage | ||
| Process Resident Memory | The amount of resident memory size in bytes used by scheduler process during the last poll period. | Bytes |
| Process Virtual Memory | The amount of virtual memory size in bytes used by scheduler process during the last poll period. | Bytes |
| Go Usage | ||
| Go Threads | The number of OS threads created by the Go runtime of the scheduler process during the last poll period. | Count |
| Goroutines | The number of goroutines that currently exist for scheduler process during the last poll period. | Count |
Operations
| Metric name | Description | Units |
|---|---|---|
| Successfully Scheduled Pods | The total number of successfully scheduled pod count during the last poll period. | Count |
| Cached Bound Pods | The number of assumed (bound) pods in the scheduler cache during the last poll period. | Count |
| Total Algorithm Executions | The total count of attempts received for the scheduling algorithm during the last poll period. | Count |
| Pod Scheduling Attempts | ||
| Average Successful Scheduling Attempts | The average amount of successfully scheduled pods per attempt during the last poll period. | Count |
| Total Successful Scheduling Attempts | The total number of attempts to schedule a pod during the last poll period successfully. | Count |
| Scheduling Attempts | ||
| Scheduler Error Attempts | The total number of attempts to schedule pods by scheduler resulted in Error during the last poll period. | Count |
| Successful Schedule Attempts | The total number of attempts to schedule pods by scheduler resulted in Scheduled during the last poll period. | Count |
| Scheduler Unscheduled Attempts | The total number of attempts to schedule pods by scheduler resulted in Unscheduled during the last poll period. | Count |
| Scheduler Cached Resources | ||
| Cached Nodes | The number of nodes in the scheduler cache during the last poll period. | Count |
| Cached Pods | The number of pods in the scheduler cache during the last poll period. | Count |
| Scheduling Algorithm Duration | ||
| Average Scheduling Algorithm Duration | The average time taken by scheduling algorithm per request during the last poll period. | Seconds |
| Total Scheduling Algorithm Duration | The total time taken in scheduling algorithm latency in seconds during the last poll period. | Seconds |
| Average Scheduling Attempt Duration | ||
| Average Scheduling Duration for Errors | The average time taken for scheduling attempt resulted in Error during the last poll period. | Seconds |
| Average Scheduling Duration for Success | The average time taken for scheduling attempt resulted in Scheduled during the last poll period. | Seconds |
| Average Scheduling Duration for Unscheduled Attempts | The average time taken for scheduling attempt resulted in Unscheduled during the last poll period. | Seconds |
| Total Scheduling Attempt Duration | ||
| Total Scheduling Duration for Errors | The total time spent on scheduling and binding that resulted in an error state during the last polling period. | Seconds |
| Total Scheduling Duration for Success | The total time spent on scheduling and binding that resulted in a scheduled state during the last polling period. | Seconds |
| Total Scheduling Duration for Unscheduled Attempts | The total time spent on scheduling and binding that resulted in an unscheduled state during the last polling period. | Seconds |
Scheduling Flow
| Metric name | Description | Units |
|---|---|---|
| Queue Additions by Event Name | The total number of pods added to scheduling queues grouped by event name during the last poll period. | Count |
| Queue Additions by Queue Type | The total number of pods added to scheduling queues grouped by queue type during the last poll period. | Count |
| Average Pod Scheduling Duration by Attempt | The average time taken by a pod to be scheduled for the given attempts during the last poll period. | Seconds |
| Successful Schedule Attempts | The total count of pod being scheduled for the given attempts during the last poll period. | Count |
| Goroutines by Operation | The total number of running goroutines grouped by operation (work) they do, during the last poll period. | Count |
| Unschedulable Pods by Plugin | The total number of unschedulable pods grouped by plugin name during the last poll period. | Count |
Work Queue
| Metric name | Description | Units |
|---|---|---|
| Total Workqueue Adds | The total number of add events handled by the workqueue grouped by action name during the last poll period. | Count |
| Workqueue Depth | The number of actions or tasks waiting in the workqueue to be processed during the last poll period. | Count |
| Average Workqueue Queue Duration | The average time an item remained in the workqueue before being requested during the last poll period. | Seconds |
| Average Workqueue Work Duration | The average time taken to process an item from the workqueue during the last poll period. | Seconds |
| Workqueue Retries | The total number of retries handled by the workqueue grouped by name during the last poll period. | Count |
| Workqueue Unfinished Work Duration | The total number of seconds of in-progress work not yet captured by work duration. High values may indicate stuck threads. | Seconds |
