Deciphering Kubernetes autoscaling issues before they escalate

Start 30-day free trial Try now, sign up in 30 seconds

Kubernetes autoscaling is critical for maintaining app performance and managing infrastructure costs. It ensures that workloads receive the necessary resources based on demand fluctuations. Yet, misconfigurations and blind spots in your Kubernetes cluster often result in performance hiccups and unnecessary cloud spend.

To keep your clusters lean and efficient and to avoid overspending, which is largely caused by resource-heavy workloads and inconsistent usage patterns, it's essential to address autoscaling challenges proactively—and this is where Kubernetes monitoring can make a difference.

What is autoscaling in Kubernetes?

Autoscaling in Kubernetes refers to the dynamic adjustment of compute resources—like pods or nodes—based on real-time demand. Instead of provisioning resources manually, Kubernetes automatically scales workloads up or down to ensure optimal performance and resource efficiency. This means your applications can handle traffic spikes smoothly and scale back during idle periods, reducing costs and preventing resource waste.

Behind the scenes: Kubernetes autoscaling mechanisms

Kubernetes provides three core autoscaling mechanisms:

  • Horizontal Pod Autoscaler (HPA): Adjusts the number of pod replicas based on CPU, memory, or external metrics.
  • Vertical Pod Autoscaler (VPA): Dynamically adjusts resource requests and limits (i.e., for CPU or memory) for each pod.
  • Cluster Autoscaler (CA): Automatically adds or removes worker nodes depending on whether pods can be scheduled or not.

Despite these built-in features, autoscaling can go wrong—often due to improper setups or monitoring gaps. In fact, over half of Kubernetes users report concerns over misconfigurations impacting scaling effectiveness.

Real-world autoscaling hurdles and how to fix them

Problem 1: Pods don’t scale during resource spikes

The issue:

Despite a sudden increase in demand—such as a surge in user traffic or processing workload—your application fails to scale out as expected. Kubernetes doesn’t automatically spin up additional pods to handle the load, even though resource usage (CPU or memory) is clearly above threshold levels.

Symptoms:

  • Consistently high CPU or memory usage, yet no increase in pod replicas
  • Slow API or service responses during traffic surges, especially during peak business hours
  • Increased error rates or timeouts due to overwhelmed pods
  • No events logged for HPA)activity when spikes occur
  • User complaints about sluggish performance during known high-demand windows

Likely causes:

  • HPA thresholds are off.
  • Metrics aren’t collected properly.
  • Resource quotas are too tight.

Steps to fix it:

kubectl describe hpa <hpa-name>

Look for Current CPU utilization versus Target.

Check if it's trying to scale and hitting limits (minimum or maximum pods).

kubectl get apiservice | grep metrics.k8s.io

Ensure the metrics.k8s.io API service is available.

The STATUS should be True.

kubectl get deployment metrics-server -n kube-system 
kube-system kubectl describe deployment metrics-server -n kube-system
kubectl logs -n kube-system deployment/metrics-server

Check for readiness, restarts, and any log errors.If the metrics server is missing, install it using:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml 
kubectl describe resourcequota 

Look for CPU or memory limits that could block scaling.

How Site24x7 helps:

With detailed HPA insights and real-time metric tracking, Site24x7 makes it easy to spot missing or misconfigured settings, improving response time by up to 40%.

real time metric

Problem 2: Inefficient vertical scaling

The issue:

Pods are either starved of resources or overprovisioned. VPA settings are not aligned with actual usage patterns, leading to frequent out-of-memory (OOM) errors or unnecessary resource allocation.

Symptoms:

  • Pods frequently crash or restart due to OOM errors
  • Low resource utilization despite high CPU or memory requests being allocated
  • Inconsistent application performance, especially during moderate-to-high load periods
  • VPA recommendations not being applied, or a VPA is running in off or recommend mode without effect
  • Higher cloud costs due to overprovisioned memory and CPU requests

Likely causes:

  • A VPA is inactive or misconfigured.
  • CPU or memory requests aren't aligned with actual usage.

Steps to fix:

kubectl get vpa 

Check the VPA object to see its recommendations for CPU and memory:

kubectl describe vpa <vpa-name> 

This command displays the current resource recommendations for the target deployment, including CPU and memory requests.

How Site24x7 helps:

Site24x7 surfaces pod-level trends that help right-size your workloads, reducing OOM errors by 60% and improving cost efficiency.

inefficient vertical scaling

Problem 3: CA fails to add nodes

The issue:

When workloads outgrow existing node capacity, Kubernetes doesn’t add new nodes as expected. A CA may be blocked due to configuration issues, restrictive policies, or cloud provider limits.

Symptoms:

  • Pods stuck in pending state with reasons like insufficient CPU or insufficient memory
  • No new nodes added, even though there’s clearly a need for more capacity
  • CA logs show scaling attempts blocked due to PodDisruptionBudget (PDB), node taints, or max node limits
  • Increased latency or service unavailability during traffic peaks
  • Resource utilization on existing nodes maxed out, while new workloads can’t be scheduled

Likely causes:

  • There are restrictive PDBs.
  • Node pool limits have been reached.
  • There are cloud provider constraints.

Steps to fix:

kubectl logs -n kube-system deployment/cluster-autoscaler

Look for messages indicating issues like insufficient resources or failed scaling attempts.

kubectl get pods --field-selector=status.phase=Pending -n <your-namespace>

If there are pending pods, investigate their resource requests and constraints.

Check the PDBs in your cluster to ensure they are not too restrictive.

kubectl get pdb --all-namespaces

If necessary, adjust the maxUnavailable field to allow for more flexibility during scaling operations.

Check your cloud provider's console for any errors related to node provisioning or resource limits.

  • Google Cloud (GKE): Check the Kubernetes Engine page for any scaling issues.
  • AWS (EKS): Review the EC2 Auto Scaling Group settings.
  • Azure (AKS): Examine the Virtual Machine Scale Set configurations.

Ensure that your cloud provider's quotas for resources like CPU, memory, and instances are not exceeded.

How Site24x7 helps:

Monitor node status, pending pods, and autoscaler behavior all in one place. Site24x7 also alerts you when node provisioning is blocked due to policy limits.

kubernetes autoscaling monitor status

cluster utilization

kubernetes events

Problem 4: Conflicts between an HPA and VPA

The issue:

When both an HPA and a VPA target the same deployment, conflicting recommendations lead to unstable scaling behavior and frequent pod disruptions.

Symptoms:

  • Pods restarting unexpectedly due to rapid resource reconfiguration
  • Scaling loops, where pod counts and resource requests keep fluctuating
  • Erratic performance metrics with no clear trend
  • Unstable deployment behavior, especially after introducing both autoscalers
  • Logs showing conflicting decisions or errors related to resource recommendations

Likely causes:

  • Both an HPA and a VPA are managing the same metrics.
  • There are overlapping metric targets or unclear scaling triggers.

Steps to fix:

  • Avoid enabling an HPA or a VPA in auto mode simultaneously unless you have carefully configured them to work together.
  • A VPA in auto mode can conflict with an HPA, leading to unpredictable scaling behavior. Consider using the VPA in initial or off mode when the HPA is enabled.

How Site24x7 helps:

Visualize autoscaler policies across your deployments and detect misalignments before they cause chaos. Site24x7 helps teams improve deployment stability by up to 50%.

kubernetes hpa configuration

kubernetes deployment cluster agent

Smarter scaling starts with observability

Kubernetes autoscaling isn’t just about automation—it’s about control. Organizations using advanced observability tools like Site24x7 have seen a considerable drop in scaling-related outages.

Here’s what you can expect from Site24x7's Kubernetes monitoring:

  • End-to-end visibility into pod, node, and autoscaler behavior
  • Intelligent alerts for misconfigurations and scaling bottlenecks
  • Resource usage trends to support HPA and VPA tuning
  • Cluster-wide health dashboards and reports

Final word: Let Site24x7 simplify your scaling strategy

Autoscaling works best when paired with continuous monitoring. By integrating Site24x7 into your Kubernetes workflows, you gain full control over your autoscalers, ensure high availability, and keep your cloud costs predictable.

Request Demo
  • Request Demo
  • Get Quote
  • Get Quote