Back to all scenarios
Scenario #463
Scaling & Load
Kubernetes v1.24, Google Cloud

Insufficient Node Pools During Sudden Pod Scaling

Insufficient node pool capacity caused pod scheduling failures during sudden scaling events.

Find this helpful?
What Happened

During a sudden traffic surge, the Horizontal Pod Autoscaler (HPA) scaled the pods, but there weren’t enough nodes available to schedule the new pods.

Diagnosis Steps
  • 1Checked the available resources on the nodes and found that node pools were insufficient to accommodate the newly scaled pods.
  • 2Cluster logs revealed the autoscaler did not add more nodes promptly.
Root Cause

Node pool capacity was insufficient, and the autoscaler did not scale the cluster quickly enough.

Fix/Workaround
• Expanded node pool size to accommodate more pods.
• Adjusted autoscaling policies to trigger faster node provisioning during scaling events.
Lessons Learned

Autoscaling node pools must be able to respond quickly during sudden traffic surges.

How to Avoid
  • 1Pre-configure node pools to handle expected traffic growth, and ensure autoscalers are tuned to scale quickly.