Back to all scenarios
Scenario #463
Scaling & Load
Kubernetes v1.24, Google Cloud
Insufficient Node Pools During Sudden Pod Scaling
Insufficient node pool capacity caused pod scheduling failures during sudden scaling events.
Find this helpful?
What Happened
During a sudden traffic surge, the Horizontal Pod Autoscaler (HPA) scaled the pods, but there weren’t enough nodes available to schedule the new pods.
Diagnosis Steps
- 1Checked the available resources on the nodes and found that node pools were insufficient to accommodate the newly scaled pods.
- 2Cluster logs revealed the autoscaler did not add more nodes promptly.
Root Cause
Node pool capacity was insufficient, and the autoscaler did not scale the cluster quickly enough.
Fix/Workaround
• Expanded node pool size to accommodate more pods.
• Adjusted autoscaling policies to trigger faster node provisioning during scaling events.
Lessons Learned
Autoscaling node pools must be able to respond quickly during sudden traffic surges.
How to Avoid
- 1Pre-configure node pools to handle expected traffic growth, and ensure autoscalers are tuned to scale quickly.