Back to all scenarios
Scenario #492
Scaling & Load
Kubernetes v1.22, Google Kubernetes Engine (GKE)
Unstable Node Performance During Rapid Scaling
Rapid node scaling led to unstable node performance, impacting pod stability.
Find this helpful?
What Happened
A sudden scaling event resulted in new nodes being added too quickly. The Kubernetes scheduler failed to appropriately distribute workloads across the new nodes, causing instability and resource contention.
Diagnosis Steps
- 1Checked the GKE scaling settings and identified that the node pool autoscaling was triggered aggressively.
- 2Found that the new nodes lacked proper configuration for high-demand workloads.
Root Cause
Lack of proper resource configuration on new nodes during rapid scaling events.
Fix/Workaround
• Adjusted the autoscaler settings to scale nodes more gradually and ensure proper configuration of new nodes.
• Reviewed and adjusted pod scheduling policies to ensure new pods would be distributed evenly across nodes.
Lessons Learned
Scaling should be more gradual and require proper resource allocation for new nodes.
How to Avoid
- 1Implement a more conservative autoscaling policy.
- 2Add resource limits and pod affinity rules to ensure workloads are distributed across nodes efficiently.