Unstable Node Performance During Rapid Scaling

Rapid node scaling led to unstable node performance, impacting pod stability.

Find this helpful?

What Happened

A sudden scaling event resulted in new nodes being added too quickly. The Kubernetes scheduler failed to appropriately distribute workloads across the new nodes, causing instability and resource contention.

Diagnosis Steps

1Checked the GKE scaling settings and identified that the node pool autoscaling was triggered aggressively.
2Found that the new nodes lacked proper configuration for high-demand workloads.

Root Cause

Lack of proper resource configuration on new nodes during rapid scaling events.

Fix/Workaround

• Adjusted the autoscaler settings to scale nodes more gradually and ensure proper configuration of new nodes.
• Reviewed and adjusted pod scheduling policies to ensure new pods would be distributed evenly across nodes.

Lessons Learned

Scaling should be more gradual and require proper resource allocation for new nodes.

How to Avoid

1Implement a more conservative autoscaling policy.
2Add resource limits and pod affinity rules to ensure workloads are distributed across nodes efficiently.

Previous Scenario Next Scenario