Back to all scenarios
Scenario #489
Scaling & Load
Kubernetes v1.22, GKE
Uncontrolled Resource Spikes After Scaling Large StatefulSets
Scaling large StatefulSets led to resource spikes that caused system instability.
Find this helpful?
What Happened
Scaling up a large StatefulSet resulted in CPU and memory spikes that overwhelmed the cluster, causing instability and outages.
Diagnosis Steps
- 1Monitored CPU and memory usage and found that new StatefulSet pods were consuming more resources than anticipated.
- 2Examined pod configurations and discovered they were not optimized for the available resources.
Root Cause
Inefficient resource requests and limits for StatefulSet pods during scaling.
Fix/Workaround
• Adjusted resource requests and limits for StatefulSet pods to better match the actual usage.
• Implemented a rolling upgrade to distribute the scaling load more evenly.
Lessons Learned
Always account for resource spikes and optimize requests for large StatefulSets.
How to Avoid
- 1Set proper resource limits and requests for StatefulSets, especially during scaling events.
- 2Test scaling for large StatefulSets in staging environments to evaluate resource impact.