Back to all scenarios
Scenario #489
Scaling & Load
Kubernetes v1.22, GKE

Uncontrolled Resource Spikes After Scaling Large StatefulSets

Scaling large StatefulSets led to resource spikes that caused system instability.

Find this helpful?
What Happened

Scaling up a large StatefulSet resulted in CPU and memory spikes that overwhelmed the cluster, causing instability and outages.

Diagnosis Steps
  • 1Monitored CPU and memory usage and found that new StatefulSet pods were consuming more resources than anticipated.
  • 2Examined pod configurations and discovered they were not optimized for the available resources.
Root Cause

Inefficient resource requests and limits for StatefulSet pods during scaling.

Fix/Workaround
• Adjusted resource requests and limits for StatefulSet pods to better match the actual usage.
• Implemented a rolling upgrade to distribute the scaling load more evenly.
Lessons Learned

Always account for resource spikes and optimize requests for large StatefulSets.

How to Avoid
  • 1Set proper resource limits and requests for StatefulSets, especially during scaling events.
  • 2Test scaling for large StatefulSets in staging environments to evaluate resource impact.