Back to all scenarios
Scenario #406
Scaling & Load
Kubernetes v1.24, AKS
Load Test Crashed Cluster Due to Insufficient Node Quotas
Stress test resulted in API server crash due to unthrottled pod burst.
Find this helpful?
What Happened
Locust load test created hundreds of pods, exceeding node count limits.
Diagnosis Steps
- 1API server latency spiked, etcd logs flooded.
- 2Cluster hit node quota limit on Azure.
Root Cause
No upper limit on replica count during load test ; hit cloud provider limits.
Fix/Workaround
• Added maxReplicas to HPA.
• Throttled CI tests.
Lessons Learned
CI/CD and load tests should obey cluster quotas.
How to Avoid
- 1Monitor node count vs quota in metrics.
- 2Set maxReplicas in HPA and cap CI workloads.