Back to all scenarios
Scenario #406
Scaling & Load
Kubernetes v1.24, AKS

Load Test Crashed Cluster Due to Insufficient Node Quotas

Stress test resulted in API server crash due to unthrottled pod burst.

Find this helpful?
What Happened

Locust load test created hundreds of pods, exceeding node count limits.

Diagnosis Steps
  • 1API server latency spiked, etcd logs flooded.
  • 2Cluster hit node quota limit on Azure.
Root Cause

No upper limit on replica count during load test ; hit cloud provider limits.

Fix/Workaround
• Added maxReplicas to HPA.
• Throttled CI tests.
Lessons Learned

CI/CD and load tests should obey cluster quotas.

How to Avoid
  • 1Monitor node count vs quota in metrics.
  • 2Set maxReplicas in HPA and cap CI workloads.