Back to all scenarios
Scenario #55
Cluster Management
K8s v1.22, AWS EKS
Resource Exhaustion Due to Misconfigured Horizontal Pod Autoscaler
Cluster resources were exhausted due to misconfiguration in the Horizontal Pod Autoscaler (HPA), resulting in excessive pod scaling.
Find this helpful?
What Happened
HPA was configured to scale pods based on CPU utilization but had an overly sensitive threshold, causing the application to scale out rapidly and exhaust resources.
Diagnosis Steps
- 1Analyzed HPA metrics and found excessive scaling actions.
- 2Verified CPU utilization metrics and observed that they were consistently above the threshold due to a sudden workload spike.
Root Cause
HPA was too aggressive in scaling up based on CPU utilization, without considering other metrics like memory usage or custom metrics.
Fix/Workaround
• Adjusted HPA configuration to scale based on a combination of CPU and memory usage.
• Set more appropriate scaling thresholds.
Lessons Learned
Scaling based on a single metric (e.g., CPU) can lead to inefficiency, especially during workload spikes.
How to Avoid
- 1Use multiple metrics for autoscaling (e.g., CPU, memory, and custom metrics).
- 2Set more conservative scaling thresholds to prevent resource exhaustion.