Back to all scenarios
Scenario #457
Scaling & Load
Kubernetes v1.22, AWS EKS

Failure to Scale Due to Horizontal Pod Autoscaler Anomaly

Horizontal Pod Autoscaler (HPA) failed to scale up due to a temporary anomaly in the resource metrics.

Find this helpful?
What Happened

HPA failed to trigger a scale-up action during a high traffic period because resource metrics were temporarily inaccurate.

Diagnosis Steps
  • 1Checked metrics server logs and found that there was a temporary issue with the metric collection process.
  • 2Metrics were not properly reflecting the true resource usage due to a short-lived anomaly.
Root Cause

Temporary anomaly in the metric collection system led to inaccurate scaling decisions.

Fix/Workaround
• Implemented a fallback mechanism to trigger scaling based on last known good metrics.
• Used a more robust monitoring system to track resource usage in real time.
Lessons Learned

Autoscalers should have fallback mechanisms for temporary metric anomalies.

How to Avoid
  • 1Set up fallback mechanisms and monitoring alerts to handle metric inconsistencies.
  • 2Regularly test autoscaling responses to ensure reliability.