Failure to Scale Down Due to Persistent Idle Pods

Pods failed to scale down during low traffic periods, leading to idle resources consuming cluster capacity.

Find this helpful?

What Happened

During low traffic periods, the Horizontal Pod Autoscaler (HPA) failed to scale down pods because some pods were marked as "not ready" but still consuming resources.

Diagnosis Steps

1Checked HPA configuration and found that some pods were stuck in a “not ready” state.
2Identified that these pods were preventing the autoscaler from scaling down.

Root Cause

Pods marked as “not ready” were still consuming resources, preventing autoscaling.

Fix/Workaround

• Updated the readiness probe configuration to ensure pods were correctly marked as ready or not based on their actual state.
• Configured the HPA to scale down based on actual pod readiness.

Lessons Learned

Autoscaling can be disrupted by incorrectly configured readiness probes or failing pods.

How to Avoid

1Regularly review and adjust readiness probes to ensure they reflect the actual health of pods.
2Set up alerts for unresponsive pods that could block scaling.

Previous Scenario Next Scenario