Back to all scenarios
Scenario #440
Scaling & Load
Kubernetes v1.25, Azure AKS
Scaling Inhibited Due to Pending Jobs in Queue
Pod scaling was delayed because jobs in the queue were not processed fast enough.
Find this helpful?
What Happened
A backlog of jobs created delays in scaling, as the job queue was overfilled.
Diagnosis Steps
- 1Examined job logs, which confirmed long processing times for queued tasks.
- 2Found that the HPA didn’t account for the job queue backlog.
Root Cause
Insufficient pod scaling in response to job queue size.
Fix/Workaround
• Added job queue monitoring metrics to scaling triggers.
• Adjusted HPA to trigger based on job queue size and pod workload.
Lessons Learned
Scale based on queue and workload, not just traffic.
How to Avoid
- 1Implement queue size-based scaling triggers.
- 2Use advanced metrics for autoscaling decisions.