Back to all scenarios
Scenario #456
Scaling & Load
Kubernetes v1.23, Azure AKS
CPU Resource Over-Commitment During Scale-Up
During a scale-up event, CPU resources were over-committed, causing pod performance degradation.
Find this helpful?
What Happened
When scaling up, CPU resources were over-allocated to new pods, leading to performance degradation as existing pods had to share CPU cores.
Diagnosis Steps
- 1Checked CPU resource allocation and found that the new pods had been allocated higher CPU shares than the existing pods, causing resource contention.
- 2Observed significant latency and degraded performance in the cluster.
Root Cause
Resource allocation was not adjusted for existing pods, causing CPU contention during scale-up.
Fix/Workaround
• Adjusted the CPU resource limits and requests for new pods to avoid over-commitment.
• Implemented resource isolation policies to prevent CPU contention.
Lessons Learned
Proper resource allocation strategies are essential during scale-up to avoid resource contention.
How to Avoid
- 1Use CPU and memory limits to avoid resource over-commitment.
- 2Implement resource isolation techniques like CPU pinning or dedicated nodes for specific workloads.