Back to all scenarios
Scenario #456
Scaling & Load
Kubernetes v1.23, Azure AKS

CPU Resource Over-Commitment During Scale-Up

During a scale-up event, CPU resources were over-committed, causing pod performance degradation.

Find this helpful?
What Happened

When scaling up, CPU resources were over-allocated to new pods, leading to performance degradation as existing pods had to share CPU cores.

Diagnosis Steps
  • 1Checked CPU resource allocation and found that the new pods had been allocated higher CPU shares than the existing pods, causing resource contention.
  • 2Observed significant latency and degraded performance in the cluster.
Root Cause

Resource allocation was not adjusted for existing pods, causing CPU contention during scale-up.

Fix/Workaround
• Adjusted the CPU resource limits and requests for new pods to avoid over-commitment.
• Implemented resource isolation policies to prevent CPU contention.
Lessons Learned

Proper resource allocation strategies are essential during scale-up to avoid resource contention.

How to Avoid
  • 1Use CPU and memory limits to avoid resource over-commitment.
  • 2Implement resource isolation techniques like CPU pinning or dedicated nodes for specific workloads.