Back to all scenarios
Scenario #72
Cluster Management
K8s v1.22, Azure AKS
Pod Disruption Due to Insufficient Node Resources
Pods experienced disruptions as nodes ran out of CPU and memory, causing evictions.
Find this helpful?
What Happened
During a high workload period, nodes ran out of resources, causing the scheduler to evict pods and causing disruptions.
Diagnosis Steps
- 1Monitored node resource usage and identified CPU and memory exhaustion.
- 2Reviewed pod events and noticed pod evictions due to resource pressure.
Root Cause
Insufficient node resources for the workload being run, causing resource contention and pod evictions.
Fix/Workaround
• Added more nodes to the cluster to meet resource requirements.
• Adjusted pod resource requests/limits to be more aligned with node resources.
Lessons Learned
Regularly monitor and scale nodes to ensure sufficient resources during peak workloads.
How to Avoid
- 1Use cluster autoscaling to add nodes automatically when resource pressure increases.
- 2Set appropriate resource requests and limits for pods.