Back to all scenarios
Scenario #62
Cluster Management
K8s v1.21, AWS EKS
Failed Pod Rescheduling Due to Node Affinity Misconfiguration
Pods failed to reschedule after a node failure due to improper node affinity rules.
Find this helpful?
What Happened
When a node was taken down for maintenance, the pod failed to reschedule due to restrictive node affinity settings.
Diagnosis Steps
- 1Checked pod events and noticed affinity rule errors preventing the pod from scheduling on other nodes.
- 2Analyzed the node affinity configuration in the pod spec.
Root Cause
Node affinity rules were set too restrictively, preventing the pod from being scheduled on other nodes.
Fix/Workaround
• Adjusted the node affinity rules to be less restrictive.
• Re-scheduled the pods to available nodes.
Lessons Learned
Affinity rules should be configured to provide sufficient flexibility for pod rescheduling.
How to Avoid
- 1Set node affinity rules based on availability and workloads.
- 2Regularly test affinity and anti-affinity rules during node maintenance windows.