Back to all scenarios
Scenario #85
Cluster Management
K8s v1.22, Azure AKS
Failed Node Drain Due to In-Use Pods
A node failed to drain due to pods that were in use, preventing the drain operation from completing.
Find this helpful?
What Happened
When attempting to drain a node, the operation failed because some pods were still in use or had pending termination grace periods.
Diagnosis Steps
- 1Ran kubectl describe node and checked pod evictions.
- 2Identified pods that were in the middle of long-running processes or had insufficient termination grace periods.
Root Cause
Pods with long-running tasks or improper termination grace periods caused the drain to hang.
Fix/Workaround
• Increased termination grace periods for the affected pods.
• Forced the node drain operation after ensuring that the pods could safely terminate.
Lessons Learned
Ensure that pods with long-running tasks have adequate termination grace periods.
How to Avoid
- 1Configure appropriate termination grace periods for all pods.
- 2Monitor node draining and ensure pods can gracefully shut down.