Back to all scenarios
Scenario #52
Cluster Management
K8s v1.21, GKE
Node Draining Delay During Maintenance
Node draining took an unusually long time during maintenance due to unscheduled pod disruption.
Find this helpful?
What Happened
During a scheduled node maintenance, draining took longer than expected because pods were not respecting PodDisruptionBudgets.
Diagnosis Steps
- 1Checked kubectl describe for affected pods and identified PodDisruptionBudget violations.
- 2Observed that some pods had hard constraints on disruption due to storage.
Root Cause
PodDisruptionBudget was too strict, preventing pods from being evicted quickly.
Fix/Workaround
• Adjusted PodDisruptionBudget to allow more flexibility for pod evictions.
• Manually evicted the pods to speed up the node draining process.
Lessons Learned
PodDisruptionBudgets should be set based on actual disruption tolerance.
How to Avoid
- 1Set reasonable disruption budgets for critical applications.
- 2Test disruption scenarios during maintenance windows to identify issues.