Back to all scenarios
Scenario #52
Cluster Management
K8s v1.21, GKE

Node Draining Delay During Maintenance

Node draining took an unusually long time during maintenance due to unscheduled pod disruption.

Find this helpful?
What Happened

During a scheduled node maintenance, draining took longer than expected because pods were not respecting PodDisruptionBudgets.

Diagnosis Steps
  • 1Checked kubectl describe for affected pods and identified PodDisruptionBudget violations.
  • 2Observed that some pods had hard constraints on disruption due to storage.
Root Cause

PodDisruptionBudget was too strict, preventing pods from being evicted quickly.

Fix/Workaround
• Adjusted PodDisruptionBudget to allow more flexibility for pod evictions.
• Manually evicted the pods to speed up the node draining process.
Lessons Learned

PodDisruptionBudgets should be set based on actual disruption tolerance.

How to Avoid
  • 1Set reasonable disruption budgets for critical applications.
  • 2Test disruption scenarios during maintenance windows to identify issues.