Back to all scenarios
Scenario #421
Scaling & Load
Kubernetes v1.23, GKE
Node Drain Race Condition During Scale Down
Node drain raced with pod termination, causing pod loss.
Find this helpful?
What Happened
Pods were terminated while the node was still draining, leading to data loss.
Diagnosis Steps
- 1kubectl describe node showed multiple eviction races.
- 2Pod logs showed abrupt termination without graceful shutdown.
Root Cause
Scale-down process didn’t wait for node draining to complete fully.
Fix/Workaround
• Adjusted terminationGracePeriodSeconds for pods.
• Introduced node draining delay in scaling policy.
Lessons Learned
Node draining should be synchronized with pod termination.
How to Avoid
- 1Use PodDisruptionBudget to ensure safe scaling.
- 2Implement pod graceful shutdown hooks.