Back to all scenarios
Scenario #421
Scaling & Load
Kubernetes v1.23, GKE

Node Drain Race Condition During Scale Down

Node drain raced with pod termination, causing pod loss.

Find this helpful?
What Happened

Pods were terminated while the node was still draining, leading to data loss.

Diagnosis Steps
  • 1kubectl describe node showed multiple eviction races.
  • 2Pod logs showed abrupt termination without graceful shutdown.
Root Cause

Scale-down process didn’t wait for node draining to complete fully.

Fix/Workaround
• Adjusted terminationGracePeriodSeconds for pods.
• Introduced node draining delay in scaling policy.
Lessons Learned

Node draining should be synchronized with pod termination.

How to Avoid
  • 1Use PodDisruptionBudget to ensure safe scaling.
  • 2Implement pod graceful shutdown hooks.