Incomplete Volume Detach Breaks Node Scheduling

Scheduler skipped a healthy node due to a ghost VolumeAttachment that was never cleaned up.

Find this helpful?

What Happened

Node marked as ready, but volume controller skipped scheduling new pods due to “in-use” flag on volumes from a deleted pod.

Diagnosis Steps

Root Cause

CSI controller restart dropped detach request queue.

Fix/Workaround

• Recreated CSI controller pod.
• Requeued detach operation via manual deletion.

Lessons Learned

CSI recovery from mid-state crash is critical.

How to Avoid