Volume Mount Delays Due to Node Drain Stale Attachment

Volumes took too long to attach on new nodes after pod rescheduling due to stale attachment metadata.

Find this helpful?

What Happened

After draining a node for maintenance, workloads failed over, but volume attachments still pointed to old node, causing delays in remount.

Diagnosis Steps

Root Cause

Controller had exponential backoff on detach retries.

Fix/Workaround

• Reduced backoff limit in CSI controller config.
• Used manual detach via cloud CLI in emergencies.

Lessons Learned

Volume operations can get stuck in edge-node cases.

How to Avoid