Back to all scenarios
Scenario #322
Storage
Kubernetes v1.23, AWS EBS CSI
Volume Mount Delays Due to Node Drain Stale Attachment
Volumes took too long to attach on new nodes after pod rescheduling due to stale attachment metadata.
Find this helpful?
What Happened
After draining a node for maintenance, workloads failed over, but volume attachments still pointed to old node, causing delays in remount.
Diagnosis Steps
- 1Described PV: still had attachedNode as drained one.
- 2Cloud logs showed volume in-use errors.
- 3CSI controller didn’t retry detach fast enough.
Root Cause
Controller had exponential backoff on detach retries.
Fix/Workaround
• Reduced backoff limit in CSI controller config.
• Used manual detach via cloud CLI in emergencies.
Lessons Learned
Volume operations can get stuck in edge-node cases.
How to Avoid
- 1Use health checks to ensure detach success before draining.
- 2Monitor VolumeAttachment objects during node ops.