Back to all scenarios
Scenario #305
Storage
Kubernetes v1.24, Rook Ceph
Long PVC Rebinding Time on StatefulSet Restart
Restarting a StatefulSet with many PVCs caused long downtime due to slow rebinding.
Find this helpful?
What Happened
A 20-replica StatefulSet was restarted, and each pod waited for its PVC to rebind and attach. Ceph mount operations were sequential and slow.
Diagnosis Steps
- 1Pods stuck at Init stage for 15–20 minutes.
- 2Ceph logs showed delayed attachment per volume.
- 3Described PVCs: bound but not mounted.
Root Cause
Sequential volume mount throttling and inefficient CSI attach policies.
Fix/Workaround
• Tuned CSI attach concurrency.
• Split the StatefulSet into smaller chunks.
Lessons Learned
Large-scale StatefulSets need volume attach tuning.
How to Avoid
- 1Parallelize pod restarts using partitioned rollouts.
- 2Monitor CSI mount throughput.