Back to all scenarios
Scenario #305
Storage
Kubernetes v1.24, Rook Ceph

Long PVC Rebinding Time on StatefulSet Restart

Restarting a StatefulSet with many PVCs caused long downtime due to slow rebinding.

Find this helpful?
What Happened

A 20-replica StatefulSet was restarted, and each pod waited for its PVC to rebind and attach. Ceph mount operations were sequential and slow.

Diagnosis Steps
  • 1Pods stuck at Init stage for 15–20 minutes.
  • 2Ceph logs showed delayed attachment per volume.
  • 3Described PVCs: bound but not mounted.
Root Cause

Sequential volume mount throttling and inefficient CSI attach policies.

Fix/Workaround
• Tuned CSI attach concurrency.
• Split the StatefulSet into smaller chunks.
Lessons Learned

Large-scale StatefulSets need volume attach tuning.

How to Avoid
  • 1Parallelize pod restarts using partitioned rollouts.
  • 2Monitor CSI mount throughput.