Back to all scenarios
Scenario #342
Storage
Kubernetes v1.23, CSI Snapshot Controller
Volume Snapshot Controller Race Condition
Rapid creation/deletion of snapshots caused the controller to panic due to race conditions in snapshot finalizers.
Find this helpful?
What Happened
Automation created/deleted hundreds of snapshots per minute. The controller panicked due to concurrent finalizer modifications.
Diagnosis Steps
- 1Observed controller crash loop in logs.
- 2Snapshot objects stuck in Terminating state.
- 3Controller logs: resourceVersion conflict.
Root Cause
Finalizer updates not serialized under high load.
Fix/Workaround
• Throttled snapshot requests.
• Patched controller deployment to limit concurrency.
Lessons Learned
High snapshot churn breaks stability.
How to Avoid
- 1Monitor snapshot queue metrics.
- 2Apply rate limits in CI/CD snapshot tests.