Back to all scenarios
Scenario #342
Storage
Kubernetes v1.23, CSI Snapshot Controller

Volume Snapshot Controller Race Condition

Rapid creation/deletion of snapshots caused the controller to panic due to race conditions in snapshot finalizers.

Find this helpful?
What Happened

Automation created/deleted hundreds of snapshots per minute. The controller panicked due to concurrent finalizer modifications.

Diagnosis Steps
  • 1Observed controller crash loop in logs.
  • 2Snapshot objects stuck in Terminating state.
  • 3Controller logs: resourceVersion conflict.
Root Cause

Finalizer updates not serialized under high load.

Fix/Workaround
• Throttled snapshot requests.
• Patched controller deployment to limit concurrency.
Lessons Learned

High snapshot churn breaks stability.

How to Avoid
  • 1Monitor snapshot queue metrics.
  • 2Apply rate limits in CI/CD snapshot tests.