Back to all scenarios
Scenario #346
Storage
Kubernetes v1.24, in-cluster NFS server
NFS Server Restart Crashes Pods
NFS server restarted for upgrade. All dependent pods crashed due to stale file handles and unmount errors.
Find this helpful?
What Happened
NFS mount became stale after server restart. Pods using volumes got stuck in crash loops.
Diagnosis Steps
- 1Pod logs: Stale file handle, I/O error.
- 2Kernel logs showed NFS timeout.
Root Cause
NFS state is not stateless across server restarts unless configured.
Fix/Workaround
• Enabled NFSv4 stateless mode.
• Recovered pods by restarting them post-reboot.
Lessons Learned
In-cluster storage servers need HA design.
How to Avoid
- 1Use managed NFS services or replicated storage.
- 2Add pod liveness checks for filesystem readiness.