Back to all scenarios
Scenario #346
Storage
Kubernetes v1.24, in-cluster NFS server

NFS Server Restart Crashes Pods

NFS server restarted for upgrade. All dependent pods crashed due to stale file handles and unmount errors.

Find this helpful?
What Happened

NFS mount became stale after server restart. Pods using volumes got stuck in crash loops.

Diagnosis Steps
  • 1Pod logs: Stale file handle, I/O error.
  • 2Kernel logs showed NFS timeout.
Root Cause

NFS state is not stateless across server restarts unless configured.

Fix/Workaround
• Enabled NFSv4 stateless mode.
• Recovered pods by restarting them post-reboot.
Lessons Learned

In-cluster storage servers need HA design.

How to Avoid
  • 1Use managed NFS services or replicated storage.
  • 2Add pod liveness checks for filesystem readiness.