Back to all scenarios
Scenario #1
Cluster Management
K8s v1.23, On-prem bare metal, Systemd cgroups

Zombie Pods Causing NodeDrain to Hang

Node drain stuck indefinitely due to unresponsive terminating pod.

Find this helpful?
What Happened

A pod with a custom finalizer never completed termination, blocking kubectl drain. Even after the pod was marked for deletion, the API server kept waiting because the finalizer wasn’t removed.

Diagnosis Steps
  • 1Checked kubectl get pods --all-namespaces -o wide to find lingering pods.
  • 2Found pod stuck in Terminating state for over 20 minutes.
  • 3Used kubectl describe pod <pod> to identify the presence of a custom finalizer.
  • 4Investigated controller logs managing the finalizer – the controller had crashed.
Root Cause

Finalizer logic was never executed because its controller was down, leaving the pod undeletable.

Fix/Workaround
kubectl patch pod <pod-name> -p '{"metadata":{"finalizers":[]}}' --type=merge
Lessons Learned

Finalizers should have timeout or fail-safe logic.

How to Avoid
  • 1Avoid finalizers unless absolutely necessary.
  • 2Add monitoring for stuck Terminating pods.
  • 3Implement retry/timeout logic in finalizer controllers.