Back to all scenarios
Scenario #156
Networking
K8s v1.18, on-premise

Flannel Overlay Network Interruption Due to Node Failure

Flannel overlay network was interrupted after a node failure, causing pod-to-pod communication issues.

Find this helpful?
What Happened

A node failure caused the Flannel CNI plugin to lose its network routes, disrupting communication between pods on different nodes.

Diagnosis Steps
  • 1Used kubectl get pods -o wide to identify affected pods.
  • 2Checked the Flannel daemon logs and found errors related to missing network routes.
Root Cause

Flannel CNI plugin was not re-establishing network routes after the node failure.

Fix/Workaround
• Restarted the Flannel pods on the affected nodes to re-establish network routes.
• Verified that communication between pods was restored.
Lessons Learned

Ensure that CNI plugins can gracefully handle node failures and re-establish connectivity.

How to Avoid
  • 1Implement automatic recovery or self-healing mechanisms for CNI plugins.
  • 2Monitor CNI plugin logs to detect issues early.