Back to all scenarios
Scenario #156
Networking
K8s v1.18, on-premise
Flannel Overlay Network Interruption Due to Node Failure
Flannel overlay network was interrupted after a node failure, causing pod-to-pod communication issues.
Find this helpful?
What Happened
A node failure caused the Flannel CNI plugin to lose its network routes, disrupting communication between pods on different nodes.
Diagnosis Steps
- 1Used kubectl get pods -o wide to identify affected pods.
- 2Checked the Flannel daemon logs and found errors related to missing network routes.
Root Cause
Flannel CNI plugin was not re-establishing network routes after the node failure.
Fix/Workaround
• Restarted the Flannel pods on the affected nodes to re-establish network routes.
• Verified that communication between pods was restored.
Lessons Learned
Ensure that CNI plugins can gracefully handle node failures and re-establish connectivity.
How to Avoid
- 1Implement automatic recovery or self-healing mechanisms for CNI plugins.
- 2Monitor CNI plugin logs to detect issues early.