Back to all scenarios
Scenario #161
Networking
K8s v1.21, AWS EKS
Network Performance Degradation Due to Overloaded CNI Plugin
Network performance degraded due to the CNI plugin being overwhelmed by high traffic volume.
Find this helpful?
What Happened
A sudden spike in traffic caused the CNI plugin to become overloaded, resulting in significant packet loss and network latency between pods.
Diagnosis Steps
- 1Monitored network traffic using kubectl top pods and observed unusually high traffic to and from a few specific pods.
- 2Inspected CNI plugin logs and found errors related to resource exhaustion.
Root Cause
The CNI plugin lacked sufficient resources to handle the spike in traffic, leading to packet loss and network degradation.
Fix/Workaround
• Increased resource limits for the CNI plugin pods.
• Used network policies to limit the traffic spikes to specific services.
Lessons Learned
Ensure that the CNI plugin is properly sized to handle peak traffic loads, and monitor its health regularly.
How to Avoid
- 1Set up traffic rate limiting to prevent sudden spikes from overwhelming the network.
- 2Use resource limits and horizontal pod autoscaling for critical CNI components.