Back to all scenarios
Scenario #133
Networking
K8s v1.20, Google Cloud

Network Latency Spikes During Pod Autoscaling

Network latency spikes occurred when autoscaling pods during traffic surges.

Find this helpful?
What Happened

As the number of pods increased due to autoscaling, network latency between pods and services spiked, causing slow response times.

Diagnosis Steps
  • 1Monitored pod-to-pod network latency using kubectl and found high latencies during autoscaling events.
  • 2Investigated pod distribution and found that new pods were being scheduled on nodes with insufficient network capacity.
Root Cause

Insufficient network capacity on newly provisioned nodes during autoscaling.

Fix/Workaround
• Adjusted the autoscaling configuration to ensure new pods are distributed across nodes with better network resources.
• Increased network capacity for nodes with higher pod density.
Lessons Learned

Network resources should be a consideration when autoscaling pods.

How to Avoid
  • 1Use network resource metrics to guide autoscaling decisions.
  • 2Continuously monitor and adjust network resources for autoscaling scenarios.