Back to all scenarios
Scenario #175
Networking
K8s v1.19, AWS EKS

Service Discovery Failures Due to CoreDNS Pod Crash

Service discovery failures occurred when CoreDNS pods crashed due to resource exhaustion, causing DNS resolution issues.

Find this helpful?
What Happened

CoreDNS pods crashed due to high CPU utilization caused by excessive DNS queries, which prevented service discovery and caused communication failures.

Diagnosis Steps
  • 1Checked pod logs and observed frequent crashes related to out-of-memory (OOM) errors.
  • 2Monitored CoreDNS resource utilization and confirmed CPU spikes from DNS queries.
Root Cause

Resource exhaustion in CoreDNS due to an overload of DNS queries.

Fix/Workaround
• Increased CPU and memory resources for CoreDNS pods.
• Optimized the DNS query patterns from applications to reduce the load.
Lessons Learned

Ensure that DNS services like CoreDNS are properly resourced and monitored.

How to Avoid
  • 1Set up monitoring for DNS query rates and resource utilization.
  • 2Scale CoreDNS horizontally to distribute the load.