Back to all scenarios
Scenario #73
Cluster Management
K8s v1.21, AWS EKS

Service Discovery Issues Due to DNS Resolution Failures

Services could not discover each other due to DNS resolution failures, affecting internal communication.

Find this helpful?
What Happened

Pods were unable to resolve internal service names due to DNS failures, leading to broken inter-service communication.

Diagnosis Steps
  • 1Checked DNS logs and found dnsmasq errors.
  • 2Investigated CoreDNS logs and found insufficient resources allocated to the DNS pods.
Root Cause

CoreDNS pods were running out of resources (CPU/memory), causing DNS resolution failures.

Fix/Workaround
• Increased resource limits for the CoreDNS pods.
• Restarted CoreDNS pods to apply the new resource settings.
Lessons Learned

Ensure that CoreDNS has enough resources to handle DNS requests efficiently.

How to Avoid
  • 1Monitor CoreDNS pod resource usage.
  • 2Allocate adequate resources based on cluster size and workload.