Back to all scenarios
Scenario #73
Cluster Management
K8s v1.21, AWS EKS
Service Discovery Issues Due to DNS Resolution Failures
Services could not discover each other due to DNS resolution failures, affecting internal communication.
Find this helpful?
What Happened
Pods were unable to resolve internal service names due to DNS failures, leading to broken inter-service communication.
Diagnosis Steps
- 1Checked DNS logs and found dnsmasq errors.
- 2Investigated CoreDNS logs and found insufficient resources allocated to the DNS pods.
Root Cause
CoreDNS pods were running out of resources (CPU/memory), causing DNS resolution failures.
Fix/Workaround
• Increased resource limits for the CoreDNS pods.
• Restarted CoreDNS pods to apply the new resource settings.
Lessons Learned
Ensure that CoreDNS has enough resources to handle DNS requests efficiently.
How to Avoid
- 1Monitor CoreDNS pod resource usage.
- 2Allocate adequate resources based on cluster size and workload.