Back to all scenarios
Scenario #105
Networking
K8s v1.19, Azure AKS

Service Discovery Failure Due to DNS Pod Resource Limits

Service discovery failed across the cluster due to DNS pod resource limits being exceeded.

Find this helpful?
What Happened

The DNS service was unable to resolve names due to resource limits being hit on the CoreDNS pods, causing failures in service discovery.

Diagnosis Steps
  • 1Checked CoreDNS pod resource usage and logs, revealing that the memory limit was being exceeded.
  • 2Found that DNS requests were timing out, and pods were unable to discover services.
Root Cause

CoreDNS pods hit resource limits, leading to failures in service resolution.

Fix/Workaround
• Increased memory and CPU limits for CoreDNS pods.
• Restarted CoreDNS pods and verified that DNS resolution was restored.
Lessons Learned

Service discovery requires sufficient resources to avoid failure.

How to Avoid
  • 1Regularly monitor CoreDNS metrics and adjust resource limits accordingly.
  • 2Scale CoreDNS replicas based on cluster size and traffic.