Service Discovery Issues Due to DNS Cache Staleness

Service discovery failed due to stale DNS cache entries that were not updated when services changed IPs.

Find this helpful?

What Happened

The DNS resolver cached the old IP addresses for services, causing service discovery failures when the IPs of the services changed.

Diagnosis Steps

1Used kubectl exec to verify DNS cache entries.
2Observed that the cached IPs were outdated and did not reflect the current service IPs.

Root Cause

The DNS cache was not being properly refreshed, causing stale DNS entries.

Fix/Workaround

• Cleared the DNS cache manually and implemented shorter TTL (Time-To-Live) values for DNS records.
• Restarted CoreDNS pods to apply changes.

Lessons Learned

Ensure that DNS TTL values are appropriately set to avoid stale cache issues.

How to Avoid

1Regularly monitor DNS cache and refresh TTL values to ensure up-to-date resolution.
2Implement a caching strategy that works well with Kubernetes service discovery.