Back to all scenarios
Scenario #155
Networking
K8s v1.21, GCP GKE

Intermittent Service Disruptions Due to DNS Caching Issue

Intermittent service disruptions occurred due to stale DNS cache in CoreDNS.

Find this helpful?
What Happened

Services failed intermittently because CoreDNS had cached stale DNS records, causing them to resolve incorrectly.

Diagnosis Steps
  • 1Verified DNS resolution using nslookup and found incorrect IP addresses being returned.
  • 2Cleared the DNS cache in CoreDNS and noticed that the issue was temporarily resolved.
Root Cause

CoreDNS was caching stale DNS records due to incorrect TTL settings.

Fix/Workaround
• Reduced the TTL value in CoreDNS configuration.
• Restarted CoreDNS pods to apply the new TTL setting.
Lessons Learned

Be cautious of DNS TTL settings, especially in dynamic environments where IP addresses change frequently.

How to Avoid
  • 1Monitor DNS records and TTL values actively.
  • 2Implement cache invalidation or reduce TTL for critical services.