Back to all scenarios
Scenario #155
Networking
K8s v1.21, GCP GKE
Intermittent Service Disruptions Due to DNS Caching Issue
Intermittent service disruptions occurred due to stale DNS cache in CoreDNS.
Find this helpful?
What Happened
Services failed intermittently because CoreDNS had cached stale DNS records, causing them to resolve incorrectly.
Diagnosis Steps
- 1Verified DNS resolution using nslookup and found incorrect IP addresses being returned.
- 2Cleared the DNS cache in CoreDNS and noticed that the issue was temporarily resolved.
Root Cause
CoreDNS was caching stale DNS records due to incorrect TTL settings.
Fix/Workaround
• Reduced the TTL value in CoreDNS configuration.
• Restarted CoreDNS pods to apply the new TTL setting.
Lessons Learned
Be cautious of DNS TTL settings, especially in dynamic environments where IP addresses change frequently.
How to Avoid
- 1Monitor DNS records and TTL values actively.
- 2Implement cache invalidation or reduce TTL for critical services.