Back to all scenarios
Scenario #8
Cluster Management
K8s v1.23, Azure AKS

API Server High Latency Due to Event Flooding

An app spamming Kubernetes events slowed down the entire API server.

Find this helpful?
What Happened

A custom controller logged frequent events (~50/second), causing the etcd event store to choke.

Diagnosis Steps
  • 1Prometheus showed spike in event count.
  • 2kubectl get events --sort-by=.metadata.creationTimestamp showed massive spam.
  • 3Found misbehaving controller repeating failure events.
Root Cause

No rate limiting on event creation in controller logic.

Fix/Workaround
• Patched controller to rate-limit record.Eventf.
• Cleaned old events.
Lessons Learned

Events are not free – they impact etcd/API server.

How to Avoid
  • 1Use deduplicated or summarized event logic.
  • 2Set API server --event-ttl=1h and --eventRateLimit.