Back to all scenarios
Scenario #8
Cluster Management
K8s v1.23, Azure AKS
API Server High Latency Due to Event Flooding
An app spamming Kubernetes events slowed down the entire API server.
Find this helpful?
What Happened
A custom controller logged frequent events (~50/second), causing the etcd event store to choke.
Diagnosis Steps
- 1Prometheus showed spike in event count.
- 2kubectl get events --sort-by=.metadata.creationTimestamp showed massive spam.
- 3Found misbehaving controller repeating failure events.
Root Cause
No rate limiting on event creation in controller logic.
Fix/Workaround
• Patched controller to rate-limit record.Eventf.
• Cleaned old events.
Lessons Learned
Events are not free – they impact etcd/API server.
How to Avoid
- 1Use deduplicated or summarized event logic.
- 2Set API server --event-ttl=1h and --eventRateLimit.