Back to all scenarios
Scenario #33
Cluster Management
K8s v1.23, OpenShift
API Server Slowdowns from High Watch Connection Count
API latency rose sharply due to thousands of watch connections from misbehaving clients.
Find this helpful?
What Happened
Multiple pods opened persistent watch connections and never closed them, overloading the API server.
Diagnosis Steps
- 1Monitored API metrics /metrics for apiserver_registered_watchers.
- 2Identified top offenders using connection source IPs.
Root Cause
Custom controller with poor watch logic never closed connections.
Fix/Workaround
• Restarted offending pods.
• Updated controller to reuse watches.
Lessons Learned
Unbounded watches can exhaust server resources.
How to Avoid
- 1Use client-go with resync periods and connection limits.
- 2Enable metrics to detect watch leaks early.