Back to all scenarios
Scenario #33
Cluster Management
K8s v1.23, OpenShift

API Server Slowdowns from High Watch Connection Count

API latency rose sharply due to thousands of watch connections from misbehaving clients.

Find this helpful?
What Happened

Multiple pods opened persistent watch connections and never closed them, overloading the API server.

Diagnosis Steps
  • 1Monitored API metrics /metrics for apiserver_registered_watchers.
  • 2Identified top offenders using connection source IPs.
Root Cause

Custom controller with poor watch logic never closed connections.

Fix/Workaround
• Restarted offending pods.
• Updated controller to reuse watches.
Lessons Learned

Unbounded watches can exhaust server resources.

How to Avoid
  • 1Use client-go with resync periods and connection limits.
  • 2Enable metrics to detect watch leaks early.