Back to all scenarios
Scenario #419
Scaling & Load
Kubernetes v1.21, bare metal
Overuse of Liveness Probes Disrupted Load Balance
Misfiring liveness probes killed healthy pods during load test.
Find this helpful?
What Happened
Sudden scale-out introduced new pods, which were killed due to false negatives on liveness probes.
Diagnosis Steps
- 1Pod logs showed probe failures under high CPU.
- 2Readiness was OK, liveness killed them anyway.
Root Cause
CPU starvation during load caused probe timeouts.
Fix/Workaround
• Increased probe timeoutSeconds and failureThreshold.
Lessons Learned
Under load, even health checks need headroom.
How to Avoid
- 1Separate readiness from liveness logic.
- 2Gracefully handle CPU-heavy workloads.