Back to all scenarios
Scenario #419
Scaling & Load
Kubernetes v1.21, bare metal

Overuse of Liveness Probes Disrupted Load Balance

Misfiring liveness probes killed healthy pods during load test.

Find this helpful?
What Happened

Sudden scale-out introduced new pods, which were killed due to false negatives on liveness probes.

Diagnosis Steps
  • 1Pod logs showed probe failures under high CPU.
  • 2Readiness was OK, liveness killed them anyway.
Root Cause

CPU starvation during load caused probe timeouts.

Fix/Workaround
• Increased probe timeoutSeconds and failureThreshold.
Lessons Learned

Under load, even health checks need headroom.

How to Avoid
  • 1Separate readiness from liveness logic.
  • 2Gracefully handle CPU-heavy workloads.