Back to all scenarios
Scenario #26
Cluster Management
K8s v1.22, managed AKS

Taints and Tolerations Mismatch Prevented Workload Scheduling

Workloads failed to schedule on new nodes that had a taint the workloads didn’t tolerate.

Find this helpful?
What Happened

Platform team added a new node pool with node-role.kubernetes.io/gpu:NoSchedule, but forgot to add tolerations to GPU workloads.

Diagnosis Steps
  • 1kubectl describe pod – showed reason: “0/3 nodes are available: node(s) had taints”.
  • 2Checked node taints via kubectl get nodes -o json.
Root Cause

Taints on new node pool weren’t matched by tolerations in pods.

Fix/Workaround
• Added proper tolerations to workloads:
yaml
CopyEdit
tolerations:
- key: "node-role.kubernetes.io/gpu"
operator: "Exists"
effect: "NoSchedule"
Lessons Learned

Node taints should be coordinated with scheduling policies.

How to Avoid
  • 1Use preset toleration templates in CI/CD pipelines.
  • 2Test new node pools with dummy workloads.