Back to all scenarios
Scenario #110
Cluster Management
K8s v1.21, AWS EKS

Cluster Autoscaler Fails to Scale Nodes Due to Incorrect IAM Role Permissions

The cluster autoscaler failed to scale the number of nodes in response to resource shortages due to missing IAM role permissions for managing EC2 instances.

Find this helpful?
What Happened

The cluster autoscaler tried to add nodes to the cluster, but due to insufficient IAM permissions, it was unable to interact with EC2 to provision new instances. This led to insufficient resources, affecting pod scheduling.

Diagnosis Steps
  • 1Checked kubectl describe pod and noted that pods were in pending state due to resource shortages.
  • 2Analyzed the IAM roles and found that the permissions required by the cluster autoscaler to manage EC2 instances were missing.
Root Cause

Missing IAM role permissions for the cluster autoscaler prevented node scaling.

Fix/Workaround
• Updated the IAM role associated with the cluster autoscaler to include the necessary permissions for EC2 instance provisioning.
• Restarted the autoscaler and confirmed that new nodes were added successfully.
Lessons Learned

Ensure that the cluster autoscaler has the required permissions to scale nodes in cloud environments.

How to Avoid
  • 1Regularly review IAM permissions and role configurations for essential services like the cluster autoscaler.
  • 2Automate IAM permission audits to catch configuration issues early.