Back to all scenarios
Scenario #57
Cluster Management
K8s v1.21, AWS EKS

Cluster-wide Service Outage Due to Missing ClusterRoleBinding

Cluster-wide service outage occurred after an automated change removed a critical ClusterRoleBinding.

Find this helpful?
What Happened

A misconfigured automation pipeline accidentally removed a ClusterRoleBinding, which was required for certain critical services to function.

Diagnosis Steps
  • 1Analyzed service logs and found permission-related errors.
  • 2Checked the RBAC configuration and found the missing ClusterRoleBinding.
Root Cause

Automated pipeline incorrectly removed the ClusterRoleBinding, causing service permissions to be revoked.

Fix/Workaround
• Restored the missing ClusterRoleBinding.
• Manually verified that affected services were functioning correctly.
Lessons Learned

Automation changes must be reviewed and tested to prevent accidental permission misconfigurations.

How to Avoid
  • 1Use automated tests and checks for RBAC changes.
  • 2Implement safeguards and approval workflows for automated configuration changes.