Kubernetes Production Issues

The cluster autoscaler failed to scale the number of nodes in response to resource shortages due to missing IAM role permissions for managing EC2 instances.

Click to view details

Cluster Management

#111

Networking

DNS Resolution Failure Due to Incorrect Pod IP Allocation

K8s v1.21, GKE

DNS resolution failed due to incorrect IP allocation in the cluster’s CNI plugin.

Click to view details

Networking

#112

Networking

Failed Pod-to-Service Communication Due to Port Binding Conflict

K8s v1.18, AWS EKS

Pods couldn’t communicate with services because of a port binding conflict.

Click to view details

Networking

#113

Networking

Pod Eviction Due to Network Resource Constraints

K8s v1.19, GKE

A pod was evicted due to network resource constraints, specifically limited bandwidth.

Click to view details

Networking

#114

Networking

Intermittent Network Disconnects Due to MTU Mismatch Between Nodes

K8s v1.20, Azure AKS

Intermittent network disconnects occurred due to MTU mismatches between different nodes in the cluster.

Click to view details

Networking

#115

Networking

Service Load Balancer Failing to Route Traffic to New Pods

K8s v1.22, Google GKE

Service load balancer failed to route traffic to new pods after scaling up.

Click to view details

Networking

#116

Networking

Network Traffic Drop Due to Overlapping CIDR Blocks

K8s v1.19, AWS EKS

Network traffic dropped due to overlapping CIDR blocks between the VPC and Kubernetes pod network.

Click to view details

Networking

#117

Networking

Misconfigured DNS Resolvers Leading to Service Discovery Failure

K8s v1.21, DigitalOcean Kubernetes

Service discovery failed due to misconfigured DNS resolvers.

Click to view details

Networking

#118

Networking

Intermittent Latency Due to Overloaded Network Interface

K8s v1.22, AWS EKS

Intermittent network latency occurred due to an overloaded network interface on a single node.

Click to view details

Networking

#119

Networking

Pod Disconnection During Network Partition

K8s v1.20, Google GKE

Pods were disconnected during a network partition between nodes in the cluster.

Click to view details

Networking

#120

Networking

Pod-to-Pod Communication Blocked by Network Policies

K8s v1.21, AWS EKS

Pod-to-pod communication was blocked due to overly restrictive network policies.

Click to view details

Networking

#121

Networking

Unresponsive External API Due to DNS Resolution Failure

K8s v1.22, DigitalOcean Kubernetes

External API calls from the pods failed due to DNS resolution issues for the external domain.

Click to view details

Networking

#122

Networking

Load Balancer Health Checks Failing After Pod Update

K8s v1.19, GCP Kubernetes Engine

Load balancer health checks failed after updating a pod due to incorrect readiness probe configuration.

Click to view details

Networking

#123

Networking

Pod Network Performance Degradation After Node Upgrade

K8s v1.21, Azure AKS

Network performance degraded after an automatic node upgrade, causing latency in pod communication.

Click to view details

Networking

#124

Networking

Service IP Conflict Due to CIDR Overlap

K8s v1.20, GKE

A service IP conflict occurred due to overlapping CIDR blocks, preventing correct routing of traffic to the service.

Click to view details

Networking

#125

Networking

High Latency in Inter-Namespace Communication

K8s v1.22, AWS EKS

High latency observed in inter-namespace communication, leading to application timeouts.

Click to view details

Networking

#126

Networking

Pod Network Disruptions Due to CNI Plugin Update

K8s v1.19, DigitalOcean Kubernetes

Pods experienced network disruptions after updating the CNI plugin to a newer version.

Click to view details

Networking

#127

Networking

Loss of Service Traffic Due to Missing Ingress Annotations

K8s v1.21, GKE

Loss of service traffic after ingress annotations were incorrectly set, causing the ingress controller to misroute traffic.

Click to view details

Networking

#128

Cluster Management

Node Pool Draining Timeout Due to Slow Pod Termination

K8s v1.19, GKE

The node pool draining process timed out during upgrades due to pods taking longer than expected to terminate.

Click to view details

Cluster Management

#129

Cluster Management

Failed Cluster Upgrade Due to Incompatible API Versions

K8s v1.17, Azure AKS

The cluster upgrade failed because certain deprecated API versions were still in use, causing compatibility issues with the new K8s version.

Click to view details

Cluster Management

#130

Networking

DNS Resolution Failure for Services After Pod Restart

K8s v1.19, Azure AKS

DNS resolution failed for services after restarting a pod, causing internal communication issues.

Click to view details

Networking

#131

Networking

Pod IP Address Changes Causing Application Failures

K8s v1.21, GKE

Application failed after a pod IP address changed unexpectedly, breaking communication between services.

Click to view details

Networking

#132

Networking

Service Exposure Failed Due to Misconfigured Load Balancer

K8s v1.22, AWS EKS

A service exposure attempt failed due to incorrect configuration of the AWS load balancer.

Click to view details

Networking

#133

Networking

Network Latency Spikes During Pod Autoscaling

K8s v1.20, Google Cloud

Network latency spikes occurred when autoscaling pods during traffic surges.

Click to view details

Networking

#134

Networking

Service Not Accessible Due to Incorrect Namespace Selector

K8s v1.18, on-premise

A service was not accessible due to a misconfigured namespace selector in the service definition.

Click to view details

Networking

#135

Networking

Intermittent Pod Connectivity Due to Network Plugin Bug

K8s v1.23, DigitalOcean Kubernetes

Pods experienced intermittent connectivity issues due to a bug in the CNI network plugin.

Click to view details

Networking

#136

Networking

Failed Ingress Traffic Routing Due to Missing Annotations

K8s v1.21, AWS EKS

Ingress traffic was not properly routed to services due to missing annotations in the ingress resource.

Click to view details

Networking

#137

Networking

Pod IP Conflict Causing Service Downtime

K8s v1.19, GKE

A pod IP conflict caused service downtime and application crashes.

Click to view details

Networking

#138

Networking

Latency Due to Unoptimized Service Mesh Configuration

K8s v1.21, Istio

Increased latency in service-to-service communication due to suboptimal configuration of Istio service mesh.

Click to view details

Networking

#139

Networking

DNS Resolution Failure After Cluster Upgrade

K8s v1.20 to v1.21, AWS EKS

DNS resolution failures occurred across pods after a Kubernetes cluster upgrade.

Click to view details

Networking

#140

Networking

Service Mesh Sidecar Injection Failure

K8s v1.19, Istio 1.8

Sidecar injection failed for some pods in the service mesh, preventing communication between services.

Click to view details

Networking

#141

Networking

Network Bandwidth Saturation During Large-Scale Deployments

K8s v1.21, Azure AKS

Network bandwidth was saturated during a large-scale deployment, affecting cluster communication.

Click to view details

Networking

#142

Networking

Inconsistent Network Policies Blocking Internal Traffic

K8s v1.18, GKE

Internal pod-to-pod traffic was unexpectedly blocked due to inconsistent network policies.

Click to view details

Networking

#143

Networking

Pod Network Latency Caused by Overloaded CNI Plugin

K8s v1.19, on-premise

Pod network latency increased due to an overloaded CNI plugin.

Click to view details

Networking

#144

Networking

TCP Retransmissions Due to Network Saturation

K8s v1.22, DigitalOcean Kubernetes

TCP retransmissions increased due to network saturation, leading to degraded pod-to-pod communication.

Click to view details

Networking

#145

Networking

DNS Lookup Failures Due to Resource Limits

K8s v1.20, AWS EKS

DNS lookup failures occurred due to resource limits on the CoreDNS pods.

Click to view details

Networking

#146

Networking

Service Exposure Issues Due to Incorrect Ingress Configuration

K8s v1.22, Azure AKS

A service was not accessible externally due to incorrect ingress configuration.

Click to view details

Networking

#147

Networking

Pod-to-Pod Communication Failure Due to Network Policy

K8s v1.19, on-premise

Pod-to-pod communication failed due to an overly restrictive network policy.

Click to view details

Networking

#148

Networking

Unstable Network Due to Overlay Network Misconfiguration

K8s v1.18, VMware Tanzu

The overlay network was misconfigured, leading to instability in pod communication.

Click to view details

Networking

#149

Networking

Intermittent Pod Network Connectivity Due to Cloud Provider Issues

K8s v1.21, AWS EKS

Pod network connectivity was intermittent due to issues with the cloud provider's network infrastructure.

Click to view details

Networking

#150

Networking

Port Conflicts Between Services in Different Namespaces

K8s v1.22, Google GKE

Port conflicts between services in different namespaces led to communication failures.

Click to view details

Networking

#151

Networking

NodePort Service Not Accessible Due to Firewall Rules

K8s v1.23, Google GKE

A NodePort service became inaccessible due to restrictive firewall rules on the cloud provider.

Click to view details

Networking

#152

Networking

DNS Latency Due to Overloaded CoreDNS Pods

K8s v1.19, AWS EKS

CoreDNS latency increased due to resource constraints on the CoreDNS pods.

Click to view details

Networking

#153

Networking

Network Performance Degradation Due to Misconfigured MTU

K8s v1.20, on-premise

Network performance degraded due to an incorrect Maximum Transmission Unit (MTU) setting.

Click to view details

Networking

#154

Networking

Application Traffic Routing Issue Due to Incorrect Ingress Resource

K8s v1.22, Azure AKS

Application traffic was routed incorrectly due to an error in the ingress resource definition.

Click to view details

Networking

#155

Networking

Intermittent Service Disruptions Due to DNS Caching Issue

K8s v1.21, GCP GKE

Intermittent service disruptions occurred due to stale DNS cache in CoreDNS.

Click to view details

Networking

#156

Networking

Flannel Overlay Network Interruption Due to Node Failure

K8s v1.18, on-premise

Flannel overlay network was interrupted after a node failure, causing pod-to-pod communication issues.

Click to view details

Networking

#157

Networking

Network Traffic Loss Due to Port Collision in Network Policy

K8s v1.19, GKE

Network traffic was lost due to a port collision in the network policy, affecting application availability.

Click to view details

Networking

#158

Networking

CoreDNS Service Failures Due to Resource Exhaustion

K8s v1.20, Azure AKS

CoreDNS service failed due to resource exhaustion, causing DNS resolution failures.

Click to view details

Networking

#159

Networking

Pod Network Partition Due to Misconfigured IPAM

K8s v1.22, VMware Tanzu

Pod network partition occurred due to an incorrectly configured IP Address Management (IPAM) in the CNI plugin.

Click to view details

Networking

#160

Networking

Network Performance Degradation Due to Overloaded CNI Plugin

K8s v1.21, AWS EKS

Network performance degraded due to the CNI plugin being overwhelmed by high traffic volume.

Click to view details

Networking

#161

Networking

Network Performance Degradation Due to Overloaded CNI Plugin

K8s v1.21, AWS EKS

Network performance degraded due to the CNI plugin being overwhelmed by high traffic volume.

Click to view details

Networking

#162

Networking

DNS Resolution Failures Due to Misconfigured CoreDNS

K8s v1.19, Google GKE

DNS resolution failures due to misconfigured CoreDNS, leading to application errors.

Click to view details

Networking

#163

Networking

Network Partition Due to Incorrect Calico Configuration

K8s v1.20, Azure AKS

Network partitioning due to incorrect Calico CNI configuration, resulting in pods being unable to communicate with each other.

Click to view details

Networking

#164

Networking

IP Overlap Leading to Communication Failure Between Pods

K8s v1.19, On-premise

Pods failed to communicate due to IP address overlap caused by an incorrect subnet configuration.

Click to view details

Networking

#165

Networking

Pod Network Latency Due to Overloaded Kubernetes Network Interface

K8s v1.21, AWS EKS

Pod network latency increased due to an overloaded network interface on the Kubernetes nodes.

Click to view details

Networking

#166

Networking

Intermittent Connectivity Failures Due to Pod DNS Cache Expiry

K8s v1.22, Google GKE

Intermittent connectivity failures due to pod DNS cache expiry, leading to failed DNS lookups for external services.

Click to view details

Networking

#167

Networking

Flapping Network Connections Due to Misconfigured Network Policies

K8s v1.20, Azure AKS

Network connections between pods were intermittently dropping due to misconfigured network policies, causing application instability.

Click to view details

Networking

#168

Networking

Cluster Network Downtime Due to CNI Plugin Upgrade

K8s v1.22, On-premise

Cluster network downtime occurred during a CNI plugin upgrade, affecting pod-to-pod communication.

Click to view details

Networking

#169

Networking

Inconsistent Pod Network Connectivity in Multi-Region Cluster

K8s v1.21, GCP

Pods in a multi-region cluster experienced inconsistent network connectivity between regions due to misconfigured VPC peering.

Click to view details

Networking

#170

Networking

Pod Network Partition Due to Network Policy Blocking DNS Requests

K8s v1.19, Azure AKS

Pods were unable to resolve DNS due to a network policy blocking DNS traffic, causing service failures.

Click to view details

Networking

#171

Networking

Network Bottleneck Due to Overutilized Network Interface

K8s v1.22, AWS EKS

Network bottleneck occurred due to overutilization of a single network interface on the worker nodes.

Click to view details

Networking

#172

Networking

Network Latency Caused by Overloaded VPN Tunnel

K8s v1.20, On-premise

Network latency increased due to an overloaded VPN tunnel between the Kubernetes cluster and an on-premise data center.

Click to view details

Networking

#173

Networking

Dropped Network Packets Due to MTU Mismatch

K8s v1.21, GKE

Network packets were dropped due to a mismatch in Maximum Transmission Unit (MTU) settings across different network components.

Click to view details

Networking

#174

Networking

Pod Network Isolation Due to Misconfigured Network Policy

K8s v1.20, Azure AKS

Pods in a specific namespace were unable to communicate due to an incorrectly applied network policy blocking traffic between namespaces.

Click to view details

Networking

#175

Networking

Service Discovery Failures Due to CoreDNS Pod Crash

K8s v1.19, AWS EKS

Service discovery failures occurred when CoreDNS pods crashed due to resource exhaustion, causing DNS resolution issues.

Click to view details

Networking

#176

Networking

Pod DNS Resolution Failure Due to CoreDNS Configuration Issue

K8s v1.18, On-premise

DNS resolution failures occurred within pods due to a misconfiguration in the CoreDNS config map.

Click to view details

Networking

#177

Networking

DNS Latency Due to Overloaded CoreDNS Pods

K8s v1.19, GKE

CoreDNS pods experienced high latency and timeouts due to resource overutilization, causing slow DNS resolution for applications.

Click to view details

Networking

#178

Networking

Pod Network Degradation Due to Overlapping CIDR Blocks

K8s v1.21, AWS EKS

Network degradation occurred due to overlapping CIDR blocks between VPCs in a hybrid cloud setup, causing routing issues.

Click to view details

Networking

#179

Networking

Service Discovery Failures Due to Network Policy Blocking DNS Traffic

K8s v1.22, Azure AKS

Service discovery failed when a network policy was mistakenly applied to block DNS traffic, preventing pods from resolving services within the cluster.

Click to view details

Networking

#180

Networking

Intermittent Network Connectivity Due to Overloaded Overlay Network

K8s v1.19, OpenStack

Pods experienced intermittent network connectivity issues due to an overloaded overlay network that could not handle the traffic.

Click to view details

Networking

#181

Networking

Pod-to-Pod Communication Failure Due to CNI Plugin Configuration Issue

K8s v1.22, AWS EKS

Pods were unable to communicate with each other due to a misconfiguration in the CNI plugin.

Click to view details

Networking

#182

Networking

Sporadic DNS Failures Due to Resource Contention in CoreDNS Pods

K8s v1.19, GKE

Sporadic DNS resolution failures occurred due to resource contention in CoreDNS pods, which were not allocated enough CPU resources.

Click to view details

Networking

#183

Networking

High Latency in Pod-to-Node Communication Due to Overlay Network

K8s v1.21, OpenShift

High latency was observed in pod-to-node communication due to network overhead introduced by the overlay network.

Click to view details

Networking

#184

Networking

Service Discovery Issues Due to DNS Cache Staleness

K8s v1.20, On-premise

Service discovery failed due to stale DNS cache entries that were not updated when services changed IPs.

Click to view details

Networking

#185

Networking

Network Partition Between Node Pools in Multi-Zone Cluster

K8s v1.18, GKE

Pods in different node pools located in different zones experienced network partitioning due to a misconfigured regional load balancer.

Click to view details

Networking

#186

Networking

Pod Network Isolation Failure Due to Missing NetworkPolicy

K8s v1.21, AKS

Pods that were intended to be isolated from each other could communicate freely due to a missing NetworkPolicy.

Click to view details

Networking

#187

Networking

Flapping Node Network Connectivity Due to MTU Mismatch

K8s v1.20, On-Premise

Nodes in the cluster were flapping due to mismatched MTU settings between Kubernetes and the underlying physical network, causing intermittent network connectivity issues.

Click to view details

Networking

#188

Networking

DNS Query Timeout Due to Unoptimized CoreDNS Config

K8s v1.18, GKE

DNS queries were timing out in the cluster, causing delays in service discovery, due to unoptimized CoreDNS configuration.

Click to view details

Networking

#189

Networking

Traffic Splitting Failure Due to Incorrect Service LoadBalancer Configuration

K8s v1.22, AWS EKS

Traffic splitting between two microservices failed due to a misconfiguration in the Service LoadBalancer.

Click to view details

Networking

#190

Networking

Network Latency Between Pods in Different Regions

K8s v1.19, Azure AKS

Pods in different Azure regions experienced high network latency, affecting application performance.

Click to view details

Networking

#191

Networking

Port Collision Between Services Due to Missing Port Ranges

K8s v1.21, AKS

Two services attempted to bind to the same port, causing a port collision and service failures.

Click to view details

Networking

#192

Networking

Pod-to-External Service Connectivity Failures Due to Egress Network Policy

K8s v1.20, AWS EKS

Pods failed to connect to an external service due to an overly restrictive egress network policy.

Click to view details

Networking

#193

Networking

Pod Connectivity Loss After Network Plugin Upgrade

K8s v1.18, GKE

Pods lost connectivity after an upgrade of the Calico network plugin due to misconfigured IP pool settings.

Click to view details

Networking

#194

Networking

External DNS Not Resolving After Cluster Network Changes

K8s v1.19, DigitalOcean

External DNS resolution stopped working after changes were made to the cluster network configuration.

Click to view details

Networking

#195

Networking

Slow Pod Communication Due to Misconfigured MTU in Network Plugin

K8s v1.22, On-premise

Pod-to-pod communication was slow due to an incorrect MTU setting in the network plugin.

Click to view details

Networking

#196

Networking

High CPU Usage in Nodes Due to Overloaded Network Plugin

K8s v1.22, AWS EKS

Nodes experienced high CPU usage due to an overloaded network plugin that couldn’t handle traffic spikes effectively.

Click to view details

Networking

#197

Networking

Cross-Namespace Network Isolation Not Enforced

K8s v1.19, OpenShift

Network isolation between namespaces failed due to an incorrectly applied NetworkPolicy.

Click to view details

Networking

#198

Networking

Inconsistent Service Discovery Due to CoreDNS Misconfiguration

K8s v1.20, GKE

Service discovery was inconsistent due to misconfigured CoreDNS settings.

Click to view details

Networking

#199

Networking

Network Segmentation Issues Due to Misconfigured CNI

K8s v1.18, IBM Cloud

Network segmentation between clusters failed due to incorrect CNI (Container Network Interface) plugin configuration.

Click to view details

Networking

#200

Networking

DNS Cache Poisoning in CoreDNS

K8s v1.23, DigitalOcean

DNS cache poisoning occurred in CoreDNS, leading to incorrect IP resolution for services.

Click to view details

Networking

#201

Security

Unauthorized Access to Secrets Due to Incorrect RBAC Permissions

K8s v1.22, GKE

Unauthorized users were able to access Kubernetes secrets due to overly permissive RBAC roles.

Click to view details

Security

#202

Security

Insecure Network Policies Leading to Pod Exposure

K8s v1.19, AWS EKS

Pods intended to be isolated were exposed to unauthorized traffic due to misconfigured network policies.

Click to view details

Security

#203

Security

Privileged Container Vulnerability Due to Incorrect Security Context

K8s v1.21, Azure AKS

A container running with elevated privileges due to an incorrect security context exposed the cluster to potential privilege escalation attacks.

Click to view details

Security

#204

Security

Exposed Kubernetes Dashboard Due to Misconfigured Ingress

K8s v1.20, GKE

The Kubernetes dashboard was exposed to the public internet due to a misconfigured Ingress resource.

Click to view details

Security

#205

Security

Unencrypted Communication Between Pods Due to Missing TLS Configuration

K8s v1.18, On-Premise

Communication between microservices in the cluster was not encrypted due to missing TLS configuration, exposing data to potential interception.

Click to view details

Security

#206

Security

Sensitive Data in Logs Due to Improper Log Sanitization

K8s v1.23, Azure AKS

Sensitive data, such as API keys and passwords, was logged due to improper sanitization in application logs.

Click to view details

Security

#207

Security

Insufficient Pod Security Policies Leading to Privilege Escalation

K8s v1.21, GKE

Privilege escalation was possible due to insufficiently restrictive PodSecurityPolicies (PSPs).

Click to view details

Security

#208

Security

Service Account Token Compromise

K8s v1.22, DigitalOcean

A compromised service account token was used to gain unauthorized access to the cluster's API server.

Click to view details

Security

#209

Security

Lack of Regular Vulnerability Scanning in Container Images

K8s v1.19, On-Premise

The container images used in the cluster were not regularly scanned for vulnerabilities, leading to deployment of vulnerable images.

Click to view details

Security

#210

Security

Insufficient Container Image Signing Leading to Unverified Deployments

K8s v1.20, Google Cloud

Unverified container images were deployed due to the lack of image signing, exposing the cluster to potential malicious code.

Click to view details

Security

#211

Security

Insecure Default Namespace Leading to Unauthorized Access

K8s v1.22, AWS EKS

Unauthorized users gained access to resources in the default namespace due to lack of namespace isolation.

Click to view details

Security

#212

Security

Vulnerable OpenSSL Version in Container Images

K8s v1.21, DigitalOcean

A container image was using an outdated and vulnerable version of OpenSSL, exposing the cluster to known security vulnerabilities.

Click to view details

Security

#213

Security

Misconfigured API Server Authentication Allowing External Access

K8s v1.20, GKE

API server authentication was misconfigured, allowing external unauthenticated users to access the Kubernetes API.

Click to view details

Security

#214

Security

Insufficient Node Security Due to Lack of OS Hardening

K8s v1.22, Azure AKS

Nodes in the cluster were insecure due to a lack of proper OS hardening, making them vulnerable to attacks.

Click to view details

Security

#215

Security

Unrestricted Ingress Access to Sensitive Resources

K8s v1.21, GKE

Sensitive services were exposed to the public internet due to unrestricted ingress rules.

Click to view details

Security

#216

Security

Exposure of Sensitive Data in Container Environment Variables

K8s v1.19, AWS EKS

Sensitive data, such as database credentials, was exposed through environment variables in container configurations.

Click to view details

Security

#217

Security

Inadequate Container Resource Limits Leading to DoS Attacks

K8s v1.20, On-Premise

A lack of resource limits on containers allowed a denial-of-service (DoS) attack to disrupt services by consuming excessive CPU and memory.

Click to view details

Security

#218

Security

Exposure of Container Logs Due to Insufficient Log Management

K8s v1.21, Google Cloud

Container logs were exposed to unauthorized users due to insufficient log management controls.

Click to view details

Security

#219

Security

Using Insecure Docker Registry for Container Images

K8s v1.18, On-Premise

The cluster was pulling container images from an insecure, untrusted Docker registry, exposing the system to the risk of malicious images.

Click to view details

Security

#220

Security

Weak Pod Security Policies Leading to Privileged Containers

K8s v1.19, AWS EKS

Privileged containers were deployed due to weak or missing Pod Security Policies (PSPs), exposing the cluster to security risks.

Click to view details

Security

#221

Security

Unsecured Kubernetes Dashboard

K8s v1.21, GKE

The Kubernetes Dashboard was exposed to the public internet without proper authentication or access controls, allowing unauthorized users to access sensitive cluster information.

Click to view details

Security

#222

Security

Using HTTP Instead of HTTPS for Ingress Resources

K8s v1.22, Google Cloud

Sensitive applications were exposed using HTTP instead of HTTPS, leaving communication vulnerable to eavesdropping and man-in-the-middle attacks.

Click to view details

Security

#223

Security

Insecure Network Policies Exposing Internal Services

K8s v1.20, On-Premise

Network policies were too permissive, exposing internal services to unnecessary access, increasing the risk of lateral movement within the cluster.

Click to view details

Security

#224

Security

Exposing Sensitive Secrets in Environment Variables

K8s v1.21, AWS EKS

Sensitive credentials were stored in environment variables within the pod specification, exposing them to potential attackers.

Click to view details

Security

#225

Security

Insufficient RBAC Permissions Leading to Unauthorized Access

K8s v1.20, On-Premise

Insufficient Role-Based Access Control (RBAC) configurations allowed unauthorized users to access and modify sensitive resources within the cluster.

Click to view details

Security

#226

Security

Insecure Ingress Controller Exposed to the Internet

K8s v1.22, Google Cloud

An insecure ingress controller was exposed to the internet, allowing attackers to exploit vulnerabilities in the controller.

Click to view details

Security

#227

Security

Lack of Security Updates in Container Images

K8s v1.19, DigitalOcean

The cluster was running outdated container images without the latest security patches, exposing it to known vulnerabilities.

Click to view details

Security

#228

Security

Exposed Kubelet API Without Authentication

K8s v1.21, AWS EKS

The Kubelet API was exposed without proper authentication or authorization, allowing external users to query cluster node details.

Click to view details

Security

#229

Security

Inadequate Logging of Sensitive Events

K8s v1.22, Google Cloud

Sensitive security events were not logged, preventing detection of potential security breaches or misconfigurations.

Click to view details

Security

#230

Security

Misconfigured RBAC Allowing Cluster Admin Privileges to Developers

K8s v1.19, On-Premise

Developers were mistakenly granted cluster admin privileges due to misconfigured RBAC roles, which gave them the ability to modify sensitive resources.

Click to view details

Security

#231

Security

Insufficiently Secured Service Account Permissions

K8s v1.20, AWS EKS

Service accounts were granted excessive permissions, giving pods access to resources they did not require, leading to a potential security risk.

Click to view details

Security

#232

Security

Cluster Secrets Exposed Due to Insecure Mounting

K8s v1.21, On-Premise

Kubernetes secrets were mounted into pods insecurely, exposing sensitive information to unauthorized users.

Click to view details

Security

#233

Security

Improperly Configured API Server Authorization

K8s v1.22, Azure AKS

The Kubernetes API server was improperly configured, allowing unauthorized users to make API calls without proper authorization.

Click to view details

Security

#234

Security

Compromised Image Registry Access Credentials

K8s v1.19, On-Premise

The image registry access credentials were compromised, allowing attackers to pull and run malicious images in the cluster.

Click to view details

Security

#235

Security

Insufficiently Secured Cluster API Server Access

K8s v1.23, Google Cloud

The API server was exposed with insufficient security, allowing unauthorized external access and increasing the risk of exploitation.

Click to view details

Security

#236

Security

Misconfigured Admission Controllers Allowing Insecure Resources

K8s v1.21, AWS EKS

Admission controllers were misconfigured, allowing the creation of insecure or non-compliant resources.

Click to view details

Security

#237

Security

Lack of Security Auditing and Monitoring in Cluster

K8s v1.22, DigitalOcean

The lack of proper auditing and monitoring allowed security events to go undetected, resulting in delayed response to potential security threats.

Click to view details

Security

#238

Security

Exposed Internal Services Due to Misconfigured Load Balancer

K8s v1.19, On-Premise

Internal services were inadvertently exposed to the public due to incorrect load balancer configurations, leading to potential security risks.

Click to view details

Security

#239

Security

Kubernetes Secrets Accessed via Insecure Network

K8s v1.20, GKE

Kubernetes secrets were accessed via an insecure network connection, exposing sensitive information to unauthorized parties.

Click to view details

Security

#240

Security

Pod Security Policies Not Enforced

K8s v1.21, On-Premise

Pod security policies were not enforced, allowing the deployment of pods with unsafe configurations, such as privileged access and host network use.

Click to view details

Security

#241

Security

Unpatched Vulnerabilities in Cluster Nodes

K8s v1.22, Azure AKS

Cluster nodes were not regularly patched, exposing known vulnerabilities that were later exploited by attackers.

Click to view details

Security

#242

Security

Weak Network Policies Allowing Unrestricted Traffic

K8s v1.18, On-Premise

Network policies were not properly configured, allowing unrestricted traffic between pods, which led to lateral movement by attackers after a pod was compromised.

Click to view details

Security

#243

Security

Exposed Dashboard Without Authentication

K8s v1.19, GKE

Kubernetes dashboard was exposed to the internet without authentication, allowing unauthorized users to access cluster information and potentially take control.

Click to view details

Security

#244

Security

Use of Insecure Container Images

K8s v1.20, AWS EKS

Insecure container images were used in production, leading to the deployment of containers with known vulnerabilities.

Click to view details

Security

#245

Security

Misconfigured TLS Certificates

K8s v1.23, Azure AKS

Misconfigured TLS certificates led to insecure communication between Kubernetes components, exposing the cluster to potential attacks.

Click to view details

Security

#246

Security

Excessive Privileges for Service Accounts

K8s v1.22, Google Cloud

Service accounts were granted excessive privileges, allowing them to perform operations outside their intended scope, increasing the risk of compromise.

Click to view details

Security

#247

Security

Exposure of Sensitive Logs Due to Misconfigured Logging Setup

K8s v1.21, DigitalOcean

Sensitive logs, such as those containing authentication tokens and private keys, were exposed due to a misconfigured logging setup.

Click to view details

Security

#248

Security

Use of Deprecated APIs with Known Vulnerabilities

K8s v1.19, AWS EKS

The cluster was using deprecated Kubernetes APIs that contained known security vulnerabilities, which were exploited by attackers.

Click to view details

Security

#249

Security

Lack of Security Context in Pod Specifications

K8s v1.22, Google Cloud

Pods were deployed without defining appropriate security contexts, resulting in privileged containers and access to host resources.

Click to view details

Security

#250

Security

Compromised Container Runtime

K8s v1.21, On-Premise

The container runtime (Docker) was compromised, allowing an attacker to gain control over the containers running on the node.

Click to view details

Security

#251

Security

Insufficient RBAC Permissions for Cluster Admin

K8s v1.22, GKE

A cluster administrator was mistakenly granted insufficient RBAC permissions, preventing them from performing essential management tasks.

Click to view details

Security

#252

Security

Insufficient Pod Security Policies Leading to Privilege Escalation

K8s v1.21, AWS EKS

Insufficiently restrictive PodSecurityPolicies (PSPs) allowed the deployment of privileged pods, which were later exploited by attackers.

Click to view details

Security

#253

Security

Exposed Service Account Token in Pod

K8s v1.20, On-Premise

A service account token was mistakenly exposed in a pod, allowing attackers to gain unauthorized access to the Kubernetes API.

Click to view details

Security

#254

Security

Rogue Container Executing Malicious Code

K8s v1.22, Azure AKS

A compromised container running a known exploit executed malicious code that allowed the attacker to gain access to the underlying node.

Click to view details

Security

#255

Security

Overly Permissive Network Policies Allowing Lateral Movement

K8s v1.19, Google Cloud

Network policies were not restrictive enough, allowing compromised pods to move laterally across the cluster and access other services.

Click to view details

Security

#256

Security

Insufficient Encryption for In-Transit Data

K8s v1.23, AWS EKS

Sensitive data was transmitted in plaintext between services, exposing it to potential eavesdropping and data breaches.

Click to view details

Security

#257

Security

Exposing Cluster Services via LoadBalancer with Public IP

K8s v1.21, Google Cloud

A service was exposed to the public internet via a LoadBalancer without proper access control, making it vulnerable to attacks.

Click to view details

Security

#258

Security

Privileged Containers Running Without Seccomp or AppArmor Profiles

K8s v1.20, On-Premise

Privileged containers were running without seccomp or AppArmor profiles, leaving the host vulnerable to attacks.

Click to view details

Security

#259

Security

Malicious Container Image from Untrusted Source

K8s v1.19, Azure AKS

A malicious container image from an untrusted source was deployed, leading to a security breach in the cluster.

Click to view details

Security

#260

Security

Unrestricted Ingress Controller Allowing External Attacks

K8s v1.24, GKE

The ingress controller was misconfigured, allowing external attackers to bypass network security controls and exploit internal services.

Click to view details

Security

#261

Security

Misconfigured Ingress Controller Exposing Internal Services

Kubernetes v1.24, GKE

An Ingress controller was misconfigured, inadvertently exposing internal services to the public internet.

Click to view details

Security

#262

Security

Privileged Containers Without Security Context

Kubernetes v1.22, EKS

Containers were running with elevated privileges without defined security contexts, increasing the risk of host compromise.

Click to view details

Security

#263

Security

Unrestricted Network Policies Allowing Lateral Movement

Kubernetes v1.21, Azure AKS

Lack of restrictive network policies permitted lateral movement within the cluster after a pod compromise.

Click to view details

Security

#264

Security

Exposed Kubernetes Dashboard Without Authentication

Kubernetes v1.20, On-Premise

The Kubernetes Dashboard was exposed without authentication, allowing unauthorized access to cluster resources.

Click to view details

Security

#265

Security

Use of Vulnerable Container Images

Kubernetes v1.23, AWS EKS

Deployment of container images with known vulnerabilities led to potential exploitation risks.

Click to view details

Security

#266

Security

Misconfigured Role-Based Access Control (RBAC)

Kubernetes v1.22, GKE

Overly permissive RBAC configurations granted users more access than necessary, posing security risks.

Click to view details

Security

#267

Security

Insecure Secrets Management

Kubernetes v1.21, On-Premise

Secrets were stored in plaintext within configuration files, leading to potential exposure.

Click to view details

Security

#268

Security

Lack of Audit Logging

Kubernetes v1.24, Azure AKS

Absence of audit logging hindered the ability to detect and investigate security incidents.

Click to view details

Security

#269

Security

Unrestricted Access to etcd

Kubernetes v1.20, On-Premise

The etcd datastore was accessible without authentication, risking exposure of sensitive cluster data.

Click to view details

Security

#270

Security

Absence of Pod Security Policies

Kubernetes v1.23, AWS EKS

Without Pod Security Policies, pods were deployed with insecure configurations, increasing the attack surface.

Click to view details

Security

#271

Security

Service Account Token Mounted in All Pods

Kubernetes v1.23, AKS

All pods had default service account tokens mounted, increasing the risk of credential leakage.

Click to view details

Security

#272

Security

Sensitive Logs Exposed via Centralized Logging

Kubernetes v1.22, EKS with Fluentd

Secrets and passwords were accidentally logged and shipped to a centralized logging service accessible to many teams.

Click to view details

Security

#273

Security

Broken Container Escape Detection

Kubernetes v1.24, GKE

A malicious container escaped to host level due to an unpatched kernel, but went undetected due to insufficient monitoring.

Click to view details

Security

#274

Security

Unauthorized Cloud Metadata API Access

Kubernetes v1.22, AWS

A pod was able to access the EC2 metadata API and retrieve IAM credentials due to open network access.

Click to view details

Security

#275

Security

Admin Kubeconfig Checked into Git

Kubernetes v1.23, On-Prem

A developer accidentally committed a kubeconfig file with full admin access into a public Git repository.

Click to view details

Security

#276

Security

JWT Token Replay Attack in Webhook Auth

Kubernetes v1.21, AKS

Reused JWT tokens from intercepted API requests were used to impersonate authorized users.

Click to view details

Security

#277

Security

Container With Hardcoded SSH Keys

Kubernetes v1.20, On-Prem

A base image included hardcoded SSH keys which allowed attackers lateral access between environments.

Click to view details

Security

#278

Security

Insecure Helm Chart Defaults

Kubernetes v1.24, GKE

A popular Helm chart had insecure defaults, like exposing dashboards or running as root.

Click to view details

Security

#279

Security

Shared Cluster with Overlapping Namespaces

Kubernetes v1.22, Shared Dev Cluster

Multiple teams used the same namespace naming conventions, causing RBAC overlaps and security concerns.

Click to view details

Security

#280

Security

CVE Ignored in Base Image for Months

Kubernetes v1.23, AWS

A known CVE affecting the base image used by multiple services remained unpatched due to no alerting.

Click to view details

Security

#281

Security

Misconfigured PodSecurityPolicy Allowed Privileged Containers

Kubernetes v1.21, On-Prem Cluster

Pods were running with privileged: true due to a permissive PodSecurityPolicy (PSP) left enabled during testing.

Click to view details

Security

#282

Security

GitLab Runners Spawning Privileged Containers

Kubernetes v1.23, GitLab CI on EKS

GitLab runners were configured to run privileged containers to support Docker-in-Docker (DinD), leading to a high-risk setup.

Click to view details

Security

#283

Security

Kubernetes Secrets Mounted in World-Readable Volumes

Kubernetes v1.24, GKE

Secret volumes were mounted with 0644 permissions, allowing any user process inside the container to read them.

Click to view details

Security

#284

Security

Kubelet Port Exposed on Public Interface

Kubernetes v1.20, Bare Metal

Kubelet was accidentally exposed on port 10250 to the public internet, allowing unauthenticated metrics and logs access.

Click to view details

Security

#285

Security

Cluster Admin Bound to All Authenticated Users

Kubernetes v1.21, AKS

A ClusterRoleBinding accidentally granted cluster-admin to all authenticated users due to system:authenticated group.

Click to view details

Security

#286

Security

Webhook Authentication Timing Out, Causing Denial of Service

Kubernetes v1.22, EKS

Authentication webhook for custom RBAC timed out under load, rejecting valid users and causing cluster-wide issues.

Click to view details

Security

#287

Security

CSI Driver Exposing Node Secrets

Kubernetes v1.24, CSI Plugin (AWS Secrets Store)

Misconfigured CSI driver exposed secrets on hostPath mount accessible to privileged pods.

Click to view details

Security

#288

Security

EphemeralContainers Used for Reconnaissance

Kubernetes v1.25, GKE

A compromised user deployed ephemeral containers to inspect and copy secrets from running pods.

Click to view details

Security

#289

Security

hostAliases Used for Spoofing Internal Services

Kubernetes v1.22, On-Prem

Malicious pod used hostAliases to spoof internal service hostnames and intercept requests.

Click to view details

Security

#290

Security

Privilege Escalation via Unchecked securityContext in Helm Chart

Kubernetes v1.21, Helm v3.8

A third-party Helm chart allowed setting arbitrary securityContext, letting users run pods as root in production.

Click to view details

Security

#291

Security

Service Account Token Leakage via Logs

Kubernetes v1.23, AKS

Application inadvertently logged its mounted service account token, exposing it to log aggregation systems.

Click to view details

Security

#292

Security

Escalation via Editable Validating WebhookConfiguration

Kubernetes v1.24, EKS

User with edit rights on a validating webhook modified it to bypass critical security policies.

Click to view details

Security

#293

Security

Stale Node Certificates After Rejoining Cluster

Kubernetes v1.21, Kubeadm-based cluster

A node was rejoined to the cluster using a stale certificate, giving it access it shouldn't have.

Click to view details

Security

#294

Security

ArgoCD Exploit via Unverified Helm Charts

Kubernetes v1.24, ArgoCD

ArgoCD deployed a malicious Helm chart that added privileged pods and container escape backdoors.

Click to view details

Security

#295

Security

Node Compromise via Insecure Container Runtime

Kubernetes v1.22, CRI-O on Bare Metal

A CVE in the container runtime allowed a container breakout, leading to full node compromise.

Click to view details

Security

#296

Security

Workload with Wildcard RBAC Access to All Secrets

Kubernetes v1.23, Self-Hosted

A microservice was granted get and list access to all secrets cluster-wide using *.

Click to view details

Security

#297

Security

Malicious Init Container Used for Reconnaissance

Kubernetes v1.25, GKE

A pod was launched with a benign main container and a malicious init container that copied node metadata.

Click to view details

Security

#298

Security

Ingress Controller Exposed /metrics Without Auth

Kubernetes v1.24, NGINX Ingress

Prometheus scraping endpoint /metrics was exposed without authentication and revealed sensitive internal details.

Click to view details

Security

#299

Security

Secret Stored in ConfigMap by Mistake

Kubernetes v1.23, AKS

A sensitive API key was accidentally stored in a ConfigMap instead of a Secret, making it visible in plain text.

Click to view details

Security

#300

Security

Token Reuse After Namespace Deletion and Recreation

Kubernetes v1.24, Self-Hosted

A previously deleted namespace was recreated, and old tokens (from backups) were still valid and worked.

Click to view details

Security

#301

Storage

PVC Stuck in Terminating State After Node Crash

Kubernetes v1.22, EBS CSI Driver on EKS

A node crash caused a PersistentVolumeClaim (PVC) to be stuck in Terminating, blocking pod deletion.

Click to view details

Storage

#302

Storage

Data Corruption on HostPath Volumes

Kubernetes v1.20, Bare Metal

Multiple pods sharing a HostPath volume led to inconsistent file states and eventual corruption.

Click to view details

Storage

#303

Storage

Volume Mount Fails Due to Node Affinity Mismatch

Kubernetes v1.23, GCE PD on GKE

A pod was scheduled on a node that couldn’t access the persistent disk due to zone mismatch.

Click to view details

Storage

#304

Storage

PVC Not Rescheduled After Node Deletion

Kubernetes v1.21, Azure Disk CSI

A StatefulSet pod failed to reschedule after its node was deleted, due to Azure disk still being attached.

Click to view details

Storage

#305

Storage

Long PVC Rebinding Time on StatefulSet Restart

Kubernetes v1.24, Rook Ceph

Restarting a StatefulSet with many PVCs caused long downtime due to slow rebinding.

Click to view details

Storage

#306

Storage

CSI Volume Plugin Crash Loops Due to Secret Rotation

Kubernetes v1.25, Vault CSI Provider

Volume plugin entered crash loop after secret provider’s token was rotated unexpectedly.

Click to view details

Storage

#307

Storage

ReadWriteMany PVCs Cause IO Bottlenecks

Kubernetes v1.23, NFS-backed PVCs

Heavy read/write on a shared PVC caused file IO contention and throttling across pods.

Click to view details

Storage

#308

Storage

PVC Mount Timeout Due to PodSecurityPolicy

Kubernetes v1.21, PSP Enabled Cluster

A pod couldn’t mount a volume because PodSecurityPolicy (PSP) rejected required fsGroup.

Click to view details

Storage

#309

Storage

Orphaned PVs After Namespace Deletion

Kubernetes v1.20, Self-Hosted

Deleting a namespace did not clean up PersistentVolumes, leading to leaked storage.

Click to view details

Storage

#310

Storage

StorageClass Misconfiguration Blocks Dynamic Provisioning

Kubernetes v1.25, GKE

New PVCs failed to bind due to a broken default StorageClass with incorrect parameters.

Click to view details

Storage

#311

Storage

StatefulSet Volume Cloning Results in Data Leakage

Kubernetes v1.24, CSI Volume Cloning enabled

Cloning PVCs between StatefulSet pods led to shared data unexpectedly appearing in new replicas.

Click to view details

Storage

#312

Storage

Volume Resize Not Reflected in Mounted Filesystem

Kubernetes v1.22, OpenEBS

Volume expansion was successful on the PV, but pods didn’t see the increased space.

Click to view details

Storage

#313

Storage

CSI Controller Pod Crash Due to Log Overflow

Kubernetes v1.23, Longhorn

The CSI controller crashed repeatedly due to unbounded logging filling up ephemeral storage.

Click to view details

Storage

#314

Storage

PVs Stuck in Released Due to Missing Finalizer Removal

Kubernetes v1.21, NFS

PVCs were deleted, but PVs remained stuck in Released, preventing reuse.

Click to view details

Storage

#315

Storage

CSI Driver DaemonSet Deployment Missing Tolerations for Taints

Kubernetes v1.25, Bare Metal

CSI Node plugin DaemonSet didn’t deploy on all nodes due to missing taint tolerations.

Click to view details

Storage

#316

Storage

Mount Propagation Issues with Sidecar Containers

Kubernetes v1.22, GKE

Sidecar containers didn’t see mounted volumes due to incorrect mountPropagation settings.

Click to view details

Storage

#317

Storage

File Permissions Reset on Pod Restart

Kubernetes v1.20, CephFS

Pod volume permissions reset after each restart, breaking application logic.

Click to view details

Storage

#318

Storage

Volume Mount Succeeds but Application Can't Write

Kubernetes v1.23, EBS

Volume mounted correctly, but application failed to write due to filesystem mismatch.

Click to view details

Storage

#319

Storage

Volume Snapshot Restore Includes Corrupt Data

Kubernetes v1.24, Velero + CSI Snapshots

Snapshot-based restore brought back corrupted state due to hot snapshot timing.

Click to view details

Storage

#320

Storage

Zombie Volumes Occupying Cloud Quota

Kubernetes v1.25, AWS EBS

Deleted PVCs didn’t release volumes due to failed detach steps, leading to quota exhaustion.

Click to view details

Storage

#321

Storage

Volume Snapshot Garbage Collection Fails

Kubernetes v1.25, CSI Snapshotter with Velero

Volume snapshots piled up because snapshot objects were not getting garbage collected after use.

Click to view details

Storage

#322

Storage

Volume Mount Delays Due to Node Drain Stale Attachment

Kubernetes v1.23, AWS EBS CSI

Volumes took too long to attach on new nodes after pod rescheduling due to stale attachment metadata.

Click to view details

Storage

#323

Storage

Application Writes Lost After Node Reboot

Kubernetes v1.21, Local Persistent Volumes

After a node reboot, pod restarted, but wrote to a different volume path, resulting in apparent data loss.

Click to view details

Storage

#324

Storage

Pod CrashLoop Due to Read-Only Volume Remount

Kubernetes v1.22, GCP Filestore

Pod volume was remounted as read-only after a transient network disconnect, breaking app write logic.

Click to view details

Storage

#325

Storage

Data Corruption on Shared Volume With Two Pods

Kubernetes v1.23, NFS PVC shared by 2 pods

Two pods writing to the same volume caused inconsistent files and data loss.

Click to view details

Storage

#326

Storage

Mount Volume Exceeded Timeout

Kubernetes v1.26, Azure Disk CSI

Pod remained stuck in ContainerCreating state because volume mount operations timed out.

Click to view details

Storage

#327

Storage

Static PV Bound to Wrong PVC

Kubernetes v1.21, Manually created PVs

A misconfigured static PV got bound to the wrong PVC, exposing sensitive data.

Click to view details

Storage

#328

Storage

Pod Eviction Due to DiskPressure Despite PVC

Kubernetes v1.22, Local PVs

Node evicted pods due to DiskPressure, even though app used dedicated PVC backed by a separate disk.

Click to view details

Storage

#329

Storage

Pod Gets Stuck Due to Ghost Mount Point

Kubernetes v1.20, iSCSI volumes

Pod failed to start because the mount point was partially deleted, leaving the system confused.

Click to view details

Storage

#330

Storage

PVC Resize Broke StatefulSet Ordering

Kubernetes v1.24, StatefulSets + RWO PVCs

When resizing PVCs, StatefulSet pods restarted in parallel, violating ordinal guarantees.

Click to view details

Storage

#331

Storage

ReadAfterWrite Inconsistency on Object Store-Backed CSI

Kubernetes v1.26, MinIO CSI driver, Ceph RGW backend

Applications experienced stale reads immediately after writing to the same file via CSI mount backed by an S3-like object store.

Click to view details

Storage

#332

Storage

PV Resize Fails After Node Reboot

Kubernetes v1.24, AWS EBS

After a node reboot, a PVC resize request remained pending, blocking pod start.

Click to view details

Storage

#333

Storage

CSI Driver Crash Loops on VolumeAttach

Kubernetes v1.22, OpenEBS Jiva CSI

CSI node plugin entered CrashLoopBackOff due to panic during volume attach, halting all storage provisioning.

Click to view details

Storage

#334

Storage

PVC Binding Fails Due to Multiple Default StorageClasses

Kubernetes v1.23

PVC creation failed intermittently because the cluster had two storage classes marked as default.

Click to view details

Storage

#335

Storage

Zombie VolumeAttachment Blocks New PVC

Kubernetes v1.21, Longhorn

After a node crash, a VolumeAttachment object was not garbage collected, blocking new PVCs from attaching.

Click to view details

Storage

#336

Storage

Persistent Volume Bound But Not Mounted

Kubernetes v1.25, NFS

Pod entered Running state, but data was missing because PV was bound but not properly mounted.

Click to view details

Storage

#337

Storage

CSI Snapshot Restore Overwrites Active Data

Kubernetes v1.26, CSI snapshots (v1beta1)

User triggered a snapshot restore to an existing PVC, unintentionally overwriting live data.

Click to view details

Storage

#338

Storage

Incomplete Volume Detach Breaks Node Scheduling

Kubernetes v1.22, iSCSI

Scheduler skipped a healthy node due to a ghost VolumeAttachment that was never cleaned up.

Click to view details

Storage

#339

Storage

App Breaks Due to Missing SubPath After Volume Expansion

Kubernetes v1.24, PVC with subPath

After PVC expansion, the mount inside pod pointed to root of volume, not the expected subPath.

Click to view details

Storage

#340

Storage

Backup Restore Process Created Orphaned PVCs

Kubernetes v1.23, Velero

A namespace restore from backup recreated PVCs that had no matching PVs, blocking further deployment.

Click to view details

Storage

#341

Storage

Cross-Zone Volume Binding Fails with StatefulSet

Kubernetes v1.25, AWS EBS, StatefulSet with anti-affinity

Pods in a StatefulSet failed to start due to volume binding constraints when spread across zones.

Click to view details

Storage

#342

Storage

Volume Snapshot Controller Race Condition

Kubernetes v1.23, CSI Snapshot Controller

Rapid creation/deletion of snapshots caused the controller to panic due to race conditions in snapshot finalizers.

Click to view details

Storage

#343

Storage

Failed Volume Resize Blocks Rollout

Kubernetes v1.24, CSI VolumeExpansion enabled

Deployment rollout got stuck because one of the pods couldn’t start due to a failed volume expansion.

Click to view details

Storage

#344

Storage

Application Data Lost After Node Eviction

Kubernetes v1.23, hostPath volumes

Node drained for maintenance led to permanent data loss for apps using hostPath volumes.

Click to view details

Storage

#345

Storage

Read-Only PV Caused Write Failures After Restore

Kubernetes v1.22, Velero, AWS EBS

After restoring from backup, the volume was attached as read-only, causing application crashes.

Click to view details

Storage

#346

Storage

NFS Server Restart Crashes Pods

Kubernetes v1.24, in-cluster NFS server

NFS server restarted for upgrade. All dependent pods crashed due to stale file handles and unmount errors.

Click to view details

Storage

#347

Storage

VolumeBindingBlocked Condition Causes Pod Scheduling Delay

Kubernetes v1.25, dynamic provisioning

Scheduler skipped over pods with pending PVCs due to VolumeBindingBlocked status, even though volumes were eventually created.

Click to view details

Storage

#348

Storage

Data Corruption from Overprovisioned Thin Volumes

Kubernetes v1.22, LVM-CSI thin provisioning

Under heavy load, pods reported data corruption. Storage layer had thinly provisioned LVM volumes that overcommitted disk.

Click to view details

Storage

#349

Storage

VolumeProvisioningFailure on GKE Due to IAM Misconfiguration

GKE, Workload Identity enabled

CSI driver failed to provision new volumes due to missing IAM permissions, even though StorageClass was valid.

Click to view details

Storage

#350

Storage

Node Crash Triggers Volume Remount Loop

Kubernetes v1.26, CSI, NVMes

After a node crash, volume remount loop occurred due to conflicting device paths.

Click to view details

Storage

#351

Storage

VolumeMount Conflict Between Init and Main Containers

Kubernetes v1.25, containerized database restore job

Init container and main container used the same volume path but with different modes, causing the main container to crash.

Click to view details

Storage

#352

Storage

PVCs Stuck in “Terminating” Due to Finalizers

Kubernetes v1.24, CSI driver with finalizer

After deleting PVCs, they remained in Terminating state indefinitely due to stuck finalizers.

Click to view details

Storage

#353

Storage

Misconfigured ReadOnlyMany Mount Blocks Write Operations

Kubernetes v1.23, NFS volume

Volume mounted as ReadOnlyMany blocked necessary write operations, despite NFS server allowing writes.

Click to view details

Storage

#354

Storage

In-Tree Plugin PVs Lost After Driver Migration

Kubernetes v1.26, in-tree to CSI migration

Existing in-tree volumes became unrecognized after enabling CSI migration.

Click to view details

Storage

#355

Storage

Pod Deleted but Volume Still Mounted on Node

Kubernetes v1.24, CSI

Pod was force-deleted, but its volume wasn’t unmounted from the node, blocking future pod scheduling.

Click to view details

Storage

#356

Storage

Ceph RBD Volume Crashes Pods Under IOPS Saturation

Kubernetes v1.23, Ceph CSI

Under heavy I/O, Ceph volumes became unresponsive, leading to kernel-level I/O errors in pods.

Click to view details

Storage

#357

Storage

ReplicaSet Using PVCs Fails Due to VolumeClaimTemplate Misuse

Kubernetes v1.25

Developer tried using volumeClaimTemplates in a ReplicaSet manifest, which isn’t supported.

Click to view details

Storage

#358

Storage

Filesystem Type Mismatch During Volume Attach

Kubernetes v1.24, ext4 vs xfs

A pod failed to start because the PV expected ext4 but the node formatted it as xfs.

Click to view details

Storage

#359

Storage

iSCSI Volumes Fail After Node Kernel Upgrade

Kubernetes v1.26, CSI iSCSI plugin

Post-upgrade, all pods using iSCSI volumes failed to mount due to kernel module incompatibility.

Click to view details

Storage

#360

Storage

PVs Not Deleted After PVC Cleanup Due to Retain Policy

Kubernetes v1.23, AWS EBS

After PVCs were deleted, underlying PVs and disks remained, leading to cloud resource sprawl.

Click to view details

Storage

#361

Storage

Concurrent Pod Scheduling on the Same PVC Causes Mount Conflict

Kubernetes v1.24, AWS EBS, ReadWriteOnce PVC

Two pods attempted to use the same PVC simultaneously, causing one pod to be stuck in ContainerCreating.

Click to view details

Storage

#362

Storage

StatefulSet Pod Replacement Fails Due to PVC Retention

Kubernetes v1.23, StatefulSet with volumeClaimTemplates

Deleted a StatefulSet pod manually, but new pod failed due to existing PVC conflict.

Click to view details

Storage

#363

Storage

HostPath Volume Access Leaks Host Data into Container

Kubernetes v1.22, single-node dev cluster

HostPath volume mounted the wrong directory, exposing sensitive host data to the container.

Click to view details

Storage

#364

Storage

CSI Driver Crashes When Node Resource Is Deleted Prematurely

Kubernetes v1.25, custom CSI driver

Deleting a node object before the CSI driver detached volumes caused crash loops.

Click to view details

Storage

#365

Storage

Retained PV Blocks New Claim Binding with Identical Name

Kubernetes v1.21, NFS

A PV stuck in Released state with Retain policy blocked new PVCs from binding with the same name.

Click to view details

Storage

#366

Storage

CSI Plugin Panic on Missing Mount Option

Kubernetes v1.26, custom CSI plugin

Missing mountOptions in StorageClass led to runtime nil pointer exception in CSI driver.

Click to view details

Storage

#367

Storage

Pod Fails to Mount Volume Due to SELinux Context Mismatch

Kubernetes v1.24, RHEL with SELinux enforcing

Pod failed to mount volume due to denied SELinux permissions.

Click to view details

Storage

#368

Storage

VolumeExpansion on Bound PVC Fails Due to Pod Running

Kubernetes v1.25, GCP PD

PVC resize operation failed because the pod using it was still running.

Click to view details

Storage

#369

Storage

CSI Driver Memory Leak on Volume Detach Loop

Kubernetes v1.24, external CSI

CSI plugin leaked memory due to improper garbage collection on detach failure loop.

Click to view details

Storage

#370

Storage

Volume Mount Timeout Due to Slow Cloud API

Kubernetes v1.23, Azure Disk CSI

During a cloud outage, Azure Disk operations timed out, blocking pod mounts.

Click to view details

Storage

#371

Storage

Volume Snapshot Restore Misses Application Consistency

Kubernetes v1.26, Velero with CSI VolumeSnapshot

Snapshot restore completed successfully, but restored app data was corrupt.

Click to view details

Storage

#372

Storage

File Locking Issue Between Multiple Pods on NFS

Kubernetes v1.22, NFS with ReadWriteMany

Two pods wrote to the same file concurrently, causing lock conflicts and data loss.

Click to view details

Storage

#373

Storage

Pod Reboots Erase Data on EmptyDir Volume

Kubernetes v1.24, default EmptyDir

Pod restarts caused in-memory volume to be wiped, resulting in lost logs.

Click to view details

Storage

#374

Storage

PVC Resize Fails on In-Use Block Device

Kubernetes v1.25, CSI with block mode

PVC expansion failed for a block device while pod was still running.

Click to view details

Storage

#375

Storage

Default StorageClass Prevents PVC Binding to Custom Class

Kubernetes v1.23, GKE

A PVC remained in Pending because the default StorageClass kept getting assigned instead of a custom one.

Click to view details

Storage

#376

Storage

Ceph RBD Volume Mount Failure Due to Kernel Mismatch

Kubernetes v1.21, Rook-Ceph

Mounting Ceph RBD volume failed after a node kernel upgrade.

Click to view details

Storage

#377

Storage

CSI Volume Cleanup Delay Leaves Orphaned Devices

Kubernetes v1.24, Azure Disk CSI

Volume deletion left orphaned devices on the node, consuming disk space.

Click to view details

Storage

#378

Storage

Immutable ConfigMap Used in CSI Sidecar Volume Mount

Kubernetes v1.23, EKS

CSI sidecar depended on a ConfigMap that was updated, but volume behavior didn’t change.

Click to view details

Storage

#379

Storage

PodMount Denied Due to SecurityContext Constraints

Kubernetes v1.25, OpenShift with SCCs

Pod failed to mount PVC due to restricted SELinux type in pod’s security context.

Click to view details

Storage

#380

Storage

VolumeProvisioner Race Condition Leads to Duplicated PVC

Kubernetes v1.24, CSI with dynamic provisioning

Simultaneous provisioning requests created duplicate PVs for a single PVC.

Click to view details

Storage

#381

Storage

PVC Bound to Deleted PV After Restore

Kubernetes v1.25, Velero restore with CSI driver

Restored PVC bound to a PV that no longer existed, causing stuck pods.

Click to view details

Storage

#382

Storage

Unexpected Volume Type Defaults to HDD Instead of SSD

Kubernetes v1.24, GKE with dynamic provisioning

Volumes defaulted to HDD even though workloads needed SSD.

Click to view details

Storage

#383

Storage

ReclaimPolicy Retain Caused Resource Leaks

Kubernetes v1.22, bare-metal CSI

Deleting PVCs left behind unused PVs and disks.

Click to view details

Storage

#384

Storage

ReadWriteOnce PVC Mounted by Multiple Pods

Kubernetes v1.23, AWS EBS

Attempt to mount a ReadWriteOnce PVC on two pods in different AZs failed silently.

Click to view details

Storage

#385

Storage

VolumeAttach Race on StatefulSet Rolling Update

Kubernetes v1.26, StatefulSet with CSI driver

Volume attach operations failed during parallel pod updates.

Click to view details

Storage

#386

Storage

CSI Driver CrashLoop Due to Missing Node Labels

Kubernetes v1.24, OpenEBS CSI

CSI sidecars failed to initialize due to missing node topology labels.

Click to view details

Storage

#387

Storage

PVC Deleted While Volume Still Mounted

Kubernetes v1.22, on-prem CSI

PVC deletion didn’t unmount volume due to finalizer stuck on pod.

Click to view details

Storage

#388

Storage

In-Tree Volume Plugin Migration Caused Downtime

Kubernetes v1.25, GKE

GCE PD plugin migration to CSI caused volume mount errors.

Click to view details

Storage

#389

Storage

Overprovisioned Thin Volumes Hit Underlying Limit

Kubernetes v1.24, LVM-based CSI

Thin-provisioned volumes ran out of physical space, affecting all pods.

Click to view details

Storage

#390

Storage

Dynamic Provisioning Failure Due to Quota Exhaustion

Kubernetes v1.26, vSphere CSI

PVCs failed to provision silently due to exhausted storage quota.

Click to view details

Storage

#391

Storage

PVC Resizing Didn’t Expand Filesystem Automatically

Kubernetes v1.24, AWS EBS, ext4 filesystem

PVC was resized but the pod’s filesystem didn’t reflect the new size.

Click to view details

Storage

#392

Storage

StatefulSet Pods Lost Volume Data After Node Reboot

Kubernetes v1.22, local-path-provisioner

Node reboots caused StatefulSet volumes to disappear due to ephemeral local storage.

Click to view details

Storage

#393

Storage

VolumeSnapshots Failed to Restore with Immutable Fields

Kubernetes v1.25, VolumeSnapshot API

Restore operation failed due to immutable PVC spec fields like access mode.

Click to view details

Storage

#394

Storage

GKE Autopilot PVCs Stuck Due to Resource Class Conflict

GKE Autopilot, dynamic PVC provisioning

PVCs remained in Pending state due to missing resource class binding.

Click to view details

Storage

#395

Storage

Cross-Zone Volume Scheduling Failed in Regional Cluster

Kubernetes v1.24, GKE regional cluster

Pods failed to schedule because volumes were provisioned in a different zone than the node.

Click to view details

Storage

#396

Storage

Stuck Finalizers on Deleted PVCs Blocking Namespace Deletion

Kubernetes v1.22, CSI driver

Finalizers on PVCs blocked namespace deletion for hours.

Click to view details

Storage

#397

Storage

CSI Driver Upgrade Corrupted Volume Attachments

Kubernetes v1.23, OpenEBS

CSI driver upgrade introduced a regression causing volume mounts to fail.

Click to view details

Storage

#398

Storage

Stale Volume Handles After Disaster Recovery Cutover

Kubernetes v1.25, Velero restore to DR cluster

Stale volume handles caused new PVCs to fail provisioning.

Click to view details

Storage

#399

Storage

Application Wrote Outside Mounted Path and Lost Data

Kubernetes v1.24, default mountPath

Application wrote logs to /tmp, not mounted volume, causing data loss on pod eviction.

Click to view details

Storage

#400

Storage

Cluster Autoscaler Deleted Nodes with Mounted Volumes

Kubernetes v1.23, AWS EKS with CA

Cluster Autoscaler aggressively removed nodes with attached volumes, causing workload restarts.

Click to view details

Storage

#401

Scaling & Load

HPA Didn't Scale Due to Missing Metrics Server

Kubernetes v1.22, Minikube

Horizontal Pod Autoscaler (HPA) didn’t scale pods as expected.

Click to view details

Scaling & Load

#402

Scaling & Load

CPU Throttling Prevented Effective Autoscaling

Kubernetes v1.24, EKS, Burstable QoS

Application CPU throttled even under low usage, leading to delayed scaling.

Click to view details

Scaling & Load

#403

Scaling & Load

Overprovisioned Pods Starved the Cluster

Kubernetes v1.21, on-prem

Aggressively overprovisioned pod resources led to failed scheduling and throttling.

Click to view details

Scaling & Load

#404

Scaling & Load

HPA and VPA Conflicted, Causing Flapping

Kubernetes v1.25, GKE

HPA scaled replicas based on CPU while VPA changed pod resources dynamically, creating instability.

Click to view details

Scaling & Load

#405

Scaling & Load

Cluster Autoscaler Didn't Scale Due to Pod Affinity Rules

Kubernetes v1.23, AWS EKS

Workloads couldn't be scheduled and CA didn’t scale nodes because affinity rules restricted placement.

Click to view details

Scaling & Load

#406

Scaling & Load

Load Test Crashed Cluster Due to Insufficient Node Quotas

Kubernetes v1.24, AKS

Stress test resulted in API server crash due to unthrottled pod burst.

Click to view details

Scaling & Load

#407

Scaling & Load

Scale-To-Zero Caused Cold Starts and SLA Violations

Kubernetes v1.25, KEDA + Knative

Pods scaled to zero, but requests during cold start breached SLA.

Click to view details

Scaling & Load

#408

Scaling & Load

Misconfigured Readiness Probe Blocked HPA Scaling

Kubernetes v1.24, DigitalOcean

HPA didn’t scale pods because readiness probes failed and metrics were not reported.

Click to view details

Scaling & Load

#409

Scaling & Load

Custom Metrics Adapter Crashed, Breaking Custom HPA

Kubernetes v1.25, Prometheus Adapter

Custom HPA didn’t function after metrics adapter pod crashed silently.

Click to view details

Scaling & Load

#410

Scaling & Load

Application Didn’t Handle Scale-In Gracefully

Kubernetes v1.22, Azure AKS

App lost in-flight requests during scale-down, causing 5xx spikes.

Click to view details

Scaling & Load

#411

Scaling & Load

Cluster Autoscaler Ignored Pod PriorityClasses

Kubernetes v1.25, AWS EKS with PriorityClasses

Low-priority workloads blocked scaling of high-priority ones due to misconfigured Cluster Autoscaler.

Click to view details

Scaling & Load

#412

Scaling & Load

ReplicaSet Misalignment Led to Excessive Scale-Out

Kubernetes v1.23, GKE

A stale ReplicaSet with label mismatches caused duplicate pod scale-out.

Click to view details

Scaling & Load

#413

Scaling & Load

StatefulSet Didn't Scale Due to PodDisruptionBudget

Kubernetes v1.26, AKS

StatefulSet couldn’t scale-in during node pressure due to a restrictive PDB.

Click to view details

Scaling & Load

#414

Scaling & Load

Horizontal Pod Autoscaler Triggered by Wrong Metric

Kubernetes v1.24, DigitalOcean

HPA used memory instead of CPU, causing unnecessary scale-ups.

Click to view details

Scaling & Load

#415

Scaling & Load

Prometheus Scraper Bottlenecked Custom HPA Metrics

Kubernetes v1.25, custom metrics + Prometheus Adapter

Delays in Prometheus scraping caused lag in HPA reactions.

Click to view details

Scaling & Load

#416

Scaling & Load

Kubernetes Downscaled During Rolling Update

Kubernetes v1.23, on-prem

Pods were prematurely scaled down during rolling deployment.

Click to view details

Scaling & Load

#417

Scaling & Load

KEDA Failed to Scale on Kafka Lag Metric

Kubernetes v1.26, KEDA + Kafka

Consumers didn’t scale out despite Kafka topic lag.

Click to view details

Scaling & Load

#418

Scaling & Load

Spike in Load Exceeded Pod Init Time

Kubernetes v1.24, self-hosted

Sudden burst of traffic overwhelmed services due to slow pod boot time.

Click to view details

Scaling & Load

#419

Scaling & Load

Overuse of Liveness Probes Disrupted Load Balance

Kubernetes v1.21, bare metal

Misfiring liveness probes killed healthy pods during load test.

Click to view details

Scaling & Load

#420

Scaling & Load

Scale-In Happened Before Queue Was Drained

Kubernetes v1.26, RabbitMQ + consumers

Consumers scaled in while queue still had unprocessed messages.

Click to view details

Scaling & Load

#421

Scaling & Load

Node Drain Race Condition During Scale Down

Kubernetes v1.23, GKE

Node drain raced with pod termination, causing pod loss.

Click to view details

Scaling & Load

#422

Scaling & Load

HPA Disabled Due to Missing Resource Requests

Kubernetes v1.22, AWS EKS

Horizontal Pod Autoscaler (HPA) failed to trigger because resource requests weren’t set.

Click to view details

Scaling & Load

#423

Scaling & Load

Unexpected Overprovisioning of Pods

Kubernetes v1.24, DigitalOcean

Unnecessary pod scaling due to misconfigured resource limits.

Click to view details

Scaling & Load

#424

Scaling & Load

Autoscaler Failed During StatefulSet Upgrade

Kubernetes v1.25, AKS

Horizontal scaling issues occurred during rolling upgrade of StatefulSet.

Click to view details

Scaling & Load

#425

Scaling & Load

Inadequate Load Distribution in a Multi-AZ Setup

Kubernetes v1.27, AWS EKS

Load balancing wasn’t even across availability zones, leading to inefficient scaling.

Click to view details

Scaling & Load

#426

Scaling & Load

Downscale Too Aggressive During Traffic Dips

Kubernetes v1.22, GCP

Autoscaler scaled down too aggressively during short traffic dips, causing pod churn.

Click to view details

Scaling & Load

#427

Scaling & Load

Insufficient Scaling Under High Ingress Traffic

Kubernetes v1.26, NGINX Ingress Controller

Pod autoscaling didn’t trigger in time to handle high ingress traffic.

Click to view details

Scaling & Load

#428

Scaling & Load

Nginx Ingress Controller Hit Rate Limit on External API

Kubernetes v1.25, AWS EKS

Rate limits were hit on an external API during traffic surge, affecting service scaling.

Click to view details

Scaling & Load

#429

Scaling & Load

Resource Constraints on Node Impacted Pod Scaling

Kubernetes v1.24, on-prem

Pod scaling failed due to resource constraints on nodes during high load.

Click to view details

Scaling & Load

#430

Scaling & Load

Memory Leak in Application Led to Excessive Scaling

Kubernetes v1.23, Azure AKS

A memory leak in the app led to unnecessary scaling, causing resource exhaustion.

Click to view details

Scaling & Load

#431

Scaling & Load

Inconsistent Pod Scaling During Burst Traffic

Kubernetes v1.24, AWS EKS

Pod scaling inconsistently triggered during burst traffic spikes, causing service delays.

Click to view details

Scaling & Load

#432

Scaling & Load

Auto-Scaling Hit Limits with StatefulSet

Kubernetes v1.22, GCP

StatefulSet scaling hit limits due to pod affinity constraints.

Click to view details

Scaling & Load

#433

Scaling & Load

Cross-Cluster Autoscaling Failures

Kubernetes v1.21, Azure AKS

Autoscaling failed across clusters due to inconsistent resource availability between regions.

Click to view details

Scaling & Load

#434

Scaling & Load

Service Disruption During Auto-Scaling of StatefulSet

Kubernetes v1.24, AWS EKS

StatefulSet failed to scale properly during maintenance, causing service disruption.

Click to view details

Scaling & Load

#435

Scaling & Load

Unwanted Pod Scale-down During Quiet Periods

Kubernetes v1.23, GKE

Autoscaler scaled down too aggressively during periods of low traffic, leading to resource shortages during traffic bursts.

Click to view details

Scaling & Load

#436

Scaling & Load

Cluster Autoscaler Inconsistencies with Node Pools

Kubernetes v1.25, GCP

Cluster Autoscaler failed to trigger due to node pool constraints.

Click to view details

Scaling & Load

#437

Scaling & Load

Disrupted Service During Pod Autoscaling in StatefulSet

Kubernetes v1.22, AWS EKS

Pod autoscaling in a StatefulSet led to disrupted service due to the stateful nature of the application.

Click to view details

Scaling & Load

#438

Scaling & Load

Slow Pod Scaling During High Load

Kubernetes v1.26, DigitalOcean

Autoscaling pods didn’t trigger quickly enough during sudden high-load events, causing delays.

Click to view details

Scaling & Load

#439

Scaling & Load

Autoscaler Skipped Scale-up Due to Incorrect Metric

Kubernetes v1.23, AWS EKS

Autoscaler skipped scale-up because it was using the wrong metric for scaling.

Click to view details

Scaling & Load

#440

Scaling & Load

Scaling Inhibited Due to Pending Jobs in Queue

Kubernetes v1.25, Azure AKS

Pod scaling was delayed because jobs in the queue were not processed fast enough.

Click to view details

Scaling & Load

#441

Scaling & Load

Scaling Delayed Due to Incorrect Resource Requests

Kubernetes v1.24, AWS EKS

Pod scaling was delayed because of incorrectly set resource requests, leading to resource over-provisioning.

Click to view details

Scaling & Load

#442

Scaling & Load

Unexpected Pod Termination Due to Scaling Policy

Kubernetes v1.23, Google Cloud

Pods were unexpectedly terminated during scale-down due to aggressive scaling policies.

Click to view details

Scaling & Load

#443

Scaling & Load

Unstable Load Balancing During Scaling Events

Kubernetes v1.25, Azure AKS

Load balancing issues surfaced during scaling, leading to uneven distribution of traffic.

Click to view details

Scaling & Load

#444

Scaling & Load

Autoscaling Ignored Due to Resource Quotas

Kubernetes v1.26, IBM Cloud

Resource quotas prevented autoscaling from triggering despite high load.

Click to view details

Scaling & Load

#445

Scaling & Load

Delayed Scaling Response to Traffic Spike

Kubernetes v1.24, GCP

Scaling took too long to respond during a traffic spike, leading to degraded service.

Click to view details

Scaling & Load

#446

Scaling & Load

CPU Utilization-Based Scaling Did Not Trigger for High Memory Usage

Kubernetes v1.22, Azure AKS

Scaling based on CPU utilization did not trigger when the issue was related to high memory usage.

Click to view details

Scaling & Load

#447

Scaling & Load

Inefficient Horizontal Scaling of StatefulSets

Kubernetes v1.25, GKE

Horizontal scaling of StatefulSets was inefficient due to StatefulSet’s inherent limitations.

Click to view details

Scaling & Load

#448

Scaling & Load

Autoscaler Skipped Scaling Events Due to Flaky Metrics

Kubernetes v1.23, AWS EKS

Autoscaler skipped scaling events due to unreliable metrics from external monitoring tools.

Click to view details

Scaling & Load

#449

Scaling & Load

Delayed Pod Creation Due to Node Affinity Misconfigurations

Kubernetes v1.24, Google Cloud

Pods were delayed in being created due to misconfigured node affinity rules during scaling events.

Click to view details

Scaling & Load

#450

Scaling & Load

Excessive Scaling During Short-Term Traffic Spikes

Kubernetes v1.25, AWS EKS

Autoscaling triggered excessive scaling during short-term traffic spikes, leading to unnecessary resource usage.

Click to view details

Scaling & Load

#451

Scaling & Load

Inconsistent Scaling Due to Misconfigured Horizontal Pod Autoscaler

Kubernetes v1.26, Azure AKS

Horizontal Pod Autoscaler (HPA) inconsistently scaled pods based on incorrect metric definitions.

Click to view details

Scaling & Load

#452

Scaling & Load

Load Balancer Overload After Quick Pod Scaling

Kubernetes v1.25, Google Cloud

Load balancer failed to distribute traffic effectively after a large pod scaling event, leading to overloaded pods.

Click to view details

Scaling & Load

#453

Scaling & Load

Autoscaling Failed During Peak Traffic Periods

Kubernetes v1.24, AWS EKS

Autoscaling was ineffective during peak traffic periods, leading to degraded performance.

Click to view details

Scaling & Load

#454

Scaling & Load

Insufficient Node Resources During Scaling

Kubernetes v1.23, IBM Cloud

Node resources were insufficient during scaling, leading to pod scheduling failures.

Click to view details

Scaling & Load

#455

Scaling & Load

Unpredictable Pod Scaling During Cluster Autoscaler Event

Kubernetes v1.25, Google Cloud

Pod scaling was unpredictable during a Cluster Autoscaler event due to a sudden increase in node availability.

Click to view details

Scaling & Load

#456

Scaling & Load

CPU Resource Over-Commitment During Scale-Up

Kubernetes v1.23, Azure AKS

During a scale-up event, CPU resources were over-committed, causing pod performance degradation.

Click to view details

Scaling & Load

#457

Scaling & Load

Failure to Scale Due to Horizontal Pod Autoscaler Anomaly

Kubernetes v1.22, AWS EKS

Horizontal Pod Autoscaler (HPA) failed to scale up due to a temporary anomaly in the resource metrics.

Click to view details

Scaling & Load

#458

Scaling & Load

Memory Pressure Causing Slow Pod Scaling

Kubernetes v1.24, IBM Cloud

Pod scaling was delayed due to memory pressure in the cluster, causing performance bottlenecks.

Click to view details

Scaling & Load

#459

Scaling & Load

Node Over-Provisioning During Cluster Scaling

Kubernetes v1.25, Google Cloud

Nodes were over-provisioned, leading to unnecessary resource wastage during scaling.

Click to view details

Scaling & Load

#460

Scaling & Load

Autoscaler Fails to Handle Node Termination Events Properly

Kubernetes v1.26, Azure AKS

Autoscaler did not handle node termination events properly, leading to pod disruptions.

Click to view details

Scaling & Load

#461

Scaling & Load

Node Failure During Pod Scaling Up

Kubernetes v1.25, AWS EKS

Scaling up pods failed when a node was unexpectedly terminated, preventing proper pod scheduling.

Click to view details

Scaling & Load

#462

Scaling & Load

Unstable Scaling During Traffic Spikes

Kubernetes v1.26, Azure AKS

Pod scaling became unstable during traffic spikes due to delayed scaling responses.

Click to view details

Scaling & Load

#463

Scaling & Load

Insufficient Node Pools During Sudden Pod Scaling

Kubernetes v1.24, Google Cloud

Insufficient node pool capacity caused pod scheduling failures during sudden scaling events.

Click to view details

Scaling & Load

#464

Scaling & Load

Latency Spikes During Horizontal Pod Scaling

Kubernetes v1.25, IBM Cloud

Latency spikes occurred during horizontal pod scaling due to inefficient pod distribution.

Click to view details

Scaling & Load

#465

Scaling & Load

Resource Starvation During Infrequent Scaling Events

Kubernetes v1.23, AWS EKS

During infrequent scaling events, resource starvation occurred due to improper resource allocation.

Click to view details

Scaling & Load

#466

Scaling & Load

Autoscaler Delayed Reaction to Load Decrease

Kubernetes v1.22, Google Cloud

The autoscaler was slow to scale down after a drop in traffic, causing resource wastage.

Click to view details

Scaling & Load

#467

Scaling & Load

Node Resource Exhaustion Due to High Pod Density

Kubernetes v1.24, Azure AKS

Node resource exhaustion occurred when too many pods were scheduled on a single node, leading to instability.

Click to view details

Scaling & Load

#468

Scaling & Load

Scaling Failure Due to Node Memory Pressure

Kubernetes v1.25, Google Cloud

Pod scaling failed due to memory pressure on nodes, preventing new pods from being scheduled.

Click to view details

Scaling & Load

#469

Scaling & Load

Scaling Latency Due to Slow Node Provisioning

Kubernetes v1.26, IBM Cloud

Pod scaling was delayed due to slow node provisioning during cluster scaling events.

Click to view details

Scaling & Load

#470

Scaling & Load

Slow Scaling Response Due to Insufficient Metrics Collection

Kubernetes v1.23, AWS EKS

The autoscaling mechanism responded slowly to traffic changes because of insufficient metrics collection.

Click to view details

Scaling & Load

#471

Scaling & Load

Node Scaling Delayed Due to Cloud Provider API Limits

Kubernetes v1.24, Google Cloud

Node scaling was delayed because the cloud provider’s API rate limits were exceeded, preventing automatic node provisioning.

Click to view details

Scaling & Load

#472

Scaling & Load

Scaling Overload Due to High Replica Count

Kubernetes v1.25, Azure AKS

Pod scaling led to resource overload on nodes due to an excessively high replica count.

Click to view details

Scaling & Load

#473

Scaling & Load

Failure to Scale Down Due to Persistent Idle Pods

Kubernetes v1.24, IBM Cloud

Pods failed to scale down during low traffic periods, leading to idle resources consuming cluster capacity.

Click to view details

Scaling & Load

#474

Scaling & Load

Load Balancer Misrouting After Pod Scaling

Kubernetes v1.26, AWS EKS

The load balancer routed traffic unevenly after scaling up, causing some pods to become overloaded.

Click to view details

Scaling & Load

#475

Scaling & Load

Cluster Autoscaler Not Triggering Under High Load

Kubernetes v1.22, Google Cloud

The Cluster Autoscaler failed to trigger under high load due to misconfiguration in resource requests.

Click to view details

Scaling & Load

#476

Scaling & Load

Autoscaling Slow Due to Cloud Provider API Delay

Kubernetes v1.25, Azure AKS

Pod scaling was delayed due to cloud provider API delays during scaling events.

Click to view details

Scaling & Load

#477

Scaling & Load

Over-provisioning Resources During Scaling

Kubernetes v1.24, IBM Cloud

During a scaling event, resources were over-provisioned, causing unnecessary resource consumption and cost.

Click to view details

Scaling & Load

#478

Scaling & Load

Incorrect Load Balancer Configuration After Node Scaling

Kubernetes v1.25, Google Cloud

After node scaling, the load balancer failed to distribute traffic correctly due to misconfigured settings.

Click to view details

Scaling & Load

#479

Scaling & Load

Incorrect Load Balancer Configuration After Node Scaling

Kubernetes v1.25, Google Cloud

After node scaling, the load balancer failed to distribute traffic correctly due to misconfigured settings.

Click to view details

Scaling & Load

#480

Scaling & Load

Autoscaling Disabled Due to Resource Constraints

Kubernetes v1.22, AWS EKS

Autoscaling was disabled due to resource constraints on the cluster.

Click to view details

Scaling & Load

#481

Scaling & Load

Resource Fragmentation Leading to Scaling Delays

Kubernetes v1.24, Azure AKS

Fragmentation of resources across nodes led to scaling delays as new pods could not be scheduled efficiently.

Click to view details

Scaling & Load

#482

Scaling & Load

Incorrect Scaling Triggers Due to Misconfigured Metrics Server

Kubernetes v1.26, IBM Cloud

The HPA scaled pods incorrectly because the metrics server was misconfigured, leading to wrong scaling triggers.

Click to view details

Scaling & Load

#483

Scaling & Load

Autoscaler Misconfigured with Cluster Network Constraints

Kubernetes v1.25, Google Cloud

The Cluster Autoscaler failed to scale due to network configuration constraints that prevented communication between nodes.

Click to view details

Scaling & Load

#484

Scaling & Load

Scaling Delays Due to Resource Quota Exhaustion

Kubernetes v1.23, AWS EKS

Pod scaling was delayed due to exhausted resource quotas, preventing new pods from being scheduled.

Click to view details

Scaling & Load

#485

Scaling & Load

Memory Resource Overload During Scaling

Kubernetes v1.24, Azure AKS

Node memory resources were exhausted during a scaling event, causing pods to crash.

Click to view details

Scaling & Load

#486

Scaling & Load

HPA Scaling Delays Due to Incorrect Metric Aggregation

Kubernetes v1.26, Google Cloud

HPA scaling was delayed due to incorrect aggregation of metrics, leading to slower response to traffic spikes.

Click to view details

Scaling & Load

#487

Scaling & Load

Scaling Causing Unbalanced Pods Across Availability Zones

Kubernetes v1.25, AWS EKS

Pods became unbalanced across availability zones during scaling, leading to higher latency for some traffic.

Click to view details

Scaling & Load

#488

Scaling & Load

Failed Scaling due to Insufficient Node Capacity for StatefulSets

Kubernetes v1.23, AWS EKS

Scaling failed because the node pool did not have sufficient capacity to accommodate new StatefulSets.

Click to view details

Scaling & Load

#489

Scaling & Load

Uncontrolled Resource Spikes After Scaling Large StatefulSets

Kubernetes v1.22, GKE

Scaling large StatefulSets led to resource spikes that caused system instability.

Click to view details

Scaling & Load

#490

Scaling & Load

Cluster Autoscaler Preventing Scaling Due to Underutilized Nodes

Kubernetes v1.24, AWS EKS

The Cluster Autoscaler prevented scaling because nodes with low utilization were not being considered for scaling.

Click to view details

Scaling & Load

#491

Scaling & Load

Pod Overload During Horizontal Pod Autoscaling Event

Kubernetes v1.25, Azure AKS

Horizontal Pod Autoscaler (HPA) overloaded the system with pods during a traffic spike, leading to resource exhaustion.

Click to view details

Scaling & Load

#492

Scaling & Load

Unstable Node Performance During Rapid Scaling

Kubernetes v1.22, Google Kubernetes Engine (GKE)

Rapid node scaling led to unstable node performance, impacting pod stability.

Click to view details

Scaling & Load

#493

Scaling & Load

Insufficient Load Balancer Configuration After Scaling Pods

Kubernetes v1.23, Azure Kubernetes Service (AKS)

Load balancer configurations failed to scale with the increased number of pods, causing traffic routing issues.

Click to view details

Scaling & Load

#494

Scaling & Load

Inconsistent Pod Distribution Across Node Pools

Kubernetes v1.21, Google Kubernetes Engine (GKE)

Pods were not evenly distributed across node pools after scaling, leading to uneven resource utilization.

Click to view details

Scaling & Load

#495

Scaling & Load

HPA and Node Pool Scaling Conflict

Kubernetes v1.22, AWS EKS

Horizontal Pod Autoscaler (HPA) conflicted with Node Pool autoscaling, causing resource exhaustion.

Click to view details

Scaling & Load

#496

Scaling & Load

Delayed Horizontal Pod Scaling During Peak Load

Kubernetes v1.20, DigitalOcean Kubernetes (DOKS)

HPA scaled too slowly during a traffic surge, leading to application unavailability.

Click to view details

Scaling & Load

#497

Scaling & Load

Ineffective Pod Affinity Leading to Overload in Specific Nodes

Kubernetes v1.21, AWS EKS

Pod affinity settings caused workload imbalance and overloading in specific nodes.

Click to view details

Scaling & Load

#498

Scaling & Load

Inconsistent Pod Scaling Due to Resource Limits

Kubernetes v1.24, Google Kubernetes Engine (GKE)

Pods were not scaling properly due to overly restrictive resource limits.

Click to view details

Scaling & Load

#499

Scaling & Load

Kubernetes Autoscaler Misbehaving Under Variable Load

Kubernetes v1.23, AWS EKS

Cluster Autoscaler failed to scale the nodes appropriately due to fluctuating load, causing resource shortages.

Click to view details

Scaling & Load

#500

Scaling & Load

Pod Evictions Due to Resource Starvation After Scaling

Kubernetes v1.21, Azure Kubernetes Service (AKS)

After scaling up the deployment, resource starvation led to pod evictions, resulting in service instability.

Click to view details

Scaling & Load

#501

Scaling & Load

Slow Pod Scaling Due to Insufficient Metrics Collection

Kubernetes v1.22, Google Kubernetes Engine (GKE)

The Horizontal Pod Autoscaler (HPA) was slow to respond because it lacked sufficient metric collection.

Click to view details

Scaling & Load

#502

Scaling & Load

Inconsistent Load Balancing During Pod Scaling Events

Kubernetes v1.20, AWS EKS

Load balancer failed to redistribute traffic effectively when scaling pods, causing uneven distribution and degraded service.

Click to view details

Scaling & Load