Kubernetes Production Issues

A collection of real-world Kubernetes production issues and their solutions, presented in an easy-to-navigate format.

#1
Cluster Management
Zombie Pods Causing NodeDrain to Hang
K8s v1.23, On-prem bare metal, Systemd cgroups

Node drain stuck indefinitely due to unresponsive terminating pod.

Click to view details
Cluster Management
#2
Cluster Management
API Server Crash Due to Excessive CRD Writes
K8s v1.24, GKE, heavy use of custom controllers

API server crashed due to flooding by a malfunctioning controller creating too many custom resources.

Click to view details
Cluster Management
#3
Cluster Management
Node Not Rejoining After Reboot
K8s v1.21, Self-managed cluster, Static nodes

A rebooted node failed to rejoin the cluster due to kubelet identity mismatch.

Click to view details
Cluster Management
#4
Cluster Management
Etcd Disk Full Causing API Server Timeout
K8s v1.25, Bare-metal cluster

etcd ran out of disk space, making API server unresponsive.

Click to view details
Cluster Management
#5
Cluster Management
Misconfigured Taints Blocking Pod Scheduling
K8s v1.26, Multi-tenant cluster

Critical workloads weren’t getting scheduled due to incorrect node taints.

Click to view details
Cluster Management
#6
Cluster Management
Kubelet DiskPressure Loop on Large Image Pulls
K8s v1.22, EKS

Continuous pod evictions caused by DiskPressure due to image bloating.

Click to view details
Cluster Management
#7
Cluster Management
Node Goes NotReady Due to Clock Skew
K8s v1.20, On-prem

One node dropped from the cluster due to TLS errors from time skew.

Click to view details
Cluster Management
#8
Cluster Management
API Server High Latency Due to Event Flooding
K8s v1.23, Azure AKS

An app spamming Kubernetes events slowed down the entire API server.

Click to view details
Cluster Management
#9
Cluster Management
CoreDNS CrashLoop on Startup
K8s v1.24, DigitalOcean

CoreDNS pods kept crashing due to a misconfigured Corefile.

Click to view details
Cluster Management
#10
Cluster Management
Control Plane Unavailable After Flannel Misconfiguration
K8s v1.18, On-prem, Flannel CNI

Misaligned pod CIDRs caused overlay misrouting and API server failure.

Click to view details
Cluster Management
#11
Cluster Management
kube-proxy IPTables Rules Overlap Breaking Networking
K8s v1.22, On-prem with kube-proxy in IPTables mode

Services became unreachable due to overlapping custom IPTables rules with kube-proxy rules.

Click to view details
Cluster Management
#12
Cluster Management
Stuck CSR Requests Blocking New Node Joins
K8s v1.20, kubeadm cluster

New nodes couldn’t join due to a backlog of unapproved CSRs.

Click to view details
Cluster Management
#13
Cluster Management
Failed Cluster Upgrade Due to Unready Static Pods
K8s v1.21 → v1.23 upgrade, kubeadm

Upgrade failed when static control plane pods weren’t ready due to invalid manifests.

Click to view details
Cluster Management
#14
Cluster Management
Uncontrolled Logs Filled Disk on All Nodes
K8s v1.24, AWS EKS, containerd

Application pods generated excessive logs, filling up node /var/log.

Click to view details
Cluster Management
#15
Cluster Management
Node Drain Fails Due to PodDisruptionBudget Deadlock
K8s v1.21, production cluster with HPA and PDB

kubectl drain never completed because PDBs blocked eviction.

Click to view details
Cluster Management
#16
Cluster Management
CrashLoop of Kube-Controller-Manager on Boot
K8s v1.23, self-hosted control plane

Controller-manager crashed on startup due to outdated admission controller configuration.

Click to view details
Cluster Management
#17
Cluster Management
Inconsistent Cluster State After Partial Backup Restore
K8s v1.24, Velero-based etcd backup

A partial etcd restore led to stale object references and broken dependencies.

Click to view details
Cluster Management
#18
Cluster Management
kubelet Unable to Pull Images Due to Proxy Misconfig
K8s v1.25, Corporate proxy network

Nodes failed to pull images from DockerHub due to incorrect proxy environment configuration.

Click to view details
Cluster Management
#19
Cluster Management
Multiple Nodes Marked Unreachable Due to Flaky Network Interface
K8s v1.22, Bare-metal, bonded NICs

Flapping interface on switch caused nodes to be marked NotReady intermittently.

Click to view details
Cluster Management
#20
Cluster Management
Node Labels Accidentally Overwritten by DaemonSet
K8s v1.24, DaemonSet-based node config

A DaemonSet used for node labeling overwrote existing labels used by schedulers.

Click to view details
Cluster Management
#21
Cluster Management
Cluster Autoscaler Continuously Spawning and Deleting Nodes
K8s v1.24, AWS EKS with Cluster Autoscaler

The cluster was rapidly scaling up and down, creating instability in workloads.

Click to view details
Cluster Management
#22
Cluster Management
Stale Finalizers Preventing Namespace Deletion
K8s v1.21, self-managed

A namespace remained in “Terminating” state indefinitely.

Click to view details
Cluster Management
#23
Cluster Management
CoreDNS CrashLoop Due to Invalid ConfigMap Update
K8s v1.23, managed GKE

CoreDNS stopped resolving names cluster-wide after a config update.

Click to view details
Cluster Management
#24
Cluster Management
Pod Eviction Storm Due to DiskPressure
K8s v1.25, self-managed, containerd

A sudden spike in image pulls caused all nodes to hit disk pressure, leading to massive pod evictions.

Click to view details
Cluster Management
#25
Cluster Management
Orphaned PVs Causing Unscheduled Pods
K8s v1.20, CSI storage on vSphere

PVCs were stuck in Pending state due to existing orphaned PVs in Released state.

Click to view details
Cluster Management
#26
Cluster Management
Taints and Tolerations Mismatch Prevented Workload Scheduling
K8s v1.22, managed AKS

Workloads failed to schedule on new nodes that had a taint the workloads didn’t tolerate.

Click to view details
Cluster Management
#27
Cluster Management
Node Bootstrap Failure Due to Unavailable Container Registry
K8s v1.21, on-prem, private registry

New nodes failed to join the cluster due to container runtime timeout when pulling base images.

Click to view details
Cluster Management
#28
Cluster Management
kubelet Fails to Start Due to Expired TLS Certs
K8s v1.19, kubeadm cluster

Several nodes went NotReady after reboot due to kubelet failing to start with expired client certs.

Click to view details
Cluster Management
#29
Cluster Management
kube-scheduler Crash Due to Invalid Leader Election Config
K8s v1.24, custom scheduler deployment

kube-scheduler pod failed with panic due to misconfigured leader election flags.

Click to view details
Cluster Management
#30
Cluster Management
Cluster DNS Resolution Broken After Calico CNI Update
K8s v1.23, self-hosted Calico

DNS resolution broke after Calico CNI update due to iptables policy drop changes.

Click to view details
Cluster Management
#31
Cluster Management
Node Clock Drift Causing Authentication Failures
K8s v1.22, on-prem, kubeadm

Authentication tokens failed across the cluster due to node clock skew.

Click to view details
Cluster Management
#32
Cluster Management
Inconsistent Node Labels Causing Scheduling Bugs
K8s v1.24, multi-zone GKE

Zone-aware workloads failed to schedule due to missing zone labels on some nodes.

Click to view details
Cluster Management
#33
Cluster Management
API Server Slowdowns from High Watch Connection Count
K8s v1.23, OpenShift

API latency rose sharply due to thousands of watch connections from misbehaving clients.

Click to view details
Cluster Management
#34
Cluster Management
Etcd Disk Full Crashing the Cluster
K8s v1.21, self-managed with local etcd

Entire control plane crashed due to etcd disk running out of space.

Click to view details
Cluster Management
#35
Cluster Management
ClusterConfigMap Deleted by Accident Bringing Down Addons
K8s v1.24, Rancher

A user accidentally deleted the kube-root-ca.crt ConfigMap, which many workloads relied on.

Click to view details
Cluster Management
#36
Cluster Management
Misconfigured NodeAffinity Excluding All Nodes
K8s v1.22, Azure AKS

A critical deployment was unschedulable due to strict nodeAffinity rules.

Click to view details
Cluster Management
#37
Cluster Management
Outdated Admission Webhook Blocking All Deployments
K8s v1.25, self-hosted

A stale mutating webhook caused all deployments to fail due to TLS certificate errors.

Click to view details
Cluster Management
#38
Cluster Management
API Server Certificate Expiry Blocking Cluster Access
K8s v1.19, kubeadm

After 1 year of uptime, API server certificate expired, blocking access to all components.

Click to view details
Cluster Management
#39
Cluster Management
CRI Socket Mismatch Preventing kubelet Startup
K8s v1.22, containerd switch

kubelet failed to start after switching from Docker to containerd due to incorrect CRI socket path.

Click to view details
Cluster Management
#40
Cluster Management
Cluster-Wide Crash Due to Misconfigured Resource Quotas
K8s v1.24, multi-tenant namespace setup

Cluster workloads failed after applying overly strict resource quotas that denied new pod creation.

Click to view details
Cluster Management
#41
Cluster Management
Cluster Upgrade Failing Due to CNI Compatibility
K8s v1.21 to v1.22, custom CNI plugin

Cluster upgrade failed due to an incompatible version of the CNI plugin.

Click to view details
Cluster Management
#42
Cluster Management
Failed Pod Security Policy Enforcement Causing Privileged Container Launch
K8s v1.22, AWS EKS

Privileged containers were able to run despite Pod Security Policy enforcement.

Click to view details
Cluster Management
#43
Cluster Management
Node Pool Scaling Impacting StatefulSets
K8s v1.24, GKE

StatefulSet pods were rescheduled across different nodes, breaking persistent volume bindings.

Click to view details
Cluster Management
#44
Cluster Management
Kubelet Crash Due to Out of Memory (OOM) Errors
K8s v1.20, bare metal

Kubelet crashed after running out of memory due to excessive pod resource usage.

Click to view details
Cluster Management
#45
Cluster Management
DNS Resolution Failure in Multi-Cluster Setup
K8s v1.23, multi-cluster federation

DNS resolution failed between two federated clusters due to missing DNS records.

Click to view details
Cluster Management
#46
Cluster Management
Insufficient Resource Limits in Autoscaling Setup
K8s v1.21, GKE with Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler did not scale pods up as expected due to insufficient resource limits.

Click to view details
Cluster Management
#47
Cluster Management
Control Plane Overload Due to High Audit Log Volume
K8s v1.22, Azure AKS

The control plane became overloaded and slow due to excessive audit log volume.

Click to view details
Cluster Management
#48
Cluster Management
Resource Fragmentation Causing Cluster Instability
K8s v1.23, bare metal

Resource fragmentation due to unbalanced pod distribution led to cluster instability.

Click to view details
Cluster Management
#49
Cluster Management
Failed Cluster Backup Due to Misconfigured Volume Snapshots
K8s v1.21, AWS EBS

Cluster backup failed due to a misconfigured volume snapshot driver.

Click to view details
Cluster Management
#50
Cluster Management
Failed Deployment Due to Image Pulling Issues
K8s v1.22, custom Docker registry

Deployment failed due to image pulling issues from a custom Docker registry.

Click to view details
Cluster Management
#51
Cluster Management
High Latency Due to Inefficient Ingress Controller Configuration
K8s v1.20, AWS EKS

Ingress controller configuration caused high network latency due to inefficient routing rules.

Click to view details
Cluster Management
#52
Cluster Management
Node Draining Delay During Maintenance
K8s v1.21, GKE

Node draining took an unusually long time during maintenance due to unscheduled pod disruption.

Click to view details
Cluster Management
#53
Cluster Management
Unresponsive Cluster After Large-Scale Deployment
K8s v1.19, Azure AKS

Cluster became unresponsive after deploying a large number of pods in a single batch.

Click to view details
Cluster Management
#54
Cluster Management
Failed Node Recovery Due to Corrupt Kubelet Configuration
K8s v1.23, Bare Metal

Node failed to recover after being drained due to a corrupt kubelet configuration.

Click to view details
Cluster Management
#55
Cluster Management
Resource Exhaustion Due to Misconfigured Horizontal Pod Autoscaler
K8s v1.22, AWS EKS

Cluster resources were exhausted due to misconfiguration in the Horizontal Pod Autoscaler (HPA), resulting in excessive pod scaling.

Click to view details
Cluster Management
#56
Cluster Management
Inconsistent Application Behavior After Pod Restart
K8s v1.20, GKE

Application behavior became inconsistent after pod restarts due to improper state handling.

Click to view details
Cluster Management
#57
Cluster Management
Cluster-wide Service Outage Due to Missing ClusterRoleBinding
K8s v1.21, AWS EKS

Cluster-wide service outage occurred after an automated change removed a critical ClusterRoleBinding.

Click to view details
Cluster Management
#58
Cluster Management
Node Overcommitment Leading to Pod Evictions
K8s v1.19, Bare Metal

Node overcommitment led to pod evictions, causing application downtime.

Click to view details
Cluster Management
#59
Cluster Management
Failed Pod Startup Due to Image Pull Policy Misconfiguration
K8s v1.23, Azure AKS

Pods failed to start because the image pull policy was misconfigured.

Click to view details
Cluster Management
#60
Cluster Management
Excessive Control Plane Resource Usage During Pod Scheduling
K8s v1.24, AWS EKS

Control plane resources were excessively utilized during pod scheduling, leading to slow deployments.

Click to view details
Cluster Management
#61
Cluster Management
Persistent Volume Claim Failure Due to Resource Quota Exceedance
K8s v1.22, GKE

Persistent Volume Claims (PVCs) failed due to exceeding the resource quota for storage in the namespace.

Click to view details
Cluster Management
#62
Cluster Management
Failed Pod Rescheduling Due to Node Affinity Misconfiguration
K8s v1.21, AWS EKS

Pods failed to reschedule after a node failure due to improper node affinity rules.

Click to view details
Cluster Management
#63
Cluster Management
Intermittent Network Latency Due to Misconfigured CNI Plugin
K8s v1.24, Azure AKS

Network latency issues occurred intermittently due to misconfiguration in the CNI (Container Network Interface) plugin.

Click to view details
Cluster Management
#64
Cluster Management
Excessive Pod Restarts Due to Resource Limits
K8s v1.19, GKE

A pod was restarting frequently due to resource limits being too low, causing the container to be killed.

Click to view details
Cluster Management
#65
Cluster Management
Cluster Performance Degradation Due to Excessive Logs
K8s v1.22, AWS EKS

Cluster performance degraded because of excessive logs being generated by applications, leading to high disk usage.

Click to view details
Cluster Management
#66
Cluster Management
Insufficient Cluster Capacity Due to Unchecked CronJobs
K8s v1.21, GKE

The cluster experienced resource exhaustion because CronJobs were running in parallel without proper capacity checks.

Click to view details
Cluster Management
#67
Cluster Management
Unsuccessful Pod Scaling Due to Affinity/Anti-Affinity Conflict
K8s v1.23, Azure AKS

Pod scaling failed due to conflicting affinity/anti-affinity rules that prevented pods from being scheduled.

Click to view details
Cluster Management
#68
Cluster Management
Cluster Inaccessibility Due to API Server Throttling
K8s v1.22, AWS EKS

Cluster became inaccessible due to excessive API server throttling caused by too many concurrent requests.

Click to view details
Cluster Management
#69
Cluster Management
Persistent Volume Expansion Failure
K8s v1.20, GKE

Expansion of a Persistent Volume (PV) failed due to improper storage class settings.

Click to view details
Cluster Management
#70
Cluster Management
Unauthorized Access to Cluster Resources Due to RBAC Misconfiguration
K8s v1.22, AWS EKS

Unauthorized users gained access to sensitive resources due to misconfigured RBAC roles and bindings.

Click to view details
Cluster Management
#71
Cluster Management
Inconsistent Pod State Due to Image Pull Failures
K8s v1.20, GKE

Pods entered an inconsistent state because the container image failed to pull due to incorrect image tag.

Click to view details
Cluster Management
#72
Cluster Management
Pod Disruption Due to Insufficient Node Resources
K8s v1.22, Azure AKS

Pods experienced disruptions as nodes ran out of CPU and memory, causing evictions.

Click to view details
Cluster Management
#73
Cluster Management
Service Discovery Issues Due to DNS Resolution Failures
K8s v1.21, AWS EKS

Services could not discover each other due to DNS resolution failures, affecting internal communication.

Click to view details
Cluster Management
#74
Cluster Management
Persistent Volume Provisioning Delays
K8s v1.22, GKE

Persistent volume provisioning was delayed due to an issue with the dynamic provisioner.

Click to view details
Cluster Management
#75
Cluster Management
Deployment Rollback Failure Due to Missing Image
K8s v1.21, Azure AKS

A deployment rollback failed due to the rollback image version no longer being available in the container registry.

Click to view details
Cluster Management
#76
Cluster Management
Kubernetes Master Node Unresponsive After High Load
K8s v1.22, AWS EKS

The Kubernetes master node became unresponsive under high load due to excessive API server calls and high memory usage.

Click to view details
Cluster Management
#77
Cluster Management
Failed Pod Restart Due to Inadequate Node Affinity
K8s v1.24, GKE

Pods failed to restart on available nodes due to overly strict node affinity rules.

Click to view details
Cluster Management
#78
Cluster Management
ReplicaSet Scaling Issues Due to Resource Limits
K8s v1.19, AWS EKS

The ReplicaSet failed to scale due to insufficient resources on the nodes.

Click to view details
Cluster Management
#79
Cluster Management
Missing Namespace After Cluster Upgrade
K8s v1.21, GKE

A namespace was missing after performing a cluster upgrade.

Click to view details
Cluster Management
#80
Cluster Management
Inefficient Resource Usage Due to Misconfigured Horizontal Pod Autoscaler
K8s v1.23, Azure AKS

The Horizontal Pod Autoscaler (HPA) was inefficiently scaling due to misconfigured metrics.

Click to view details
Cluster Management
#81
Cluster Management
Pod Disruption Due to Unavailable Image Registry
K8s v1.21, GKE

Pods could not start because the image registry was temporarily unavailable, causing image pull failures.

Click to view details
Cluster Management
#82
Cluster Management
Pod Fails to Start Due to Insufficient Resource Requests
K8s v1.20, AWS EKS

Pods failed to start because their resource requests were too low, preventing the scheduler from assigning them to nodes.

Click to view details
Cluster Management
#83
Cluster Management
Horizontal Pod Autoscaler Under-Scaling During Peak Load
K8s v1.22, GKE

HPA failed to scale the pods appropriately during a sudden spike in load.

Click to view details
Cluster Management
#84
Cluster Management
Pod Eviction Due to Node Disk Pressure
K8s v1.21, AWS EKS

Pods were evicted due to disk pressure on the node, causing service interruptions.

Click to view details
Cluster Management
#85
Cluster Management
Failed Node Drain Due to In-Use Pods
K8s v1.22, Azure AKS

A node failed to drain due to pods that were in use, preventing the drain operation from completing.

Click to view details
Cluster Management
#86
Cluster Management
Cluster Autoscaler Not Scaling Up
K8s v1.20, GKE

The cluster autoscaler failed to scale up the node pool despite high resource demand.

Click to view details
Cluster Management
#87
Cluster Management
Pod Network Connectivity Issues After Node Reboot
K8s v1.21, AWS EKS

Pods lost network connectivity after a node reboot, causing communication failures between services.

Click to view details
Cluster Management
#88
Cluster Management
Insufficient Permissions Leading to Unauthorized Access Errors
K8s v1.22, GKE

Unauthorized access errors occurred due to missing permissions in RBAC configurations.

Click to view details
Cluster Management
#89
Cluster Management
Failed Pod Upgrade Due to Incompatible API Versions
K8s v1.19, AWS EKS

A pod upgrade failed because it was using deprecated APIs not supported in the new version.

Click to view details
Cluster Management
#90
Cluster Management
High CPU Utilization Due to Inefficient Application Code
K8s v1.21, Azure AKS

A container's high CPU usage was caused by inefficient application code, leading to resource exhaustion.

Click to view details
Cluster Management
#91
Cluster Management
Resource Starvation Due to Over-provisioned Pods
K8s v1.20, AWS EKS

Resource starvation occurred on nodes because pods were over-provisioned, consuming more resources than expected.

Click to view details
Cluster Management
#92
Cluster Management
Unscheduled Pods Due to Insufficient Affinity Constraints
K8s v1.21, GKE

Pods were not scheduled due to overly strict affinity rules that limited the nodes available for deployment.

Click to view details
Cluster Management
#93
Cluster Management
Pod Readiness Probe Failure Due to Slow Initialization
K8s v1.22, Azure AKS

Pods failed their readiness probes during initialization, causing traffic to be routed to unhealthy instances.

Click to view details
Cluster Management
#94
Cluster Management
Incorrect Ingress Path Handling Leading to 404 Errors
K8s v1.19, GKE

Incorrect path configuration in the ingress resource resulted in 404 errors for certain API routes.

Click to view details
Cluster Management
#95
Cluster Management
Node Pool Scaling Failure Due to Insufficient Quotas
K8s v1.20, AWS EKS

Node pool scaling failed because the account exceeded resource quotas in AWS.

Click to view details
Cluster Management
#96
Cluster Management
Pod Crash Loop Due to Missing ConfigMap
K8s v1.21, Azure AKS

Pods entered a crash loop because a required ConfigMap was not present in the namespace.

Click to view details
Cluster Management
#97
Cluster Management
Kubernetes API Server Slowness Due to Excessive Logging
K8s v1.22, GKE

The Kubernetes API server became slow due to excessive log generation from the kubelet and other components.

Click to view details
Cluster Management
#98
Cluster Management
Pod Scheduling Failure Due to Taints and Tolerations Misconfiguration
K8s v1.19, AWS EKS

Pods failed to schedule because the taints and tolerations were misconfigured, preventing the scheduler from placing them on nodes.

Click to view details
Cluster Management
#99
Cluster Management
Unresponsive Dashboard Due to High Resource Usage
K8s v1.20, Azure AKS

The Kubernetes dashboard became unresponsive due to high resource usage caused by a large number of requests.

Click to view details
Cluster Management
#100
Cluster Management
Resource Limits Causing Container Crashes
K8s v1.21, GKE

Containers kept crashing due to hitting resource limits set in their configurations.

Click to view details
Cluster Management
#101
Networking
Pod Communication Failure Due to Network Policy Misconfiguration
K8s v1.22, GKE

Pods failed to communicate due to a misconfigured NetworkPolicy that blocked ingress traffic.

Click to view details
Networking
#102
Networking
DNS Resolution Failure Due to CoreDNS Pod Crash
K8s v1.21, Azure AKS

DNS resolution failed across the cluster after CoreDNS pods crashed unexpectedly.

Click to view details
Networking
#103
Networking
Network Latency Due to Misconfigured Service Type
K8s v1.18, AWS EKS

High network latency occurred because a service was incorrectly configured as a NodePortinstead of a LoadBalancer.

Click to view details
Networking
#104
Networking
Inconsistent Pod-to-Pod Communication Due to MTU Mismatch
K8s v1.20, GKE

Pod-to-pod communication became inconsistent due to a mismatch in Maximum Transmission Unit (MTU) settings across nodes.

Click to view details
Networking
#105
Networking
Service Discovery Failure Due to DNS Pod Resource Limits
K8s v1.19, Azure AKS

Service discovery failed across the cluster due to DNS pod resource limits being exceeded.

Click to view details
Networking
#106
Networking
Pod IP Collision Due to Insufficient IP Range
K8s v1.21, GKE

Pod IP collisions occurred due to insufficient IP range allocation for the cluster.

Click to view details
Networking
#107
Networking
Network Bottleneck Due to Single Node in NodePool
K8s v1.23, AWS EKS

A network bottleneck occurred due to excessive traffic being handled by a single node in the node pool.

Click to view details
Networking
#108
Networking
Network Partitioning Due to CNI Plugin Failure
K8s v1.18, GKE

A network partition occurred when the CNI plugin failed, preventing pods from communicating with each other.

Click to view details
Networking
#109
Networking
Misconfigured Ingress Resource Causing SSL Errors
K8s v1.22, Azure AKS

SSL certificate errors occurred due to a misconfigured Ingress resource.

Click to view details
Networking
#110
Cluster Management
Cluster Autoscaler Fails to Scale Nodes Due to Incorrect IAM Role Permissions
K8s v1.21, AWS EKS

The cluster autoscaler failed to scale the number of nodes in response to resource shortages due to missing IAM role permissions for managing EC2 instances.

Click to view details
Cluster Management
#111
Networking
DNS Resolution Failure Due to Incorrect Pod IP Allocation
K8s v1.21, GKE

DNS resolution failed due to incorrect IP allocation in the cluster’s CNI plugin.

Click to view details
Networking
#112
Networking
Failed Pod-to-Service Communication Due to Port Binding Conflict
K8s v1.18, AWS EKS

Pods couldn’t communicate with services because of a port binding conflict.

Click to view details
Networking
#113
Networking
Pod Eviction Due to Network Resource Constraints
K8s v1.19, GKE

A pod was evicted due to network resource constraints, specifically limited bandwidth.

Click to view details
Networking
#114
Networking
Intermittent Network Disconnects Due to MTU Mismatch Between Nodes
K8s v1.20, Azure AKS

Intermittent network disconnects occurred due to MTU mismatches between different nodes in the cluster.

Click to view details
Networking
#115
Networking
Service Load Balancer Failing to Route Traffic to New Pods
K8s v1.22, Google GKE

Service load balancer failed to route traffic to new pods after scaling up.

Click to view details
Networking
#116
Networking
Network Traffic Drop Due to Overlapping CIDR Blocks
K8s v1.19, AWS EKS

Network traffic dropped due to overlapping CIDR blocks between the VPC and Kubernetes pod network.

Click to view details
Networking
#117
Networking
Misconfigured DNS Resolvers Leading to Service Discovery Failure
K8s v1.21, DigitalOcean Kubernetes

Service discovery failed due to misconfigured DNS resolvers.

Click to view details
Networking
#118
Networking
Intermittent Latency Due to Overloaded Network Interface
K8s v1.22, AWS EKS

Intermittent network latency occurred due to an overloaded network interface on a single node.

Click to view details
Networking
#119
Networking
Pod Disconnection During Network Partition
K8s v1.20, Google GKE

Pods were disconnected during a network partition between nodes in the cluster.

Click to view details
Networking
#120
Networking
Pod-to-Pod Communication Blocked by Network Policies
K8s v1.21, AWS EKS

Pod-to-pod communication was blocked due to overly restrictive network policies.

Click to view details
Networking
#121
Networking
Unresponsive External API Due to DNS Resolution Failure
K8s v1.22, DigitalOcean Kubernetes

External API calls from the pods failed due to DNS resolution issues for the external domain.

Click to view details
Networking
#122
Networking
Load Balancer Health Checks Failing After Pod Update
K8s v1.19, GCP Kubernetes Engine

Load balancer health checks failed after updating a pod due to incorrect readiness probe configuration.

Click to view details
Networking
#123
Networking
Pod Network Performance Degradation After Node Upgrade
K8s v1.21, Azure AKS

Network performance degraded after an automatic node upgrade, causing latency in pod communication.

Click to view details
Networking
#124
Networking
Service IP Conflict Due to CIDR Overlap
K8s v1.20, GKE

A service IP conflict occurred due to overlapping CIDR blocks, preventing correct routing of traffic to the service.

Click to view details
Networking
#125
Networking
High Latency in Inter-Namespace Communication
K8s v1.22, AWS EKS

High latency observed in inter-namespace communication, leading to application timeouts.

Click to view details
Networking
#126
Networking
Pod Network Disruptions Due to CNI Plugin Update
K8s v1.19, DigitalOcean Kubernetes

Pods experienced network disruptions after updating the CNI plugin to a newer version.

Click to view details
Networking
#127
Networking
Loss of Service Traffic Due to Missing Ingress Annotations
K8s v1.21, GKE

Loss of service traffic after ingress annotations were incorrectly set, causing the ingress controller to misroute traffic.

Click to view details
Networking
#128
Cluster Management
Node Pool Draining Timeout Due to Slow Pod Termination
K8s v1.19, GKE

The node pool draining process timed out during upgrades due to pods taking longer than expected to terminate.

Click to view details
Cluster Management
#129
Cluster Management
Failed Cluster Upgrade Due to Incompatible API Versions
K8s v1.17, Azure AKS

The cluster upgrade failed because certain deprecated API versions were still in use, causing compatibility issues with the new K8s version.

Click to view details
Cluster Management
#130
Networking
DNS Resolution Failure for Services After Pod Restart
K8s v1.19, Azure AKS

DNS resolution failed for services after restarting a pod, causing internal communication issues.

Click to view details
Networking
#131
Networking
Pod IP Address Changes Causing Application Failures
K8s v1.21, GKE

Application failed after a pod IP address changed unexpectedly, breaking communication between services.

Click to view details
Networking
#132
Networking
Service Exposure Failed Due to Misconfigured Load Balancer
K8s v1.22, AWS EKS

A service exposure attempt failed due to incorrect configuration of the AWS load balancer.

Click to view details
Networking
#133
Networking
Network Latency Spikes During Pod Autoscaling
K8s v1.20, Google Cloud

Network latency spikes occurred when autoscaling pods during traffic surges.

Click to view details
Networking
#134
Networking
Service Not Accessible Due to Incorrect Namespace Selector
K8s v1.18, on-premise

A service was not accessible due to a misconfigured namespace selector in the service definition.

Click to view details
Networking
#135
Networking
Intermittent Pod Connectivity Due to Network Plugin Bug
K8s v1.23, DigitalOcean Kubernetes

Pods experienced intermittent connectivity issues due to a bug in the CNI network plugin.

Click to view details
Networking
#136
Networking
Failed Ingress Traffic Routing Due to Missing Annotations
K8s v1.21, AWS EKS

Ingress traffic was not properly routed to services due to missing annotations in the ingress resource.

Click to view details
Networking
#137
Networking
Pod IP Conflict Causing Service Downtime
K8s v1.19, GKE

A pod IP conflict caused service downtime and application crashes.

Click to view details
Networking
#138
Networking
Latency Due to Unoptimized Service Mesh Configuration
K8s v1.21, Istio

Increased latency in service-to-service communication due to suboptimal configuration of Istio service mesh.

Click to view details
Networking
#139
Networking
DNS Resolution Failure After Cluster Upgrade
K8s v1.20 to v1.21, AWS EKS

DNS resolution failures occurred across pods after a Kubernetes cluster upgrade.

Click to view details
Networking
#140
Networking
Service Mesh Sidecar Injection Failure
K8s v1.19, Istio 1.8

Sidecar injection failed for some pods in the service mesh, preventing communication between services.

Click to view details
Networking
#141
Networking
Network Bandwidth Saturation During Large-Scale Deployments
K8s v1.21, Azure AKS

Network bandwidth was saturated during a large-scale deployment, affecting cluster communication.

Click to view details
Networking
#142
Networking
Inconsistent Network Policies Blocking Internal Traffic
K8s v1.18, GKE

Internal pod-to-pod traffic was unexpectedly blocked due to inconsistent network policies.

Click to view details
Networking
#143
Networking
Pod Network Latency Caused by Overloaded CNI Plugin
K8s v1.19, on-premise

Pod network latency increased due to an overloaded CNI plugin.

Click to view details
Networking
#144
Networking
TCP Retransmissions Due to Network Saturation
K8s v1.22, DigitalOcean Kubernetes

TCP retransmissions increased due to network saturation, leading to degraded pod-to-pod communication.

Click to view details
Networking
#145
Networking
DNS Lookup Failures Due to Resource Limits
K8s v1.20, AWS EKS

DNS lookup failures occurred due to resource limits on the CoreDNS pods.

Click to view details
Networking
#146
Networking
Service Exposure Issues Due to Incorrect Ingress Configuration
K8s v1.22, Azure AKS

A service was not accessible externally due to incorrect ingress configuration.

Click to view details
Networking
#147
Networking
Pod-to-Pod Communication Failure Due to Network Policy
K8s v1.19, on-premise

Pod-to-pod communication failed due to an overly restrictive network policy.

Click to view details
Networking
#148
Networking
Unstable Network Due to Overlay Network Misconfiguration
K8s v1.18, VMware Tanzu

The overlay network was misconfigured, leading to instability in pod communication.

Click to view details
Networking
#149
Networking
Intermittent Pod Network Connectivity Due to Cloud Provider Issues
K8s v1.21, AWS EKS

Pod network connectivity was intermittent due to issues with the cloud provider's network infrastructure.

Click to view details
Networking
#150
Networking
Port Conflicts Between Services in Different Namespaces
K8s v1.22, Google GKE

Port conflicts between services in different namespaces led to communication failures.

Click to view details
Networking
#151
Networking
NodePort Service Not Accessible Due to Firewall Rules
K8s v1.23, Google GKE

A NodePort service became inaccessible due to restrictive firewall rules on the cloud provider.

Click to view details
Networking
#152
Networking
DNS Latency Due to Overloaded CoreDNS Pods
K8s v1.19, AWS EKS

CoreDNS latency increased due to resource constraints on the CoreDNS pods.

Click to view details
Networking
#153
Networking
Network Performance Degradation Due to Misconfigured MTU
K8s v1.20, on-premise

Network performance degraded due to an incorrect Maximum Transmission Unit (MTU) setting.

Click to view details
Networking
#154
Networking
Application Traffic Routing Issue Due to Incorrect Ingress Resource
K8s v1.22, Azure AKS

Application traffic was routed incorrectly due to an error in the ingress resource definition.

Click to view details
Networking
#155
Networking
Intermittent Service Disruptions Due to DNS Caching Issue
K8s v1.21, GCP GKE

Intermittent service disruptions occurred due to stale DNS cache in CoreDNS.

Click to view details
Networking
#156
Networking
Flannel Overlay Network Interruption Due to Node Failure
K8s v1.18, on-premise

Flannel overlay network was interrupted after a node failure, causing pod-to-pod communication issues.

Click to view details
Networking
#157
Networking
Network Traffic Loss Due to Port Collision in Network Policy
K8s v1.19, GKE

Network traffic was lost due to a port collision in the network policy, affecting application availability.

Click to view details
Networking
#158
Networking
CoreDNS Service Failures Due to Resource Exhaustion
K8s v1.20, Azure AKS

CoreDNS service failed due to resource exhaustion, causing DNS resolution failures.

Click to view details
Networking
#159
Networking
Pod Network Partition Due to Misconfigured IPAM
K8s v1.22, VMware Tanzu

Pod network partition occurred due to an incorrectly configured IP Address Management (IPAM) in the CNI plugin.

Click to view details
Networking
#160
Networking
Network Performance Degradation Due to Overloaded CNI Plugin
K8s v1.21, AWS EKS

Network performance degraded due to the CNI plugin being overwhelmed by high traffic volume.

Click to view details
Networking
#161
Networking
Network Performance Degradation Due to Overloaded CNI Plugin
K8s v1.21, AWS EKS

Network performance degraded due to the CNI plugin being overwhelmed by high traffic volume.

Click to view details
Networking
#162
Networking
DNS Resolution Failures Due to Misconfigured CoreDNS
K8s v1.19, Google GKE

DNS resolution failures due to misconfigured CoreDNS, leading to application errors.

Click to view details
Networking
#163
Networking
Network Partition Due to Incorrect Calico Configuration
K8s v1.20, Azure AKS

Network partitioning due to incorrect Calico CNI configuration, resulting in pods being unable to communicate with each other.

Click to view details
Networking
#164
Networking
IP Overlap Leading to Communication Failure Between Pods
K8s v1.19, On-premise

Pods failed to communicate due to IP address overlap caused by an incorrect subnet configuration.

Click to view details
Networking
#165
Networking
Pod Network Latency Due to Overloaded Kubernetes Network Interface
K8s v1.21, AWS EKS

Pod network latency increased due to an overloaded network interface on the Kubernetes nodes.

Click to view details
Networking
#166
Networking
Intermittent Connectivity Failures Due to Pod DNS Cache Expiry
K8s v1.22, Google GKE

Intermittent connectivity failures due to pod DNS cache expiry, leading to failed DNS lookups for external services.

Click to view details
Networking
#167
Networking
Flapping Network Connections Due to Misconfigured Network Policies
K8s v1.20, Azure AKS

Network connections between pods were intermittently dropping due to misconfigured network policies, causing application instability.

Click to view details
Networking
#168
Networking
Cluster Network Downtime Due to CNI Plugin Upgrade
K8s v1.22, On-premise

Cluster network downtime occurred during a CNI plugin upgrade, affecting pod-to-pod communication.

Click to view details
Networking
#169
Networking
Inconsistent Pod Network Connectivity in Multi-Region Cluster
K8s v1.21, GCP

Pods in a multi-region cluster experienced inconsistent network connectivity between regions due to misconfigured VPC peering.

Click to view details
Networking
#170
Networking
Pod Network Partition Due to Network Policy Blocking DNS Requests
K8s v1.19, Azure AKS

Pods were unable to resolve DNS due to a network policy blocking DNS traffic, causing service failures.

Click to view details
Networking
#171
Networking
Network Bottleneck Due to Overutilized Network Interface
K8s v1.22, AWS EKS

Network bottleneck occurred due to overutilization of a single network interface on the worker nodes.

Click to view details
Networking
#172
Networking
Network Latency Caused by Overloaded VPN Tunnel
K8s v1.20, On-premise

Network latency increased due to an overloaded VPN tunnel between the Kubernetes cluster and an on-premise data center.

Click to view details
Networking
#173
Networking
Dropped Network Packets Due to MTU Mismatch
K8s v1.21, GKE

Network packets were dropped due to a mismatch in Maximum Transmission Unit (MTU) settings across different network components.

Click to view details
Networking
#174
Networking
Pod Network Isolation Due to Misconfigured Network Policy
K8s v1.20, Azure AKS

Pods in a specific namespace were unable to communicate due to an incorrectly applied network policy blocking traffic between namespaces.

Click to view details
Networking
#175
Networking
Service Discovery Failures Due to CoreDNS Pod Crash
K8s v1.19, AWS EKS

Service discovery failures occurred when CoreDNS pods crashed due to resource exhaustion, causing DNS resolution issues.

Click to view details
Networking
#176
Networking
Pod DNS Resolution Failure Due to CoreDNS Configuration Issue
K8s v1.18, On-premise

DNS resolution failures occurred within pods due to a misconfiguration in the CoreDNS config map.

Click to view details
Networking
#177
Networking
DNS Latency Due to Overloaded CoreDNS Pods
K8s v1.19, GKE

CoreDNS pods experienced high latency and timeouts due to resource overutilization, causing slow DNS resolution for applications.

Click to view details
Networking
#178
Networking
Pod Network Degradation Due to Overlapping CIDR Blocks
K8s v1.21, AWS EKS

Network degradation occurred due to overlapping CIDR blocks between VPCs in a hybrid cloud setup, causing routing issues.

Click to view details
Networking
#179
Networking
Service Discovery Failures Due to Network Policy Blocking DNS Traffic
K8s v1.22, Azure AKS

Service discovery failed when a network policy was mistakenly applied to block DNS traffic, preventing pods from resolving services within the cluster.

Click to view details
Networking
#180
Networking
Intermittent Network Connectivity Due to Overloaded Overlay Network
K8s v1.19, OpenStack

Pods experienced intermittent network connectivity issues due to an overloaded overlay network that could not handle the traffic.

Click to view details
Networking
#181
Networking
Pod-to-Pod Communication Failure Due to CNI Plugin Configuration Issue
K8s v1.22, AWS EKS

Pods were unable to communicate with each other due to a misconfiguration in the CNI plugin.

Click to view details
Networking
#182
Networking
Sporadic DNS Failures Due to Resource Contention in CoreDNS Pods
K8s v1.19, GKE

Sporadic DNS resolution failures occurred due to resource contention in CoreDNS pods, which were not allocated enough CPU resources.

Click to view details
Networking
#183
Networking
High Latency in Pod-to-Node Communication Due to Overlay Network
K8s v1.21, OpenShift

High latency was observed in pod-to-node communication due to network overhead introduced by the overlay network.

Click to view details
Networking
#184
Networking
Service Discovery Issues Due to DNS Cache Staleness
K8s v1.20, On-premise

Service discovery failed due to stale DNS cache entries that were not updated when services changed IPs.

Click to view details
Networking
#185
Networking
Network Partition Between Node Pools in Multi-Zone Cluster
K8s v1.18, GKE

Pods in different node pools located in different zones experienced network partitioning due to a misconfigured regional load balancer.

Click to view details
Networking
#186
Networking
Pod Network Isolation Failure Due to Missing NetworkPolicy
K8s v1.21, AKS

Pods that were intended to be isolated from each other could communicate freely due to a missing NetworkPolicy.

Click to view details
Networking
#187
Networking
Flapping Node Network Connectivity Due to MTU Mismatch
K8s v1.20, On-Premise

Nodes in the cluster were flapping due to mismatched MTU settings between Kubernetes and the underlying physical network, causing intermittent network connectivity issues.

Click to view details
Networking
#188
Networking
DNS Query Timeout Due to Unoptimized CoreDNS Config
K8s v1.18, GKE

DNS queries were timing out in the cluster, causing delays in service discovery, due to unoptimized CoreDNS configuration.

Click to view details
Networking
#189
Networking
Traffic Splitting Failure Due to Incorrect Service LoadBalancer Configuration
K8s v1.22, AWS EKS

Traffic splitting between two microservices failed due to a misconfiguration in the Service LoadBalancer.

Click to view details
Networking
#190
Networking
Network Latency Between Pods in Different Regions
K8s v1.19, Azure AKS

Pods in different Azure regions experienced high network latency, affecting application performance.

Click to view details
Networking
#191
Networking
Port Collision Between Services Due to Missing Port Ranges
K8s v1.21, AKS

Two services attempted to bind to the same port, causing a port collision and service failures.

Click to view details
Networking
#192
Networking
Pod-to-External Service Connectivity Failures Due to Egress Network Policy
K8s v1.20, AWS EKS

Pods failed to connect to an external service due to an overly restrictive egress network policy.

Click to view details
Networking
#193
Networking
Pod Connectivity Loss After Network Plugin Upgrade
K8s v1.18, GKE

Pods lost connectivity after an upgrade of the Calico network plugin due to misconfigured IP pool settings.

Click to view details
Networking
#194
Networking
External DNS Not Resolving After Cluster Network Changes
K8s v1.19, DigitalOcean

External DNS resolution stopped working after changes were made to the cluster network configuration.

Click to view details
Networking
#195
Networking
Slow Pod Communication Due to Misconfigured MTU in Network Plugin
K8s v1.22, On-premise

Pod-to-pod communication was slow due to an incorrect MTU setting in the network plugin.

Click to view details
Networking
#196
Networking
High CPU Usage in Nodes Due to Overloaded Network Plugin
K8s v1.22, AWS EKS

Nodes experienced high CPU usage due to an overloaded network plugin that couldn’t handle traffic spikes effectively.

Click to view details
Networking
#197
Networking
Cross-Namespace Network Isolation Not Enforced
K8s v1.19, OpenShift

Network isolation between namespaces failed due to an incorrectly applied NetworkPolicy.

Click to view details
Networking
#198
Networking
Inconsistent Service Discovery Due to CoreDNS Misconfiguration
K8s v1.20, GKE

Service discovery was inconsistent due to misconfigured CoreDNS settings.

Click to view details
Networking
#199
Networking
Network Segmentation Issues Due to Misconfigured CNI
K8s v1.18, IBM Cloud

Network segmentation between clusters failed due to incorrect CNI (Container Network Interface) plugin configuration.

Click to view details
Networking
#200
Networking
DNS Cache Poisoning in CoreDNS
K8s v1.23, DigitalOcean

DNS cache poisoning occurred in CoreDNS, leading to incorrect IP resolution for services.

Click to view details
Networking
#201
Security
Unauthorized Access to Secrets Due to Incorrect RBAC Permissions
K8s v1.22, GKE

Unauthorized users were able to access Kubernetes secrets due to overly permissive RBAC roles.

Click to view details
Security
#202
Security
Insecure Network Policies Leading to Pod Exposure
K8s v1.19, AWS EKS

Pods intended to be isolated were exposed to unauthorized traffic due to misconfigured network policies.

Click to view details
Security
#203
Security
Privileged Container Vulnerability Due to Incorrect Security Context
K8s v1.21, Azure AKS

A container running with elevated privileges due to an incorrect security context exposed the cluster to potential privilege escalation attacks.

Click to view details
Security
#204
Security
Exposed Kubernetes Dashboard Due to Misconfigured Ingress
K8s v1.20, GKE

The Kubernetes dashboard was exposed to the public internet due to a misconfigured Ingress resource.

Click to view details
Security
#205
Security
Unencrypted Communication Between Pods Due to Missing TLS Configuration
K8s v1.18, On-Premise

Communication between microservices in the cluster was not encrypted due to missing TLS configuration, exposing data to potential interception.

Click to view details
Security
#206
Security
Sensitive Data in Logs Due to Improper Log Sanitization
K8s v1.23, Azure AKS

Sensitive data, such as API keys and passwords, was logged due to improper sanitization in application logs.

Click to view details
Security
#207
Security
Insufficient Pod Security Policies Leading to Privilege Escalation
K8s v1.21, GKE

Privilege escalation was possible due to insufficiently restrictive PodSecurityPolicies (PSPs).

Click to view details
Security
#208
Security
Service Account Token Compromise
K8s v1.22, DigitalOcean

A compromised service account token was used to gain unauthorized access to the cluster's API server.

Click to view details
Security
#209
Security
Lack of Regular Vulnerability Scanning in Container Images
K8s v1.19, On-Premise

The container images used in the cluster were not regularly scanned for vulnerabilities, leading to deployment of vulnerable images.

Click to view details
Security
#210
Security
Insufficient Container Image Signing Leading to Unverified Deployments
K8s v1.20, Google Cloud

Unverified container images were deployed due to the lack of image signing, exposing the cluster to potential malicious code.

Click to view details
Security
#211
Security
Insecure Default Namespace Leading to Unauthorized Access
K8s v1.22, AWS EKS

Unauthorized users gained access to resources in the default namespace due to lack of namespace isolation.

Click to view details
Security
#212
Security
Vulnerable OpenSSL Version in Container Images
K8s v1.21, DigitalOcean

A container image was using an outdated and vulnerable version of OpenSSL, exposing the cluster to known security vulnerabilities.

Click to view details
Security
#213
Security
Misconfigured API Server Authentication Allowing External Access
K8s v1.20, GKE

API server authentication was misconfigured, allowing external unauthenticated users to access the Kubernetes API.

Click to view details
Security
#214
Security
Insufficient Node Security Due to Lack of OS Hardening
K8s v1.22, Azure AKS

Nodes in the cluster were insecure due to a lack of proper OS hardening, making them vulnerable to attacks.

Click to view details
Security
#215
Security
Unrestricted Ingress Access to Sensitive Resources
K8s v1.21, GKE

Sensitive services were exposed to the public internet due to unrestricted ingress rules.

Click to view details
Security
#216
Security
Exposure of Sensitive Data in Container Environment Variables
K8s v1.19, AWS EKS

Sensitive data, such as database credentials, was exposed through environment variables in container configurations.

Click to view details
Security
#217
Security
Inadequate Container Resource Limits Leading to DoS Attacks
K8s v1.20, On-Premise

A lack of resource limits on containers allowed a denial-of-service (DoS) attack to disrupt services by consuming excessive CPU and memory.

Click to view details
Security
#218
Security
Exposure of Container Logs Due to Insufficient Log Management
K8s v1.21, Google Cloud

Container logs were exposed to unauthorized users due to insufficient log management controls.

Click to view details
Security
#219
Security
Using Insecure Docker Registry for Container Images
K8s v1.18, On-Premise

The cluster was pulling container images from an insecure, untrusted Docker registry, exposing the system to the risk of malicious images.

Click to view details
Security
#220
Security
Weak Pod Security Policies Leading to Privileged Containers
K8s v1.19, AWS EKS

Privileged containers were deployed due to weak or missing Pod Security Policies (PSPs), exposing the cluster to security risks.

Click to view details
Security
#221
Security
Unsecured Kubernetes Dashboard
K8s v1.21, GKE

The Kubernetes Dashboard was exposed to the public internet without proper authentication or access controls, allowing unauthorized users to access sensitive cluster information.

Click to view details
Security
#222
Security
Using HTTP Instead of HTTPS for Ingress Resources
K8s v1.22, Google Cloud

Sensitive applications were exposed using HTTP instead of HTTPS, leaving communication vulnerable to eavesdropping and man-in-the-middle attacks.

Click to view details
Security
#223
Security
Insecure Network Policies Exposing Internal Services
K8s v1.20, On-Premise

Network policies were too permissive, exposing internal services to unnecessary access, increasing the risk of lateral movement within the cluster.

Click to view details
Security
#224
Security
Exposing Sensitive Secrets in Environment Variables
K8s v1.21, AWS EKS

Sensitive credentials were stored in environment variables within the pod specification, exposing them to potential attackers.

Click to view details
Security
#225
Security
Insufficient RBAC Permissions Leading to Unauthorized Access
K8s v1.20, On-Premise

Insufficient Role-Based Access Control (RBAC) configurations allowed unauthorized users to access and modify sensitive resources within the cluster.

Click to view details
Security
#226
Security
Insecure Ingress Controller Exposed to the Internet
K8s v1.22, Google Cloud

An insecure ingress controller was exposed to the internet, allowing attackers to exploit vulnerabilities in the controller.

Click to view details
Security
#227
Security
Lack of Security Updates in Container Images
K8s v1.19, DigitalOcean

The cluster was running outdated container images without the latest security patches, exposing it to known vulnerabilities.

Click to view details
Security
#228
Security
Exposed Kubelet API Without Authentication
K8s v1.21, AWS EKS

The Kubelet API was exposed without proper authentication or authorization, allowing external users to query cluster node details.

Click to view details
Security
#229
Security
Inadequate Logging of Sensitive Events
K8s v1.22, Google Cloud

Sensitive security events were not logged, preventing detection of potential security breaches or misconfigurations.

Click to view details
Security
#230
Security
Misconfigured RBAC Allowing Cluster Admin Privileges to Developers
K8s v1.19, On-Premise

Developers were mistakenly granted cluster admin privileges due to misconfigured RBAC roles, which gave them the ability to modify sensitive resources.

Click to view details
Security
#231
Security
Insufficiently Secured Service Account Permissions
K8s v1.20, AWS EKS

Service accounts were granted excessive permissions, giving pods access to resources they did not require, leading to a potential security risk.

Click to view details
Security
#232
Security
Cluster Secrets Exposed Due to Insecure Mounting
K8s v1.21, On-Premise

Kubernetes secrets were mounted into pods insecurely, exposing sensitive information to unauthorized users.

Click to view details
Security
#233
Security
Improperly Configured API Server Authorization
K8s v1.22, Azure AKS

The Kubernetes API server was improperly configured, allowing unauthorized users to make API calls without proper authorization.

Click to view details
Security
#234
Security
Compromised Image Registry Access Credentials
K8s v1.19, On-Premise

The image registry access credentials were compromised, allowing attackers to pull and run malicious images in the cluster.

Click to view details
Security
#235
Security
Insufficiently Secured Cluster API Server Access
K8s v1.23, Google Cloud

The API server was exposed with insufficient security, allowing unauthorized external access and increasing the risk of exploitation.

Click to view details
Security
#236
Security
Misconfigured Admission Controllers Allowing Insecure Resources
K8s v1.21, AWS EKS

Admission controllers were misconfigured, allowing the creation of insecure or non-compliant resources.

Click to view details
Security
#237
Security
Lack of Security Auditing and Monitoring in Cluster
K8s v1.22, DigitalOcean

The lack of proper auditing and monitoring allowed security events to go undetected, resulting in delayed response to potential security threats.

Click to view details
Security
#238
Security
Exposed Internal Services Due to Misconfigured Load Balancer
K8s v1.19, On-Premise

Internal services were inadvertently exposed to the public due to incorrect load balancer configurations, leading to potential security risks.

Click to view details
Security
#239
Security
Kubernetes Secrets Accessed via Insecure Network
K8s v1.20, GKE

Kubernetes secrets were accessed via an insecure network connection, exposing sensitive information to unauthorized parties.

Click to view details
Security
#240
Security
Pod Security Policies Not Enforced
K8s v1.21, On-Premise

Pod security policies were not enforced, allowing the deployment of pods with unsafe configurations, such as privileged access and host network use.

Click to view details
Security
#241
Security
Unpatched Vulnerabilities in Cluster Nodes
K8s v1.22, Azure AKS

Cluster nodes were not regularly patched, exposing known vulnerabilities that were later exploited by attackers.

Click to view details
Security
#242
Security
Weak Network Policies Allowing Unrestricted Traffic
K8s v1.18, On-Premise

Network policies were not properly configured, allowing unrestricted traffic between pods, which led to lateral movement by attackers after a pod was compromised.

Click to view details
Security
#243
Security
Exposed Dashboard Without Authentication
K8s v1.19, GKE

Kubernetes dashboard was exposed to the internet without authentication, allowing unauthorized users to access cluster information and potentially take control.

Click to view details
Security
#244
Security
Use of Insecure Container Images
K8s v1.20, AWS EKS

Insecure container images were used in production, leading to the deployment of containers with known vulnerabilities.

Click to view details
Security
#245
Security
Misconfigured TLS Certificates
K8s v1.23, Azure AKS

Misconfigured TLS certificates led to insecure communication between Kubernetes components, exposing the cluster to potential attacks.

Click to view details
Security
#246
Security
Excessive Privileges for Service Accounts
K8s v1.22, Google Cloud

Service accounts were granted excessive privileges, allowing them to perform operations outside their intended scope, increasing the risk of compromise.

Click to view details
Security
#247
Security
Exposure of Sensitive Logs Due to Misconfigured Logging Setup
K8s v1.21, DigitalOcean

Sensitive logs, such as those containing authentication tokens and private keys, were exposed due to a misconfigured logging setup.

Click to view details
Security
#248
Security
Use of Deprecated APIs with Known Vulnerabilities
K8s v1.19, AWS EKS

The cluster was using deprecated Kubernetes APIs that contained known security vulnerabilities, which were exploited by attackers.

Click to view details
Security
#249
Security
Lack of Security Context in Pod Specifications
K8s v1.22, Google Cloud

Pods were deployed without defining appropriate security contexts, resulting in privileged containers and access to host resources.

Click to view details
Security
#250
Security
Compromised Container Runtime
K8s v1.21, On-Premise

The container runtime (Docker) was compromised, allowing an attacker to gain control over the containers running on the node.

Click to view details
Security
#251
Security
Insufficient RBAC Permissions for Cluster Admin
K8s v1.22, GKE

A cluster administrator was mistakenly granted insufficient RBAC permissions, preventing them from performing essential management tasks.

Click to view details
Security
#252
Security
Insufficient Pod Security Policies Leading to Privilege Escalation
K8s v1.21, AWS EKS

Insufficiently restrictive PodSecurityPolicies (PSPs) allowed the deployment of privileged pods, which were later exploited by attackers.

Click to view details
Security
#253
Security
Exposed Service Account Token in Pod
K8s v1.20, On-Premise

A service account token was mistakenly exposed in a pod, allowing attackers to gain unauthorized access to the Kubernetes API.

Click to view details
Security
#254
Security
Rogue Container Executing Malicious Code
K8s v1.22, Azure AKS

A compromised container running a known exploit executed malicious code that allowed the attacker to gain access to the underlying node.

Click to view details
Security
#255
Security
Overly Permissive Network Policies Allowing Lateral Movement
K8s v1.19, Google Cloud

Network policies were not restrictive enough, allowing compromised pods to move laterally across the cluster and access other services.

Click to view details
Security
#256
Security
Insufficient Encryption for In-Transit Data
K8s v1.23, AWS EKS

Sensitive data was transmitted in plaintext between services, exposing it to potential eavesdropping and data breaches.

Click to view details
Security
#257
Security
Exposing Cluster Services via LoadBalancer with Public IP
K8s v1.21, Google Cloud

A service was exposed to the public internet via a LoadBalancer without proper access control, making it vulnerable to attacks.

Click to view details
Security
#258
Security
Privileged Containers Running Without Seccomp or AppArmor Profiles
K8s v1.20, On-Premise

Privileged containers were running without seccomp or AppArmor profiles, leaving the host vulnerable to attacks.

Click to view details
Security
#259
Security
Malicious Container Image from Untrusted Source
K8s v1.19, Azure AKS

A malicious container image from an untrusted source was deployed, leading to a security breach in the cluster.

Click to view details
Security
#260
Security
Unrestricted Ingress Controller Allowing External Attacks
K8s v1.24, GKE

The ingress controller was misconfigured, allowing external attackers to bypass network security controls and exploit internal services.

Click to view details
Security
#261
Security
Misconfigured Ingress Controller Exposing Internal Services
Kubernetes v1.24, GKE

An Ingress controller was misconfigured, inadvertently exposing internal services to the public internet.

Click to view details
Security
#262
Security
Privileged Containers Without Security Context
Kubernetes v1.22, EKS

Containers were running with elevated privileges without defined security contexts, increasing the risk of host compromise.

Click to view details
Security
#263
Security
Unrestricted Network Policies Allowing Lateral Movement
Kubernetes v1.21, Azure AKS

Lack of restrictive network policies permitted lateral movement within the cluster after a pod compromise.

Click to view details
Security
#264
Security
Exposed Kubernetes Dashboard Without Authentication
Kubernetes v1.20, On-Premise

The Kubernetes Dashboard was exposed without authentication, allowing unauthorized access to cluster resources.

Click to view details
Security
#265
Security
Use of Vulnerable Container Images
Kubernetes v1.23, AWS EKS

Deployment of container images with known vulnerabilities led to potential exploitation risks.

Click to view details
Security
#266
Security
Misconfigured Role-Based Access Control (RBAC)
Kubernetes v1.22, GKE

Overly permissive RBAC configurations granted users more access than necessary, posing security risks.

Click to view details
Security
#267
Security
Insecure Secrets Management
Kubernetes v1.21, On-Premise

Secrets were stored in plaintext within configuration files, leading to potential exposure.

Click to view details
Security
#268
Security
Lack of Audit Logging
Kubernetes v1.24, Azure AKS

Absence of audit logging hindered the ability to detect and investigate security incidents.

Click to view details
Security
#269
Security
Unrestricted Access to etcd
Kubernetes v1.20, On-Premise

The etcd datastore was accessible without authentication, risking exposure of sensitive cluster data.

Click to view details
Security
#270
Security
Absence of Pod Security Policies
Kubernetes v1.23, AWS EKS

Without Pod Security Policies, pods were deployed with insecure configurations, increasing the attack surface.

Click to view details
Security
#271
Security
Service Account Token Mounted in All Pods
Kubernetes v1.23, AKS

All pods had default service account tokens mounted, increasing the risk of credential leakage.

Click to view details
Security
#272
Security
Sensitive Logs Exposed via Centralized Logging
Kubernetes v1.22, EKS with Fluentd

Secrets and passwords were accidentally logged and shipped to a centralized logging service accessible to many teams.

Click to view details
Security
#273
Security
Broken Container Escape Detection
Kubernetes v1.24, GKE

A malicious container escaped to host level due to an unpatched kernel, but went undetected due to insufficient monitoring.

Click to view details
Security
#274
Security
Unauthorized Cloud Metadata API Access
Kubernetes v1.22, AWS

A pod was able to access the EC2 metadata API and retrieve IAM credentials due to open network access.

Click to view details
Security
#275
Security
Admin Kubeconfig Checked into Git
Kubernetes v1.23, On-Prem

A developer accidentally committed a kubeconfig file with full admin access into a public Git repository.

Click to view details
Security
#276
Security
JWT Token Replay Attack in Webhook Auth
Kubernetes v1.21, AKS

Reused JWT tokens from intercepted API requests were used to impersonate authorized users.

Click to view details
Security
#277
Security
Container With Hardcoded SSH Keys
Kubernetes v1.20, On-Prem

A base image included hardcoded SSH keys which allowed attackers lateral access between environments.

Click to view details
Security
#278
Security
Insecure Helm Chart Defaults
Kubernetes v1.24, GKE

A popular Helm chart had insecure defaults, like exposing dashboards or running as root.

Click to view details
Security
#279
Security
Shared Cluster with Overlapping Namespaces
Kubernetes v1.22, Shared Dev Cluster

Multiple teams used the same namespace naming conventions, causing RBAC overlaps and security concerns.

Click to view details
Security
#280
Security
CVE Ignored in Base Image for Months
Kubernetes v1.23, AWS

A known CVE affecting the base image used by multiple services remained unpatched due to no alerting.

Click to view details
Security
#281
Security
Misconfigured PodSecurityPolicy Allowed Privileged Containers
Kubernetes v1.21, On-Prem Cluster

Pods were running with privileged: true due to a permissive PodSecurityPolicy (PSP) left enabled during testing.

Click to view details
Security
#282
Security
GitLab Runners Spawning Privileged Containers
Kubernetes v1.23, GitLab CI on EKS

GitLab runners were configured to run privileged containers to support Docker-in-Docker (DinD), leading to a high-risk setup.

Click to view details
Security
#283
Security
Kubernetes Secrets Mounted in World-Readable Volumes
Kubernetes v1.24, GKE

Secret volumes were mounted with 0644 permissions, allowing any user process inside the container to read them.

Click to view details
Security
#284
Security
Kubelet Port Exposed on Public Interface
Kubernetes v1.20, Bare Metal

Kubelet was accidentally exposed on port 10250 to the public internet, allowing unauthenticated metrics and logs access.

Click to view details
Security
#285
Security
Cluster Admin Bound to All Authenticated Users
Kubernetes v1.21, AKS

A ClusterRoleBinding accidentally granted cluster-admin to all authenticated users due to system:authenticated group.

Click to view details
Security
#286
Security
Webhook Authentication Timing Out, Causing Denial of Service
Kubernetes v1.22, EKS

Authentication webhook for custom RBAC timed out under load, rejecting valid users and causing cluster-wide issues.

Click to view details
Security
#287
Security
CSI Driver Exposing Node Secrets
Kubernetes v1.24, CSI Plugin (AWS Secrets Store)

Misconfigured CSI driver exposed secrets on hostPath mount accessible to privileged pods.

Click to view details
Security
#288
Security
EphemeralContainers Used for Reconnaissance
Kubernetes v1.25, GKE

A compromised user deployed ephemeral containers to inspect and copy secrets from running pods.

Click to view details
Security
#289
Security
hostAliases Used for Spoofing Internal Services
Kubernetes v1.22, On-Prem

Malicious pod used hostAliases to spoof internal service hostnames and intercept requests.

Click to view details
Security
#290
Security
Privilege Escalation via Unchecked securityContext in Helm Chart
Kubernetes v1.21, Helm v3.8

A third-party Helm chart allowed setting arbitrary securityContext, letting users run pods as root in production.

Click to view details
Security
#291
Security
Service Account Token Leakage via Logs
Kubernetes v1.23, AKS

Application inadvertently logged its mounted service account token, exposing it to log aggregation systems.

Click to view details
Security
#292
Security
Escalation via Editable Validating WebhookConfiguration
Kubernetes v1.24, EKS

User with edit rights on a validating webhook modified it to bypass critical security policies.

Click to view details
Security
#293
Security
Stale Node Certificates After Rejoining Cluster
Kubernetes v1.21, Kubeadm-based cluster

A node was rejoined to the cluster using a stale certificate, giving it access it shouldn't have.

Click to view details
Security
#294
Security
ArgoCD Exploit via Unverified Helm Charts
Kubernetes v1.24, ArgoCD

ArgoCD deployed a malicious Helm chart that added privileged pods and container escape backdoors.

Click to view details
Security
#295
Security
Node Compromise via Insecure Container Runtime
Kubernetes v1.22, CRI-O on Bare Metal

A CVE in the container runtime allowed a container breakout, leading to full node compromise.

Click to view details
Security
#296
Security
Workload with Wildcard RBAC Access to All Secrets
Kubernetes v1.23, Self-Hosted

A microservice was granted get and list access to all secrets cluster-wide using *.

Click to view details
Security
#297
Security
Malicious Init Container Used for Reconnaissance
Kubernetes v1.25, GKE

A pod was launched with a benign main container and a malicious init container that copied node metadata.

Click to view details
Security
#298
Security
Ingress Controller Exposed /metrics Without Auth
Kubernetes v1.24, NGINX Ingress

Prometheus scraping endpoint /metrics was exposed without authentication and revealed sensitive internal details.

Click to view details
Security
#299
Security
Secret Stored in ConfigMap by Mistake
Kubernetes v1.23, AKS

A sensitive API key was accidentally stored in a ConfigMap instead of a Secret, making it visible in plain text.

Click to view details
Security
#300
Security
Token Reuse After Namespace Deletion and Recreation
Kubernetes v1.24, Self-Hosted

A previously deleted namespace was recreated, and old tokens (from backups) were still valid and worked.

Click to view details
Security
#301
Storage
PVC Stuck in Terminating State After Node Crash
Kubernetes v1.22, EBS CSI Driver on EKS

A node crash caused a PersistentVolumeClaim (PVC) to be stuck in Terminating, blocking pod deletion.

Click to view details
Storage
#302
Storage
Data Corruption on HostPath Volumes
Kubernetes v1.20, Bare Metal

Multiple pods sharing a HostPath volume led to inconsistent file states and eventual corruption.

Click to view details
Storage
#303
Storage
Volume Mount Fails Due to Node Affinity Mismatch
Kubernetes v1.23, GCE PD on GKE

A pod was scheduled on a node that couldn’t access the persistent disk due to zone mismatch.

Click to view details
Storage
#304
Storage
PVC Not Rescheduled After Node Deletion
Kubernetes v1.21, Azure Disk CSI

A StatefulSet pod failed to reschedule after its node was deleted, due to Azure disk still being attached.

Click to view details
Storage
#305
Storage
Long PVC Rebinding Time on StatefulSet Restart
Kubernetes v1.24, Rook Ceph

Restarting a StatefulSet with many PVCs caused long downtime due to slow rebinding.

Click to view details
Storage
#306
Storage
CSI Volume Plugin Crash Loops Due to Secret Rotation
Kubernetes v1.25, Vault CSI Provider

Volume plugin entered crash loop after secret provider’s token was rotated unexpectedly.

Click to view details
Storage
#307
Storage
ReadWriteMany PVCs Cause IO Bottlenecks
Kubernetes v1.23, NFS-backed PVCs

Heavy read/write on a shared PVC caused file IO contention and throttling across pods.

Click to view details
Storage
#308
Storage
PVC Mount Timeout Due to PodSecurityPolicy
Kubernetes v1.21, PSP Enabled Cluster

A pod couldn’t mount a volume because PodSecurityPolicy (PSP) rejected required fsGroup.

Click to view details
Storage
#309
Storage
Orphaned PVs After Namespace Deletion
Kubernetes v1.20, Self-Hosted

Deleting a namespace did not clean up PersistentVolumes, leading to leaked storage.

Click to view details
Storage
#310
Storage
StorageClass Misconfiguration Blocks Dynamic Provisioning
Kubernetes v1.25, GKE

New PVCs failed to bind due to a broken default StorageClass with incorrect parameters.

Click to view details
Storage
#311
Storage
StatefulSet Volume Cloning Results in Data Leakage
Kubernetes v1.24, CSI Volume Cloning enabled

Cloning PVCs between StatefulSet pods led to shared data unexpectedly appearing in new replicas.

Click to view details
Storage
#312
Storage
Volume Resize Not Reflected in Mounted Filesystem
Kubernetes v1.22, OpenEBS

Volume expansion was successful on the PV, but pods didn’t see the increased space.

Click to view details
Storage
#313
Storage
CSI Controller Pod Crash Due to Log Overflow
Kubernetes v1.23, Longhorn

The CSI controller crashed repeatedly due to unbounded logging filling up ephemeral storage.

Click to view details
Storage
#314
Storage
PVs Stuck in Released Due to Missing Finalizer Removal
Kubernetes v1.21, NFS

PVCs were deleted, but PVs remained stuck in Released, preventing reuse.

Click to view details
Storage
#315
Storage
CSI Driver DaemonSet Deployment Missing Tolerations for Taints
Kubernetes v1.25, Bare Metal

CSI Node plugin DaemonSet didn’t deploy on all nodes due to missing taint tolerations.

Click to view details
Storage
#316
Storage
Mount Propagation Issues with Sidecar Containers
Kubernetes v1.22, GKE

Sidecar containers didn’t see mounted volumes due to incorrect mountPropagation settings.

Click to view details
Storage
#317
Storage
File Permissions Reset on Pod Restart
Kubernetes v1.20, CephFS

Pod volume permissions reset after each restart, breaking application logic.

Click to view details
Storage
#318
Storage
Volume Mount Succeeds but Application Can't Write
Kubernetes v1.23, EBS

Volume mounted correctly, but application failed to write due to filesystem mismatch.

Click to view details
Storage
#319
Storage
Volume Snapshot Restore Includes Corrupt Data
Kubernetes v1.24, Velero + CSI Snapshots

Snapshot-based restore brought back corrupted state due to hot snapshot timing.

Click to view details
Storage
#320
Storage
Zombie Volumes Occupying Cloud Quota
Kubernetes v1.25, AWS EBS

Deleted PVCs didn’t release volumes due to failed detach steps, leading to quota exhaustion.

Click to view details
Storage
#321
Storage
Volume Snapshot Garbage Collection Fails
Kubernetes v1.25, CSI Snapshotter with Velero

Volume snapshots piled up because snapshot objects were not getting garbage collected after use.

Click to view details
Storage
#322
Storage
Volume Mount Delays Due to Node Drain Stale Attachment
Kubernetes v1.23, AWS EBS CSI

Volumes took too long to attach on new nodes after pod rescheduling due to stale attachment metadata.

Click to view details
Storage
#323
Storage
Application Writes Lost After Node Reboot
Kubernetes v1.21, Local Persistent Volumes

After a node reboot, pod restarted, but wrote to a different volume path, resulting in apparent data loss.

Click to view details
Storage
#324
Storage
Pod CrashLoop Due to Read-Only Volume Remount
Kubernetes v1.22, GCP Filestore

Pod volume was remounted as read-only after a transient network disconnect, breaking app write logic.

Click to view details
Storage
#325
Storage
Data Corruption on Shared Volume With Two Pods
Kubernetes v1.23, NFS PVC shared by 2 pods

Two pods writing to the same volume caused inconsistent files and data loss.

Click to view details
Storage
#326
Storage
Mount Volume Exceeded Timeout
Kubernetes v1.26, Azure Disk CSI

Pod remained stuck in ContainerCreating state because volume mount operations timed out.

Click to view details
Storage
#327
Storage
Static PV Bound to Wrong PVC
Kubernetes v1.21, Manually created PVs

A misconfigured static PV got bound to the wrong PVC, exposing sensitive data.

Click to view details
Storage
#328
Storage
Pod Eviction Due to DiskPressure Despite PVC
Kubernetes v1.22, Local PVs

Node evicted pods due to DiskPressure, even though app used dedicated PVC backed by a separate disk.

Click to view details
Storage
#329
Storage
Pod Gets Stuck Due to Ghost Mount Point
Kubernetes v1.20, iSCSI volumes

Pod failed to start because the mount point was partially deleted, leaving the system confused.

Click to view details
Storage
#330
Storage
PVC Resize Broke StatefulSet Ordering
Kubernetes v1.24, StatefulSets + RWO PVCs

When resizing PVCs, StatefulSet pods restarted in parallel, violating ordinal guarantees.

Click to view details
Storage
#331
Storage
ReadAfterWrite Inconsistency on Object Store-Backed CSI
Kubernetes v1.26, MinIO CSI driver, Ceph RGW backend

Applications experienced stale reads immediately after writing to the same file via CSI mount backed by an S3-like object store.

Click to view details
Storage
#332
Storage
PV Resize Fails After Node Reboot
Kubernetes v1.24, AWS EBS

After a node reboot, a PVC resize request remained pending, blocking pod start.

Click to view details
Storage
#333
Storage
CSI Driver Crash Loops on VolumeAttach
Kubernetes v1.22, OpenEBS Jiva CSI

CSI node plugin entered CrashLoopBackOff due to panic during volume attach, halting all storage provisioning.

Click to view details
Storage
#334
Storage
PVC Binding Fails Due to Multiple Default StorageClasses
Kubernetes v1.23

PVC creation failed intermittently because the cluster had two storage classes marked as default.

Click to view details
Storage
#335
Storage
Zombie VolumeAttachment Blocks New PVC
Kubernetes v1.21, Longhorn

After a node crash, a VolumeAttachment object was not garbage collected, blocking new PVCs from attaching.

Click to view details
Storage
#336
Storage
Persistent Volume Bound But Not Mounted
Kubernetes v1.25, NFS

Pod entered Running state, but data was missing because PV was bound but not properly mounted.

Click to view details
Storage
#337
Storage
CSI Snapshot Restore Overwrites Active Data
Kubernetes v1.26, CSI snapshots (v1beta1)

User triggered a snapshot restore to an existing PVC, unintentionally overwriting live data.

Click to view details
Storage
#338
Storage
Incomplete Volume Detach Breaks Node Scheduling
Kubernetes v1.22, iSCSI

Scheduler skipped a healthy node due to a ghost VolumeAttachment that was never cleaned up.

Click to view details
Storage
#339
Storage
App Breaks Due to Missing SubPath After Volume Expansion
Kubernetes v1.24, PVC with subPath

After PVC expansion, the mount inside pod pointed to root of volume, not the expected subPath.

Click to view details
Storage
#340
Storage
Backup Restore Process Created Orphaned PVCs
Kubernetes v1.23, Velero

A namespace restore from backup recreated PVCs that had no matching PVs, blocking further deployment.

Click to view details
Storage
#341
Storage
Cross-Zone Volume Binding Fails with StatefulSet
Kubernetes v1.25, AWS EBS, StatefulSet with anti-affinity

Pods in a StatefulSet failed to start due to volume binding constraints when spread across zones.

Click to view details
Storage
#342
Storage
Volume Snapshot Controller Race Condition
Kubernetes v1.23, CSI Snapshot Controller

Rapid creation/deletion of snapshots caused the controller to panic due to race conditions in snapshot finalizers.

Click to view details
Storage
#343
Storage
Failed Volume Resize Blocks Rollout
Kubernetes v1.24, CSI VolumeExpansion enabled

Deployment rollout got stuck because one of the pods couldn’t start due to a failed volume expansion.

Click to view details
Storage
#344
Storage
Application Data Lost After Node Eviction
Kubernetes v1.23, hostPath volumes

Node drained for maintenance led to permanent data loss for apps using hostPath volumes.

Click to view details
Storage
#345
Storage
Read-Only PV Caused Write Failures After Restore
Kubernetes v1.22, Velero, AWS EBS

After restoring from backup, the volume was attached as read-only, causing application crashes.

Click to view details
Storage
#346
Storage
NFS Server Restart Crashes Pods
Kubernetes v1.24, in-cluster NFS server

NFS server restarted for upgrade. All dependent pods crashed due to stale file handles and unmount errors.

Click to view details
Storage
#347
Storage
VolumeBindingBlocked Condition Causes Pod Scheduling Delay
Kubernetes v1.25, dynamic provisioning

Scheduler skipped over pods with pending PVCs due to VolumeBindingBlocked status, even though volumes were eventually created.

Click to view details
Storage
#348
Storage
Data Corruption from Overprovisioned Thin Volumes
Kubernetes v1.22, LVM-CSI thin provisioning

Under heavy load, pods reported data corruption. Storage layer had thinly provisioned LVM volumes that overcommitted disk.

Click to view details
Storage
#349
Storage
VolumeProvisioningFailure on GKE Due to IAM Misconfiguration
GKE, Workload Identity enabled

CSI driver failed to provision new volumes due to missing IAM permissions, even though StorageClass was valid.

Click to view details
Storage
#350
Storage
Node Crash Triggers Volume Remount Loop
Kubernetes v1.26, CSI, NVMes

After a node crash, volume remount loop occurred due to conflicting device paths.

Click to view details
Storage
#351
Storage
VolumeMount Conflict Between Init and Main Containers
Kubernetes v1.25, containerized database restore job

Init container and main container used the same volume path but with different modes, causing the main container to crash.

Click to view details
Storage
#352
Storage
PVCs Stuck in “Terminating” Due to Finalizers
Kubernetes v1.24, CSI driver with finalizer

After deleting PVCs, they remained in Terminating state indefinitely due to stuck finalizers.

Click to view details
Storage
#353
Storage
Misconfigured ReadOnlyMany Mount Blocks Write Operations
Kubernetes v1.23, NFS volume

Volume mounted as ReadOnlyMany blocked necessary write operations, despite NFS server allowing writes.

Click to view details
Storage
#354
Storage
In-Tree Plugin PVs Lost After Driver Migration
Kubernetes v1.26, in-tree to CSI migration

Existing in-tree volumes became unrecognized after enabling CSI migration.

Click to view details
Storage
#355
Storage
Pod Deleted but Volume Still Mounted on Node
Kubernetes v1.24, CSI

Pod was force-deleted, but its volume wasn’t unmounted from the node, blocking future pod scheduling.

Click to view details
Storage
#356
Storage
Ceph RBD Volume Crashes Pods Under IOPS Saturation
Kubernetes v1.23, Ceph CSI

Under heavy I/O, Ceph volumes became unresponsive, leading to kernel-level I/O errors in pods.

Click to view details
Storage
#357
Storage
ReplicaSet Using PVCs Fails Due to VolumeClaimTemplate Misuse
Kubernetes v1.25

Developer tried using volumeClaimTemplates in a ReplicaSet manifest, which isn’t supported.

Click to view details
Storage
#358
Storage
Filesystem Type Mismatch During Volume Attach
Kubernetes v1.24, ext4 vs xfs

A pod failed to start because the PV expected ext4 but the node formatted it as xfs.

Click to view details
Storage
#359
Storage
iSCSI Volumes Fail After Node Kernel Upgrade
Kubernetes v1.26, CSI iSCSI plugin

Post-upgrade, all pods using iSCSI volumes failed to mount due to kernel module incompatibility.

Click to view details
Storage
#360
Storage
PVs Not Deleted After PVC Cleanup Due to Retain Policy
Kubernetes v1.23, AWS EBS

After PVCs were deleted, underlying PVs and disks remained, leading to cloud resource sprawl.

Click to view details
Storage
#361
Storage
Concurrent Pod Scheduling on the Same PVC Causes Mount Conflict
Kubernetes v1.24, AWS EBS, ReadWriteOnce PVC

Two pods attempted to use the same PVC simultaneously, causing one pod to be stuck in ContainerCreating.

Click to view details
Storage
#362
Storage
StatefulSet Pod Replacement Fails Due to PVC Retention
Kubernetes v1.23, StatefulSet with volumeClaimTemplates

Deleted a StatefulSet pod manually, but new pod failed due to existing PVC conflict.

Click to view details
Storage
#363
Storage
HostPath Volume Access Leaks Host Data into Container
Kubernetes v1.22, single-node dev cluster

HostPath volume mounted the wrong directory, exposing sensitive host data to the container.

Click to view details
Storage
#364
Storage
CSI Driver Crashes When Node Resource Is Deleted Prematurely
Kubernetes v1.25, custom CSI driver

Deleting a node object before the CSI driver detached volumes caused crash loops.

Click to view details
Storage
#365
Storage
Retained PV Blocks New Claim Binding with Identical Name
Kubernetes v1.21, NFS

A PV stuck in Released state with Retain policy blocked new PVCs from binding with the same name.

Click to view details
Storage
#366
Storage
CSI Plugin Panic on Missing Mount Option
Kubernetes v1.26, custom CSI plugin

Missing mountOptions in StorageClass led to runtime nil pointer exception in CSI driver.

Click to view details
Storage
#367
Storage
Pod Fails to Mount Volume Due to SELinux Context Mismatch
Kubernetes v1.24, RHEL with SELinux enforcing

Pod failed to mount volume due to denied SELinux permissions.

Click to view details
Storage
#368
Storage
VolumeExpansion on Bound PVC Fails Due to Pod Running
Kubernetes v1.25, GCP PD

PVC resize operation failed because the pod using it was still running.

Click to view details
Storage
#369
Storage
CSI Driver Memory Leak on Volume Detach Loop
Kubernetes v1.24, external CSI

CSI plugin leaked memory due to improper garbage collection on detach failure loop.

Click to view details
Storage
#370
Storage
Volume Mount Timeout Due to Slow Cloud API
Kubernetes v1.23, Azure Disk CSI

During a cloud outage, Azure Disk operations timed out, blocking pod mounts.

Click to view details
Storage
#371
Storage
Volume Snapshot Restore Misses Application Consistency
Kubernetes v1.26, Velero with CSI VolumeSnapshot

Snapshot restore completed successfully, but restored app data was corrupt.

Click to view details
Storage
#372
Storage
File Locking Issue Between Multiple Pods on NFS
Kubernetes v1.22, NFS with ReadWriteMany

Two pods wrote to the same file concurrently, causing lock conflicts and data loss.

Click to view details
Storage
#373
Storage
Pod Reboots Erase Data on EmptyDir Volume
Kubernetes v1.24, default EmptyDir

Pod restarts caused in-memory volume to be wiped, resulting in lost logs.

Click to view details
Storage
#374
Storage
PVC Resize Fails on In-Use Block Device
Kubernetes v1.25, CSI with block mode

PVC expansion failed for a block device while pod was still running.

Click to view details
Storage
#375
Storage
Default StorageClass Prevents PVC Binding to Custom Class
Kubernetes v1.23, GKE

A PVC remained in Pending because the default StorageClass kept getting assigned instead of a custom one.

Click to view details
Storage
#376
Storage
Ceph RBD Volume Mount Failure Due to Kernel Mismatch
Kubernetes v1.21, Rook-Ceph

Mounting Ceph RBD volume failed after a node kernel upgrade.

Click to view details
Storage
#377
Storage
CSI Volume Cleanup Delay Leaves Orphaned Devices
Kubernetes v1.24, Azure Disk CSI

Volume deletion left orphaned devices on the node, consuming disk space.

Click to view details
Storage
#378
Storage
Immutable ConfigMap Used in CSI Sidecar Volume Mount
Kubernetes v1.23, EKS

CSI sidecar depended on a ConfigMap that was updated, but volume behavior didn’t change.

Click to view details
Storage
#379
Storage
PodMount Denied Due to SecurityContext Constraints
Kubernetes v1.25, OpenShift with SCCs

Pod failed to mount PVC due to restricted SELinux type in pod’s security context.

Click to view details
Storage
#380
Storage
VolumeProvisioner Race Condition Leads to Duplicated PVC
Kubernetes v1.24, CSI with dynamic provisioning

Simultaneous provisioning requests created duplicate PVs for a single PVC.

Click to view details
Storage
#381
Storage
PVC Bound to Deleted PV After Restore
Kubernetes v1.25, Velero restore with CSI driver

Restored PVC bound to a PV that no longer existed, causing stuck pods.

Click to view details
Storage
#382
Storage
Unexpected Volume Type Defaults to HDD Instead of SSD
Kubernetes v1.24, GKE with dynamic provisioning

Volumes defaulted to HDD even though workloads needed SSD.

Click to view details
Storage
#383
Storage
ReclaimPolicy Retain Caused Resource Leaks
Kubernetes v1.22, bare-metal CSI

Deleting PVCs left behind unused PVs and disks.

Click to view details
Storage
#384
Storage
ReadWriteOnce PVC Mounted by Multiple Pods
Kubernetes v1.23, AWS EBS

Attempt to mount a ReadWriteOnce PVC on two pods in different AZs failed silently.

Click to view details
Storage
#385
Storage
VolumeAttach Race on StatefulSet Rolling Update
Kubernetes v1.26, StatefulSet with CSI driver

Volume attach operations failed during parallel pod updates.

Click to view details
Storage
#386
Storage
CSI Driver CrashLoop Due to Missing Node Labels
Kubernetes v1.24, OpenEBS CSI

CSI sidecars failed to initialize due to missing node topology labels.

Click to view details
Storage
#387
Storage
PVC Deleted While Volume Still Mounted
Kubernetes v1.22, on-prem CSI

PVC deletion didn’t unmount volume due to finalizer stuck on pod.

Click to view details
Storage
#388
Storage
In-Tree Volume Plugin Migration Caused Downtime
Kubernetes v1.25, GKE

GCE PD plugin migration to CSI caused volume mount errors.

Click to view details
Storage
#389
Storage
Overprovisioned Thin Volumes Hit Underlying Limit
Kubernetes v1.24, LVM-based CSI

Thin-provisioned volumes ran out of physical space, affecting all pods.

Click to view details
Storage
#390
Storage
Dynamic Provisioning Failure Due to Quota Exhaustion
Kubernetes v1.26, vSphere CSI

PVCs failed to provision silently due to exhausted storage quota.

Click to view details
Storage
#391
Storage
PVC Resizing Didn’t Expand Filesystem Automatically
Kubernetes v1.24, AWS EBS, ext4 filesystem

PVC was resized but the pod’s filesystem didn’t reflect the new size.

Click to view details
Storage
#392
Storage
StatefulSet Pods Lost Volume Data After Node Reboot
Kubernetes v1.22, local-path-provisioner

Node reboots caused StatefulSet volumes to disappear due to ephemeral local storage.

Click to view details
Storage
#393
Storage
VolumeSnapshots Failed to Restore with Immutable Fields
Kubernetes v1.25, VolumeSnapshot API

Restore operation failed due to immutable PVC spec fields like access mode.

Click to view details
Storage
#394
Storage
GKE Autopilot PVCs Stuck Due to Resource Class Conflict
GKE Autopilot, dynamic PVC provisioning

PVCs remained in Pending state due to missing resource class binding.

Click to view details
Storage
#395
Storage
Cross-Zone Volume Scheduling Failed in Regional Cluster
Kubernetes v1.24, GKE regional cluster

Pods failed to schedule because volumes were provisioned in a different zone than the node.

Click to view details
Storage
#396
Storage
Stuck Finalizers on Deleted PVCs Blocking Namespace Deletion
Kubernetes v1.22, CSI driver

Finalizers on PVCs blocked namespace deletion for hours.

Click to view details
Storage
#397
Storage
CSI Driver Upgrade Corrupted Volume Attachments
Kubernetes v1.23, OpenEBS

CSI driver upgrade introduced a regression causing volume mounts to fail.

Click to view details
Storage
#398
Storage
Stale Volume Handles After Disaster Recovery Cutover
Kubernetes v1.25, Velero restore to DR cluster

Stale volume handles caused new PVCs to fail provisioning.

Click to view details
Storage
#399
Storage
Application Wrote Outside Mounted Path and Lost Data
Kubernetes v1.24, default mountPath

Application wrote logs to /tmp, not mounted volume, causing data loss on pod eviction.

Click to view details
Storage
#400
Storage
Cluster Autoscaler Deleted Nodes with Mounted Volumes
Kubernetes v1.23, AWS EKS with CA

Cluster Autoscaler aggressively removed nodes with attached volumes, causing workload restarts.

Click to view details
Storage
#401
Scaling & Load
HPA Didn't Scale Due to Missing Metrics Server
Kubernetes v1.22, Minikube

Horizontal Pod Autoscaler (HPA) didn’t scale pods as expected.

Click to view details
Scaling & Load
#402
Scaling & Load
CPU Throttling Prevented Effective Autoscaling
Kubernetes v1.24, EKS, Burstable QoS

Application CPU throttled even under low usage, leading to delayed scaling.

Click to view details
Scaling & Load
#403
Scaling & Load
Overprovisioned Pods Starved the Cluster
Kubernetes v1.21, on-prem

Aggressively overprovisioned pod resources led to failed scheduling and throttling.

Click to view details
Scaling & Load
#404
Scaling & Load
HPA and VPA Conflicted, Causing Flapping
Kubernetes v1.25, GKE

HPA scaled replicas based on CPU while VPA changed pod resources dynamically, creating instability.

Click to view details
Scaling & Load
#405
Scaling & Load
Cluster Autoscaler Didn't Scale Due to Pod Affinity Rules
Kubernetes v1.23, AWS EKS

Workloads couldn't be scheduled and CA didn’t scale nodes because affinity rules restricted placement.

Click to view details
Scaling & Load
#406
Scaling & Load
Load Test Crashed Cluster Due to Insufficient Node Quotas
Kubernetes v1.24, AKS

Stress test resulted in API server crash due to unthrottled pod burst.

Click to view details
Scaling & Load
#407
Scaling & Load
Scale-To-Zero Caused Cold Starts and SLA Violations
Kubernetes v1.25, KEDA + Knative

Pods scaled to zero, but requests during cold start breached SLA.

Click to view details
Scaling & Load
#408
Scaling & Load
Misconfigured Readiness Probe Blocked HPA Scaling
Kubernetes v1.24, DigitalOcean

HPA didn’t scale pods because readiness probes failed and metrics were not reported.

Click to view details
Scaling & Load
#409
Scaling & Load
Custom Metrics Adapter Crashed, Breaking Custom HPA
Kubernetes v1.25, Prometheus Adapter

Custom HPA didn’t function after metrics adapter pod crashed silently.

Click to view details
Scaling & Load
#410
Scaling & Load
Application Didn’t Handle Scale-In Gracefully
Kubernetes v1.22, Azure AKS

App lost in-flight requests during scale-down, causing 5xx spikes.

Click to view details
Scaling & Load
#411
Scaling & Load
Cluster Autoscaler Ignored Pod PriorityClasses
Kubernetes v1.25, AWS EKS with PriorityClasses

Low-priority workloads blocked scaling of high-priority ones due to misconfigured Cluster Autoscaler.

Click to view details
Scaling & Load
#412
Scaling & Load
ReplicaSet Misalignment Led to Excessive Scale-Out
Kubernetes v1.23, GKE

A stale ReplicaSet with label mismatches caused duplicate pod scale-out.

Click to view details
Scaling & Load
#413
Scaling & Load
StatefulSet Didn't Scale Due to PodDisruptionBudget
Kubernetes v1.26, AKS

StatefulSet couldn’t scale-in during node pressure due to a restrictive PDB.

Click to view details
Scaling & Load
#414
Scaling & Load
Horizontal Pod Autoscaler Triggered by Wrong Metric
Kubernetes v1.24, DigitalOcean

HPA used memory instead of CPU, causing unnecessary scale-ups.

Click to view details
Scaling & Load
#415
Scaling & Load
Prometheus Scraper Bottlenecked Custom HPA Metrics
Kubernetes v1.25, custom metrics + Prometheus Adapter

Delays in Prometheus scraping caused lag in HPA reactions.

Click to view details
Scaling & Load
#416
Scaling & Load
Kubernetes Downscaled During Rolling Update
Kubernetes v1.23, on-prem

Pods were prematurely scaled down during rolling deployment.

Click to view details
Scaling & Load
#417
Scaling & Load
KEDA Failed to Scale on Kafka Lag Metric
Kubernetes v1.26, KEDA + Kafka

Consumers didn’t scale out despite Kafka topic lag.

Click to view details
Scaling & Load
#418
Scaling & Load
Spike in Load Exceeded Pod Init Time
Kubernetes v1.24, self-hosted

Sudden burst of traffic overwhelmed services due to slow pod boot time.

Click to view details
Scaling & Load
#419
Scaling & Load
Overuse of Liveness Probes Disrupted Load Balance
Kubernetes v1.21, bare metal

Misfiring liveness probes killed healthy pods during load test.

Click to view details
Scaling & Load
#420
Scaling & Load
Scale-In Happened Before Queue Was Drained
Kubernetes v1.26, RabbitMQ + consumers

Consumers scaled in while queue still had unprocessed messages.

Click to view details
Scaling & Load
#421
Scaling & Load
Node Drain Race Condition During Scale Down
Kubernetes v1.23, GKE

Node drain raced with pod termination, causing pod loss.

Click to view details
Scaling & Load
#422
Scaling & Load
HPA Disabled Due to Missing Resource Requests
Kubernetes v1.22, AWS EKS

Horizontal Pod Autoscaler (HPA) failed to trigger because resource requests weren’t set.

Click to view details
Scaling & Load
#423
Scaling & Load
Unexpected Overprovisioning of Pods
Kubernetes v1.24, DigitalOcean

Unnecessary pod scaling due to misconfigured resource limits.

Click to view details
Scaling & Load
#424
Scaling & Load
Autoscaler Failed During StatefulSet Upgrade
Kubernetes v1.25, AKS

Horizontal scaling issues occurred during rolling upgrade of StatefulSet.

Click to view details
Scaling & Load
#425
Scaling & Load
Inadequate Load Distribution in a Multi-AZ Setup
Kubernetes v1.27, AWS EKS

Load balancing wasn’t even across availability zones, leading to inefficient scaling.

Click to view details
Scaling & Load
#426
Scaling & Load
Downscale Too Aggressive During Traffic Dips
Kubernetes v1.22, GCP

Autoscaler scaled down too aggressively during short traffic dips, causing pod churn.

Click to view details
Scaling & Load
#427
Scaling & Load
Insufficient Scaling Under High Ingress Traffic
Kubernetes v1.26, NGINX Ingress Controller

Pod autoscaling didn’t trigger in time to handle high ingress traffic.

Click to view details
Scaling & Load
#428
Scaling & Load
Nginx Ingress Controller Hit Rate Limit on External API
Kubernetes v1.25, AWS EKS

Rate limits were hit on an external API during traffic surge, affecting service scaling.

Click to view details
Scaling & Load
#429
Scaling & Load
Resource Constraints on Node Impacted Pod Scaling
Kubernetes v1.24, on-prem

Pod scaling failed due to resource constraints on nodes during high load.

Click to view details
Scaling & Load
#430
Scaling & Load
Memory Leak in Application Led to Excessive Scaling
Kubernetes v1.23, Azure AKS

A memory leak in the app led to unnecessary scaling, causing resource exhaustion.

Click to view details
Scaling & Load
#431
Scaling & Load
Inconsistent Pod Scaling During Burst Traffic
Kubernetes v1.24, AWS EKS

Pod scaling inconsistently triggered during burst traffic spikes, causing service delays.

Click to view details
Scaling & Load
#432
Scaling & Load
Auto-Scaling Hit Limits with StatefulSet
Kubernetes v1.22, GCP

StatefulSet scaling hit limits due to pod affinity constraints.

Click to view details
Scaling & Load
#433
Scaling & Load
Cross-Cluster Autoscaling Failures
Kubernetes v1.21, Azure AKS

Autoscaling failed across clusters due to inconsistent resource availability between regions.

Click to view details
Scaling & Load
#434
Scaling & Load
Service Disruption During Auto-Scaling of StatefulSet
Kubernetes v1.24, AWS EKS

StatefulSet failed to scale properly during maintenance, causing service disruption.

Click to view details
Scaling & Load
#435
Scaling & Load
Unwanted Pod Scale-down During Quiet Periods
Kubernetes v1.23, GKE

Autoscaler scaled down too aggressively during periods of low traffic, leading to resource shortages during traffic bursts.

Click to view details
Scaling & Load
#436
Scaling & Load
Cluster Autoscaler Inconsistencies with Node Pools
Kubernetes v1.25, GCP

Cluster Autoscaler failed to trigger due to node pool constraints.

Click to view details
Scaling & Load
#437
Scaling & Load
Disrupted Service During Pod Autoscaling in StatefulSet
Kubernetes v1.22, AWS EKS

Pod autoscaling in a StatefulSet led to disrupted service due to the stateful nature of the application.

Click to view details
Scaling & Load
#438
Scaling & Load
Slow Pod Scaling During High Load
Kubernetes v1.26, DigitalOcean

Autoscaling pods didn’t trigger quickly enough during sudden high-load events, causing delays.

Click to view details
Scaling & Load
#439
Scaling & Load
Autoscaler Skipped Scale-up Due to Incorrect Metric
Kubernetes v1.23, AWS EKS

Autoscaler skipped scale-up because it was using the wrong metric for scaling.

Click to view details
Scaling & Load
#440
Scaling & Load
Scaling Inhibited Due to Pending Jobs in Queue
Kubernetes v1.25, Azure AKS

Pod scaling was delayed because jobs in the queue were not processed fast enough.

Click to view details
Scaling & Load
#441
Scaling & Load
Scaling Delayed Due to Incorrect Resource Requests
Kubernetes v1.24, AWS EKS

Pod scaling was delayed because of incorrectly set resource requests, leading to resource over-provisioning.

Click to view details
Scaling & Load
#442
Scaling & Load
Unexpected Pod Termination Due to Scaling Policy
Kubernetes v1.23, Google Cloud

Pods were unexpectedly terminated during scale-down due to aggressive scaling policies.

Click to view details
Scaling & Load
#443
Scaling & Load
Unstable Load Balancing During Scaling Events
Kubernetes v1.25, Azure AKS

Load balancing issues surfaced during scaling, leading to uneven distribution of traffic.

Click to view details
Scaling & Load
#444
Scaling & Load
Autoscaling Ignored Due to Resource Quotas
Kubernetes v1.26, IBM Cloud

Resource quotas prevented autoscaling from triggering despite high load.

Click to view details
Scaling & Load
#445
Scaling & Load
Delayed Scaling Response to Traffic Spike
Kubernetes v1.24, GCP

Scaling took too long to respond during a traffic spike, leading to degraded service.

Click to view details
Scaling & Load
#446
Scaling & Load
CPU Utilization-Based Scaling Did Not Trigger for High Memory Usage
Kubernetes v1.22, Azure AKS

Scaling based on CPU utilization did not trigger when the issue was related to high memory usage.

Click to view details
Scaling & Load
#447
Scaling & Load
Inefficient Horizontal Scaling of StatefulSets
Kubernetes v1.25, GKE

Horizontal scaling of StatefulSets was inefficient due to StatefulSet’s inherent limitations.

Click to view details
Scaling & Load
#448
Scaling & Load
Autoscaler Skipped Scaling Events Due to Flaky Metrics
Kubernetes v1.23, AWS EKS

Autoscaler skipped scaling events due to unreliable metrics from external monitoring tools.

Click to view details
Scaling & Load
#449
Scaling & Load
Delayed Pod Creation Due to Node Affinity Misconfigurations
Kubernetes v1.24, Google Cloud

Pods were delayed in being created due to misconfigured node affinity rules during scaling events.

Click to view details
Scaling & Load
#450
Scaling & Load
Excessive Scaling During Short-Term Traffic Spikes
Kubernetes v1.25, AWS EKS

Autoscaling triggered excessive scaling during short-term traffic spikes, leading to unnecessary resource usage.

Click to view details
Scaling & Load
#451
Scaling & Load
Inconsistent Scaling Due to Misconfigured Horizontal Pod Autoscaler
Kubernetes v1.26, Azure AKS

Horizontal Pod Autoscaler (HPA) inconsistently scaled pods based on incorrect metric definitions.

Click to view details
Scaling & Load
#452
Scaling & Load
Load Balancer Overload After Quick Pod Scaling
Kubernetes v1.25, Google Cloud

Load balancer failed to distribute traffic effectively after a large pod scaling event, leading to overloaded pods.

Click to view details
Scaling & Load
#453
Scaling & Load
Autoscaling Failed During Peak Traffic Periods
Kubernetes v1.24, AWS EKS

Autoscaling was ineffective during peak traffic periods, leading to degraded performance.

Click to view details
Scaling & Load
#454
Scaling & Load
Insufficient Node Resources During Scaling
Kubernetes v1.23, IBM Cloud

Node resources were insufficient during scaling, leading to pod scheduling failures.

Click to view details
Scaling & Load
#455
Scaling & Load
Unpredictable Pod Scaling During Cluster Autoscaler Event
Kubernetes v1.25, Google Cloud

Pod scaling was unpredictable during a Cluster Autoscaler event due to a sudden increase in node availability.

Click to view details
Scaling & Load
#456
Scaling & Load
CPU Resource Over-Commitment During Scale-Up
Kubernetes v1.23, Azure AKS

During a scale-up event, CPU resources were over-committed, causing pod performance degradation.

Click to view details
Scaling & Load
#457
Scaling & Load
Failure to Scale Due to Horizontal Pod Autoscaler Anomaly
Kubernetes v1.22, AWS EKS

Horizontal Pod Autoscaler (HPA) failed to scale up due to a temporary anomaly in the resource metrics.

Click to view details
Scaling & Load
#458
Scaling & Load
Memory Pressure Causing Slow Pod Scaling
Kubernetes v1.24, IBM Cloud

Pod scaling was delayed due to memory pressure in the cluster, causing performance bottlenecks.

Click to view details
Scaling & Load
#459
Scaling & Load
Node Over-Provisioning During Cluster Scaling
Kubernetes v1.25, Google Cloud

Nodes were over-provisioned, leading to unnecessary resource wastage during scaling.

Click to view details
Scaling & Load
#460
Scaling & Load
Autoscaler Fails to Handle Node Termination Events Properly
Kubernetes v1.26, Azure AKS

Autoscaler did not handle node termination events properly, leading to pod disruptions.

Click to view details
Scaling & Load
#461
Scaling & Load
Node Failure During Pod Scaling Up
Kubernetes v1.25, AWS EKS

Scaling up pods failed when a node was unexpectedly terminated, preventing proper pod scheduling.

Click to view details
Scaling & Load
#462
Scaling & Load
Unstable Scaling During Traffic Spikes
Kubernetes v1.26, Azure AKS

Pod scaling became unstable during traffic spikes due to delayed scaling responses.

Click to view details
Scaling & Load
#463
Scaling & Load
Insufficient Node Pools During Sudden Pod Scaling
Kubernetes v1.24, Google Cloud

Insufficient node pool capacity caused pod scheduling failures during sudden scaling events.

Click to view details
Scaling & Load
#464
Scaling & Load
Latency Spikes During Horizontal Pod Scaling
Kubernetes v1.25, IBM Cloud

Latency spikes occurred during horizontal pod scaling due to inefficient pod distribution.

Click to view details
Scaling & Load
#465
Scaling & Load
Resource Starvation During Infrequent Scaling Events
Kubernetes v1.23, AWS EKS

During infrequent scaling events, resource starvation occurred due to improper resource allocation.

Click to view details
Scaling & Load
#466
Scaling & Load
Autoscaler Delayed Reaction to Load Decrease
Kubernetes v1.22, Google Cloud

The autoscaler was slow to scale down after a drop in traffic, causing resource wastage.

Click to view details
Scaling & Load
#467
Scaling & Load
Node Resource Exhaustion Due to High Pod Density
Kubernetes v1.24, Azure AKS

Node resource exhaustion occurred when too many pods were scheduled on a single node, leading to instability.

Click to view details
Scaling & Load
#468
Scaling & Load
Scaling Failure Due to Node Memory Pressure
Kubernetes v1.25, Google Cloud

Pod scaling failed due to memory pressure on nodes, preventing new pods from being scheduled.

Click to view details
Scaling & Load
#469
Scaling & Load
Scaling Latency Due to Slow Node Provisioning
Kubernetes v1.26, IBM Cloud

Pod scaling was delayed due to slow node provisioning during cluster scaling events.

Click to view details
Scaling & Load
#470
Scaling & Load
Slow Scaling Response Due to Insufficient Metrics Collection
Kubernetes v1.23, AWS EKS

The autoscaling mechanism responded slowly to traffic changes because of insufficient metrics collection.

Click to view details
Scaling & Load
#471
Scaling & Load
Node Scaling Delayed Due to Cloud Provider API Limits
Kubernetes v1.24, Google Cloud

Node scaling was delayed because the cloud provider’s API rate limits were exceeded, preventing automatic node provisioning.

Click to view details
Scaling & Load
#472
Scaling & Load
Scaling Overload Due to High Replica Count
Kubernetes v1.25, Azure AKS

Pod scaling led to resource overload on nodes due to an excessively high replica count.

Click to view details
Scaling & Load
#473
Scaling & Load
Failure to Scale Down Due to Persistent Idle Pods
Kubernetes v1.24, IBM Cloud

Pods failed to scale down during low traffic periods, leading to idle resources consuming cluster capacity.

Click to view details
Scaling & Load
#474
Scaling & Load
Load Balancer Misrouting After Pod Scaling
Kubernetes v1.26, AWS EKS

The load balancer routed traffic unevenly after scaling up, causing some pods to become overloaded.

Click to view details
Scaling & Load
#475
Scaling & Load
Cluster Autoscaler Not Triggering Under High Load
Kubernetes v1.22, Google Cloud

The Cluster Autoscaler failed to trigger under high load due to misconfiguration in resource requests.

Click to view details
Scaling & Load
#476
Scaling & Load
Autoscaling Slow Due to Cloud Provider API Delay
Kubernetes v1.25, Azure AKS

Pod scaling was delayed due to cloud provider API delays during scaling events.

Click to view details
Scaling & Load
#477
Scaling & Load
Over-provisioning Resources During Scaling
Kubernetes v1.24, IBM Cloud

During a scaling event, resources were over-provisioned, causing unnecessary resource consumption and cost.

Click to view details
Scaling & Load
#478
Scaling & Load
Incorrect Load Balancer Configuration After Node Scaling
Kubernetes v1.25, Google Cloud

After node scaling, the load balancer failed to distribute traffic correctly due to misconfigured settings.

Click to view details
Scaling & Load
#479
Scaling & Load
Incorrect Load Balancer Configuration After Node Scaling
Kubernetes v1.25, Google Cloud

After node scaling, the load balancer failed to distribute traffic correctly due to misconfigured settings.

Click to view details
Scaling & Load
#480
Scaling & Load
Autoscaling Disabled Due to Resource Constraints
Kubernetes v1.22, AWS EKS

Autoscaling was disabled due to resource constraints on the cluster.

Click to view details
Scaling & Load
#481
Scaling & Load
Resource Fragmentation Leading to Scaling Delays
Kubernetes v1.24, Azure AKS

Fragmentation of resources across nodes led to scaling delays as new pods could not be scheduled efficiently.

Click to view details
Scaling & Load
#482
Scaling & Load
Incorrect Scaling Triggers Due to Misconfigured Metrics Server
Kubernetes v1.26, IBM Cloud

The HPA scaled pods incorrectly because the metrics server was misconfigured, leading to wrong scaling triggers.

Click to view details
Scaling & Load
#483
Scaling & Load
Autoscaler Misconfigured with Cluster Network Constraints
Kubernetes v1.25, Google Cloud

The Cluster Autoscaler failed to scale due to network configuration constraints that prevented communication between nodes.

Click to view details
Scaling & Load
#484
Scaling & Load
Scaling Delays Due to Resource Quota Exhaustion
Kubernetes v1.23, AWS EKS

Pod scaling was delayed due to exhausted resource quotas, preventing new pods from being scheduled.

Click to view details
Scaling & Load
#485
Scaling & Load
Memory Resource Overload During Scaling
Kubernetes v1.24, Azure AKS

Node memory resources were exhausted during a scaling event, causing pods to crash.

Click to view details
Scaling & Load
#486
Scaling & Load
HPA Scaling Delays Due to Incorrect Metric Aggregation
Kubernetes v1.26, Google Cloud

HPA scaling was delayed due to incorrect aggregation of metrics, leading to slower response to traffic spikes.

Click to view details
Scaling & Load
#487
Scaling & Load
Scaling Causing Unbalanced Pods Across Availability Zones
Kubernetes v1.25, AWS EKS

Pods became unbalanced across availability zones during scaling, leading to higher latency for some traffic.

Click to view details
Scaling & Load
#488
Scaling & Load
Failed Scaling due to Insufficient Node Capacity for StatefulSets
Kubernetes v1.23, AWS EKS

Scaling failed because the node pool did not have sufficient capacity to accommodate new StatefulSets.

Click to view details
Scaling & Load
#489
Scaling & Load
Uncontrolled Resource Spikes After Scaling Large StatefulSets
Kubernetes v1.22, GKE

Scaling large StatefulSets led to resource spikes that caused system instability.

Click to view details
Scaling & Load
#490
Scaling & Load
Cluster Autoscaler Preventing Scaling Due to Underutilized Nodes
Kubernetes v1.24, AWS EKS

The Cluster Autoscaler prevented scaling because nodes with low utilization were not being considered for scaling.

Click to view details
Scaling & Load
#491
Scaling & Load
Pod Overload During Horizontal Pod Autoscaling Event
Kubernetes v1.25, Azure AKS

Horizontal Pod Autoscaler (HPA) overloaded the system with pods during a traffic spike, leading to resource exhaustion.

Click to view details
Scaling & Load
#492
Scaling & Load
Unstable Node Performance During Rapid Scaling
Kubernetes v1.22, Google Kubernetes Engine (GKE)

Rapid node scaling led to unstable node performance, impacting pod stability.

Click to view details
Scaling & Load
#493
Scaling & Load
Insufficient Load Balancer Configuration After Scaling Pods
Kubernetes v1.23, Azure Kubernetes Service (AKS)

Load balancer configurations failed to scale with the increased number of pods, causing traffic routing issues.

Click to view details
Scaling & Load
#494
Scaling & Load
Inconsistent Pod Distribution Across Node Pools
Kubernetes v1.21, Google Kubernetes Engine (GKE)

Pods were not evenly distributed across node pools after scaling, leading to uneven resource utilization.

Click to view details
Scaling & Load
#495
Scaling & Load
HPA and Node Pool Scaling Conflict
Kubernetes v1.22, AWS EKS

Horizontal Pod Autoscaler (HPA) conflicted with Node Pool autoscaling, causing resource exhaustion.

Click to view details
Scaling & Load
#496
Scaling & Load
Delayed Horizontal Pod Scaling During Peak Load
Kubernetes v1.20, DigitalOcean Kubernetes (DOKS)

HPA scaled too slowly during a traffic surge, leading to application unavailability.

Click to view details
Scaling & Load
#497
Scaling & Load
Ineffective Pod Affinity Leading to Overload in Specific Nodes
Kubernetes v1.21, AWS EKS

Pod affinity settings caused workload imbalance and overloading in specific nodes.

Click to view details
Scaling & Load
#498
Scaling & Load
Inconsistent Pod Scaling Due to Resource Limits
Kubernetes v1.24, Google Kubernetes Engine (GKE)

Pods were not scaling properly due to overly restrictive resource limits.

Click to view details
Scaling & Load
#499
Scaling & Load
Kubernetes Autoscaler Misbehaving Under Variable Load
Kubernetes v1.23, AWS EKS

Cluster Autoscaler failed to scale the nodes appropriately due to fluctuating load, causing resource shortages.

Click to view details
Scaling & Load
#500
Scaling & Load
Pod Evictions Due to Resource Starvation After Scaling
Kubernetes v1.21, Azure Kubernetes Service (AKS)

After scaling up the deployment, resource starvation led to pod evictions, resulting in service instability.

Click to view details
Scaling & Load
#501
Scaling & Load
Slow Pod Scaling Due to Insufficient Metrics Collection
Kubernetes v1.22, Google Kubernetes Engine (GKE)

The Horizontal Pod Autoscaler (HPA) was slow to respond because it lacked sufficient metric collection.

Click to view details
Scaling & Load
#502
Scaling & Load
Inconsistent Load Balancing During Pod Scaling Events
Kubernetes v1.20, AWS EKS

Load balancer failed to redistribute traffic effectively when scaling pods, causing uneven distribution and degraded service.

Click to view details
Scaling & Load