Back to all scenarios
Scenario #34
Cluster Management
K8s v1.21, self-managed with local etcd
Etcd Disk Full Crashing the Cluster
Entire control plane crashed due to etcd disk running out of space.
Find this helpful?
What Happened
Continuous writes from custom resources filled the disk where etcd data was stored.
Diagnosis Steps
- 1Observed etcdserver: mvcc: database space exceeded errors.
- 2Checked disk usage: df -h showed 100% full.
Root Cause
No compaction or defragmentation done on etcd for weeks.
Fix/Workaround
• Performed etcd compaction and defragmentation.
• Added disk space temporarily.
Lessons Learned
Etcd needs regular maintenance.
How to Avoid
- 1Set up cron jobs or alerts for etcd health.
- 2Monitor disk usage and trigger auto-compaction.