Description
Component(s)
No response
What happened?
Description
I am unaware of a way to disable PodDisruptionBudgets (PDBs) altogether for our OpenTelemetryCollector deployment resources, so please let me know if there is a way to do that and we can close this issue.
The issue:
When a user has a working deployment of opentelemetry collector (managed by the operator) and then attempts an update on the OpenTelemetryCollector that has a breaking change (ie bad permission/misconfiguration/etc) it causes the rolling out deployment to stay in a crashloop, while the previous deployment stays up and can not be removed (due to the pdb default MaxUnavailable 1). This prevents node drains from removing the old deployment and causes k8s node upgrade/rollouts to time out.
Steps to Reproduce
- Have a working OpenTelemetryCollector resource that is managed by opentelemetry-operator
- update that resource with something that would cause a new deployment with misconfiguration (ie a bad feature-gate flag would probably work)
- new deployment goes into a crash loop while old deployment stays up (expected, but blocks node drain)
Resolution / Suggestion:
The workaround I have found is by having users of the OpenTelemetryCollector resource setting:
spec:
podDisruptionBudget:
minAvailable: 0
I'm hoping an easy solution would just be to change the default of the MaxUnavailable: 1
to MinAvailable: 0
instead... code link:
Is this possible, if not could there be an option added to disable setting PDBs at all at the operator level?
I understand users of the OpenTelemetryCollector resource should ensure their update/rollouts work when updating it, but unfortunately that not something i've been able to force users to do 😄 ...so having a less restrictive default seems like an easier option.
Kubernetes Version
1.31
Operator version
0.117
Collector version
0.117.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
Log output
Additional context
No response