Skip to content

PodDisruptionBudget defaults can still block node drain #4114

Open
@ryanohnemus

Description

@ryanohnemus

Component(s)

No response

What happened?

Description

I am unaware of a way to disable PodDisruptionBudgets (PDBs) altogether for our OpenTelemetryCollector deployment resources, so please let me know if there is a way to do that and we can close this issue.

The issue:

When a user has a working deployment of opentelemetry collector (managed by the operator) and then attempts an update on the OpenTelemetryCollector that has a breaking change (ie bad permission/misconfiguration/etc) it causes the rolling out deployment to stay in a crashloop, while the previous deployment stays up and can not be removed (due to the pdb default MaxUnavailable 1). This prevents node drains from removing the old deployment and causes k8s node upgrade/rollouts to time out.

Steps to Reproduce

  1. Have a working OpenTelemetryCollector resource that is managed by opentelemetry-operator
  2. update that resource with something that would cause a new deployment with misconfiguration (ie a bad feature-gate flag would probably work)
  3. new deployment goes into a crash loop while old deployment stays up (expected, but blocks node drain)

Resolution / Suggestion:

The workaround I have found is by having users of the OpenTelemetryCollector resource setting:

spec:
  podDisruptionBudget:
    minAvailable: 0

I'm hoping an easy solution would just be to change the default of the MaxUnavailable: 1 to MinAvailable: 0 instead... code link:

MaxUnavailable: &intstr.IntOrString{
Type: intstr.Int,
IntVal: 1,

Is this possible, if not could there be an option added to disable setting PDBs at all at the operator level?

I understand users of the OpenTelemetryCollector resource should ensure their update/rollouts work when updating it, but unfortunately that not something i've been able to force users to do 😄 ...so having a less restrictive default seems like an easier option.

Kubernetes Version

1.31

Operator version

0.117

Collector version

0.117.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

Log output

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:collectorIssues for deploying collectordiscuss-at-sigThis issue or PR should be discussed at the next SIG meetingenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions