Skip to content

Improve Operator Pod Mutation Observability #3702

Open
@cjp421

Description

@cjp421

Component(s)

auto-instrumentation

Is your feature request related to a problem? Please describe.

I'm currently working on rolling out the OpenTelemetry Operator across all of the kubernetes (OpenShift) clusters in our environment. The capability of auto-instrumenting our application workloads will become crucial in our ability to support our systems. If something happens to the operator that results in pods NOT getting auto-instrumented, we'd potentially be "flying blind".

I'd like the ability to have finer insights into the counts of auto-instrumentation attempts and failures to build the proper alerting (SLOs).

Describe the solution you'd like

Instrument the pod mutator to create/increment metrics that indicate that a pod contains the instrumentation annotation and is subject to receive auto-instrumentation. Some initial ideas on the types of scenarios/metrics to expose:

  • pod contained instrumentation/sidecar annotation (may or may not be valid config) -> increment some counter saying "the podmutator will attempt to process"
  • pod contained invalid "inject" type -> pod mutation didn't happen, increment a counter to reflect this scenario
  • pod contained invalid instrumentation or sidecar reference in the annotation value -> pod mutation didn't happen, increment a counter to reflect this scenario
  • pod contained valid instrumentation or sidecar annotation/reference, but an unexpected error occurred -> pod mutation failed, increment a counter to reflect

I know some of these scenarios may be available in container or kubernetes logs, but for managing a fleet of operator across multiple clusters is much easier to do with aggregate metrics to feed to our alerting infrastructure.

Describe alternatives you've considered

I'm currently leveraging the metrics provided by the kubernetes api server admission controller to see the counts of webhook invocations sent to the mpod.kb.io and it does provide some insights, but not all pod creations will be eligible for OTel instrumentation (i.e. they may or may not have the instrumentation.opentelemetry.io annotations.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions