Skip to content

Add serviceName Field to OpenTelemetryCollector CRD for StatefulSet Deployments #4029

Open
@rewt

Description

@rewt

Component(s)

collector

Is your feature request related to a problem? Please describe.

When deploying the OpenTelemetry Collector as a StatefulSet using the OpenTelemetry Operator in a GKE cluster, the OpenTelemetryCollector CRD does not allow setting the serviceName to match the headless Service name, causing a mismatch with statefulset.serviceName. Kubernetes requires statefulset.serviceName to match the headless Service name for pods to register DNS hostnames (e.g., ...svc.cluster.local). Without this, the loadbalancing exporter’s kubernetes resolver (return_hostnames: true) fails to resolve pod hostnames, resulting in:

error: couldn't find the exporter for the endpoint ""

when configured as

exporters:
  loadbalancing:
    routing_key: "traceID"
    protocol:
      otlp:
        timeout: 1s
        tls:
          insecure: true
    resolver:
      k8s:
        service: opentelemetry-backend-collector-headless.plat-observe-dev
        timeout: 3s
        return_hostnames: true

For example, my headless Service (opentelemetry-backend-collector-headless.plat-observe-dev) doesn’t match the StatefulSet’s default serviceName (opentelemetry-backend-collector), preventing hostname resolution:

kubectl get statefulset opentelemetry-backend-collector -n plat-observe-dev -o jsonpath='{.spec.serviceName}'
opentelemetry-backend-collector

nslookup opentelemetry-backend-collector-0.plat-observe-dev.svc.cluster.local
** server can't find ...: NXDOMAIN

This breaks traceID routing, critical for tail-based sampling in my setup.

Describe the solution you'd like

Add a serviceName field to the OpenTelemetryCollector CRD to specify the headless Service name when mode: statefulset. The Operator should set statefulset.serviceName to this value, ensuring DNS hostname registration.

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-backend
  namespace: plat-observe-dev
spec:
  name: "opentelemetry-backend-collector"
  serviceName: "opentelemetry-backend-collector-headless"
  mode: statefulset
  replicas: 9
  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 10
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
        send_batch_size: 20000
        timeout: 5s
        send_batch_max_size: 25000
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]

Describe alternatives you've considered

Manual Workaround: Deploy the StatefulSet outside the Operator (using official collector helm chart) with a matching serviceName. This is successful, but loses Operator benefits.

Static Resolver: Failed due to incorrect DNS domain (plat-observe.dev vs. svc.cluster.local).

cert-manager: Issued certificates with Service DNS name, resolving TLS errors, but doesn’t fix hostname resolution.

Istio Gateway: Considered for ingress, but doesn’t support pod-specific routing needed for traceID.

Additional context

Environment: GKE, Istio 1.19.10-asm.33 (CSM auto mode), mTLS STRICT, OpenTelemetry Collector 0.126.0, Helm chart open-telemetry/opentelemetry-collector.

Setup: StatefulSet with replicas: 9, autoscaling (minReplicas: 3, maxReplicas: 10), headless Service (opentelemetry-backend-collector-headless), loadbalancing exporter (routing_key: "traceID", kubernetes resolver).

Issue: Mismatch between statefulset.serviceName (opentelemetry-backend-collector) and headless Service name prevents DNS hostname registration, breaking kubernetes resolver.

TLS: Certificates lack Service DNS name, addressed with cert-manager, but hostname resolution remains critical.

Metadata

Metadata

Assignees

Labels

area:collectorIssues for deploying collectorbugSomething isn't workingenhancementNew feature or requesthelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions