Skip to content

[receiver/prometheus] Invalid metric name validation scheme error in 0.128.0 #40788

Closed
@stephen-herd-eb

Description

@stephen-herd-eb

Component(s)

No response

What happened?

Description

After upgrading from OpenTelemetry Collector 0.127.0 to 0.128.0, the Prometheus receiver fails to create scrape pools for various ServiceMonitors and PodMonitors with the error "invalid metric name validation scheme". This appears to be a breaking change in metric name validation that affects commonly used Prometheus exporters including Kyverno, Loki, kube-state-metrics, and others.

Steps to Reproduce

  1. Deploy OpenTelemetry Collector 0.128.0 with Prometheus receiver and Target Allocator enabled
  2. Configure Target Allocator to discover ServiceMonitors and PodMonitors from various namespaces
  3. Deploy common Kubernetes monitoring components (Kyverno, Loki, kube-state-metrics, etc.)
  4. Observe collector logs

Expected Result

Prometheus receiver should successfully create scrape pools and collect metrics from all discovered targets, as it did in version 0.127.0.

Actual Result

Prometheus receiver fails to create scrape pools with error:

    error creating new scrape pool {"err": "invalid metric name validation scheme: invalid metric name validation scheme, ", "scrape_pool": "serviceMonitor/kdp-core-kyverno/kyverno-background-controller/0"}

This affects multiple scrape targets including:

  • serviceMonitor/kdp-core-kyverno/*
  • podMonitor/loki/*
  • serviceMonitor/kube-state-metrics/*
  • podMonitor/opentelemetry-collectors/*

The same configuration works without issues in version 0.127.0.

Collector version

0.128.0

Environment information

Environment

OS: Kubernetes (Linux containers)
OpenTelemetry Collector: 0.128.0 (official container image)
Target Allocator: 0.127.0
OpenTelemetry Operator: 0.127.0
Deployment: Kubernetes StatefulSet via OpenTelemetry Operator

OpenTelemetry Collector configuration

---
# Source: opentelemetry-collectors/templates/otel-metrics-collector/otel-metrics-collector.yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel-metrics
  labels:
    eb/envtype: dev
    eb/owner: MyTeam
    eb/service: opentelemetry
spec:
  mode: statefulset
  replicas: 3
  ports:
    - appProtocol: grpc
      name: otlp-grpc
      port: 4317
      targetPort: 4317
      protocol: TCP
    - appProtocol: http
      name: otlp-http
      port: 4318
      protocol: TCP
      targetPort: 4318
  targetAllocator:
    enabled: true
    replicas: 1
    resources:
      limits:
        memory: 2Gi
      requests:
        cpu: 500m
        memory: 1Gi
    prometheusCR:
      enabled: true
      podMonitorSelector: {}
      serviceMonitorSelector: {}
  resources:
    limits:
      memory: 3Gi
    requests:
      cpu: 750m
      memory: 2Gi
  config:
    receivers:
      prometheus:
        config:
          global: {}
          scrape_configs: []

    exporters:
      otlphttp/metrics:
        endpoint: http://otel-metrics-gateway.example.com:4318        

    processors:
      batch/metrics:
        send_batch_max_size: 1024
        send_batch_size: 512
        timeout: 2s
      memory_limiter:
        check_interval: 1s
        limit_percentage: 80
        spike_limit_percentage: 15

    service:
      pipelines:
        metrics:
          receivers: 
            - prometheus
          processors:
            - batch/metrics
            - memory_limiter
          exporters: 
            - otlphttp/metrics
      telemetry:
        metrics:
          level: detailed

Log output

2025-06-18T00:26:32.298Z    info    service@v0.128.0/service.go:199    Setting up own telemetry...    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}}
2025-06-18T00:26:32.298Z    info    memorylimiter@v0.128.0/memorylimiter.go:149    Using percentage memory limiter    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.kind": "processor", "total_memory_mib": 3072, "limit_percentage": 80, "spike_limit_percentage": 15}
2025-06-18T00:26:32.298Z    info    prometheusreceiver@v0.128.0/metrics_receiver.go:157    Starting discovery manager    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics"}
2025-06-18T00:26:32.299Z    info    targetallocator/manager.go:69    Starting target allocator discovery    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics"}
2025-06-18T00:26:37.299Z    error    error creating new scrape pool    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics", "err": "invalid metric name validation scheme: invalid metric name validation scheme, ", "scrape_pool": "serviceMonitor/example-namespace/example-service/0"}
github.com/prometheus/prometheus/scrape.(*Manager).reload
    github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:188
github.com/prometheus/prometheus/scrape.(*Manager).reloader
    github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:161
2025-06-18T00:26:37.299Z    error    error creating new scrape pool    {"resource": {"service.instance.id": "65910152-bdde-4a41-ad96-99a45d4f8aaf", "service.name": "otelcol-contrib", "service.version": "0.128.0"}, "otelcol.component.id": "prometheus", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics", "err": "invalid metric name validation scheme: invalid metric name validation scheme, ", "scrape_pool": "podMonitor/monitoring-namespace/metrics-exporter/0"}
github.com/prometheus/prometheus/scrape.(*Manager).reload
    github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:188
github.com/prometheus/prometheus/scrape.(*Manager).reloader
    github.com/prometheus/prometheus@v0.304.1/scrape/manager.go:161

Additional context

Version Matrix

  • OpenTelemetry Collector: 0.128.0 (fails)
  • OpenTelemetry Collector: 0.127.0 (works)
  • Target Allocator: 0.127.0
  • OpenTelemetry Operator: 0.127.0

Affected Exporters/Components

This issue affects multiple common Kubernetes monitoring components:

  • Kyverno (policy engine)
  • Loki (log aggregation)
  • kube-state-metrics
  • CloudWatch exporter
  • Custom application metrics

Prometheus Version Context

Prometheus version used by collector: v0.304.1 (from stack trace)

Workaround Status

No known configuration workaround found. Attempted configurations:

  • Adding metric_name_validation_scheme: "utf8" to global config (failed)
  • Various Prometheus receiver flags (failed)
  • Transform processors (ineffective - validation occurs before processing)

Impact Assessment

This is a blocking issue for upgrading to 0.128.0 in environments with:

  • ServiceMonitor/PodMonitor auto-discovery enabled
  • Common Kubernetes monitoring stack components
  • Prevents adoption of 0.128.0 security fixes and features

Regression Confirmation

Confirmed regression: exact same configuration works in 0.127.0 and fails in 0.128.0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions