enhancement(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking by ArunPiduguDD · Pull Request #25372 · vectordotdev/vector

ArunPiduguDD · 2026-05-05T17:24:51Z

Summary

When metrics do not have an explicit per_metric_limits entry, their tag values were always pooled into a single shared bucket. This can lead to some of the following example scenarios:

If metric1 and metric2 have the host tag, but metric1 has a high cardinality for the host tag (above the limit), the host tag will be dropped on metric2 (even if the tag on metric2 only has 1-2 cardinality)
If there are ~100 metrics with the host tag, and each tag has 1-2 unique values per metric, then a cardinality limit of 50 will drop this tag across all metrics.

The new tracking_scope setting lets users opt into per-metric tracking buckets instead, providing isolation at the cost of higher memory.

Default is global (current behavior); per_metric gives every distinct (namespace, name) its own bucket regardless of per_metric_limits membership.

Vector configuration

sources:
  otel:
    type: opentelemetry
    grpc:
      address: "0.0.0.0:4317"
    http:
      address: "0.0.0.0:4318"

  cardinality:
    type: tag_cardinality_limit
    inputs: ["otel.metrics"]
    value_limit: 5
    mode: exact
    limit_exceeded_action: drop_event

    # The new setting under test. Try toggling between `global` (current behavior:
    # all metrics without a `per_metric_limits` entry share one bucket) and
    # `per_metric` (every metric name gets its own bucket).
    tracking_scope: per_metric

    per_metric_limits:
      # Tighter override on this specific metric — applies regardless of `tracking_scope`.
      demo_value_gauge:
        value_limit: 2
        mode: exact
        limit_exceeded_action: drop_event

      demo_value_counter:
        value_limit: 6
        mode: exact
        limit_exceeded_action: drop_tag

sinks:
  console:
    type: console
    inputs: ["cardinality"]
    encoding:
      codec: json

How did you test this PR?

Tested with above configuration. Simulated an Otel Collector with the following Python script:

import random
import string
from uuid import uuid4

from opentelemetry import metrics
from opentelemetry.metrics import Observation
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import InMemoryMetricReader, MetricExportResult
from opentelemetry.sdk.metrics.view import View, DropAggregation
from opentelemetry.sdk.resources import Resource


VECTOR_METRICS_ENDPOINT = "http://localhost:4318/v1/metrics"


def rand_token(prefix: str, n: int = 8) -> str:
    return f"{prefix}-{''.join(random.choices(string.ascii_lowercase + string.digits, k=n))}"


def random_trace_id() -> str:
    return uuid4().hex + uuid4().hex


def random_environment() -> str:
    return rand_token(random.choice(["dev", "staging", "prod", "local", "qa"]))


def build_common_tags() -> dict[str, str]:
    return {
        "trace_id": random_trace_id(),
        "environment": random_environment(),
    }


def build_system_process_tags() -> dict[str, str]:
    tags = build_common_tags()
    tags.update(
        {
            "host_id": rand_token("host"),
            "process_group": rand_token("pg"),
            "shard": rand_token("shard"),
            "worker": rand_token("worker"),
        }
    )
    return tags


def main() -> None:
    resource = Resource.create(
        {
            "service.name": "vector-http-metrics-demo",
            "service.version": "1.0.0",
        }
    )

    reader = InMemoryMetricReader()

    provider = MeterProvider(
        resource=resource,
        metric_readers=[reader],
        views=[
            View(instrument_name="*", aggregation=DropAggregation()),
            View(instrument_name="demo_value_gauge"),
            View(instrument_name="system.process.count"),
            View(instrument_name="demo_value_counter"),
            View(instrument_name="demo_value_secondary_gauge"),
            View(instrument_name="demo_value_secondary_counter"),
        ],
    )
    metrics.set_meter_provider(provider)

    exporter = OTLPMetricExporter(
        endpoint=VECTOR_METRICS_ENDPOINT,
        timeout=3000,
    )

    meter = metrics.get_meter("demo-meter")

    state = {
        "demo_value_gauge": {
            "value": 0.0,
            "tags": build_common_tags(),
        },
        "system.process.count": {
            "value": 0.0,
            "tags": build_system_process_tags(),
        },
        "demo_value_counter": {
            "value": 0.0,
            "tags": build_common_tags(),
        },
        "demo_value_secondary_gauge": {
            "value": 0.0,
            "tags": build_common_tags(),
        },
        "demo_value_secondary_counter": {
            "value": 0.0,
            "tags": build_common_tags(),
        },
    }

    def demo_value_gauge_callback(_options):
        s = state["demo_value_gauge"]
        return [Observation(s["value"], s["tags"])]

    def system_process_count_callback(_options):
        s = state["system.process.count"]
        return [Observation(s["value"], s["tags"])]

    def demo_value_counter_callback(_options):
        s = state["demo_value_counter"]
        return [Observation(s["value"], s["tags"])]

    def demo_value_secondary_gauge_callback(_options):
        s = state["demo_value_secondary_gauge"]
        return [Observation(s["value"], s["tags"])]

    def demo_value_secondary_counter_callback(_options):
        s = state["demo_value_secondary_counter"]
        return [Observation(s["value"], s["tags"])]

    meter.create_observable_gauge(
        name="demo_value_gauge",
        callbacks=[demo_value_gauge_callback],
        description="Gauge metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    meter.create_observable_gauge(
        name="system.process.count",
        callbacks=[system_process_count_callback],
        description="Process count metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    meter.create_observable_counter(
        name="demo_value_counter",
        callbacks=[demo_value_counter_callback],
        description="Counter metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    meter.create_observable_gauge(
        name="demo_value_secondary_gauge",
        callbacks=[demo_value_secondary_gauge_callback],
        description="Second demo gauge metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    meter.create_observable_counter(
        name="demo_value_secondary_counter",
        callbacks=[demo_value_secondary_counter_callback],
        description="Second demo counter metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    print(f"Configured OTLP/HTTP metrics endpoint: {VECTOR_METRICS_ENDPOINT}")
    print("Press Enter to send all five metrics with random values and random tags.")
    print("Type q and press Enter to quit.")

    try:
        while True:
            user_input = input("> ").strip().lower()
            if user_input in {"q", "quit", "exit"}:
                break

            state["demo_value_gauge"]["value"] = round(random.uniform(0, 100), 2)
            state["system.process.count"]["value"] = float(random.randint(1, 500))
            state["demo_value_counter"]["value"] += float(random.randint(1, 20))
            state["demo_value_secondary_gauge"]["value"] = round(random.uniform(-50, 50), 2)
            state["demo_value_secondary_counter"]["value"] += float(random.randint(1, 10))

            state["demo_value_gauge"]["tags"] = build_common_tags()
            state["system.process.count"]["tags"] = build_system_process_tags()
            state["demo_value_counter"]["tags"] = build_common_tags()
            state["demo_value_secondary_gauge"]["tags"] = build_common_tags()
            state["demo_value_secondary_counter"]["tags"] = build_common_tags()

            metrics_data = reader.get_metrics_data()
            if metrics_data is None:
                print("send failed: no metrics data collected")
                continue

            result = exporter.export(metrics_data)

            if result is MetricExportResult.SUCCESS:
                print("sent all metrics")
                print(
                    f"  demo_value_gauge value={state['demo_value_gauge']['value']} "
                    f"tags={state['demo_value_gauge']['tags']}"
                )
                print(
                    f"  system.process.count value={state['system.process.count']['value']} "
                    f"tags={state['system.process.count']['tags']}"
                )
                print(
                    f"  demo_value_counter value={state['demo_value_counter']['value']} "
                    f"tags={state['demo_value_counter']['tags']}"
                )
                print(
                    f"  demo_value_secondary_gauge value={state['demo_value_secondary_gauge']['value']} "
                    f"tags={state['demo_value_secondary_gauge']['tags']}"
                )
                print(
                    f"  demo_value_secondary_counter value={state['demo_value_secondary_counter']['value']} "
                    f"tags={state['demo_value_secondary_counter']['tags']}"
                )
            else:
                print("send failed for this export batch")

    finally:
        exporter.shutdown()
        provider.shutdown()


if __name__ == "__main__":
    main()

Change Type

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

Notes

Please read our Vector contributor resources.
Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
Some CI checks run only after we manually approve them.
- We recommend adding a pre-push hook, please see this template.
- Alternatively, we recommend running the following locally before pushing to the remote branch:
  - make fmt
  - make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
  - make test
After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run git merge origin master and git push.
If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

pront · 2026-05-07T15:30:14Z

-            Some((metric_namespace, metric_name.clone()))
-        } else {
-            None
+        let metric_key = match self.config.tracking_scope {


One concern about tracking_scope: per_metric: the accepted_tags map can only grow (no cap, TTL, or eviction). In per_metric mode every distinct (namespace, name) seen on the wire becomes a permanent bucket, and within each bucket every tag key allocates its own AcceptedTagValueSet.

The pre-existing code had this growth pattern too, but it was bounded by the user's per_metric_limits config. With per_metric scope the bound becomes dynamic and controlled by upstream metric names, so if a source emits high cardinality metric names (an anti-pattern but one we see in the wild), the transform's memory grows monotonically for the lifetime of the process.

There are a few options here but adding a max_tracked_metrics (or similar) knob seems reasonable. When we hit this limit, we can reject new metric IDs. I am open to discussing an LRU strategy too.

@pront I'd say technically this problem also existed before b/c with even a global tag counter the number of tags being tracked is also still unbounded (though definitely agree it's more likely to be an issue with this new per metric tracking scope)

Will go with a "max tracked tags" approach that can be used for either tracking scope that keeps track of the max number of items that can be tracked in total (either [metric, tag] pairs in the case of per metric scope or just [None, tag] in the case of global tracking scope).

Will leave any strategies like LRU cache out for now

Also will leave this field as optional for those that do not want to set it

…or per-metric vs global tag tracking When metrics do not have an explicit `per_metric_limits` entry, their tag values were always pooled into a single shared bucket. The new `tracking_scope` setting lets users opt into per-metric tracking buckets instead, providing isolation at the cost of higher memory. Default is `global` (current behavior); `per_metric` gives every distinct (namespace, name) its own bucket regardless of `per_metric_limits` membership. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

ArunPiduguDD requested review from a team as code owners May 5, 2026 17:24

github-actions Bot added docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: transforms Anything related to Vector's transform components domain: external docs Anything related to Vector's external, public documentation labels May 5, 2026

ArunPiduguDD marked this pull request as draft May 5, 2026 17:25

ArunPiduguDD changed the title ~~feat(tag_cardinality_limit transform): add tracking_scope setting f…~~ feat(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking May 5, 2026

ArunPiduguDD mentioned this pull request May 5, 2026

enhancement(tag_cardinality_limit transform): Add more fine grained controls tag cardinality #25360

Open

9 tasks

ArunPiduguDD changed the title ~~feat(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking~~ enhancement(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking May 5, 2026

ArunPiduguDD force-pushed the arun.pidugu/tag-cardinality-tracking-scope branch from 8044081 to 9a136f2 Compare May 5, 2026 19:03

ArunPiduguDD marked this pull request as ready for review May 5, 2026 19:22

brett0000FF approved these changes May 5, 2026

View reviewed changes

pront reviewed May 7, 2026

View reviewed changes

ArunPiduguDD force-pushed the arun.pidugu/tag-cardinality-tracking-scope branch from 9a136f2 to b402066 Compare May 7, 2026 19:57

ArunPiduguDD force-pushed the arun.pidugu/tag-cardinality-tracking-scope branch from b402066 to 445abd6 Compare May 8, 2026 02:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking#25372

enhancement(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking#25372
ArunPiduguDD wants to merge 1 commit intomasterfrom
arun.pidugu/tag-cardinality-tracking-scope

ArunPiduguDD commented May 5, 2026 •

edited

Loading

Uh oh!

pront May 7, 2026

Uh oh!

ArunPiduguDD May 7, 2026

Uh oh!

ArunPiduguDD May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ArunPiduguDD commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Notes

Uh oh!

pront May 7, 2026

Choose a reason for hiding this comment

Uh oh!

ArunPiduguDD May 7, 2026

Choose a reason for hiding this comment

Uh oh!

ArunPiduguDD May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArunPiduguDD commented May 5, 2026 •

edited

Loading