Skip to content

enhancement(tag_cardinality_limit transform): Remove HashSet initialization with capacity to improve memory usage#25480

Merged
ArunPiduguDD merged 1 commit into
masterfrom
optimize-tag-cardinality-memory-usage
May 21, 2026
Merged

enhancement(tag_cardinality_limit transform): Remove HashSet initialization with capacity to improve memory usage#25480
ArunPiduguDD merged 1 commit into
masterfrom
optimize-tag-cardinality-memory-usage

Conversation

@ArunPiduguDD
Copy link
Copy Markdown
Contributor

@ArunPiduguDD ArunPiduguDD commented May 20, 2026

Summary

When initializing a new in-memory map/set to keep track of values seen for a new tag key, the HashSet is initalized with a capacity of value_limit. However this causes unnecessary memory usage, especially in cases where the value_limit is high and the overall cardinality of each tag key is low.

Changing this to initializing with default HashSet capacity to prevent unnecessary memory usage

Vector configuration

sources:
  otel:
    type: opentelemetry
    grpc:
      address: "0.0.0.0:4317"
    http:
      address: "0.0.0.0:4318"

transforms:
  strip_resource_tags:
    type: remap
    inputs: ["otel.metrics"]
    source: |
      del(.tags."resource.telemetry.sdk.version")
      del(.tags."resource.telemetry.sdk.language")
      del(.tags."resource.telemetry.sdk.name")
      del(.tags."resource.service.name")
      del(.tags."resource.service.version")

  cardinality:
    type: tag_cardinality_limit
    inputs: ["strip_resource_tags"]
    value_limit: 5
    mode: exact
    limit_exceeded_action: drop_event

    # The new setting under test. Try toggling between `global` (current behavior:
    # all metrics without a `per_metric_limits` entry share one bucket) and
    # `per_metric` (every metric name gets its own bucket).
    tracking_scope: per_metric

    per_metric_limits:
      # Tighter override on this specific metric — applies regardless of `tracking_scope`.
      demo_value_gauge:
        value_limit: 2
        mode: exact
        limit_exceeded_action: drop_event

      demo_value_counter:
        value_limit: 6
        mode: exact
        limit_exceeded_action: drop_tag

sinks:
  console:
    type: console
    inputs: ["cardinality"]
    encoding:
      codec: json

How did you test this PR?

Tested with Vector config and monitored memory usage

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

@github-actions github-actions Bot added the domain: transforms Anything related to Vector's transform components label May 20, 2026
@ArunPiduguDD ArunPiduguDD changed the title Remove HashSet initialization with capacity enhancement(tag_cardinality_limit transform): Remove HashSet initialization with capacity to improve memory usage May 20, 2026
@ArunPiduguDD ArunPiduguDD force-pushed the optimize-tag-cardinality-memory-usage branch from 1becb46 to fa6e8b7 Compare May 20, 2026 20:29
@ArunPiduguDD ArunPiduguDD marked this pull request as ready for review May 20, 2026 20:30
@ArunPiduguDD ArunPiduguDD requested a review from a team as a code owner May 20, 2026 20:30
@ArunPiduguDD ArunPiduguDD force-pushed the optimize-tag-cardinality-memory-usage branch from fa6e8b7 to d19c326 Compare May 20, 2026 21:43
@ArunPiduguDD ArunPiduguDD added this pull request to the merge queue May 21, 2026
Merged via the queue into master with commit c5959d7 May 21, 2026
82 checks passed
@ArunPiduguDD ArunPiduguDD deleted the optimize-tag-cardinality-memory-usage branch May 21, 2026 13:10
@github-actions github-actions Bot locked and limited conversation to collaborators May 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: transforms Anything related to Vector's transform components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants