Skip to content

fix(buffers): Optimize buffer usage metric tracking#24911

Open
bruceg wants to merge 1 commit intomasterfrom
bruceg/optimize-buffer-usage-data
Open

fix(buffers): Optimize buffer usage metric tracking#24911
bruceg wants to merge 1 commit intomasterfrom
bruceg/optimize-buffer-usage-data

Conversation

@bruceg
Copy link
Member

@bruceg bruceg commented Mar 12, 2026

Summary

The buffer usage metrics, in particular the value of the current utilization levels, were tracked using an atomic u64 which was updated using a fetch_update mechanism in order to protect against underflowing. This same mechanism was extended to all of the atomics as well for consistency. The problem with that is that fetch_udpate internally uses a loop around a compare-and-exchange operation which is very expensive, particularly when contended. In comparison, the base fetch_add is typically a single locked instruction which completes in many fewer cycles.

This change returns these atomics to only ever use fetch_add and then calculate the current level by subtracting the count of increments from the count of decrements.

Vector configuration

N/A

How did you test this PR?

Unit tests

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

#24058

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@bruceg bruceg requested a review from a team as a code owner March 12, 2026 21:02
@bruceg bruceg added meta: regression This issue represents a regression domain: performance Anything related to Vector's performance labels Mar 12, 2026
@bruceg bruceg force-pushed the bruceg/optimize-buffer-usage-data branch from 7e9a011 to 4a15afc Compare March 12, 2026 21:04
The buffer usage metrics, in particular the value of the current utilization
levels, were tracked using an atomic `u64` which was updated using a
`fetch_update` mechanism in order to protect against underflowing. This same
mechanism was extended to all of the atomics as well for consistency. The
problem with that is that `fetch_udpate` internally uses a loop around a
`compare-and-exchange` operation which is very expensive, particularly when
contended. In comparison, the base `fetch_add` is typically a single locked
instruction which completes in many fewer cycles.

This change returns these atomics to only ever use `fetch_add` and then
calculate the current level by subtracting the count of increments from the
count of decrements.
@bruceg bruceg force-pushed the bruceg/optimize-buffer-usage-data branch from 4a15afc to 3936121 Compare March 12, 2026 21:06
@bruceg
Copy link
Member Author

bruceg commented Mar 12, 2026

@@ -0,0 +1,3 @@
Fixed regression in performance of buffer usage metric tracking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we validate if this PR fixes the regression?

Ref #24911 (comment)

@pznamensky
Copy link

Thanks for the effort, @bruceg!
I'd be happy to try this out in our production env with real workload if needed.
However I have some problems building proper docker image. So if you could trigger a docker build job, I could use that image to use in our setup.

@pront
Copy link
Member

pront commented Mar 16, 2026

Thanks for the effort, @bruceg! I'd be happy to try this out in our production env with real workload if needed. However I have some problems building proper docker image. So if you could trigger a docker build job, I could use that image to use in our setup.

Hi @pznamensky I kicked off a custom build: https://github.com/vectordotdev/vector/actions/runs/23146056109

You can use those builds once they are published. Looking forward to hearing back from after you test this 🤞

@pznamensky
Copy link

@pront, thank you for preparing the images.
Bad news is that in our case Vector from this PR uses CPU on the same level as v0.54.0.
Average CPU usage in our cluster:
image
So it looks like the original issue might be not in metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: performance Anything related to Vector's performance meta: regression This issue represents a regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants