Skip to content

Add ability to throttle exports when reading from disk. #663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Victorsesan
Copy link

Added an implementation which provides a flexible way to manage bandwidth usage when exporting spans, allowing for smoother data flow and preventing resource hogging. It can further refine the size estimation logic based on a specific use case.
Relate to #638

Victorsesan and others added 3 commits October 27, 2024 03:16
…idth usage when exporting spans, allowing for smoother data flow and preventing resource hogging. It can further refine the size estimation logic based on a specific use case.
…idth usage when exporting spans, allowing for smoother data flow and preventing resource hogging. It can further refine the size estimation logic based on a specific use case.

Relate to Add ability to throttle exports when reading from disk. open-telemetry#638
@Victorsesan Victorsesan requested a review from a team as a code owner October 27, 2024 02:57
…nterface and Duration with a plain long value for timeWindowInMillis

 ref open-telemetry#638
Ref: Add ability to throttle exports when reading from disk. open-telemetry#638
Comment on lines 89 to 92
final SpanExporter delegate;
CategoryFunction categoryFunction = span -> "default";
long maxBytesPerSecond = 1024; // Default to 1 KB/s
long timeWindowInMillis = 1000; // Default to 1 second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should they be private since they have a setter anyway (using the builder pattern)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the fields are intended to be set only through the builder methods i have tried to add and make them private so they can only be modified through it's provided builder methods. I'm sure it will help enhance encapsulation and help maintain the integrity of it's object's state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delegate can also be private I guess

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@breedx-splk
Copy link
Contributor

@Victorsesan are you able to come back to this any time soon? Thanks!

@breedx-splk breedx-splk added the needs author feedback Waiting for additional feedback from the author label Jan 21, 2025
@Victorsesan
Copy link
Author

Hey @breedx-splk Yes i will, i think i lastly made a change that needed a mod review. Still waiting

@github-actions github-actions bot removed the needs author feedback Waiting for additional feedback from the author label Jan 21, 2025
}

static class Builder {
final SpanExporter delegate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final SpanExporter delegate;
private final SpanExporter delegate;

@marandaneto
Copy link
Member

Copy link
Member

@marandaneto marandaneto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#663 (comment) pending otherwise LGTM

@breedx-splk
Copy link
Contributor

@Victorsesan seems like we're close, but the build is broken again.

@marandaneto
Copy link
Member

@Victorsesan, let us know if you could fix CI/rebase as well. Otherwise, @bidetofevil will 'hijack' in good faith and get it mergeable.

@bidetofevil
Copy link
Contributor

So I had a look at the PR, and I think it needs a few additional changes to be production ready: namely, the algorithm to determine the size of a span in bytes is just a placeholder, and when the threshold is reached, the exported spans are not cached, but simply dropped and not passed onto the delegate.

If it was an in-progress change, it may be reasonable to merge it, but unless there is commit to get this production-ready, I don't think we should be in the repo.

@LikeTheSalad
Copy link
Contributor

So I had a look at the PR, and I think it needs a few additional changes to be production ready: namely, the algorithm to determine the size of a span in bytes is just a placeholder, and when the threshold is reached, the exported spans are not cached, but simply dropped and not passed onto the delegate.

If it was an in-progress change, it may be reasonable to merge it, but unless there is commit to get this production-ready, I don't think we should be in the repo.

I agree. Also, to add to those points:

  • The algorithm to determine the size of a span might not be straightforward to create, and even if we come up with a nice one, it might not be as processing-friendly as other options, such as the one about using a batch/time approach that's mentioned in the issue.
  • Dropping data should not be part of this solution. The closest I think we can get to an implementation that addresses this issue without dropping data, would be by somehow breaking this loop before all the available data in disk is exported.

Copy link
Contributor

@LikeTheSalad LikeTheSalad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for creating this PR, @Victorsesan. The approach proposed here to solve the issue brings some important concerns, mentioned in the latest comments, that don't make it feasible to get merged, unless we change the overall approach.

Going with a different approach would most likely require discarding all the existing changes in this PR, which is totally understandable if that’s more work than you planned for. So please let us know if you’re up for spending more time on it — if not, no worries, we can close this one and revisit it in a future PR.

@Victorsesan
Copy link
Author

Thank you for creating this PR, @Victorsesan. The approach proposed here to solve the issue brings some important concerns, mentioned in the latest comments, that don't make it feasible to get merged, unless we change the overall approach.

Going with a different approach would most likely require discarding all the existing changes in this PR, which is totally understandable if that’s more work than you planned for. So please let us know if you’re up for spending more time on it — if not, no worries, we can close this one and revisit it in a future PR.

Hi @LikeTheSalad i can give another go, since the PR has been opened for too long i will be happy to have it completed regardless

@LikeTheSalad
Copy link
Contributor

Hi @LikeTheSalad i can give another go, since the PR has been opened for too long i will be happy to have it completed regardless

Got it, thank you @Victorsesan. If I understood correctly, it seems like you would like to try a different approach within this same PR, if that's the case then I'll keep it open. Cheers!

@bidetofevil
Copy link
Contributor

A couple of suggestions that I think might simplify the solution:

  1. We can approach this from the read-from-disk side of the house
  • Basically, replace the timed job mechanism to export batches, but rather have the read-side be triggered on-demand and read read from disk when it's ready. Basically, when a batch is written to disk, it'll inform the reader that there is a batch ready to go. The reader can decide if it's ready to process it, and do when when it's ready. Once it it exports a batch, it can schedule itself to determine when they should check next, and so on, until there are no more batches to read. The reader will be triggered again when a new batch is written to disk.
  • The advantage of this is that you won't read from disk until you're ready to send, thereby limiting losing data when there's a crash during export, say, if you were using a buffering exporter to send data out.
  1. Instead of counting by bytes, just count by spans. And instead of cutting it off right at the limit, just let the last batch go through.
  • We are just approximating things here to limit data flow, so there's no need to eat the complexity to try to be that fine grain. I think doing so by counting logs and spans is sufficient for most cases, even if not entirely accurate.

@Victorsesan
Copy link
Author

Thanks for the suggestions @bidetofevil will keep that in mind while working on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants