New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tail sampling #1768
Comments
I ended up putting together a proof-of-concept using a different approach than described above. This commit included the relevant changes to the The first was a new layer which attaches a trace data to every span in a hierarchy. This enables sharing data across spans, or aggregating spans for export by a collector like opentelemetry. The second was basically a flag that can be attached to that shared trace data to indicate to the collection layer that the trace should be dropped or exported. Naturally, this sort of all on or all off approach is limiting, but I think it's a reasonable demonstration of something that works and could be extended without having much overhead. If you end up jumping over to that repository to read the code, all the interesting stuff is in the copy of the OpenTelemetryLayer borrowed from here ( |
Hi all, I was considering opening a fresh issue with a proposal closer to our current tail sampling implementation on our tracing-opentelemetry fork. Before doing so, I wanted to ask, is there any interest in adding support for tail sampling in mainline tracing? I believe the tail sampling behavior could be completely opt-in and 0 overhead for those that don't need it. We certainly need it, and I'd love to contribute here if there's interest. |
Feature Request
Better support tail sampling based on computed properties such as HTTP response codes or other things known only later in request lifecycles (vs upfront like level filter).
Motivation
High volume systems might want to drop uninteresting collections of spans to reduce load on other systems. This is already possible to some extent by using filtering to do a probabilistic sampling up front or via level filter to reduce the number of spans for a trace, but for looking at things like HTTP response codes and dropping uninteresting things like nominal 2xx responses, there doesn't seem to be much support at all.
My particular use case is actually for a Kafka consumer which is uninterested in a large fraction of messages on the topic.
Proposal
The way I've been thinking about this problem is that a Subscriber/Collector implementation could buffer spans, events, etc. before handing off to a new
Export
/Report
/ type to actually report on the trace. In some ways, this is not too dissimilar from what thetracing-opentelemetry::Layer
impl is already doing -- data is buffered in theLayer
until the span is closed at which point the opentelemetry builder is finalized and the span is started/dropped and then handed off to the opentelemetry library to export.This could be achieved without any additional traits by implementing a Layer as described and chaining it with other reporting layers. However, that is merely convention rather than a hard abstraction. Based on my current understanding of the code, it seems to me the best approach would be to separate out trace collection from trace exporting more formally.
The benefit of an extra
Report
trait that operates on groups of spans is that it makes it easy to implement an arbitrary tail sampling logic between eg the collection step and the opentelemetry reporter.Alternatives
tracing-opentelemetry::Layer
, however.Closing notes
I actually came here to ask a question about whether or not this was possible and I was just missing something, but it wasn't one of the options in the issue templates 🙃. I had done enough code reading and thinking about the problem however to make a proposal, so here it is. If this is something the project is interested in, I believe I could provide an implementation, given a bit of guidance.
Thanks!
The text was updated successfully, but these errors were encountered: