-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call tree summary, e.g. after each request #639
Comments
|
Thanks for the great feature request! I really appreciate the level of thought and detail that you've put into this issue; this is very helpful! :) I believe @davidbarsky has been experimenting with something similar --- I'm not sure how much additional work is needed to get that crate ready for release, it may just be adding documentation and polish. I think David's crate right now is currently closer to a tree-structured log than a flamegraph-like profile — it performs no aggregation of multiple instances of the same span, and includes events as children of spans. However, I think we could add a lot of the features you're proposing on top of what he's got, and it could probably expose some configuration options to change the level of aggregation, etc. Here are some assorted notes on what you've written up:
Note that
Span metadata already has a
A much simpler option: we could just start by tracking any span that is a root (i.e. doesn't have a parent). We could also add the ability to layer in a filter using the existing filter infrastructure, to allow selecting which roots should be tracked by disabling those we don't care about. I think users could easily configure the behavior you're describing, with a special And some general open questions:
|
Yup. I can release it now as an alpha to crates.io with minimal documentation, but before I feel comfortable saying its "ready", I'd like to add additional documentation.
I'd gladly accept these PRs. The proposal @kolloch has laid out is a direction I'd like to see |
|
Thank you for your interest and your thoughtful comments!
@hawkw @davidbarsky I just had a look at the code and I don't see much overlap, unfortunately. No aggregation, the output format is quite different, ... I might be wrong! Maybe when I start implementing I will realize that I need exactly what David wrote already ;)
Yes, I am aware.
Very good suggestion, sounds like a match. Thanks!
That is a good default, true.
Nice, I have to read up on the code to fully grasp what you mean.
I am thinking of true call paths in which path entries correspond to code locations. Probably I could do the matching on callsite. It might be sometimes useful to distinguish spans by further means but I would only implement it if that happens to be common in practice. I'd guess by default not and there might be an option or so to include certain fields in the call path element identity?
Yeah, true. We could e.g. count all warnings/errors at a call path. The name of events is not very useful without having the source code side-by-side: definition of the name. Maybe we could use a certain special field value or do you have other ideas? In similar cases, I have seen the formatting string used as informal "name" -- with the placeholders still shown as placeholders. E.g. Unfortunately, that is not yet available within the |
|
Thanks to your encouragement, I finally got around putting something together: https://github.com/kolloch/reqray I think it should be useful at its current form. One implementation detail that I am uncomfortable with is the use of locking through the span extensions: Since this involves locking under the hood and I use extensions not only from the current span, but the parent and the root spans, I could imagine this results in deadlocks if another layer also uses these extensions. Can I assume if I only use my own crate types in the extensions map that I am free from such interferences of other extensions? Thanks! |
|
This looks really cool, @kolloch!
Yup. While the types placed within an extension can be shared between layers, I'm not sure how often it happens in practice due to those types being private. |
|
Thank you! The fn extensions(&self) -> Extensions<'_> {
Extensions::new(self.inner.extensions.read().expect("Mutex poisoned"))
}
fn extensions_mut(&self) -> ExtensionsMut<'_> {
ExtensionsMut::new(self.inner.extensions.write().expect("Mutex poisoned"))
}The code I wrote originally acquired the extensions of the root/parent spans in addition to the "active" span. If other extensions did that as well and tried to acquire the locks in opposite order, it would result in deadlocks. I avoid that now by always releasing the lock before acquiring other extensions. I guess that this should probably be documented in the Also, I am quite open to contribute the code of |
Feature Request
Provide an easy way to print a call tree summary after each processed request or another meaningful unit of work in your application. The call tree should be based on the tracing spans. Events are ignored.
(I am writing this feature proposal after the nice encouragement by @hawkw)
Output
In the past, roughly the information that you would find in a flame graph but in text form has proven very helpful. Here is my first proposal for an output format:
[# calls]the total number of calls for this call tree path.wall msthe total wall time that a span with this call path was alive in ms (Subscriber::new_spanuntiltry_close). Edit: wassum msbefore but this is misleading.own msthe total wall time that executing was in a span with this call path (Subscriber::enteruntilSubscriber::leave).span nameThe name from theMetadata. We could also add the callsite for disambiguation but this is probably not necessary.some relatively short identifier for the span -- probably we should allow customizing this so that each user can create a function that creates an appropriate short name for each span.The order of tree nodes: There should be only one entry for each call path but the spans should be sorted by the first time they were seen. Think of storing the children of each call path in something like a linked_hash_map. That way, the order of the children resembles the order in which they were called. For repeated calls, the order is often still quite readable since it is typical that some sub sequence of calls is simply repeated in a loop.
That, in practice, gives you a lovely outline of how your request was processed.
Crates
I think that we should create a new sub crate for this since the functionality is orthogonal to the rest and only users who want this should pay the price. Alternatively, it might be enabled by a feature on the
tracing-subscribercrate.The data model built by this subscriber might be useful for other summaries and might be extracted into another crate or a lib in the
tracing-subscribercrate. I'd start with keeping the code together in one place, though.Motivation
Let's assume a simple data model for the above example. Every
electionhas severalelection_optionsfor voting. Users can comment on everyelection_optionwith anelection_comment.election_options.querycalls for only oneelections.querycall. Fortunately, the comments do not seem to be queried individually since their cardinality is the same as the parent.wall msis useful: We do see where the majority of the latency comes from.own msis useful: We can confirm that our app mostly waits for the database. At least for async this should work well, for sync code we should rely on sub spans.Proposal
A new configurable subscriber should be created for this. I assume that the
Layer/Subscriberinfrastructure in tracing-subscriber is a good match but I haven't looked into the details.Introspection into spans:
summary roots: By default, the subscriber should start tracking at all root spans.
the subscriber should start tracking call paths for any spans that have a marker field, like e.g.. This should be overridable.summary_rootshort names: The subscriber needs to get a short name for each span. This should be user configurable.(not necessary, the metadata for spans already includes a "name" that is suitable for this)New spans which are not children of summary roots can be completely ignored.
When a summary root is
try_closed, the summary as defined above should be printed.Synthetic example
should result in the following tree (times are obviously unrealistic):
The text was updated successfully, but these errors were encountered: