Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokio-trace: how to use Spans to represent concurrency? #80

Closed
jsgf opened this issue Feb 20, 2019 · 9 comments
Closed

tokio-trace: how to use Spans to represent concurrency? #80

jsgf opened this issue Feb 20, 2019 · 9 comments
Assignees
Labels
crate/core Related to the `tracing-core` crate kind/docs

Comments

@jsgf
Copy link
Contributor

jsgf commented Feb 20, 2019

I've been reading the code for tokio-trace, and I've got some questions about Spans and how they're intended to be used.

It looks like they're pretty good at representing strictly nested time-periods, as you'd get with a function call tree. But its unclear to me if the direct children of a parent Span can overlap with each other (ie, representing long-term concurrent operations).

For example, if I have a Future which is managing 10 concurrent connections which it initiates at once and then waits for each of them to produce a result, could I represent that as a parent Span with 10 sub-Span representing each connection (which in turn may have sub-spans, etc)? It seems like that would't work because there's always a notion of "the" current Span.

Or is this not the intended level of granularity? Should I be using Span just to track specific calls to Future/Stream::poll and derive tracing at that level? I can see how that would be possible, but it seems like it would be hard to reconstruct application-level states from that level of granularity.

The example doesn't really help clarify this, because it omits all actual async behaviour.

On a more general level, is there a way to represent a dependency relationship in the trace? For example, if task A is waiting for something from task B, can that be recorded?

@jsgf
Copy link
Contributor Author

jsgf commented Feb 20, 2019

Ah, I just noticed follows_from. I'm guessing the intention is that each individual state transition is recorded with Span (ie, the actual poll calls), but they're linked together into a high-level operation via follows_from. Is that correct?

@carllerche
Copy link
Member

I don't think follows_from is the best fit in this case. A good example for follows_from is a task sending a message to a worker and the worker does some work to process that message. The span representing the work to process that message has a parent that is the worker task but follows from the span that sent the message.

The case you describe does sound like a parent / child relationship. The work in the spawned task is entirely in service of the spawner. Being able to represent this as a parent / child was an original design goal. Thinking about it, I wonder if it was lost a bit over iterations.

IIRC, my thinking was that the spawner and the spawned task would get linked via a manual step. There would be an API to say that span A is a child of span B. However, parent / child was removed from -core. My thought was that it could be entirely handled by the subscriber by observing how spans are entered / exited.

Perhaps this isn't sufficient.

There are a few questions in my mind:

  • How does the instrumentation side of tokio-trace say Span A is a child of Span B.
  • Is observing parent/child via span enter / exit correct? I don't think not doing this is an option.
    • What happens if a child span is nested in Span A, then it is moved and is re-rooted in Span B? I assume a span cannot have multiple parents? But how would this be handled?

@hawkw
Copy link
Member

hawkw commented Feb 20, 2019

@jsgf this is an excellent question, thanks!

The answer right now is that this is something that the core tokio-trace libraries are unopinionated about. Since tracking the current span is something that can be done solely by a Subscriber and isn't necessary to implement anything in tokio-trace-core, we thought this was best left up to Subscriber implementations to determine the correct behaviour for their particular use-case. This is why the docs mention a notion of a "current span" but don't actually define what this means (something which I think ought to be resolved by improving the documentation).

With that said, here are some of my thoughts that have fallen out of experimenting with subscriber implementations:

  • You're right that if a single global current span is tracked for the entire process, we can't really represent concurrently executing contexts on multiple threads.

    I think that in most cases, subscribers which expect to be used in multithreaded contexts should track a separate current span per thread. In my subscriber implementations, I've used something like this:

      thread_local! {
          static CURRENT_SPAN: RefCell<Vec<Span>> = RefCell::new(Vec::new());
      }
    
      // This can be used for determining the parent of new spans, for determining 
      // the current span for formatting events, etc...
      fn current_span() -> Option<Span> {
          CURRENT_SPAN.with(|stack| {
              stack.borrow().last().map(Span::clone)
          })
      }
    
      impl Subscriber for MySubscriber {
    
          fn enter(&self, span: &Span) {
              CURRENT_SPAN.with(|stack| {
                  stack.borrow_mut().push(span.clone());
              })
          }
    
          fn exit(&self, _span: &Span) {
              CURRENT_SPAN.with(|stack| {
                  let _ = stack.borrow_mut().pop();
              })
          }
    
          // ....
      } 

    Eventually, I'd like to ship a library of composable utilities for writing subscribers, with code for things like "tracking the current span on a per-thread basis" to be used in subscriber implementations. That's something I'm still experimenting with, though.

  • Typically, I think of the "parent" of a span as being the span in which that span was created, meaning that in most cases, it's likely that there'll be both a stack of the spans the current thread has entered (as in the example above) and a tree of spans with parent-child relationships, but again, this is left up to the subscriber.

    I've found that when printing events in a context, the context that makes the most logical sense to someone consuming logs is to print the thread's current span and all its parents. This means that (using the example of a Hyper server) we'll get logs that look like this:

     INFO server{local=V4(127.0.0.1:3000)}: hyper_echo: listening...
     TRACE server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50498)}: hyper_echo: accepted connection
     INFO server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50498)}:request{method=POST uri=/echo/}: hyper_echo: received request
     TRACE server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50500)}: hyper_echo: accepted connection
     INFO server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50500)}:request{method=POST uri=/echo/reversed}: hyper_echo: received request
     DEBUG server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50500)}:request{method=POST uri=/echo/reversed}:response{response_kind="reversed"}: hyper_echo: reversed request body chunk=Ok("hello world") body=Ok("dlrow olleh")
     TRACE server{local=V4(127.0.0.1:3000)}:connection{remote=V4(127.0.0.1:50500)}:request{method=POST uri=/echo/reversed}:response{response_kind="reversed"}: hyper_echo: response completed
    

    even if the accepting task and the spawned request-handler futures are running on different threads in a threadpool. However, if we care about what individual worker threads are doing, we might also want to inspect the stack of spans that thread has entered as well.

In the use-case you described, representing the parent future with a span and having each spawned future correspond to a new span inside the spawning future's span definitely seems like what I would do.

I would probably do something like this (using the Future::instrument combinator from the tokio-trace-futures crate in the nursery):

use futures::future::{self, Future};
use tokio_trace_futures::Instrument;

fn parent_task(how_many: usize) -> impl Future<Item=(), Error=()> {
    future::lazy(move || {
        info!("spawning subtasks...");
        let subtasks = (1..=how_many).map(|i| {
            debug!(message = "creating subtask;", number = i);
            subtask(i)
        }).collect::<Vec<_>>();
        future::join_all(subtasks)
    })
    .map(|result| {
        debug!("all subtasks completed");
        let sum: usize = result.into_iter().sum();
        info!(sum = sum);
    })
    .instrument(span!("parent_task", subtasks = how_many))
}

fn subtask(number: usize) -> impl Future<Item=usize, Error=()> {
    future::lazy(move || {
        info!("polling subtask...");
        Ok(number)
    }).instrument(span!("subtask", number = number))
}

Running this as an example (using the tokio-trace-fmt subscriber from the nursery) gets logs like this:

 INFO parent_task{subtasks=10}: spawny_thing: spawning subtasks...
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=1
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=2
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=3
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=4
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=5
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=6
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=7
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=8
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=9
DEBUG parent_task{subtasks=10}: spawny_thing: creating subtask; number=10
DEBUG parent_task{subtasks=10}:subtask{number=1}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=2}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=3}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=4}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=5}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=6}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=7}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=8}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=9}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}:subtask{number=10}: spawny_thing: polling subtask...
DEBUG parent_task{subtasks=10}: spawny_thing: all subtasks completed
 INFO parent_task{subtasks=10}: spawny_thing: sum=55

which seems about like what I would expect to see from this code.

@hawkw
Copy link
Member

hawkw commented Feb 20, 2019

As a side note, sorry that the nursery crates I mentioned above are not currently very well documented or discoverable; they're still works in progress (is is what happens when one tries to materialize an entire library ecosystem from scratch. :P).

@jsgf
Copy link
Contributor Author

jsgf commented Feb 21, 2019

Thanks for your explanations. I think I have a clearer idea about how things fit together. Specifically the understanding I have now is:

  • A Span corresponds to an instance of a Future (or Stream), with entry periods corresponding to calls to poll
    • Though Spans are more general and could be used to model similar things
  • Sub-spans correspond to sub-Future/Streams for sub-parts of the overall job
  • Span trees are scoped to a Task - it doesn't make sense for a Span to reference one in another task (except by follows_from)
  • Tasks are always bound to a single thread at a time, so only one Span at a given level of the tree can be entered at a time

Does that sound about right?

@carllerche
Copy link
Member

@hawkw

My gut is that that using a thread local would be a good default for a subscriber to use if no parent / child relationship is specified, but there should be a way using the instrumentation API to explicitly create a parent / child relationship or to create an unrooted span.

I don't think we use span creation to determine parent / child given that spans are not entered when created. Subscribers probably have to observe the first time a span is entered and find the parent then.

@hawkw
Copy link
Member

hawkw commented Feb 22, 2019

@carllerche

there should be a way using the instrumentation API to explicitly create a parent / child relationship or to create an unrooted span.

I'm open to adding this.

I don't think we use span creation to determine parent / child given that spans are not entered when created. Subscribers probably have to observe the first time a span is entered and find the parent then.

I don't agree --- in my experience, considering the span in which a span was created to be its parent leads to behaviour more closely in line with the user's expectations. For example, consider code like

span!("span_1").enter(|| {

    let fut = foo
        .and_then(...)
         // ... 
        .instrument(span!("span 2"));

    tokio::spawn(fut);
});

Logically, I would expect span_1 to be the parent of span_2. However, if we determine span parenthood based on the current span when a span is entered, then span_2 will likely be rootless*. In my experience, in order to get the nice trace trees we expect to see, where i.e. the task spawned to handle a request is a child of the task that accepts incoming connections, we probably want to determine parenthood based on the current span when a span is created.

But, regardless, this is behaviour that ought to be determined by the subscriber.


*Or, will be a child of whatever span the worker thread that began executing it was currently inside of when the future was polled for the first time, which could be very surprising indeed.

@carllerche
Copy link
Member

@hawkw

When would such a snippet ever happen? Spans are created in future constructors. In practice, it would be something like this:

TcpStream::connect(...)
    .instrument(span!("child"))
    .and_then(|...| {
    })
    .instrument(span!("parent"))

If the parent / child link happens at construction time, it will fail here.

@carllerche carllerche transferred this issue from tokio-rs/tokio Jun 24, 2019
@hawkw hawkw added crate/core Related to the `tracing-core` crate kind/docs labels Jun 24, 2019
@hawkw
Copy link
Member

hawkw commented Jul 3, 2019

I think we've put some work into fixing up the docs so that this is a little clearer. I'm closing this ticket for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate/core Related to the `tracing-core` crate kind/docs
Projects
None yet
Development

No branches or pull requests

3 participants