Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt: do not trace tasks while locking OwnedTasks #6036

Merged
merged 9 commits into from Oct 6, 2023
28 changes: 19 additions & 9 deletions tokio/src/runtime/task/trace/mod.rs
Expand Up @@ -275,6 +275,8 @@ pub(in crate::runtime) fn trace_current_thread(
task.as_raw().state().transition_to_notified_for_tracing();
// store the raw tasks into a vec
tasks.push(task.as_raw());
// do NOT poll `task` here, since we hold a lock on `owned` and the task
// may complete and need to remove itself from `owned`.
});

tasks
Expand Down Expand Up @@ -317,20 +319,28 @@ cfg_rt_multi_thread! {
drop(synced);

// notify each task
let mut traces = vec![];
let mut tasks = vec![];
owned.for_each(|task| {
// set the notified bit
task.as_raw().state().transition_to_notified_for_tracing();
// store the raw tasks into a vec
tasks.push(task.as_raw());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be taking a refcount to the task instead of using as_raw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the API for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add this:

impl<S: 'static> Clone for Task<S> {
    fn clone(&self) -> Task<S> {
        self.raw.ref_inc();
        Task::new(self.raw)
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I realized that transition_to_notified_for_tracing actually performs a refcount increment. Because of that, you don't need the additional refcount increment. However, I would probably prefer to see this refcount management a bit more wrapped up in a less error-prone api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Unfortunately, now I'm quite confused by how refcount management is supposed to work. I removed only the increment in transition_to_notified_for_tracing, but doing so causes the tests to SIGABRT:

free(): invalid pointer

What's the relationship between notifying a task, polling a task, and its refcount?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of each function is documented in harness.rs. There's also a bunch of documentation at the top of mod.rs.

Basically, each Task (or Notified) object owns a refcount. Various methods will consume the refcount. We want the refcount that transition_to_notified_for_tracing creates to be owned by a Task somehow, rather than implicitly via a RawTask. This is because it generally makes code much more clear, and it also makes the code more robust towards error paths due to RAII cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks! So, to summarize, the crash occurred because:

  • cloning Task incremented the refcount
  • polling the task decremented the refcount
  • dropping Task decremented the refcount

...which is one too many decrements. transition_to_notified_for_tracing needs to increment the refcount (much like transition_to_notified_by_ref), because polling is going to consume that count.

// do NOT poll `task` here, since we hold a lock on `owned` and the task
// may complete and need to remove itself from `owned`.
});

// trace the task
let ((), trace) = Trace::capture(|| task.as_raw().poll());
traces.push(trace);
tasks
.into_iter()
.map(|task| {
// trace the task
let ((), trace) = Trace::capture(|| task.poll());

// reschedule the task
let _ = task.as_raw().state().transition_to_notified_by_ref();
task.as_raw().schedule();
});
// reschedule the task
let _ = task.state().transition_to_notified_by_ref();
task.schedule();
jswrenn marked this conversation as resolved.
Show resolved Hide resolved

traces
trace
})
.collect()
}
}