task: fix LocalSet having a single shared task budget #2462

hawkw · 2020-04-29T19:31:00Z

Motivation

Currently, an issue exists where a LocalSet has a single cooperative
task budget that's shared across all futures spawned on the LocalSet
and by any future passed to LocalSet::run_until or
LocalSet::block_on. Because these methods will poll the run_until
future before polling spawned tasks, it is possible for that task to
always deterministically starve the entire LocalSet so that no local
tasks can proceed. When the completion of that future itself depends
on other tasks on the LocalSet, this will then result in a deadlock,
as in issue #2460.

A detailed description of why this is the case, taken from this
comment:

LocalSet wraps each time a local task is run in budget:

tokio/tokio/src/task/local.rs

Line 406 in 947045b

Some(task) => crate::coop::budget(|| task.run()),

This is identical to what tokio's other schedulers do when running
tasks, and in theory should give each task its own budget every time
it's polled.

However, LocalSet is different from other schedulers. Unlike the
runtime schedulers, a LocalSet is itself a future that's run on
another scheduler, in block_on. block_on also sets a budget:

tokio/tokio/src/runtime/basic_scheduler.rs

Line 131 in 947045b

if let Ready(v) = crate::coop::budget(|| future.as_mut().poll(&mut cx)) {

The docs for budget state that:

tokio/tokio/src/coop.rs

Line 73 in 947045b

/// Enabling budgeting when it is already enabled is a no-op.

This means that inside of a LocalSet, the calls to budget are
no-ops. Instead, each future polled by the LocalSet is subtracting
from a single global budget.

LocalSet's RunUntil future polls the provided future before polling
any other tasks spawned on the local set:

tokio/tokio/src/task/local.rs

Lines 525 to 535 in 947045b

    
           if let Poll::Ready(output) = me.future.poll(cx) { 
        
               return Poll::Ready(output); 
        
           } 
        
           if me.local_set.tick() { 
        
               // If `tick` returns `true`, we need to notify the local future again: 
        
               // there are still tasks remaining in the run queue. 
        
               cx.waker().wake_by_ref(); 
        
           } 
        
           Poll::Pending

In this case, the provided future is JoinAll. Unfortunately, every
time a JoinAll is polled, it polls every joined future that has not
yet completed. When the number of futures in the JoinAll is >= 128,
this means that the JoinAll immediately exhausts the task budget. This
would, in theory, be a good thing --- if the JoinAll had a huge
number of JoinHandles in it and none of them are ready, it would limit
the time we spend polling those join handles.

However, because the LocalSet actually has a single shared task
budget, this means polling the JoinAll always exhausts the entire
budget. There is now no budget remaining to poll any other tasks spawned
on the LocalSet, and they are never able to complete.

Solution

This branch solves this issue by resetting the task budget when polling
a LocalSet. I've added a new function to coop for resetting the task
budget to UNCONSTRAINED for the duration of a closure, and thus
allowing the budget calls in LocalSet to actually create a new
budget for each spawned local task. Additionally, I've changed
LocalSet to also ensure that a separate task budget is applied to
any future passed to block_on/run_until.

Additionally, I've added a test reproducing the issue described in
#2460. This test fails prior to this change, and passes after it.

Fixes #2460

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

kleimkuhler

Just a heads up there are also a few dbg!s left in this change

hawkw · 2020-04-29T22:40:12Z

Just a heads up there are also a few dbg!s left in this change

agh! thanks for noticing!

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

jonhoo · 2020-04-30T13:06:26Z

Hmm, it's definitely a little weird for LocalSet to be allowed to effectively boundlessly increase the size of its budget. This change would mean that a LocalSet can basically run forever, never yielding back to the runtime, as long as its inner futures keep spawning things and the block_on future does not resolve. That doesn't seem right?

Is there a reason to prefer this way of fixing it over, say, using coop::limit to only give the block_on future access to something like half the remaining budget?

hawkw · 2020-04-30T16:10:25Z

@jonhoo

This change would mean that a LocalSet can basically run forever, never yielding back to the runtime, as long as its inner futures keep spawning things and the block_on future does not resolve.

Hmm. It also seemed wrong, to me, for all tasks on the LocalSet to share a single budget, even if we restrict the amount of that budget that's available to the block_on future? When a future is running on a real scheduler and it spawns a new task, the spawned task will have its own budget independent of the parent, so I didn't think it was correct for a locally spawned future to share budget with its parent. I'm open to reevaluating that decision — this was just my thought process.

jonhoo · 2020-04-30T17:59:05Z

Yeah, I honestly don't know what the right thing to do is here. My instinct is that LocalSet is no less exempt from "you must yield the top-level task every so often" than any other task is. Maybe there's an argument for LocalSet being allowed a larger budget, but even that I'm not sure about? What is the argument for doing so?

If we were to go the route this PR proposes, we would at the very least need to special-case LocalSet further, such as by not allowing the "unstealable slot" optimization when the current task is a LocalSet. Otherwise, such a task may be perpetually starved.

hawkw · 2020-04-30T18:29:03Z

Maybe there's an argument for LocalSet being allowed a larger budget, but even that I'm not sure about? What is the argument for doing so?

As an example of the kind of scenario I was concerned about, consider a case where I've spawned, say, a third of my futures on a LocalSet, and the remainder on the threadpool. Each spawned task has its own budget of 128 polls. However, the LocalSet tasks share a single budget of 128 polls. Even if we used limit to avoid one future starving the rest of the LocalSet, any given local future is allowed to proceed significantly less often than a future running on the threadpool.

When the threadpool scheduler is used, no tasks other than the one passed to block_on is executing on the main thread; all spawned tasks are scheduled on worker threads. So, the local tasks are not competing with other tasks, only with each other. If they all share a single budget of 128 polls, the local set will spend a considerably larger fraction of its time round-tripping tasks through the scheduler than the threadpool workers.

And, it's worth remembering that the local set already has its own limit on the number of tasks it will poll per tick before yielding, separately from the task budget — if more than that number of tasks have been notified since the last local set tick, it will return Pending and wake itself. This already ensures that when running a local set side-by-side with other tasks (e.g. on the basic scheduler), the local set will yield to other tasks rather than constantly polling its subtasks. If that wasn't the case, it would make more sense for the local set to have its own budget, IMO.

I wonder if @carllerche has an opinion?

jonhoo · 2020-04-30T18:34:55Z

Hmm, that's interesting. The observation that LocalSet is only ever used in block_on or on basic_scheduler helps. There are, I suppose, still cases where a LocalSet could be contained within another task in those contexts, but that seems much rarer. I was thinking of a case where a LocalSet was somehow running on a worker thread, but that can't actually happen today, so my thinking there was just wrong.

In light of ^, I think I agree with you that unconstrained budgeting probably makes sense for LocalSet itself.

hawkw added 2 commits April 29, 2020 11:26

add test reproducing #2460

7f71fb7

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

task: fix LocalSets having a single shared budget

30e9f97

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

hawkw requested review from carllerche and jonhoo April 29, 2020 19:31

hawkw self-assigned this Apr 29, 2020

fix 'unused' code on some feature flags

691678d

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

kleimkuhler reviewed Apr 29, 2020

View reviewed changes

remove dbg

ee81cba

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

jonhoo approved these changes Apr 30, 2020

View reviewed changes

hawkw merged commit 20b5df9 into master Apr 30, 2020

carllerche deleted the eliza/localset-joinall branch May 2, 2020 20:19

leoyvens mentioned this pull request Jun 10, 2020

calling futures::executor::block_on in block_in_place may hang #2603

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task: fix LocalSet having a single shared task budget #2462

task: fix LocalSet having a single shared task budget #2462

hawkw commented Apr 29, 2020

kleimkuhler left a comment

hawkw commented Apr 29, 2020

jonhoo commented Apr 30, 2020

hawkw commented Apr 30, 2020

jonhoo commented Apr 30, 2020

hawkw commented Apr 30, 2020

jonhoo commented Apr 30, 2020

	if let Poll::Ready(output) = me.future.poll(cx) {
	return Poll::Ready(output);
	}

	if me.local_set.tick() {
	// If `tick` returns `true`, we need to notify the local future again:
	// there are still tasks remaining in the run queue.
	cx.waker().wake_by_ref();
	}

	Poll::Pending

task: fix LocalSet having a single shared task budget #2462

task: fix LocalSet having a single shared task budget #2462

Conversation

hawkw commented Apr 29, 2020

Motivation

Solution

kleimkuhler left a comment

Choose a reason for hiding this comment

hawkw commented Apr 29, 2020

jonhoo commented Apr 30, 2020

hawkw commented Apr 30, 2020

jonhoo commented Apr 30, 2020

hawkw commented Apr 30, 2020

jonhoo commented Apr 30, 2020