Dispatcher: check for outstanding actions before release #3510

samspills · 2023-03-22T23:59:39Z

resolves #3506

c0ed15a adds a test demonstrating that a Dispatcher, with await=true, will finish without completing the effect (thus failing the test)

Ultimately this is caused by the dispatcher finalizer running and releasing, even though there may have been new actions enqueued since the worker last checked.

Paired with @djspiewak and came to the solution in 0b6eaa4 in which we can now call step one last time, after the dispatcher is done but before releasing, so that those last-minute actions are forked appropriately. With this change the new test passes 🎉

Co-authored-by: Daniel Spiewak <djspiewak@gmail.com>

armanbilge

Andrew, Sam, and I tripled to review this PR 😂 added some notes below.

We also produced a new test in 4374e98 that verifies that dispatchers "reject new tasks while shutting down". It is currently failing for some permutations, but may be fixable by setting alive to false before doing the final step(...) in the release process.

This issue is relevant to this PR, since we want to be sure that the outstanding actions can't keep growing once we start the release process.

armanbilge · 2023-03-23T03:39:54Z

tests/shared/src/test/scala/cats/effect/std/DispatcherSpec.scala

+        _ <- dispatcher.use { runner => IO(runner.unsafeRunAndForget(resultR.set(true))) }
+        result <- resultR.get


Because of the race condition this test does not fail reliably for the parallel dispatcher. We might want to run it for several iterations or used the ticked runtime.

I reran the test (without fix) a bunch of times but I couldn't reproduce the flake. I still rewrote using ticked and also didn't see any flakes there so hopefully that is enough without running multiple times?

armanbilge · 2023-03-23T03:40:23Z

std/shared/src/main/scala/cats/effect/std/Dispatcher.scala

              // published by release
-              F.delay(doneR.lazySet(true)) *> release
+              F.delay(doneR.lazySet(true)) *> step(states(n), F.unit) *> release


I'm not sure if lazySet makes sense anymore, since it's not immediately published by release.

I agree, I don't think it makes sense. Now that we're doing work before the release, I think we want the doneR value to be immediately available

armanbilge · 2023-03-23T03:42:01Z

std/shared/src/main/scala/cats/effect/std/Dispatcher.scala

            // if we're marked as done, yield immediately to give other fibers a chance to shut us down
            // we might loop on this a few times since we're marked as done before the supervisor is canceled
-            F.delay(doneR.get()).ifM(F.cede, step)
+            F.delay(doneR.get()).ifM(F.cede, step(state, await))


Tangential, but I wonder if we can complete this done loop by returning a sentinel F[Unit] value that the supervisor can use as indication that it should no longer restart this fiber.

Oh I forgot to answer here :) When we talked about this it made sense to me, so I do want to try experimenting with that as a follow up!

Co-authored-by: Arman Bilge <armanbilge@gmail.com> Co-authored-by: Andrew Valencik <valencik@users.noreply.github.com>

samspills · 2023-04-05T01:10:38Z

Heads up: I have a draft PR up (#3521) with the alive test and a potential fix, it includes the commits here but it's a different PR to keep the concerns from getting muddled :)

samspills · 2023-04-05T16:46:23Z

The work on alive turned out to be pretty modular and not terribly hairy so I've merged it into this PR 🎉

More details are in the writeup for #3521, but tldr:
I added a test that submits a 1.second task to a dispatcher, and then concurrently releases the dispatcher while submitting a second (rogue) task. This test failed for sequential dispachers always, but highlighted a race condition with alive for parallel dispatchers:

wait long enough before submitting the rogue task (100 milliseconds): the dispatcher is not alive and the rogue task isn't submitted at all
wait not at all: the dispatcher is alive and kicking and the rogue task is submitted and completed
wait a goldilocks period of 500-900ish nanos: the rogue task can be submitted but not completed. This case shouldn't be allowed to happen.

The fix is moving the alive resource so that it is the last thing initialized. This means it will be the first to release, setting alive to false even before the workers start releasing. I think this is good, it means that the dispatcher will stop accepting new tasks before the workers start releasing, so no new tasks can be enqueued while the workers are running their final steps

armanbilge · 2023-04-05T18:49:40Z

Sam points out that the alive changes appear to have fixed #3501 as well 🎉 we should capture that in a test, I'm happy to do that in a follow-up PR as well.

djspiewak

Super sorry for taking so long to get back to this! Trying to climb out of the hole I fell into…

Thank you for taking this on!

samspills and others added 2 commits March 22, 2023 08:52

repro test

c0ed15a

run step one last time before release

0b6eaa4

Co-authored-by: Daniel Spiewak <djspiewak@gmail.com>

samspills self-assigned this Mar 22, 2023

armanbilge reviewed Mar 23, 2023

View reviewed changes

samspills and others added 5 commits March 23, 2023 16:39

test with ticked runtime

d9903ef

publish doneR set immediately

404e129

Test Dispatcher rejects new tasks during shutdown

27c55c3

Co-authored-by: Arman Bilge <armanbilge@gmail.com> Co-authored-by: Andrew Valencik <valencik@users.noreply.github.com>

reduce sleep so test fails in parallel dispatcher

804fdb9

set alive to false when done

24d1bdb

samspills mentioned this pull request Apr 5, 2023

Dispatcher: set alive to false when done #3521

Closed

if alive is innermost resource, it releases first

090aad0

djspiewak approved these changes Apr 15, 2023

View reviewed changes

djspiewak merged commit 5516608 into typelevel:series/3.4.x Apr 15, 2023
51 checks passed

samspills mentioned this pull request Apr 27, 2023

Closing a Dispatcher from within it seems to enter invalid state (?) #3501

Closed

samspills deleted the sam/3506/dispatcher-await branch April 30, 2023 16:29

mperucca mentioned this pull request Jan 30, 2024

Dispatcher sequential runs queued tasks concurrently when closing #3981

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dispatcher: check for outstanding actions before release #3510

Dispatcher: check for outstanding actions before release #3510

samspills commented Mar 22, 2023 •

edited

armanbilge left a comment

armanbilge Mar 23, 2023

samspills Mar 23, 2023

armanbilge Mar 23, 2023

samspills Mar 23, 2023

armanbilge Mar 23, 2023 •

edited

samspills Mar 26, 2023

samspills commented Apr 5, 2023

samspills commented Apr 5, 2023 •

edited

armanbilge commented Apr 5, 2023

djspiewak left a comment

		_ <- dispatcher.use { runner => IO(runner.unsafeRunAndForget(resultR.set(true))) }
		result <- resultR.get

Dispatcher: check for outstanding actions before release #3510

Dispatcher: check for outstanding actions before release #3510

Conversation

samspills commented Mar 22, 2023 • edited

armanbilge left a comment

Choose a reason for hiding this comment

armanbilge Mar 23, 2023

Choose a reason for hiding this comment

samspills Mar 23, 2023

Choose a reason for hiding this comment

armanbilge Mar 23, 2023

Choose a reason for hiding this comment

samspills Mar 23, 2023

Choose a reason for hiding this comment

armanbilge Mar 23, 2023 • edited

Choose a reason for hiding this comment

samspills Mar 26, 2023

Choose a reason for hiding this comment

samspills commented Apr 5, 2023

samspills commented Apr 5, 2023 • edited

armanbilge commented Apr 5, 2023

djspiewak left a comment

Choose a reason for hiding this comment

samspills commented Mar 22, 2023 •

edited

armanbilge Mar 23, 2023 •

edited

samspills commented Apr 5, 2023 •

edited