Improve Semaphore.release behavior #380

yarhrn · 2018-09-27T21:10:42Z

This

for {
  semaphore <- Semaphore[IO](0)
  _ <- {
    for {
      _ <- semaphore.acquire
      _ <- IO().foreverM
    } yield ()
  }.start
  _ <- IO(Thread.sleep(1000))
  _ <- semaphore.release
} yield ()

is hanging forever (in case you run it from the main method for example) because semaphore.release will add permits and complete waiting readers in a synchronous way.

(we discovered this while debugging flaky tests. if we remove from given example IO(Thread.sleep(1000)), there is a race between semaphore.acquire and semaphore.release. This is even harder to detect and quite error-prone.)

In our application, I am trying to build some consumer-like loop based on Semaphore.aquire and Monad.iterateWhile, Concurrent is required for Semaphore construction and the logic of synchronous completion is quite unexpected.

My proposal is :

make cats.effect.concurrent.Semaphore.AbstractSemaphore#open abstract
add default implementation to AsyncSemaphore
add implementation with wrapping in Concurrent.start to ConcurrentSemaphore.

As a result, releasing readers will be asynchronous and should handle similar situations.

WDYT?

P.S. Thank you for the amazing lib!

The text was updated successfully, but these errors were encountered:

pchlupacek · 2018-09-28T15:39:52Z

Do i understand this correctly that problem is that semaphore block when releasing at 0?

SystemFw · 2018-09-28T15:55:17Z

The "problem" is that readers are awaken in order, and without a further shift, so by waiting on it, you eliminate the effect of start, unless you shift again.
I've dealt with the same problem in this conversation https://gitter.im/typelevel/cats-effect?at=5ba5119833da0f649e3257df (warning: it's long).

Note that this behaviour is consistent:

  def p1 = {
    for {
    sem <- Semaphore[IO](0)
    _ <- (sem.acquire *> IO.unit.foreverM).start
    _ <- timer.sleep(1.second)
    _ <- sem.release
    } yield ()
  }.timeout(5.seconds).unsafeRunSync

  def p2 = {
    for {
    d <- Deferred[IO, Unit]
    _ <- (d.get *> IO.unit.foreverM).start
    _ <- timer.sleep(1.second)
    _ <- d.complete(())
    } yield ()
  }.timeout(5.seconds).unsafeRunSync

  def p3 = {
    for {
    p <- IO(scala.concurrent.Promise[Unit])
    _ <- (IO.fromFuture(IO(p.future)) *> IO.unit.foreverM).start
    _ <- timer.sleep(1.second)
    _ <- IO(p.success(()))
    } yield ()
  }.timeout(5.seconds).unsafeRunSync

they all timeout.
I'm not sure this should be treated as a "problem" though. Waking up in order is important, and you can recover what you need simply by release.start.

On a separate note, do we have a problem with MVar @alexandru @oleg-py ?
This code doesn't timeout but blocks forever, am I missing something?

  def p4 = {
    for {
    mvar <- MVar[IO].empty[Unit]
    _ <- (mvar.take *> IO.unit.foreverM).start
    _ <- timer.sleep(1.second)
    _ <- mvar.put(())
    } yield ()
  }.timeout(5.seconds).unsafeRunSync

it seems like it's uncancelable, which it shouldn't be

EDIT:
To be clear, I didn't put "problem" in quotes to be dismissive, I'm genuinely unsure whether this should just be archived as expected behaviour, or addressed

alexandru · 2018-10-28T19:19:37Z

I don't understand the problem.

release shouldn't block on anything and it shouldn't be asynchronous either.

alexandru · 2018-10-28T19:45:42Z

Ah, OK, I understood the problem.

First of all, in all samples, if you replace IO.unit.foreverM with IO.never they work.

What happens is that release calls the callback registered by acquire. Unfortunately this is problematic in case that callback invokes something that never terminates and without any async boundaries.

This problem does not happen with Monix's Task due to its default auto-forking behavior 😉

The fix could be to introduce an async boundary when calling those releases. Unfortunately we need an ExecutionContext or ContextShift that we don't have and introducing one breaks binary compatibility.

I'm also unsure if we should fix it, but yes, we probably need to improve the behavior in version 2.0.

alexandru · 2018-10-28T19:47:40Z

@SystemFw to answer your question, no, there's nothing wrong with MVar, it's the same problem 😉

alexandru · 2018-10-28T20:18:16Z

Btw, I just modified in Monix:

the MVar implementation
the Semaphore implementation

Both now take a ContextShift parameter in their builders.

This is the right approach for as long as calling the callback injected in Async.async doesn't fork — and the designers of scala.concurrent.Future would be smiling on reading this issue 🙂

SystemFw · 2018-10-28T22:23:57Z

First of all, in all samples, if you replace IO.unit.foreverM with IO.never they work.

I put that on purpose, because the original example I linked was dealing with genuinely blocking code (albeit it was a long running CPU bound computation), and when I created that snippet IO wasn't autocancelable I don't think (and it's still not fair), so it was a way of exemplifying how to reproduce the issue.

This problem does not happen with Monix's Task due to its default auto-forking behavior

Yeah, also mentioned that in the linked convo, don't worry ;) (although we had to lower the batch size)

there's nothing wrong with MVar, it's the same problem

Wdym? Why it doesn't timeout like the other implementations?

Both now take a ContextShift parameter in their builders.
This is the right approach for as long as calling the callback injected in Async.async doesn't fork — and the designers of scala.concurrent.Future would be smiling on reading this issue

Yeah, the original fs2 Promise also shifted (using EC) when waking up callbacks I think, but then we lost sight of when that might be useful and it went away at some point. That being said, this case is imho a lot rarer than with start, so I'm not entirely sure if shift (or start) should be left to the user or embedded in the constructors.

alexandru · 2018-10-29T08:31:10Z

@SystemFw I discovered a partial fix and implemented it for MVar: #402

This fixes this particular scenario. I don't know if there are other scenarios we should take care of, but it fixes this one.

What I did was to insert "light async boundaries" before put and take. In essence the scenario stops happening because calling that callback in put will end up pushing a Runnable in the internal TrampolineEC, thus delaying it enough that what comes after put() gets executed first.

This is probably not sufficient. One other thing we could do is to make the IO run-loop more cooperative. We are already checking for cancelation in fibers and we do so in batches. However what we could do is to also insert these light async boundaries and thus the run-loop could cooperate automatically, without auto-forking. That too would have solved this issue and we really need it.

Again this isn't an issue with Monix's Task because Monix's Task already has a more fair implementation.

But I don't have time for IO's run-loop right now, so that will have to wait.

alexandru · 2018-10-29T08:31:30Z

Also, if any of you could implement what I did for MVar in Semaphore, that would be great.

…ffect#380

* Semaphore changes * Test for typelevel/cats-effect#380

SystemFw · 2018-11-11T20:17:18Z

One other thing we could do is to make the IO run-loop more cooperative. We are already checking for cancelation in fibers and we do so in batches. However what we could do is to also insert these light async boundaries and thus the run-loop could cooperate automatically, without auto-forking. That too would have solved this issue and we really need it.

I somehow missed this when reading, but I wholeheartedly agree

Issue #380: Improve MVar by adding light async boundaries before put and take

Fix #380: Deferred, Semaphore and MVar

alexandru mentioned this issue Oct 28, 2018

Make MVar fork on async take monix/monix#756

Merged

alexandru mentioned this issue Oct 29, 2018

Issue #380: Improve MVar by adding light async boundaries before put and take #402

Merged

alexandru added a commit to alexandru/monix that referenced this issue Oct 29, 2018

Scheduler.timerLiftIO extension, add tests for issue typelevel/cats-e…

52aaee8

…ffect#380

alexandru added a commit to alexandru/monix that referenced this issue Oct 29, 2018

Test for typelevel/cats-effect#380

25bdadf

alexandru added a commit to monix/monix that referenced this issue Oct 30, 2018

Semaphore (#754)

1afaaa9

* Semaphore changes * Test for typelevel/cats-effect#380

SystemFw added a commit that referenced this issue Nov 11, 2018

Merge pull request #402 from typelevel/issue-380

e70882b

Issue #380: Improve MVar by adding light async boundaries before put and take

SystemFw mentioned this issue Nov 13, 2018

Improve fairness on wakeup #416

Closed

alexandru mentioned this issue Nov 24, 2018

Fix #380: Deferred, Semaphore and MVar #424

Merged

SystemFw closed this as completed in #424 Nov 30, 2018

SystemFw added a commit that referenced this issue Nov 30, 2018

Merge pull request #424 from alexandru/run-loop-experiments

390e2a7

Fix #380: Deferred, Semaphore and MVar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Semaphore.release behavior #380

Improve Semaphore.release behavior #380

yarhrn commented Sep 27, 2018 •

edited

pchlupacek commented Sep 28, 2018

SystemFw commented Sep 28, 2018 •

edited

alexandru commented Oct 28, 2018

alexandru commented Oct 28, 2018

alexandru commented Oct 28, 2018

alexandru commented Oct 28, 2018

SystemFw commented Oct 28, 2018 •

edited

alexandru commented Oct 29, 2018

alexandru commented Oct 29, 2018

SystemFw commented Nov 11, 2018

Improve Semaphore.release behavior #380

Improve Semaphore.release behavior #380

Comments

yarhrn commented Sep 27, 2018 • edited

pchlupacek commented Sep 28, 2018

SystemFw commented Sep 28, 2018 • edited

alexandru commented Oct 28, 2018

alexandru commented Oct 28, 2018

alexandru commented Oct 28, 2018

alexandru commented Oct 28, 2018

SystemFw commented Oct 28, 2018 • edited

alexandru commented Oct 29, 2018

alexandru commented Oct 29, 2018

SystemFw commented Nov 11, 2018

yarhrn commented Sep 27, 2018 •

edited

SystemFw commented Sep 28, 2018 •

edited

SystemFw commented Oct 28, 2018 •

edited