Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Semaphore.release behavior #380

Closed
yarhrn opened this issue Sep 27, 2018 · 10 comments · Fixed by #424
Closed

Improve Semaphore.release behavior #380

yarhrn opened this issue Sep 27, 2018 · 10 comments · Fixed by #424

Comments

@yarhrn
Copy link

yarhrn commented Sep 27, 2018

This

for {
  semaphore <- Semaphore[IO](0)
  _ <- {
    for {
      _ <- semaphore.acquire
      _ <- IO().foreverM
    } yield ()
  }.start
  _ <- IO(Thread.sleep(1000))
  _ <- semaphore.release
} yield ()

is hanging forever (in case you run it from the main method for example) because semaphore.release will add permits and complete waiting readers in a synchronous way.

(we discovered this while debugging flaky tests. if we remove from given example IO(Thread.sleep(1000)), there is a race between semaphore.acquire and semaphore.release. This is even harder to detect and quite error-prone.)

In our application, I am trying to build some consumer-like loop based on Semaphore.aquire and Monad.iterateWhile, Concurrent is required for Semaphore construction and the logic of synchronous completion is quite unexpected.

My proposal is :

  1. make cats.effect.concurrent.Semaphore.AbstractSemaphore#open abstract
  2. add default implementation to AsyncSemaphore
  3. add implementation with wrapping in Concurrent.start to ConcurrentSemaphore.

As a result, releasing readers will be asynchronous and should handle similar situations.

WDYT?

P.S. Thank you for the amazing lib!

@pchlupacek
Copy link

Do i understand this correctly that problem is that semaphore block when releasing at 0?

@SystemFw
Copy link
Contributor

SystemFw commented Sep 28, 2018

The "problem" is that readers are awaken in order, and without a further shift, so by waiting on it, you eliminate the effect of start, unless you shift again.
I've dealt with the same problem in this conversation https://gitter.im/typelevel/cats-effect?at=5ba5119833da0f649e3257df (warning: it's long).

Note that this behaviour is consistent:

  def p1 = {
    for {
    sem <- Semaphore[IO](0)
    _ <- (sem.acquire *> IO.unit.foreverM).start
    _ <- timer.sleep(1.second)
    _ <- sem.release
    } yield ()
  }.timeout(5.seconds).unsafeRunSync

  def p2 = {
    for {
    d <- Deferred[IO, Unit]
    _ <- (d.get *> IO.unit.foreverM).start
    _ <- timer.sleep(1.second)
    _ <- d.complete(())
    } yield ()
  }.timeout(5.seconds).unsafeRunSync

  def p3 = {
    for {
    p <- IO(scala.concurrent.Promise[Unit])
    _ <- (IO.fromFuture(IO(p.future)) *> IO.unit.foreverM).start
    _ <- timer.sleep(1.second)
    _ <- IO(p.success(()))
    } yield ()
  }.timeout(5.seconds).unsafeRunSync

they all timeout.
I'm not sure this should be treated as a "problem" though. Waking up in order is important, and you can recover what you need simply by release.start.

On a separate note, do we have a problem with MVar @alexandru @oleg-py ?
This code doesn't timeout but blocks forever, am I missing something?

  def p4 = {
    for {
    mvar <- MVar[IO].empty[Unit]
    _ <- (mvar.take *> IO.unit.foreverM).start
    _ <- timer.sleep(1.second)
    _ <- mvar.put(())
    } yield ()
  }.timeout(5.seconds).unsafeRunSync

it seems like it's uncancelable, which it shouldn't be

EDIT:
To be clear, I didn't put "problem" in quotes to be dismissive, I'm genuinely unsure whether this should just be archived as expected behaviour, or addressed

@alexandru
Copy link
Member

I don't understand the problem.

release shouldn't block on anything and it shouldn't be asynchronous either.

@alexandru
Copy link
Member

Ah, OK, I understood the problem.

First of all, in all samples, if you replace IO.unit.foreverM with IO.never they work.

What happens is that release calls the callback registered by acquire. Unfortunately this is problematic in case that callback invokes something that never terminates and without any async boundaries.

This problem does not happen with Monix's Task due to its default auto-forking behavior 😉

The fix could be to introduce an async boundary when calling those releases. Unfortunately we need an ExecutionContext or ContextShift that we don't have and introducing one breaks binary compatibility.

I'm also unsure if we should fix it, but yes, we probably need to improve the behavior in version 2.0.

@alexandru
Copy link
Member

@SystemFw to answer your question, no, there's nothing wrong with MVar, it's the same problem 😉

@alexandru
Copy link
Member

Btw, I just modified in Monix:

Both now take a ContextShift parameter in their builders.

This is the right approach for as long as calling the callback injected in Async.async doesn't fork — and the designers of scala.concurrent.Future would be smiling on reading this issue 🙂

@SystemFw
Copy link
Contributor

SystemFw commented Oct 28, 2018

First of all, in all samples, if you replace IO.unit.foreverM with IO.never they work.

I put that on purpose, because the original example I linked was dealing with genuinely blocking code (albeit it was a long running CPU bound computation), and when I created that snippet IO wasn't autocancelable I don't think (and it's still not fair), so it was a way of exemplifying how to reproduce the issue.

This problem does not happen with Monix's Task due to its default auto-forking behavior

Yeah, also mentioned that in the linked convo, don't worry ;) (although we had to lower the batch size)

there's nothing wrong with MVar, it's the same problem

Wdym? Why it doesn't timeout like the other implementations?

Both now take a ContextShift parameter in their builders.
This is the right approach for as long as calling the callback injected in Async.async doesn't fork — and the designers of scala.concurrent.Future would be smiling on reading this issue

Yeah, the original fs2 Promise also shifted (using EC) when waking up callbacks I think, but then we lost sight of when that might be useful and it went away at some point. That being said, this case is imho a lot rarer than with start, so I'm not entirely sure if shift (or start) should be left to the user or embedded in the constructors.

@alexandru
Copy link
Member

@SystemFw I discovered a partial fix and implemented it for MVar: #402

This fixes this particular scenario. I don't know if there are other scenarios we should take care of, but it fixes this one.

What I did was to insert "light async boundaries" before put and take. In essence the scenario stops happening because calling that callback in put will end up pushing a Runnable in the internal TrampolineEC, thus delaying it enough that what comes after put() gets executed first.

This is probably not sufficient. One other thing we could do is to make the IO run-loop more cooperative. We are already checking for cancelation in fibers and we do so in batches. However what we could do is to also insert these light async boundaries and thus the run-loop could cooperate automatically, without auto-forking. That too would have solved this issue and we really need it.

Again this isn't an issue with Monix's Task because Monix's Task already has a more fair implementation.

But I don't have time for IO's run-loop right now, so that will have to wait.

@alexandru
Copy link
Member

Also, if any of you could implement what I did for MVar in Semaphore, that would be great.

alexandru added a commit to alexandru/monix that referenced this issue Oct 29, 2018
alexandru added a commit to monix/monix that referenced this issue Oct 30, 2018
* Semaphore changes

* Test for typelevel/cats-effect#380
@SystemFw
Copy link
Contributor

One other thing we could do is to make the IO run-loop more cooperative. We are already checking for cancelation in fibers and we do so in batches. However what we could do is to also insert these light async boundaries and thus the run-loop could cooperate automatically, without auto-forking. That too would have solved this issue and we really need it.

I somehow missed this when reading, but I wholeheartedly agree

SystemFw added a commit that referenced this issue Nov 11, 2018
Issue #380: Improve MVar by adding light async boundaries before put and take
SystemFw added a commit that referenced this issue Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants