New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve API for doing parallel execution #229

Closed
oleg-py opened this Issue May 17, 2018 · 10 comments

Comments

5 participants
@oleg-py
Contributor

oleg-py commented May 17, 2018

Manual shift model requires a user to keep in a head if the operation is async or not. Irrelevant of whether or not it's a good idea, API of IO fails short, requiring users to:

  • Sprinkle cumbersome timer.shift calls
  • Know where these are necessary

It would provide a much better user experience if we provide concurrent/parallel semantics out of the box, even with a perf hit of extra scheduling.

As a starting point, any of these changes would greatly improve using IO:

  • Add these methods to IO:
// An infix version of doing manual shift. Less cumbersome to use for a pretty common operation
def shifted(implicit timer: Timer[IO]): IO[A] = timer.shift *> this

// A `start` that shifts automatically. Typically it's what people would want
def fork(implicit timer: Timer[IO]): IO[Fiber[IO, A]] = shifted.start

  • Add suffixed variants to race operators creating async boundaries for users, e.g.:
def raceFork[A, B](fa: IO[A], fb: IO[B])(implicit timer: Timer[IO]): IO[Either[A, B]] =
  IO.race(fa.shifted, fb)

(Alternatively, we can change the base methods and Concurrent[IO] instance to do that for us. Concurrent[IO] instance will then require Timer[IO])


  • Require Timer[IO] for Parallel[IO]. Do timer.shift ourselves in parSequence and parTraverse.
@mpilquist

This comment has been minimized.

Show comment
Hide comment
@mpilquist

mpilquist May 17, 2018

Member

Big 👍 on shifted and fork, though note IO.shift takes an ExecutionContext so perhaps shifted should as well?

The other additions sound good to me too.

Member

mpilquist commented May 17, 2018

Big 👍 on shifted and fork, though note IO.shift takes an ExecutionContext so perhaps shifted should as well?

The other additions sound good to me too.

@alexandru

This comment has been minimized.

Show comment
Hide comment
@alexandru

alexandru May 17, 2018

Member

Sprinkle cumbersome timer.shift calls
Know where these are necessary

I think that with a shifted and fork and a raceFork that won't change. The users will still have to do the manual work of introducing such async boundaries by themselves, whenever they feel necessary

Task was designed for such smartness, however Task was designed to handle fairness concerns in general and parallelism out of the box, whereas IO was only meant to be a down to earth implementation that only handles effects and leaves the fancy stuff to other libraries. We have diverged from that vision along with cancelability, race and Concurrent of course. But we still have to draw the line somewhere.

I do not see an improvement in providing fork and shifted, if the operations are basically flatmapping an IO.shift(implicit timer: Timer[IO]), which is already there.

And here I have a double standard. It is true that in Monix's Task we provided such helpers, of which I'm quite fond, but to tell of feedback I got from the trenches, one of the IntelliJ IDEA developers that is looking at introducing Monix and Scalameta stuff in their IDE told me that he finds it confusing to see multiple operators in Task that do the same thing. And this feeling does have merit, it's not the first time I hear this and it's the reason for why I tried cleaning up Task's API.

With Cats-Effect on the other hand we've got a bigger responsibility to keep the API stable. We cannot retract operations that we introduce so easily, because Cats-Effect will probably end up being used as a transitive dependency in more projects than fs2 or Monix combined.

Require Timer[IO] for Parallel[IO]. Do timer.shift ourselves in parSequence and parTraverse.

Personally I don't think that IO forking is such a good idea. That IO is very explicit in whatever it does, I find that to be a virtue.

It's why I started using it in some instances in which I would have used Task. It's why I find it acceptable for example that we're using it in the signatures exposed by Concurrent, Effect and ConcurrentEffect. Because it's a simple data type with a WYSIWYG philosophy.

We must also consider that these problems only come up because of blocking IO. We wouldn't be talking about this problem if it weren't for the blocking IO required with JDBC for example. However this is the responsibility of the libraries wrapping JDBC or whatever blocking APIs are still in use.

Member

alexandru commented May 17, 2018

Sprinkle cumbersome timer.shift calls
Know where these are necessary

I think that with a shifted and fork and a raceFork that won't change. The users will still have to do the manual work of introducing such async boundaries by themselves, whenever they feel necessary

Task was designed for such smartness, however Task was designed to handle fairness concerns in general and parallelism out of the box, whereas IO was only meant to be a down to earth implementation that only handles effects and leaves the fancy stuff to other libraries. We have diverged from that vision along with cancelability, race and Concurrent of course. But we still have to draw the line somewhere.

I do not see an improvement in providing fork and shifted, if the operations are basically flatmapping an IO.shift(implicit timer: Timer[IO]), which is already there.

And here I have a double standard. It is true that in Monix's Task we provided such helpers, of which I'm quite fond, but to tell of feedback I got from the trenches, one of the IntelliJ IDEA developers that is looking at introducing Monix and Scalameta stuff in their IDE told me that he finds it confusing to see multiple operators in Task that do the same thing. And this feeling does have merit, it's not the first time I hear this and it's the reason for why I tried cleaning up Task's API.

With Cats-Effect on the other hand we've got a bigger responsibility to keep the API stable. We cannot retract operations that we introduce so easily, because Cats-Effect will probably end up being used as a transitive dependency in more projects than fs2 or Monix combined.

Require Timer[IO] for Parallel[IO]. Do timer.shift ourselves in parSequence and parTraverse.

Personally I don't think that IO forking is such a good idea. That IO is very explicit in whatever it does, I find that to be a virtue.

It's why I started using it in some instances in which I would have used Task. It's why I find it acceptable for example that we're using it in the signatures exposed by Concurrent, Effect and ConcurrentEffect. Because it's a simple data type with a WYSIWYG philosophy.

We must also consider that these problems only come up because of blocking IO. We wouldn't be talking about this problem if it weren't for the blocking IO required with JDBC for example. However this is the responsibility of the libraries wrapping JDBC or whatever blocking APIs are still in use.

@oleg-py

This comment has been minimized.

Show comment
Hide comment
@oleg-py

oleg-py May 17, 2018

Contributor

@alexandru I'll answer to your points in random order, hope you don't mind :)

WYSIWYG

Because it's a simple data type with a WYSIWYG philosophy.

Not quite. It's similar to what we had discussed previously about Task.executeOn of Monix. I'll illustrate:


What I see:

val crunchNumbers: IO[Long] = ??? // anything CPU-intensive.

List.fill(6)(crunchNumbers).parSequence

What I think: execute things in parallel, using available system resources
What I get: maybe you don't really want concurrency here ¯\_(ツ)_/¯
What I actually wanted:

List.fill(6)(crunchNumbers).parTraverse(IO.shift *> _)
  • Needlessly verbose. I already said I want parallel. Had I wanted sequentiality, I'd use .sequence.
  • I'll make a point about shifts later.

What I see:

val ioA: IO[String] = ???
val ioB: IO[String] = ???

IO.race(ioA, ioB)

What I think: execute things in parallel, give me the first to complete, cancel the other.
What I get: maybe you want to execute ioA and then afterwards cancel ioB immediately ¯\_(ツ)_/¯
What I actually wanted:

IO.race(IO.shift *> ioA, ioB)
  • Did I not already said, by using race, that I want concurrency? It doesn't come
  • This is not equivalent to doing IO.race(ioB, IO.shift *> ioA).map(_.swap). I thought it would be, but when I checked the impl, first argument must have an async boundary or you get serial behavior.

Shifts and cats.effect.IO role

Shifts are great, I think :) Super-handy for switching from polling thread-pool. For making fairness expicit, too. I don't think IO needs to be anyhow "smarter" about fairness.

For concurrency, I almost always want these. Now I think I know all the cases I need to do shifting explicitly, but it took me a while to learn and just today I misstepped with that race gotcha trying to help a user on Gitter who was frustrated by shifts :D

With Cats-Effect on the other hand we've got a bigger responsibility to keep the API stable. We cannot retract operations that we introduce so easily, because Cats-Effect will probably end up being used as a transitive dependency in more projects than fs2 or Monix combined.

Yep. And also cats.effect.IO will probably be used in more learning projects and code snippets in Scala-FP world. Which is why I think it's not less important that c.e.IO provides an intuitive API, and here the best solution is providing a Concurrent and Parallel that do shifting for user. We can forget fork and shifted if we add these, though I can imagine fork would be used way more often that start.

Because newcomers will want to try doing stuff in parallel and then they would have to learn a concept of async boundaries in addition to already trying to learn purely functional IO. And async boundaries are not quite a trivial concept.

It's fine if there is a library with an effect type that does not do shifting for you. I don't think cats.effect.IO should be that type. I vote for newcomers intuitiveness even if it costs us some overhead.

Optionality of shifting is a lie

We must also consider that these problems only come up because of blocking IO.

And blocking IO is a fact of life today. We're not going to magically get rid of it soon.

Having to know if the IO has an async boundary or not means non-local reasoning. It's not encoded in types. It's not possible to even ask an IO if it has an async boundary. The only way to figure that out is to inspect the call graph for how the IO value is constructed. Or you could defensively add an IO.shift *>.

In fact, writing any generic code that accepts IOs and does concurrent stuff means defensive shifts. And in using such code, I need to be aware whether the implementation shifts for me (or just defensively shift another time). Non-local reasoning, again.

Contributor

oleg-py commented May 17, 2018

@alexandru I'll answer to your points in random order, hope you don't mind :)

WYSIWYG

Because it's a simple data type with a WYSIWYG philosophy.

Not quite. It's similar to what we had discussed previously about Task.executeOn of Monix. I'll illustrate:


What I see:

val crunchNumbers: IO[Long] = ??? // anything CPU-intensive.

List.fill(6)(crunchNumbers).parSequence

What I think: execute things in parallel, using available system resources
What I get: maybe you don't really want concurrency here ¯\_(ツ)_/¯
What I actually wanted:

List.fill(6)(crunchNumbers).parTraverse(IO.shift *> _)
  • Needlessly verbose. I already said I want parallel. Had I wanted sequentiality, I'd use .sequence.
  • I'll make a point about shifts later.

What I see:

val ioA: IO[String] = ???
val ioB: IO[String] = ???

IO.race(ioA, ioB)

What I think: execute things in parallel, give me the first to complete, cancel the other.
What I get: maybe you want to execute ioA and then afterwards cancel ioB immediately ¯\_(ツ)_/¯
What I actually wanted:

IO.race(IO.shift *> ioA, ioB)
  • Did I not already said, by using race, that I want concurrency? It doesn't come
  • This is not equivalent to doing IO.race(ioB, IO.shift *> ioA).map(_.swap). I thought it would be, but when I checked the impl, first argument must have an async boundary or you get serial behavior.

Shifts and cats.effect.IO role

Shifts are great, I think :) Super-handy for switching from polling thread-pool. For making fairness expicit, too. I don't think IO needs to be anyhow "smarter" about fairness.

For concurrency, I almost always want these. Now I think I know all the cases I need to do shifting explicitly, but it took me a while to learn and just today I misstepped with that race gotcha trying to help a user on Gitter who was frustrated by shifts :D

With Cats-Effect on the other hand we've got a bigger responsibility to keep the API stable. We cannot retract operations that we introduce so easily, because Cats-Effect will probably end up being used as a transitive dependency in more projects than fs2 or Monix combined.

Yep. And also cats.effect.IO will probably be used in more learning projects and code snippets in Scala-FP world. Which is why I think it's not less important that c.e.IO provides an intuitive API, and here the best solution is providing a Concurrent and Parallel that do shifting for user. We can forget fork and shifted if we add these, though I can imagine fork would be used way more often that start.

Because newcomers will want to try doing stuff in parallel and then they would have to learn a concept of async boundaries in addition to already trying to learn purely functional IO. And async boundaries are not quite a trivial concept.

It's fine if there is a library with an effect type that does not do shifting for you. I don't think cats.effect.IO should be that type. I vote for newcomers intuitiveness even if it costs us some overhead.

Optionality of shifting is a lie

We must also consider that these problems only come up because of blocking IO.

And blocking IO is a fact of life today. We're not going to magically get rid of it soon.

Having to know if the IO has an async boundary or not means non-local reasoning. It's not encoded in types. It's not possible to even ask an IO if it has an async boundary. The only way to figure that out is to inspect the call graph for how the IO value is constructed. Or you could defensively add an IO.shift *>.

In fact, writing any generic code that accepts IOs and does concurrent stuff means defensive shifts. And in using such code, I need to be aware whether the implementation shifts for me (or just defensively shift another time). Non-local reasoning, again.

@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek May 17, 2018

would IO.race be lawful if we force the shift on the left?

If so, that seems like the way to go. I didn't know I had to do the extra work to actually use race.

johnynek commented May 17, 2018

would IO.race be lawful if we force the shift on the left?

If so, that seems like the way to go. I didn't know I had to do the extra work to actually use race.

@oleg-py

This comment has been minimized.

Show comment
Hide comment
@oleg-py

oleg-py May 18, 2018

Contributor

We can forgo the shifted and fork if we make the following changes:

  • Parallel[IO.Par], Concurrent[IO] and ConcurrentEffect[IO] would require a Timer[IO] in scope. All .par* operators, start, race and racePair contain a timer.shift call for IO implementations.
  • Methods on IO companion object which imply concurrency - race, and racePair - take implicit timer: Timer[IO] as a parameter. Same goes for IO#start instance method.
  • No changes to type-classes or their companion objects. This change only affects instances for cats.effect.IO and the signatures.

IMO this gives us a huge win for intuitiveness of making IO concurrent/parallel

Contributor

oleg-py commented May 18, 2018

We can forgo the shifted and fork if we make the following changes:

  • Parallel[IO.Par], Concurrent[IO] and ConcurrentEffect[IO] would require a Timer[IO] in scope. All .par* operators, start, race and racePair contain a timer.shift call for IO implementations.
  • Methods on IO companion object which imply concurrency - race, and racePair - take implicit timer: Timer[IO] as a parameter. Same goes for IO#start instance method.
  • No changes to type-classes or their companion objects. This change only affects instances for cats.effect.IO and the signatures.

IMO this gives us a huge win for intuitiveness of making IO concurrent/parallel

@alexandru

This comment has been minimized.

Show comment
Hide comment
@alexandru

alexandru May 18, 2018

Member

So I still don’t like raceFork because once IO is understood, it doesn’t help much.

Requiring Timer in IO’s instances is a solution but I’d would like buyin from fs2 and Http4s, as they are the biggest users of Concurrent. Yes, @SystemFw’s heart helps 🙂

Otherwise whatever we agree on is cool.

Member

alexandru commented May 18, 2018

So I still don’t like raceFork because once IO is understood, it doesn’t help much.

Requiring Timer in IO’s instances is a solution but I’d would like buyin from fs2 and Http4s, as they are the biggest users of Concurrent. Yes, @SystemFw’s heart helps 🙂

Otherwise whatever we agree on is cool.

@oleg-py

This comment has been minimized.

Show comment
Hide comment
@oleg-py

oleg-py May 18, 2018

Contributor

@alexandru we won't add raceFork, we change race instead.

Currently it's perfectly lawful to have an implementation without auto-fork. If we ever reach a non-blocking fantasy land, a lawful implementation can exist that makes advantage of it 😄

Contributor

oleg-py commented May 18, 2018

@alexandru we won't add raceFork, we change race instead.

Currently it's perfectly lawful to have an implementation without auto-fork. If we ever reach a non-blocking fantasy land, a lawful implementation can exist that makes advantage of it 😄

@mpilquist

This comment has been minimized.

Show comment
Hide comment
@mpilquist

mpilquist May 18, 2018

Member

👍 for requiring race to evaluate both arguments concurrently. This would effectively remove ExecutionContext from FS2's API.

Member

mpilquist commented May 18, 2018

👍 for requiring race to evaluate both arguments concurrently. This would effectively remove ExecutionContext from FS2's API.

@alexandru

This comment has been minimized.

Show comment
Hide comment
@alexandru

alexandru May 19, 2018

Member

@oleg-py will you do a PR?

Member

alexandru commented May 19, 2018

@oleg-py will you do a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment