Optimize traverseWithChain for StackSafeMonad(s) #3790

djspiewak · 2021-02-12T13:54:05Z

The traverseWithChain implementation is really good and performs well in benchmarks on downstream datatypes, but it has a meaningful flaw: it forces added allocations through Eval at every step of the iteration. This is something that is definitely necessary when you're trying to implement it on an arbitrary applicative, but that really isn't the case most of the time.

The common scenario I'm thinking about is traversing List with IO, which happens both in the context of parTraverse and the more conventional traverse, and it happens very frequently. When you instantiate G = IO within traverseWithChain, the map2Eval and corresponding machinery is no longer necessary because IO itself is stack-safe, and thus we can entirely remove that wrapping allocation in each step.

My proposal would be to do this by checking to see if G.isInstanceOf[StackSafeMonad[G]]. The reasoning here is that most such implementations will take advantage of the StackSafeMonad utility mixin. For those that don't, the old Eval implementation will still work, but for those that do, we have the guarantee (based on their own tailRecM laws) that their flatMap is capable of suspending the laziness just as safely as Eval's. This covers the common case of IO as well as several other similar cases, and achieves a significant speed-up in downstream performance without sacrificing anything other than ugliness of implementation.

The text was updated successfully, but these errors were encountered:

johnynek · 2021-08-09T19:55:39Z

Forgive the verbose reply (much known to @djspiewak but I am adding for additional context for other readers).

This is an important question. Arguably, traverse is the most important function in cats since it is the functional programming answer to effectful loops, which are ubiquitous in programming. Cats has always been trying to serve two goals: 1. pure functional programming on scala 2. but fast.

If we want fast, we generally want to avoid laziness since that generally adds boxing and wrapping. But also, we want to preserve "short circuiting" error semantics when we can. Since we want fast, cats defaults to strict arguments, thus def map2[A, B](fa: F[A], fb: F[B])... requires the left and right to have been evaluated to F[_] before we can merge them. Early in cats, Eval[A] was the answer when we wanted targeted laziness since it was explicit, it can be composed (unlike call-by-name), and is reasonably fast for an pure execution monad (it is not for side effects). Thus we have both map2 and map2Eval, the latter taking an Eval[F[B]] to perserve laziness.

The problem comes when F[_] is already a non-strict type constructor, and we have to wrap with laziness twice: once in F[_] and once in Eval[_].

Here I want to point out a big challenge we have: optimizing traverse is hard. Generally it depends on both the particular data structure we are traversing (say T[_]) AND the applicative context F[_] we are in. For some F[_]s, the eval is a total waste: we always have to evaluate it (think Writer, Id, Eval, TailRec, Function0, Function1, NonEmptyList, NonEmptyVector...). For other F[_]s the Eval is a waste because it is already lazy: IO[_], Kliesli where the inner F has error semantics, EitherT with a lazy inner F[_]. I think our default to map2Eval is privileging strict, but error-like, applicatives: Option, OptionT, Either, EitherT, possibly empty collections (List, Vector) which can short-circuit if the left hand side is empty.

Also, it is key to note that due to our laws, the result must be the same if we use map2Eval or map2, the only possible difference is the cost to compute in the presence of some short circuits. So, ideally, we would want the applicative to be able to signal it should use map2 or map2Eval, but currently there is no way.

What is missing is a kind of complete taxonomy of applicatives that might shed some light. E.g. we have lazy function-like applicatives (Function0, Function1, Kleisli, State, Writer, Eval, TailRec, Parsers, ...), we have error-like applicatives: Either, Try, Future. We have effectful applicatives: IO, IO.Par, ... we have possibly empty collection applicatives (List, Vector, ZipList,...), we have non-empty collections (NonEmptyList, NonEmptyVector...). It seems to me our current choice is helping strict error-like and possibly empty collections at the expense of the other kinds.

Now, you can argue that the other kinds are the main uses: State, Writer, Parsers, IO, IO.Par. Further, you can ask why we are trying so hard to avoid calling the A => F[B] function in traverse if we are in a pure context anyway. Generally we write code with the assumption that the error cases are rare, yet here we seem to be optimizing for the error case and making that faster. I think it may be that in the early days of cats, before cats-effect, people were smuggling effects into that function call (think A => Future[B] or A => Try[B] that were actually having some side-effect) and they wanted laziness there to fail fast and avoid effects that were wasted. Are there other motivations for minimizing calls other than optimizing for errors, or reducing effectful calls (which shouldn't be there in modern cats code).

Lastly, @djspiewak suggests a very practical solution: if we check that the Applicative is a StackSafeMonad, we have a very good hint that we are not in the cases above that we are optimizing for: e.g. Strict but error-like monads (Try, Either, possibly empty collections). I believe a strict monad can never be a StackSafeMonad, but I don't see a proof immediately.

A last note: a few years ago, I noticed a typeclass called Defer: #2279. This represents something we can suspend without having to evaluate. This isn't something that comes up in Haskell because of the non-strict nature of the language, but in scala it does. I wonder if all StackSafeMonad must also be Defer since they could implement def defer[A](a: => F[A]): F[A] = monad.flatMap(monad.unit)(_ => a). If this is the case, then we could consider forcing trait StackSafeMonad[M[_]] extends Monad[M] with Defer[M]. In such a case, we can explicitly leverage the defer implementation of the monad instead of Eval.defer.

I think this is an interesting area for discussion and research on FP in strict languages. Maybe something is already known related to what we are currently groping in the dark for. I am not opposed to Daniel's suggestion, but given we have waited this long, I do think it is worth a bit of discussion and thinking to see if we can synthesize any new knowledge here in 2021 that we didn't really have when cats made the map2Eval design in the first place.

non · 2021-08-10T03:11:35Z

Just on the surface I think trying to unify Defer and StackSafeMonad sounds really compelling.

(That said if it proves thorny I think @djspiewak's tactical suggestion also works.)

johnynek · 2021-08-10T05:36:37Z

see #3962 for a stab at some of what I wrote.

johnynek · 2021-08-10T19:42:26Z

@djspiewak

What do you think of something like this:

trait TraverseStrategy[F[_]] {
  type Rhs[_]
  
  def map2[A, B, C](fa: F[A], fb: Rhs[B])(fn: (A, B) => C): Rhs[C]
  def build[A, B](a: A)(fn: A => F[B]): Rhs[B]
  def toF[A](f: Rhs[A]): F[A]
}

object TraverseStrategy {
  def viaEval[F[_]](ap: Apply[F]): TraverseStrategy[F] =
    new TraverseStrategy[F] {
      type Rhs[A] = Eval[F[A]]

      def map2[A, B, C](fa: F[A], fb: Rhs[B])(fn: (A, B) => C): Rhs[C] =
        ap.map2Eval(fa, fb)(fn)

      def build[A, B](a: A)(fn: A => F[B]): Rhs[B] =
        Eval.always(fn(a))

      def toF[A](f: Rhs[A]): F[A] = f.value
    }

  def direct[F[_]](ap: Apply[F]): TraverseStrategy[F] =
    new TraverseStrategy[F] {
      type Rhs[A] = F[A]

      def map2[A, B, C](fa: F[A], fb: Rhs[B])(fn: (A, B) => C): Rhs[C] =
        ap.map2(fa, fb)(fn)

      def build[A, B](a: A)(fn: A => F[B]): Rhs[B] = fn(a)

      def toF[A](f: Rhs[A]): F[A] = f
    }

  def default[F[_]](ap: Apply[F]): TraverseStrategy[F] =
    ap match {
      case _: StackSafeMonad[F] @ unchecked => direct(ap)
      case _ => viaEval(ap)
   }
}

trait Apply[F[_]] {
  // override this to change the traverse strategy
  lazy val traverseStrategy[F]: TraverseStrategy[F] = TraverseStrategy[F].default(this)
}

Or something close to that. Then we could override the traverseStrategy for some non-stacksafe monads that
would still be better with a direct traversal.

I'm not in love with the circular dependency here (which I think lazy val resolves).

djspiewak · 2021-08-10T20:57:53Z

I don't believe you need the lazy val actually; this is just definitionally circular. The bigger concern I have is this feels very ad hoc. Like, I get that Traverse itself is actually somewhat ad hoc, but I really would love to find something a little bit more from first principles if possible here, otherwise I have very little confidence that we're actually covering the full problem space. Also it doesn't help that this reminds me of CanBuildFrom :-P

johnynek · 2021-08-10T21:04:51Z

say more about your thoughts about the ad-hoc-ness... My goal here is to allow F[_] to communicate how map2 is done (basically one of two strategies: use map2 or use map2Eval). To me, we already have those two strategies baked into Apply, we just can't communicate up which one is more appropriate to use.

In some sense, I think that's the main problem: Apply defines both map2 and map2Eval, but can't say anything about which strategy is better for F, and in an abstract context, you have no idea which one is the right one to choose.

djspiewak mentioned this issue Feb 12, 2021

Consider providing optimized versions of IO.sequence typelevel/cats-effect#369

Closed

johnynek mentioned this issue Aug 10, 2021

Make StackSafeMonad extend Defer #3962

Open

johnynek mentioned this issue Sep 13, 2021

Introduce Apply.TraverseStrategy #3993

Closed

djspiewak mentioned this issue Mar 6, 2023

Improve traverse performance through increased specialization #4408

Open

TonioGela mentioned this issue Jun 21, 2023

Making Map.unorderedTraverse stack safe #4463

Merged

armanbilge mentioned this issue Mar 30, 2024

Should Monad extend Defer? #4579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize traverseWithChain for StackSafeMonad(s) #3790

Optimize traverseWithChain for StackSafeMonad(s) #3790

djspiewak commented Feb 12, 2021

johnynek commented Aug 9, 2021 •

edited

Loading

non commented Aug 10, 2021

johnynek commented Aug 10, 2021

johnynek commented Aug 10, 2021

djspiewak commented Aug 10, 2021

johnynek commented Aug 10, 2021

Optimize traverseWithChain for StackSafeMonad(s) #3790

Optimize traverseWithChain for StackSafeMonad(s) #3790

Comments

djspiewak commented Feb 12, 2021

johnynek commented Aug 9, 2021 • edited Loading

non commented Aug 10, 2021

johnynek commented Aug 10, 2021

johnynek commented Aug 10, 2021

djspiewak commented Aug 10, 2021

johnynek commented Aug 10, 2021

johnynek commented Aug 9, 2021 •

edited

Loading