Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add combineAllOption to Foldable #2380

Merged
merged 11 commits into from
Nov 14, 2019
Merged

Add combineAllOption to Foldable #2380

merged 11 commits into from
Nov 14, 2019

Conversation

barambani
Copy link
Contributor

Issue #2363

@codecov-io
Copy link

codecov-io commented Aug 10, 2018

Codecov Report

Merging #2380 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2380      +/-   ##
=========================================
+ Coverage   93.19%   93.2%   +<.01%     
=========================================
  Files         376     376              
  Lines        7323    7324       +1     
  Branches      190     187       -3     
=========================================
+ Hits         6825    6826       +1     
  Misses        498     498
Flag Coverage Δ
#scala_version_212 93.53% <100%> (ø) ⬆️
#scala_version_213 90.86% <100%> (ø) ⬆️
Impacted Files Coverage Δ
core/src/main/scala/cats/syntax/foldable.scala 100% <ø> (ø) ⬆️
...src/main/scala-2.13+/cats/instances/lazyList.scala 100% <ø> (ø) ⬆️
core/src/main/scala/cats/Foldable.scala 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fb1327...f951eef. Read the comment docs.

* given a Monoid evidence for `A`, it returns None if the foldable is empty or combines all the `A`s if it's not
*/
def combineAllOption(implicit ev: Monoid[A], F: Foldable[F]): Option[A] =
if (F.isEmpty(fa)) None else Some(F.combineAll(fa))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be using reduceLeftOption and Semigroup.combine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't use Monoid.combineAll or Semigroup.combineAllOption we are missing the optimization. This is for cases when we can use internal mutability to aggregate many fast, which is pretty critical in some applications (e.g. spark/scalding).

If we just call out to combine we are destroying optimizations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's tricky here, we don't want to impose Monoidand I wasn't aware there is optimization for Semigroup.combineAllOption (actually I still can't find it), but even it does, we are in a syntax extension, so unlike fold, we can't optimize combineAllOption in Foldable instances.

So to retain the optimization we probably need the keep the Monoid requirement which is rather unfortunate, we'd better document the rationale. Another option is to wait until Cats 2.0 with which we'll be able to add a combineAllOption to Foldable

In the meantime @johnynek would you mind point me to the where the Semigroup.combineAllOption's optimization is? we should probably document it somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, here is the method:

https://github.com/typelevel/cats/blob/master/kernel/src/main/scala/cats/kernel/Semigroup.scala#L39

note any semigroup can override that (as we do in other typeclasses for things like map2, product, etc.. in Applicative).

In algebird, which uses cats.kernel, we do this very frequently.

e.g.:
https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/HyperLogLog.scala#L582

https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/CountMinSketch.scala#L141

there are more. The point is the person implementing the Semigroup knows if there is a more efficient way to combine many of them, and should control that. Other typeclasses ideally should delegate to that to not undo the optimization.

Copy link
Contributor Author

@barambani barambani Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about this optimization and the practice of overriding Semigroup's combineAllOption (and others) to address specific domain's performance gains. Thanks. What I can think about to exploit it is something like

def combineAllOption(implicit ev: Semigroup[A], F: Foldable[F]): Option[A] =
   if (F.isEmpty(fa)) None else ev.combineAllOption(F.toList(fa))

bit it still adds a traversal of the foldable for the conversion to TraversableOnce. I'm not sure it's acceptable.
Anyway I'm also ok to abort this and wait for Cats 2.0 like @kailuowang mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tough in this case since here you are trading off the potential added cost of materializing a list with the potential benefit of reducing an entire group together.

Many Foldables could actually create an Iterator to pass into combineAllOption on Semigroup, but since we don't have a method like that we can only make the List.

We could imagine:

def onIterator[B](fa: F[A])(itFn: Iterator[A] => B): B =
  itFn(toList(fa).iterator)

on Foldable in cats 2.0. In this world, if you can make an Iterator for the caller to consume. Alternatively, we could add Fold[A, B] to cats:
https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Fold.scala#L40

so we could have def foldWith[A, B](fa: F[A])(fold: Fold[A, B]): B on Foldable. Since folds ca potentially have internal mutable state, they can do much the same optimization of combineAllOption, although, not in parallel (since you can't split over trees).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot say if we could add Fold so I would leave the decision to others. About the other option, how could I use the onIterator if it were available ? Still with the semigroup combineAllOption or that would be alternative to depending on overriding that in Semigroup ? Because if onIterator wouldn't need to be overridden, could it be another syntax ? Thanks for the explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def combineAllOption(implicit ev: Semigroup[A], F: Foldable[F]): Option[A] =
   if (F.isEmpty(fa)) None else F.onIterator(ev.combineAllOption(_))

Copy link
Contributor Author

@barambani barambani Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Right. Still aiming at mitigating the possible cost of building a list, it could be delayed up to when the iterator is traversing it. Something like

def combineAllOption(implicit ev: Semigroup[A], F: Foldable[F]): Option[A] =
  if (F.isEmpty(fa)) None else ev.combineAllOption(toIterator)

def toIterator(implicit F: Foldable[F]): Iterator[A] =
  F.foldRight[A, Stream[A]](fa, Eval.later(Stream.empty)) {
    (a, eb) => eb map (Stream.cons(a, _))
  }.value.iterator

I'm not sure actually it be a step forward and is getting late. I will keep looking.

@plippe
Copy link

plippe commented May 7, 2019

Do we have any estimates when this could/would be merged ?

@barambani would you mind rebasing to get rid of the conflicts

@kailuowang kailuowang added this to the 2.1-RC1 milestone May 7, 2019
@kailuowang
Copy link
Contributor

Now that we are close to 2.0 release (after which we can add new methods to type classes on master), we should probably schedule this to 2.1 release.

@barambani
Copy link
Contributor Author

Good point @kailuowang, thanks. I moved the changes to the type class. I started with a compromise that uses Semigroup's combineAllOption building an iterator without materialising eagerly a list. Clearly MiMa check will fail for the moment.
@plippe, the conflict should be resolved now. Thanks.

def combineAllOption[A](fa: F[A])(implicit ev: Semigroup[A]): Option[A] =
if (isEmpty(fa)) None else ev.combineAllOption(toIterator(fa))

def toIterator[A, B](fa: F[A]): Iterator[A] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you override the implementations in the List, etc. instances to use the standard methods? We'll also need to change the implementation here now that Stream is deprecated in 2.13.

if (isEmpty(fa)) None else ev.combineAllOption(toIterator(fa))

def toIterator[A, B](fa: F[A]): Iterator[A] =
foldRight[A, Stream[A]](fa, Eval.later(Stream.empty)) { (a, eb) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this one can be now, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I'll change it. Thanks.

@travisbrown
Copy link
Contributor

travisbrown commented Nov 8, 2019

Thanks much, @barambani! This looks good to me, but we should get another sign-off.

@@ -1,4 +1,5 @@
package cats.compat
package cats
package compat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nitpicky but I don't think we should change these, especially since there are no other changes in this file.

@@ -76,6 +76,7 @@ package object cats {
override def get[A](fa: Id[A])(idx: Long): Option[A] =
if (idx == 0L) Some(fa) else None
override def isEmpty[A](fa: Id[A]): Boolean = false
override def iterator[A](fa: Id[A]): Iterator[A] = Some(fa).iterator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but you can just write Iterator(fa).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Iterator.single(a) is even better because it doesn’t use varargs.

@barambani
Copy link
Contributor Author

My pleasure. Thank you.

@travisbrown
Copy link
Contributor

One other thing: personally I think I prefer the change from toIterator to iterator, but I'm a little unsure about using the same name as the Iterable API method (since this will add a iterator syntax method to anything with a Foldable instance that doesn't already have one). I'm curious what @LukaJCB and @kailuowang think.

@barambani
Copy link
Contributor Author

I was uncertain until the last. What drove my decision is that we already have toList that has a very similar limitation, so if that's not an issue I decided to go with iterator that's more consistent with the current trend in the collections. Anyway I'm more than happy to change it back. I'll wait to see where the discussion will end up. Thanks.

@rossabaker
Copy link
Member

I agree with the preference for iterator. The toList precedent is compelling. Or think of all the functors with .map syntax in addition to a native .map. Furthermore, toIterator is also present on Iterable, not even deprecated in 2.12.

@johnynek
Copy link
Contributor

johnynek commented Nov 8, 2019

I’m -1 on returning Iterator because I think it will be the first time a typeclass returns a mutable value.

I’d rather return Iterable or Stream on 2.12 and LazyList in 2.13. The thing that would sway me is if the cost is very different.

@travisbrown
Copy link
Contributor

I don’t see a big difference between .iterator and .toList.iterator. If we don’t want Iterator in the public API I’d vote to make this private[cats].

@barambani
Copy link
Contributor Author

barambani commented Nov 8, 2019

Actually, I added toIterator some time ago on different assumptions. Now, considering that we have to have version specific implementations, it's just a forwarder, so it doesn't even need to be part of the public tc interface. We could use cats.compat.Foldable.iterator(fa)(self) straight away that's already private[cats].

@johnynek
Copy link
Contributor

johnynek commented Nov 8, 2019

@travisbrown .toList.iterator forces a full materialization, and we don't control the API of List. So, the uses of a lazy structure (like Iterator, Stream, LazyList) are very different since we can interact with other tools that will not force a full materialization of the data. This is useful in things like spark or scalding.

That said, I think in a library focused on FP, we should use an immutable lazy structure if possible, so I would lean towards Stream/LazyList. Users can then call .iterator on that, but we don't control that API. Alternatively, they can use foldLeft to tear down the Stream without materializing the whole thing in memory at once.

@barambani
Copy link
Contributor Author

As suggested, I changed it to expose LazyList or Stream. I'm not very happy with the name iterable so I might still change it. Please have another look. Thanks for the review.


private[cats] object FoldableCompat {

trait ToFoldableCompatOps {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't make simulacrum generate the syntax for this so I added it manually.

@@ -275,6 +275,9 @@ import Foldable.sentinel
*/
def combineAll[A: Monoid](fa: F[A]): A = fold(fa)

def combineAllOption[A](fa: F[A])(implicit ev: Semigroup[A]): Option[A] =
if (isEmpty(fa)) None else ev.combineAllOption(iterable(fa))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have iterable now, should we use it in fold for the default implementation? I think we should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it could be implemented in terms of the Monoid's combineAll like

A.combineAll(iterable(fa))

this would allow to optimize the fold providing a specialization for Monoid[A].
I'll add it in.

@johnynek
Copy link
Contributor

johnynek commented Nov 9, 2019

I like this latest revision, personally.

package cats
package compat

private[cats] trait FoldableCompat[F[_]] { this: Foldable[F] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is private[cats] can people override outside of cats? I don't know how this works when it is mixed into Foldble. Is it even publicly visible at all?

Copy link
Contributor Author

@barambani barambani Nov 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not accessible outside cats so FoldableCompat cannot be extended but iterable can be overridden through Foldable.
I ran a quick test for this extending from outside the cats package with

object test {

  trait T1[F[_]] extends cats.Foldable[F] {
    override def iterable[A](fa: F[A]): Stream[A] = ???
  }

  trait T2[F[_]] extends cats.compat.FoldableCompat[F] {
    override def iterable[A](fa: F[A]): Stream[A] = ???
  }
}

and it seems that T1 is ok and T2 fails as expected on 2.12 and 2.13

[error] /Users/fmariotti/open-source/cats/core/src/main/scala/test.scala:7:38: trait FoldableCompat in package compat cannot be accessed in package cats.compat
[error]   trait T2[F[_]] extends cats.compat.FoldableCompat[F] {
[error]                                      ^
[error] one error found
[error] (coreJVM / Compile / compileIncremental) Compilation failed

@travisbrown
Copy link
Contributor

I'm sorry to drag out this review, but having methods with the same name but a static return type of Stream on 2.12 and LazyList on 2.13 is a road that we decided not to go down in the 2.0.0 pre-releases, and I'd strongly prefer not to reintroduce it here.

I think there's a pretty straightforward way to make everyone happy: we change the return type of iterable to Iterable. No introducing mutable stuff into the public API, no methods with the same names but different signatures on different Scala versions, it's super straightforward to have version-specific implementations in an object (not a trait), etc.

I'll put together a quick example of what this would look like.

@travisbrown
Copy link
Contributor

Okay, I've done that here: 736de07

One additional nice thing is that this approach doesn't require the explicit syntax support.

I've also changed the method name from iterable to toIterable for consistency with toList and the standard library methods. I'm happy to change it back—I think that's a lot less important than the return type.

If this looks okay to everyone you could fetch and git cherry-pick 736de0751c99878744563f3bcc38c5c4e577a799 to add it to this PR, @barambani.

Copy link
Contributor

@travisbrown travisbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but don't really understand why you didn't just cherry-pick my commit? Anyway this looks good to me.

@barambani
Copy link
Contributor Author

You're right, I probably should have done it, it's just that the change was so small that when I came to your second commit I had mostly implemented it. I realized at the end that you already did it.

@travisbrown
Copy link
Contributor

Makes sense, I didn't realize you were already working on it—sorry to nitpick!

@barambani
Copy link
Contributor Author

No apologies needed. Thanks a lot to all involved for the great review actually.

@travisbrown travisbrown merged commit 8d67ec6 into typelevel:master Nov 14, 2019
gagandeepkalra added a commit to gagandeepkalra/cats that referenced this pull request Mar 2, 2020
travisbrown pushed a commit that referenced this pull request Mar 11, 2020
* backported #2380 Added combineAllOption to Foldable

* addressed review feedback, removed FoldableCompat

* removed toIterable override for `Stream`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants