-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add combineAllOption to Foldable #2380
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2380 +/- ##
=========================================
+ Coverage 93.19% 93.2% +<.01%
=========================================
Files 376 376
Lines 7323 7324 +1
Branches 190 187 -3
=========================================
+ Hits 6825 6826 +1
Misses 498 498
Continue to review full report at Codecov.
|
* given a Monoid evidence for `A`, it returns None if the foldable is empty or combines all the `A`s if it's not | ||
*/ | ||
def combineAllOption(implicit ev: Monoid[A], F: Foldable[F]): Option[A] = | ||
if (F.isEmpty(fa)) None else Some(F.combineAll(fa)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be using reduceLeftOption
and Semigroup.combine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't use Monoid.combineAll
or Semigroup.combineAllOption
we are missing the optimization. This is for cases when we can use internal mutability to aggregate many fast, which is pretty critical in some applications (e.g. spark/scalding).
If we just call out to combine
we are destroying optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's tricky here, we don't want to impose Monoid
and I wasn't aware there is optimization for Semigroup.combineAllOption
(actually I still can't find it), but even it does, we are in a syntax extension, so unlike fold
, we can't optimize combineAllOption
in Foldable
instances.
So to retain the optimization we probably need the keep the Monoid
requirement which is rather unfortunate, we'd better document the rationale. Another option is to wait until Cats 2.0 with which we'll be able to add a combineAllOption to Foldable
In the meantime @johnynek would you mind point me to the where the Semigroup.combineAllOption
's optimization is? we should probably document it somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, here is the method:
https://github.com/typelevel/cats/blob/master/kernel/src/main/scala/cats/kernel/Semigroup.scala#L39
note any semigroup can override that (as we do in other typeclasses for things like map2, product, etc.. in Applicative).
In algebird, which uses cats.kernel, we do this very frequently.
there are more. The point is the person implementing the Semigroup knows if there is a more efficient way to combine many of them, and should control that. Other typeclasses ideally should delegate to that to not undo the optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about this optimization and the practice of overriding Semigroup
's combineAllOption
(and others) to address specific domain's performance gains. Thanks. What I can think about to exploit it is something like
def combineAllOption(implicit ev: Semigroup[A], F: Foldable[F]): Option[A] =
if (F.isEmpty(fa)) None else ev.combineAllOption(F.toList(fa))
bit it still adds a traversal of the foldable for the conversion to TraversableOnce
. I'm not sure it's acceptable.
Anyway I'm also ok to abort this and wait for Cats 2.0 like @kailuowang mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tough in this case since here you are trading off the potential added cost of materializing a list with the potential benefit of reducing an entire group together.
Many Foldables could actually create an Iterator
to pass into combineAllOption
on Semigroup, but since we don't have a method like that we can only make the List.
We could imagine:
def onIterator[B](fa: F[A])(itFn: Iterator[A] => B): B =
itFn(toList(fa).iterator)
on Foldable in cats 2.0. In this world, if you can make an Iterator for the caller to consume. Alternatively, we could add Fold[A, B]
to cats:
https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Fold.scala#L40
so we could have def foldWith[A, B](fa: F[A])(fold: Fold[A, B]): B
on Foldable
. Since folds ca potentially have internal mutable state, they can do much the same optimization of combineAllOption, although, not in parallel (since you can't split over trees).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot say if we could add Fold
so I would leave the decision to others. About the other option, how could I use the onIterator
if it were available ? Still with the semigroup combineAllOption
or that would be alternative to depending on overriding that in Semigroup ? Because if onIterator
wouldn't need to be overridden, could it be another syntax ? Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def combineAllOption(implicit ev: Semigroup[A], F: Foldable[F]): Option[A] =
if (F.isEmpty(fa)) None else F.onIterator(ev.combineAllOption(_))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Right. Still aiming at mitigating the possible cost of building a list, it could be delayed up to when the iterator is traversing it. Something like
def combineAllOption(implicit ev: Semigroup[A], F: Foldable[F]): Option[A] =
if (F.isEmpty(fa)) None else ev.combineAllOption(toIterator)
def toIterator(implicit F: Foldable[F]): Iterator[A] =
F.foldRight[A, Stream[A]](fa, Eval.later(Stream.empty)) {
(a, eb) => eb map (Stream.cons(a, _))
}.value.iterator
I'm not sure actually it be a step forward and is getting late. I will keep looking.
Do we have any estimates when this could/would be merged ? @barambani would you mind rebasing to get rid of the conflicts |
Now that we are close to 2.0 release (after which we can add new methods to type classes on master), we should probably schedule this to 2.1 release. |
Good point @kailuowang, thanks. I moved the changes to the type class. I started with a compromise that uses |
def combineAllOption[A](fa: F[A])(implicit ev: Semigroup[A]): Option[A] = | ||
if (isEmpty(fa)) None else ev.combineAllOption(toIterator(fa)) | ||
|
||
def toIterator[A, B](fa: F[A]): Iterator[A] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you override the implementations in the List
, etc. instances to use the standard methods? We'll also need to change the implementation here now that Stream
is deprecated in 2.13.
if (isEmpty(fa)) None else ev.combineAllOption(toIterator(fa)) | ||
|
||
def toIterator[A, B](fa: F[A]): Iterator[A] = | ||
foldRight[A, Stream[A]](fa, Eval.later(Stream.empty)) { (a, eb) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this one can be now
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I'll change it. Thanks.
Thanks much, @barambani! This looks good to me, but we should get another sign-off. |
@@ -1,4 +1,5 @@ | |||
package cats.compat | |||
package cats | |||
package compat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nitpicky but I don't think we should change these, especially since there are no other changes in this file.
@@ -76,6 +76,7 @@ package object cats { | |||
override def get[A](fa: Id[A])(idx: Long): Option[A] = | |||
if (idx == 0L) Some(fa) else None | |||
override def isEmpty[A](fa: Id[A]): Boolean = false | |||
override def iterator[A](fa: Id[A]): Iterator[A] = Some(fa).iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal but you can just write Iterator(fa)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Iterator.single(a)
is even better because it doesn’t use varargs.
My pleasure. Thank you. |
One other thing: personally I think I prefer the change from |
I was uncertain until the last. What drove my decision is that we already have |
I agree with the preference for |
I’m -1 on returning Iterator because I think it will be the first time a typeclass returns a mutable value. I’d rather return Iterable or Stream on 2.12 and LazyList in 2.13. The thing that would sway me is if the cost is very different. |
I don’t see a big difference between |
Actually, I added |
@travisbrown That said, I think in a library focused on FP, we should use an immutable lazy structure if possible, so I would lean towards Stream/LazyList. Users can then call |
As suggested, I changed it to expose |
|
||
private[cats] object FoldableCompat { | ||
|
||
trait ToFoldableCompatOps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't make simulacrum generate the syntax for this so I added it manually.
@@ -275,6 +275,9 @@ import Foldable.sentinel | |||
*/ | |||
def combineAll[A: Monoid](fa: F[A]): A = fold(fa) | |||
|
|||
def combineAllOption[A](fa: F[A])(implicit ev: Semigroup[A]): Option[A] = | |||
if (isEmpty(fa)) None else ev.combineAllOption(iterable(fa)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have iterable now, should we use it in fold for the default implementation? I think we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it could be implemented in terms of the Monoid's combineAll
like
A.combineAll(iterable(fa))
this would allow to optimize the fold
providing a specialization for Monoid[A]
.
I'll add it in.
I like this latest revision, personally. |
package cats | ||
package compat | ||
|
||
private[cats] trait FoldableCompat[F[_]] { this: Foldable[F] => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is private[cats]
can people override outside of cats? I don't know how this works when it is mixed into Foldble
. Is it even publicly visible at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not accessible outside cats
so FoldableCompat
cannot be extended but iterable
can be overridden through Foldable
.
I ran a quick test for this extending from outside the cats
package with
object test {
trait T1[F[_]] extends cats.Foldable[F] {
override def iterable[A](fa: F[A]): Stream[A] = ???
}
trait T2[F[_]] extends cats.compat.FoldableCompat[F] {
override def iterable[A](fa: F[A]): Stream[A] = ???
}
}
and it seems that T1
is ok and T2
fails as expected on 2.12 and 2.13
[error] /Users/fmariotti/open-source/cats/core/src/main/scala/test.scala:7:38: trait FoldableCompat in package compat cannot be accessed in package cats.compat
[error] trait T2[F[_]] extends cats.compat.FoldableCompat[F] {
[error] ^
[error] one error found
[error] (coreJVM / Compile / compileIncremental) Compilation failed
I'm sorry to drag out this review, but having methods with the same name but a static return type of I think there's a pretty straightforward way to make everyone happy: we change the return type of I'll put together a quick example of what this would look like. |
Okay, I've done that here: 736de07 One additional nice thing is that this approach doesn't require the explicit syntax support. I've also changed the method name from If this looks okay to everyone you could fetch and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but don't really understand why you didn't just cherry-pick my commit? Anyway this looks good to me.
You're right, I probably should have done it, it's just that the change was so small that when I came to your second commit I had mostly implemented it. I realized at the end that you already did it. |
Makes sense, I didn't realize you were already working on it—sorry to nitpick! |
No apologies needed. Thanks a lot to all involved for the great review actually. |
* backported #2380 Added combineAllOption to Foldable * addressed review feedback, removed FoldableCompat * removed toIterable override for `Stream`
Issue #2363