Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'csum' and 'csumByKey' should use a CommutativeMonoid #117

Closed
alonsodomin opened this issue Apr 21, 2017 · 7 comments
Closed

'csum' and 'csumByKey' should use a CommutativeMonoid #117

alonsodomin opened this issue Apr 21, 2017 · 7 comments

Comments

@alonsodomin
Copy link
Contributor

alonsodomin commented Apr 21, 2017

According to Spark docs, reduce, reduceByKey, fold and foldByKey operations in RDDs should pass in a binary commutative and associative operation. This is an excerpt from Spark 2.1.0 code:

/**
   * Reduces the elements of this RDD using the specified commutative and
   * associative binary operator.
   */
def reduce(f: (T, T) => T): T = ...

So constraining the type to a Monoid is not enough as this only garantees associativity but not commutativity. These methods should be constraining in a cats.kernel.CommutativeMonoid in order to be safer.

Is also arguably whether they also need a Monoid at all as they do not make use of the empty operation and potentially a CommutativeSemigroup could suffice...

@OlivierBlanvillain
Copy link
Contributor

OlivierBlanvillain commented Apr 24, 2017

Nice, I didn't know that Monoid and Semigroup had a Commutative version in cats, that would indeed be more appropriate! This like a simple thing to change, would you like to give it a shot?

@alonsodomin
Copy link
Contributor Author

Yes, happy to PR it, shall I go for a version using the CommutativeSemigroup instead of the Monoid? ... was also wondering whether it would be good as well to have similar safer version for the fold operation in the RDD but struggling to find an appropriate name for it ...

@OlivierBlanvillain
Copy link
Contributor

CommutativeSemigroup looks good to me 😄

At the moment we don't have custom API for RDD like we do for Dataset, so I'm not sure where you would put something like a reduceOption...

@alonsodomin
Copy link
Contributor Author

Apparently cats is missing commutative semigroup instances for tuples in which their elements also form a commutative semigroup...

@OlivierBlanvillain
Copy link
Contributor

OlivierBlanvillain commented Apr 28, 2017

Do they have that for Monoid but not CommutativeSemigroup? oO

If so, that's probably a straightforward change to get in.

@alonsodomin
Copy link
Contributor Author

yeah, it should be, will try to chat with them in gitter to understand if there is a reason beyond my understanding to not have those

@alonsodomin
Copy link
Contributor Author

we need typelevel/cats#1527 merged into cats to have the semigroup derived instances needed to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants