'csum' and 'csumByKey' should use a CommutativeMonoid #117

alonsodomin · 2017-04-21T23:06:30Z

According to Spark docs, reduce, reduceByKey, fold and foldByKey operations in RDDs should pass in a binary commutative and associative operation. This is an excerpt from Spark 2.1.0 code:

/**
   * Reduces the elements of this RDD using the specified commutative and
   * associative binary operator.
   */
def reduce(f: (T, T) => T): T = ...

So constraining the type to a Monoid is not enough as this only garantees associativity but not commutativity. These methods should be constraining in a cats.kernel.CommutativeMonoid in order to be safer.

Is also arguably whether they also need a Monoid at all as they do not make use of the empty operation and potentially a CommutativeSemigroup could suffice...

The text was updated successfully, but these errors were encountered:

OlivierBlanvillain · 2017-04-24T18:47:41Z

Nice, I didn't know that Monoid and Semigroup had a Commutative version in cats, that would indeed be more appropriate! This like a simple thing to change, would you like to give it a shot?

alonsodomin · 2017-04-24T20:15:31Z

Yes, happy to PR it, shall I go for a version using the CommutativeSemigroup instead of the Monoid? ... was also wondering whether it would be good as well to have similar safer version for the fold operation in the RDD but struggling to find an appropriate name for it ...

OlivierBlanvillain · 2017-04-25T04:43:57Z

CommutativeSemigroup looks good to me 😄

At the moment we don't have custom API for RDD like we do for Dataset, so I'm not sure where you would put something like a reduceOption...

alonsodomin · 2017-04-28T09:25:15Z

Apparently cats is missing commutative semigroup instances for tuples in which their elements also form a commutative semigroup...

OlivierBlanvillain · 2017-04-28T09:31:16Z

Do they have that for Monoid but not CommutativeSemigroup? oO

If so, that's probably a straightforward change to get in.

alonsodomin · 2017-04-28T09:50:20Z

yeah, it should be, will try to chat with them in gitter to understand if there is a reason beyond my understanding to not have those

alonsodomin · 2017-04-28T10:11:33Z

we need typelevel/cats#1527 merged into cats to have the semigroup derived instances needed to implement this.

imarios added the enhancement label May 17, 2017

alonsodomin mentioned this issue Nov 1, 2017

Replace Monoid constraint by CommutativeSemigroup in the reduce syntax #203

Merged

imarios closed this as completed in #203 Jan 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'csum' and 'csumByKey' should use a CommutativeMonoid #117

'csum' and 'csumByKey' should use a CommutativeMonoid #117

alonsodomin commented Apr 21, 2017 •

edited

Loading

OlivierBlanvillain commented Apr 24, 2017 •

edited

Loading

alonsodomin commented Apr 24, 2017

OlivierBlanvillain commented Apr 25, 2017

alonsodomin commented Apr 28, 2017

OlivierBlanvillain commented Apr 28, 2017 •

edited

Loading

alonsodomin commented Apr 28, 2017

alonsodomin commented Apr 28, 2017

'csum' and 'csumByKey' should use a CommutativeMonoid #117

'csum' and 'csumByKey' should use a CommutativeMonoid #117

Comments

alonsodomin commented Apr 21, 2017 • edited Loading

OlivierBlanvillain commented Apr 24, 2017 • edited Loading

alonsodomin commented Apr 24, 2017

OlivierBlanvillain commented Apr 25, 2017

alonsodomin commented Apr 28, 2017

OlivierBlanvillain commented Apr 28, 2017 • edited Loading

alonsodomin commented Apr 28, 2017

alonsodomin commented Apr 28, 2017

alonsodomin commented Apr 21, 2017 •

edited

Loading

OlivierBlanvillain commented Apr 24, 2017 •

edited

Loading

OlivierBlanvillain commented Apr 28, 2017 •

edited

Loading