-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'csum' and 'csumByKey' should use a CommutativeMonoid #117
Comments
Nice, I didn't know that Monoid and Semigroup had a Commutative version in cats, that would indeed be more appropriate! This like a simple thing to change, would you like to give it a shot? |
Yes, happy to PR it, shall I go for a version using the |
At the moment we don't have custom API for RDD like we do for Dataset, so I'm not sure where you would put something like a |
Apparently cats is missing commutative semigroup instances for tuples in which their elements also form a commutative semigroup... |
Do they have that for If so, that's probably a straightforward change to get in. |
yeah, it should be, will try to chat with them in gitter to understand if there is a reason beyond my understanding to not have those |
we need typelevel/cats#1527 merged into cats to have the semigroup derived instances needed to implement this. |
According to Spark docs,
reduce
,reduceByKey
,fold
andfoldByKey
operations in RDDs should pass in a binary commutative and associative operation. This is an excerpt from Spark 2.1.0 code:So constraining the type to a
Monoid
is not enough as this only garantees associativity but not commutativity. These methods should be constraining in acats.kernel.CommutativeMonoid
in order to be safer.Is also arguably whether they also need a
Monoid
at all as they do not make use of theempty
operation and potentially aCommutativeSemigroup
could suffice...The text was updated successfully, but these errors were encountered: