Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Aggregator trait provides a generic mapPlusMap-like abstraction #31

Merged
merged 3 commits into from Nov 2, 2012

Conversation

Projects
None yet
2 participants
Collaborator

avibryant commented Oct 13, 2012

A common, but inconsistent, pattern that we're seeing is to have methods for preparing members of the monoid from simple objects like numbers and strings, and sometimes methods for presenting the results of computations on the members (that is, ways of getting simple objects back out).

Some examples currently: AveragedValue (which has a companion object for preparation, no method for presentation), DecayedValue (ditto), HLL (companion object for preparation, method on the monoid for presentation), CountMin (methods on the monoid for preparation, methods on the members for presentation)...

Aggregator is a first attempt to standardize all of this as a higher-level API built on top of algebird's abstract algebra. It aims to be suitable for use in Scalding or similar frameworks along with mapReduceMap-type aggregation APIs, encapsulated in a simple aggregate() call along the same lines as plus().

Averager is provided as a minimal example. If you think this is a good idea, I'm happy to provide implementations for the other monoids mentioned.

We need to add the license header or @cra will complain

I think it should extend java.io.Serializable because in practice all these are going to be used in scalding and storm, both of which require it.

Maybe this should be moved to AveragedValue.scala?

@johnynek johnynek commented on the diff Oct 15, 2012

src/main/scala/com/twitter/algebird/Aggregator.scala
@@ -0,0 +1,22 @@
+package com.twitter.algebird
+
+trait Aggregator[A,B,C] extends Function1[TraversableOnce[A], C]{
+ def prepare(input : A) : B
+ def reduce(l : B, r : B) : B
+ def present(reduction : B) : C
+
+ def reduce(items : TraversableOnce[B]) : B = items.reduce{reduce(_,_)}
+ def apply(inputs : TraversableOnce[A]) : C = present(reduce(inputs.map{prepare(_)}))
+}
+
+trait MonoidAggregator[A,B,C] extends Aggregator[A,B,C] {
+ def monoid : Monoid[B]
+ def reduce(l : B, r : B) : B = monoid.plus(l, r)
+ override def reduce(items : TraversableOnce[B]) : B = items.foldLeft(monoid.zero){reduce(_,_)}
@johnynek

johnynek Oct 15, 2012

Collaborator

Actually, if we want to be slightly more lax here, we could do:

if (items.isEmpty) { monoid.zero }
else { items.reduce { reduce(,) } }

This allows us to work with non-empty lists of semi-groups (things, such as Either[L,R], that don't have valid zeros).

@avibryant

avibryant Oct 15, 2012

Collaborator

That's what the implementation in the base Aggregator trait is for; if you have a semi-group, don't use MonoidAggregator.

Collaborator

johnynek commented Oct 30, 2012

again ping: can you add the serializable? I want to get scalding 0.8.1 out and for that we'll probably publish a new algebird (so we can get your new monoid).

Collaborator

avibryant commented Oct 30, 2012

Sure, done. We should probably define Aggregators for a number of these monoids but that takes some thought and if you're in a hurry to get this out, no need to do it right away.

@johnynek johnynek added a commit that referenced this pull request Nov 2, 2012

@johnynek johnynek Merge pull request #31 from avibryant/aggregator
Aggregator trait provides a generic mapPlusMap-like abstraction
6b7cf26

@johnynek johnynek merged commit 6b7cf26 into twitter:develop Nov 2, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment