-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EventuallyAggregator and variants #407
EventuallyAggregator and variants #407
Conversation
@@ -451,3 +451,15 @@ case class HyperLogLogAggregator(val hllMonoid: HyperLogLogMonoid) extends Monoi | |||
def prepare(value: Array[Byte]) = monoid.create(value) | |||
def present(hll: HLL) = hll | |||
} | |||
|
|||
case class SetSizeAggregator[A](hllBits: Int, maxSetSize: Int = 10)(implicit toBytes: A => Array[Byte]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might make sense to make the default maxSetSize to be something that depends on hllBits. At the extreme you could serialize every item coming in and increment a byte counter (although the in-memory may be much less or more, but still), then when it exceeds 2^hllBits
, switch over. I would think something like maxSetSize: Int = math.max(10, 1 << (hllBits - 5))
might be a decent default.
This looks great, but you're pushing our test coverage down here. Can we get at least a simple test to exercise this a bit? Also, can you add some comments to the type so the scaladoc is not bare? |
ping for at least a test or two that exercises this. |
ping again for a test. |
Sorry, been busy - I'll try to get one in, but there's always the next release. |
@DanielleSucher thanks for the tests! |
EventuallyAggregator and variants
An Aggregator that wraps the
EventuallySemigroup
(and one forEventuallyMonoid
), as well as aSetSizeAggregator
that starts with aSet
and ends with aHLL
.