-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make HLL easier to use, add Hash128 typeclass #440
Conversation
what do you think? Got some comments HLL is too hard to use. Trying to make it easier. |
@@ -662,3 +687,15 @@ case class SetSizeAggregator[A](hllBits: Int, maxSetSize: Int = 10)(implicit toB | |||
val leftSemigroup = new HyperLogLogMonoid(hllBits) | |||
val rightAggregator = Aggregator.uniqueCount[A].andThenPresent { _.toLong } | |||
} | |||
|
|||
case class SetSizeHashAggregator[A](hllBits: Int, maxSetSize: Int = 10)(implicit hash: Hash128[A]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The level of duplication here with SetSizeAggregator
bothers me. Can we not just introduce a backwards-compatible apply
that takes a fn and wraps it in a Hash128
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I was just lazy really (but also, SetSizeAggregator could have serialized the intermediate state with what it has on hand, using the Hash you can't, but that should be an orthogonal concern).
Looks worthwhile. |
Seems like a good thing to have, its come up a few times in various forms. Once we are confident that no-hashes have changed(i see you've a test) then it seems like it should be a low impact thing. It doesn't change any existing binary API's does it? just extends them? |
@ianoc yes, it only adds APIs, and since HLL actually natively works on |
Don't merge this. It changes the the type signature of two methods, which will introduce a binary incompatibility with 0.10. We can try to find a new name so we won't need to republish everything above. Those methods were added in 0.10, so they were safe to change, but now since 0.10 is out it is no longer safe to change them. |
@johnynek Want to update this to develop? looks like with the extra method we are already going to be binary incompatible so good to go i think |
Are we already binary incompatible with 0.10? |
The addition of level in #444 makes us binary incompatible i believe. |
yeah, that's not worth it for a version number bump if you ask me. It is almost as convenient to override prepare as it is to override level. Once again, I wish we had mima wired into our tests and we actually tried to minimize non-patch bumps. In fact, almost no one would hit such a binary incompatibility, but since versions are so coarse, consumers need to consider if the bump is safe (which is expensive), which means slower adoption of never code, which means fewer people testing and contributing to the latest version. |
Conflicts: algebird-core/src/main/scala/com/twitter/algebird/Aggregator.scala
|
Make HLL easier to use, add Hash128 typeclass
Old one... but why was batch create deprecated as part of this? potential performance regression? |
@ianoc because it requires an implicit conversion in scope, which is now generally viewed as bad hygiene. |
Ah ok, so we probably need to add a new one to replicate the old batch behavior with the better usage of hasher. |
Yeah, that would be good. Sorry I didn't do that at the time. I guess I overlooked that there was an optimization in there. |
No description provided.