Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HLL easier to use, add Hash128 typeclass #440

Merged
merged 6 commits into from
May 21, 2015
Merged

Conversation

johnynek
Copy link
Collaborator

@johnynek johnynek commented May 6, 2015

No description provided.

@johnynek
Copy link
Collaborator Author

johnynek commented May 6, 2015

@ianoc @avibryant

what do you think?

Got some comments HLL is too hard to use. Trying to make it easier.

@@ -662,3 +687,15 @@ case class SetSizeAggregator[A](hllBits: Int, maxSetSize: Int = 10)(implicit toB
val leftSemigroup = new HyperLogLogMonoid(hllBits)
val rightAggregator = Aggregator.uniqueCount[A].andThenPresent { _.toLong }
}

case class SetSizeHashAggregator[A](hllBits: Int, maxSetSize: Int = 10)(implicit hash: Hash128[A])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The level of duplication here with SetSizeAggregator bothers me. Can we not just introduce a backwards-compatible apply that takes a fn and wraps it in a Hash128?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I was just lazy really (but also, SetSizeAggregator could have serialized the intermediate state with what it has on hand, using the Hash you can't, but that should be an orthogonal concern).

@avibryant
Copy link
Contributor

Looks worthwhile.

@ianoc
Copy link
Collaborator

ianoc commented May 7, 2015

Seems like a good thing to have, its come up a few times in various forms. Once we are confident that no-hashes have changed(i see you've a test) then it seems like it should be a low impact thing. It doesn't change any existing binary API's does it? just extends them?

@johnynek
Copy link
Collaborator Author

johnynek commented May 7, 2015

@ianoc yes, it only adds APIs, and since HLL actually natively works on Array[Byte] the create method was just serializing them for you, so it was easy to make sure we didn't change that.

@johnynek
Copy link
Collaborator Author

Don't merge this. It changes the the type signature of two methods, which will introduce a binary incompatibility with 0.10. We can try to find a new name so we won't need to republish everything above.

Those methods were added in 0.10, so they were safe to change, but now since 0.10 is out it is no longer safe to change them.

@ianoc
Copy link
Collaborator

ianoc commented May 18, 2015

@johnynek Want to update this to develop? looks like with the extra method we are already going to be binary incompatible so good to go i think

@johnynek
Copy link
Collaborator Author

Are we already binary incompatible with 0.10?

@ianoc
Copy link
Collaborator

ianoc commented May 18, 2015

The addition of level in #444 makes us binary incompatible i believe.

@johnynek
Copy link
Collaborator Author

yeah, that's not worth it for a version number bump if you ask me. It is almost as convenient to override prepare as it is to override level.

Once again, I wish we had mima wired into our tests and we actually tried to minimize non-patch bumps.

In fact, almost no one would hit such a binary incompatibility, but since versions are so coarse, consumers need to consider if the bump is safe (which is expensive), which means slower adoption of never code, which means fewer people testing and contributing to the latest version.

Conflicts:
	algebird-core/src/main/scala/com/twitter/algebird/Aggregator.scala
@johnynek
Copy link
Collaborator Author

[tw-mbp-oscar algebird]$ ./sbt algebird-core/mima-report-binary-issues
[info] Loading global plugins from /Users/oscar/.sbt/0.13/plugins
[info] Loading project definition from /Users/oscar/workspace/algebird/project
[info] Set current project to algebird (in build file:/Users/oscar/workspace/algebird/)
[info] Resolving com.googlecode.javaewah#JavaEWAH;0.6.6 ...
[info] algebird-core: found 0 potential binary incompatibilities
[success] Total time: 3 s, completed May 20, 2015 1:55:41 PM

egonina added a commit that referenced this pull request May 21, 2015
Make HLL easier to use, add Hash128 typeclass
@egonina egonina merged commit 78d1d68 into develop May 21, 2015
@egonina egonina deleted the oscar/more-hll branch May 21, 2015 00:36
@johnynek johnynek mentioned this pull request May 21, 2015
@ianoc
Copy link
Collaborator

ianoc commented Aug 8, 2015

Old one... but why was batch create deprecated as part of this? potential performance regression?

@johnynek
Copy link
Collaborator Author

johnynek commented Aug 9, 2015

@ianoc because it requires an implicit conversion in scope, which is now generally viewed as bad hygiene.

@ianoc
Copy link
Collaborator

ianoc commented Aug 9, 2015

Ah ok, so we probably need to add a new one to replicate the old batch behavior with the better usage of hasher.

@johnynek
Copy link
Collaborator Author

johnynek commented Aug 9, 2015

Yeah, that would be good. Sorry I didn't do that at the time. I guess I overlooked that there was an optimization in there.

@johnynek johnynek mentioned this pull request Oct 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants