-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use latest algebird, bijection, chill, elephantbird, and scala 2.11.5 #1174
Conversation
|
||
val reducers = Some(numReducers) | ||
|
||
private lazy val murmurHash = MurmurHash128(seed) | ||
def hash(key: K): Long = murmurHash(serialization(key))._1 | ||
|
||
private lazy implicit val cms = CMS.monoid(eps, delta, seed) | ||
lazy val sketch: TypedPipe[CMS] = | ||
private lazy implicit val cms = CMS.monoid[Long](eps, delta, seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it is better to fix this to be a CMS[Array[Byte]]
and make implicit hasher following this approach:
The way we had it before is because we didn't have this ability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have any implications for persisted data, or is this OK to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those CMS values are not persisted beyond a single join. Of course,
someone could do something crazy and save them, but the migration is their
problem at that point.
On Tue, Jan 27, 2015 at 11:06 AM, Alex Levenson notifications@github.com
wrote:
In scalding-core/src/main/scala/com/twitter/scalding/typed/Sketched.scala
#1174 (comment):val reducers = Some(numReducers)
private lazy val murmurHash = MurmurHash128(seed)
def hash(key: K): Long = murmurHash(serialization(key))._1
- private lazy implicit val cms = CMS.monoid(eps, delta, seed)
- lazy val sketch: TypedPipe[CMS] =
- private lazy implicit val cms = CMS.monoid[Long](eps, delta, seed)
Does this have any implications for persisted data, or is this OK to
change?—
Reply to this email directly or view it on GitHub
https://github.com/twitter/scalding/pull/1174/files#r23642791.
Oscar Boykin :: @posco :: http://twitter.com/posco
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K thanks just wanted to double check
Just did a minor fix I forgot about now that we are on the latest algebird. |
looks like the build is failing due to something with elephantbird. Did they change their packaging? Looks like a bunch of types are not found. |
I had that issue locally, and it went away after a |
This is strange, EB didn't change it's packaging. Those missing classes are from elephantbird-core, and the test failures are in scalding-commons, which depends on elephantbird-cascading2, and elephantbird-cascading2 depends on elephantbird-core. I don't see anything turning off transitive dependencies, so that is strange. At least locally, adding EB-core explicitly to scalding-commons seems to fix it. We should be listing this explicitly because we use it directly anyway, but I am puzzled as to why this didn't work as is (or what change triggered it). |
Use latest algebird, bijection, chill, elephantbird, and scala 2.11.5
|
||
val reducers = Some(numReducers) | ||
def serialize(k: K): Array[Byte] = serialization(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, because hash is no longer public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I removed the hash() method entirely, we actually need to share the serialization function with the estimation code that happens below so we can pass the right key into the cms
(creating this also so I can see the tests run)