Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use latest algebird, bijection, chill, elephantbird, and scala 2.11.5 #1174

Merged
merged 7 commits into from
Jan 28, 2015

Conversation

isnotinvain
Copy link
Contributor

(creating this also so I can see the tests run)


val reducers = Some(numReducers)

private lazy val murmurHash = MurmurHash128(seed)
def hash(key: K): Long = murmurHash(serialization(key))._1

private lazy implicit val cms = CMS.monoid(eps, delta, seed)
lazy val sketch: TypedPipe[CMS] =
private lazy implicit val cms = CMS.monoid[Long](eps, delta, seed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it is better to fix this to be a CMS[Array[Byte]] and make implicit hasher following this approach:

https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/CountMinSketch.scala#L1113

The way we had it before is because we didn't have this ability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any implications for persisted data, or is this OK to change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those CMS values are not persisted beyond a single join. Of course,
someone could do something crazy and save them, but the migration is their
problem at that point.

On Tue, Jan 27, 2015 at 11:06 AM, Alex Levenson notifications@github.com
wrote:

In scalding-core/src/main/scala/com/twitter/scalding/typed/Sketched.scala
#1174 (comment):

val reducers = Some(numReducers)

private lazy val murmurHash = MurmurHash128(seed)
def hash(key: K): Long = murmurHash(serialization(key))._1

  • private lazy implicit val cms = CMS.monoid(eps, delta, seed)
  • lazy val sketch: TypedPipe[CMS] =
  • private lazy implicit val cms = CMS.monoid[Long](eps, delta, seed)

Does this have any implications for persisted data, or is this OK to
change?


Reply to this email directly or view it on GitHub
https://github.com/twitter/scalding/pull/1174/files#r23642791.

Oscar Boykin :: @posco :: http://twitter.com/posco

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K thanks just wanted to double check

@johnynek
Copy link
Collaborator

Just did a minor fix I forgot about now that we are on the latest algebird.

@johnynek
Copy link
Collaborator

looks like the build is failing due to something with elephantbird. Did they change their packaging? Looks like a bunch of types are not found.

@isnotinvain
Copy link
Contributor Author

I had that issue locally, and it went away after a git clean -fdx so I thought it was a local environment issue. I'll take a look

@isnotinvain
Copy link
Contributor Author

This is strange, EB didn't change it's packaging. Those missing classes are from elephantbird-core, and the test failures are in scalding-commons, which depends on elephantbird-cascading2, and elephantbird-cascading2 depends on elephantbird-core. I don't see anything turning off transitive dependencies, so that is strange.

At least locally, adding EB-core explicitly to scalding-commons seems to fix it. We should be listing this explicitly because we use it directly anyway, but I am puzzled as to why this didn't work as is (or what change triggered it).

@isnotinvain
Copy link
Contributor Author

@johnynek @ianoc I think this is ready assuming the tests pass, let me know what you think, thanks!

ianoc added a commit that referenced this pull request Jan 28, 2015
 Use latest algebird, bijection, chill, elephantbird, and scala 2.11.5
@ianoc ianoc merged commit f0d7952 into develop Jan 28, 2015
@ianoc ianoc deleted the alexlevenson/bump-deps branch January 28, 2015 01:49

val reducers = Some(numReducers)
def serialize(k: K): Array[Byte] = serialization(k)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, because hash is no longer public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I removed the hash() method entirely, we actually need to share the serialization function with the estimation code that happens below so we can pass the right key into the cms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants