Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix serialization of KryoHadoop #1685

Merged
merged 4 commits into from
May 15, 2017

Conversation

ttim
Copy link
Collaborator

@ttim ttim commented May 12, 2017

In process of Summingbird migration I found that serialization of KryoHadoop class was broken recently, this PR adds test for that and fixes it.

@johnynek
Copy link
Collaborator

Lgtm. We can also add a serializer for Config by just converting to and from a Map[String, String] I think.

@ttim
Copy link
Collaborator Author

ttim commented May 12, 2017

@johnynek Do you mean to keep Config serializable instead and therefore there will be no need to have parts of it as vals?

I guess it was more or less this way before, but after @ianoc changed behavior to what we see now 2c72bac.

@johnynek
Copy link
Collaborator

I think we should make your change AND make Config Kryo serializable. Belt and suspenders.

@ttim
Copy link
Collaborator Author

ttim commented May 13, 2017

Now I see. It's not simple (at least in this PR) for now because Config doesn't expose a way to iterate over keys, so it's impossible to convert it into Map[String, String].

@dieu
Copy link
Contributor

dieu commented May 13, 2017

@ttim look on trait Config in scalding, it has methodtoMap

@ttim
Copy link
Collaborator Author

ttim commented May 13, 2017

@dieu small clarification - what we pass there isn't scalding config but chill's one, which doesn't contain this method.

@johnynek
Copy link
Collaborator

ahh yes. We should open an issue on Chill to make all of the Config Serializable and Kryo serializable.

@ttim
Copy link
Collaborator Author

ttim commented May 13, 2017

@johnynek I'm wondering how was it working before @ianoc change (2c72bac)? For me it looks like Config is serializable already (or maybe HadoopConfig, I don't know)?

@dieu
Copy link
Contributor

dieu commented May 13, 2017

@ttim everything is serializable in chill.Config, but only by kryo I think, because chill.Config is not marked as serializable.

@ttim
Copy link
Collaborator Author

ttim commented May 13, 2017

@johnynek So I guess we don't need to create an issue in chill because Config is already kryo serializable.

Also I did a small refactoring to make logic clearer (and avoid transient fields), what do you think?

Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about this at all. Multiple constructors is non-idiomatic in scala and many folks subclass this as it is. I'd rather do the first fix personally.

Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To put it another way: this type, and hose like them, are created with reflection and expected to have a constructor that accepts a Config. This is part of chill-hadoop.

Also, if we add another option we would be adding another parameter to the constructor vs just reading more from the Config.

@ttim
Copy link
Collaborator Author

ttim commented May 15, 2017

@johnynek my main motivation was to make object's state more clear (and I guess transient plays against it) and do separation between internal state and how do we create KryoHadoop clearer.
At the same time I don't like two constructors either and missed a point of creation by reflection, so I'm happy to revert this back.

@pankajroark
Copy link
Contributor

@johnynek Do you think this PR is ready to go? We need this fix for upgrading summingbird to 2.12 twitter/summingbird#721. Thanks

@johnynek
Copy link
Collaborator

👍

@johnynek johnynek merged commit 808ecb2 into twitter:develop May 15, 2017
ttim added a commit to ttim/scalding that referenced this pull request May 16, 2017
* Fix serialization of KryoHadoop

* Increase buffer size for serialization test

* Small refactoring to make serialization logic more direct

* Revert: Small refactoring to make serialization logic more direct
jcdavis pushed a commit to jcdavis/scalding that referenced this pull request May 23, 2017
* Fix serialization of KryoHadoop

* Increase buffer size for serialization test

* Small refactoring to make serialization logic more direct

* Revert: Small refactoring to make serialization logic more direct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants