New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON serializable registration model #106
JSON serializable registration model #106
Conversation
This adds some level of support for putting serializable objects inside each other. There are unit tests that make sensible assertions and do not fail, but beyond that I am not yet super confident that this is totally solid.
Can now use the at-serializable decorator to serialize nested custom objects; however it turns out objects with this decorator can no longer be pickled. Applying this to the RegistrationModel currently breaks lots of tests, since transformations have to be picklable in order to be distributed by Spark.
The former at-serializable decorator is now an abstract ThunderSerializable class. Classes wrapped by the previous decorator are not pickleable, since their class is not exposed at the top level of a module. Also fixes recursive serialization of non-basic types nested within lists, tuples, and so on.
RegistrationModels have a transformations dictionary where all values are instances of Displacements. This special case allows the name of the Displacement subclass to be serialized once for the dict, rather than once per value.
no special encoding is needed for plain lists
Only reason to have this at all is to make clear that Serializable is intended as a mixin, and not a "real" base class.
Previously, the serialization special case handling of homogeneously typed lists and dicts would only be triggered for contained objects with regular __dict__s, not with __slots__. The homogenous container logic now supports both types of objects.
Conflicts: python/test/test_imgprocessing.py python/thunder/imgprocessing/regmethods/utils.py
Ok, I think this should be ready for a potential merge now. Here's an example of a JSON-serialized
Here If we wanted to (say) convert |
Wow, this is a fantastic overhaul, @industrial-sloth! I suppose we should have tested the decorator approach more carefully before merging it in, but now that we have switched from container classes over to base classes, I cannot really think of any real drawbacks to this new approach. It did indeed feel like I was going to great lengths to present the inner class's true identity (via complicated overriding of attribute access methods, Just a few quick comments: My recommendation would be to rename Your tests for checking whether the list or dict is homogeneous are a little verbose, especially since it needs to be repeated twice for list and for dict. Here are some helper functions you could place at the top of the file to streamline that code in
With these methods so defined, you could simply check for homogeneous serializability using Everything else is looking fantastic. Totally awesome! :) |
Brilliant, thanks @broxtronix ! Agreed re: the homogeneity testing, I'l take those functions and merge those in, looks like a clear win. Funny about the homogenous container tags - I usually end up erring on the side of verbosity, the original tag strings were something like "py/homogenousThunderSerializableList" or something equally ridiculous. So I cut them way way back, and may have gone too far. :) There's a bit of a tradeoff between human-readability and efficient encoding here that's hard for me to judge, but if you think the more legible strings are worth the extra characters, I'm happy to put that in. Thanks again for the review! |
Sounds great! Thanks for the major overhaul of the code!! One other quick thing occurred to me... in addition to the |
Awesome work guys! I agree with @broxtronix about changing the name to |
@@ -138,8 +139,11 @@ def fitandtransform(im, reg): | |||
return Images(newrdd).__finalize__(images) | |||
|
|||
|
|||
class RegistrationModel(object): | |||
|
|||
class RegistrationModel(Serializable, object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put object first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on the next 4 lines. :) Basically, you can't put object first, much as one might want to. See for instance: http://stackoverflow.com/questions/3003053/metaclass-multiple-inheritance-inconsistency
was "py/hmgList" and "py/hmgDict", now "py/homogeneousList" and "py/homogeneousDict".
checking for homogeneous serializable data values is now pulled out into a separate method, as is building a dictionary equivalent to an instance's __slots__.
also fix issue where strings were decoded as Unicode. Unicode strings are not currently supported, only plain strings.
Ok, I think I have all those comments addressed. I ended up going with a This last was unexpectedly useful, as it exposed an issue where the full roundtrip through the JSON module ends up converting plain strings to Unicode strings - arguably exactly what it should be doing but not I think what we want in Thunder, which generally speaking is not big on internationalization. Have now added a couple of hooks to the JSON string decoding logic that convert unicode strings back to plain strings, as suggested by Stack Overflow. |
LGTM! The unicode / string thing is a little icky, it looks like we coupld also get around by using a different json library (it's handled more uniformly in |
JSON serializable registration model
This PR modifies the existing JSON serialization code quite heavily, with the end goal of having this be usable to serialize
RegistrationModel
objects from theimgprocessing
image registration code.This gets around a couple issues with the previous serialization code:
ThunderSerializableObjectWrapper
) was defined inside a function rather than at the top level of a module, and thus pickle could not dynamically instantiate them. RegistrationModels need to be pickleable, since they are broadcast by pyspark, which uses pickle to do so. This PR moves the serialization logic into an abstract base class rather than a decorator, so serializable classes must now extend ThunderSerializable (can be multiple inheritance) rather than being wrapped by the@serializable
decorator.At present this is still a little messy. I'm opening this PR right now for visibility and comment, but I don't yet consider it ready to be merged in.