-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serializable #96
Serializable #96
Changes from 3 commits
bee4a79
c8f7d08
354286f
c6613d1
a0d2719
0e8ac6d
fe9e7d5
3b2a247
ffc0447
3f9867d
3bd4fb6
4a8967c
fecb887
7f4bf5a
3882a7c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,235 @@ | ||
""" Useful decorators that are used throughout the library """ | ||
|
||
def _isnamedtuple(obj): | ||
"""Heuristic check if an object is a namedtuple.""" | ||
return isinstance(obj, tuple) \ | ||
and hasattr(obj, "_fields") \ | ||
and hasattr(obj, "_asdict") \ | ||
and callable(obj._asdict) | ||
|
||
def serializable(cls): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Two lines before def |
||
'''The @serializable decorator can decorate any class to make it easy to store | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use triple double quotes |
||
that class in a human readable JSON format and then recall it and recover | ||
the original object instance. Classes instances that are wrapped in this | ||
decorator gain the serialize() method, and the class also gains a | ||
deserialize() static method that can automatically "pickle" and "unpickle" a | ||
wide variety of objects like so: | ||
|
||
@serializable | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be best to reformat a bit so this shows up as an example when building the docs, should be:
as in the numpy documentation (e.g. here). I think this might be the first proper "example" anywhere in the codebase, so might need to do some trial and error building the docs to get the formatting right, I'm happy to try that myself after this is merged =) |
||
class Visitor(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need an object here so it's |
||
def __init__(self, ip_addr = None, agent = None, referrer = None): | ||
self.ip = ip_addr | ||
self.ua = agent | ||
self.referrer= referrer | ||
self.time = datetime.datetime.now() | ||
|
||
orig_visitor = Visitor('192.168', 'UA-1', 'http://www.google.com') | ||
|
||
#serialize the object | ||
pickled_visitor = orig_visitor.serialize() | ||
|
||
#restore object | ||
recov_visitor = Visitor.deserialize(pickled_visitor) | ||
|
||
Note that this decorator is NOT designed to provide generalized pickling | ||
capabilities. Rather, it is designed to make it very easy to convert small | ||
classes containing model properties to a human and machine parsable format | ||
for later analysis or visualization. A few classes under consideration for | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe drop this last sentence, great for our own planning, but will (hopefully!) be only a transient a description of the state of things and thus not really for the docs. |
||
such decorating include the Transformation class for image alignment and the | ||
Source classes for source extraction. | ||
|
||
A key feature of the @serializable decorator is that it can "pickle" data | ||
types that are not normally supported by Python's stock JSON dump() and | ||
load() methods. Supported datatypes include: list, set, tuple, namedtuple, | ||
OrderedDict, datetime objects, numpy ndarrays, and dicts with non-string | ||
(but still data) keys. Serialization is performed recursively, and descends | ||
into the standard python container types (list, dict, tuple, set). | ||
|
||
Some of this code was adapted from these fantastic blog posts by Chris | ||
Wagner and Sunil Arora: | ||
|
||
http://robotfantastic.org/serializing-python-data-to-json-some-edge-cases.html | ||
http://sunilarora.org/serializable-decorator-for-python-class/ | ||
|
||
''' | ||
|
||
class ThunderSerializeableObjectWrapper(object): | ||
|
||
def __init__(self, *args, **kwargs): | ||
self.wrapped = cls(*args, **kwargs) | ||
|
||
# Allows transparent access to the attributes of the wrapped class | ||
def __getattr__(self, *args): | ||
if args[0] != 'wrapped': | ||
return getattr(self.wrapped, *args) | ||
else: | ||
return self.__dict__['wrapped'] | ||
|
||
# Allows transparent access to the attributes of the wrapped class | ||
def __setattr__(self, *args): | ||
if args[0] != 'wrapped': | ||
return setattr(self.wrapped, *args) | ||
else: | ||
self.__dict__['wrapped'] = args[1] | ||
|
||
# Delegate to wrapped class for special python object-->string methods | ||
def __str__(self): | ||
return self.wrapped.__str__() | ||
def __repr__(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a single blank line before def |
||
return self.wrapped.__repr__() | ||
def __unicode__(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a single blank line |
||
return self.wrapped.__unicode__() | ||
|
||
# Delegate to wrapped class for special python methods | ||
def __call__(self, *args, **kwargs): | ||
return self.wrapped.__str__(*args, **kwargs) | ||
|
||
# ------------------------------------------------------------------------------ | ||
# SERIALIZE() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minor style nit, but I'd drop this heading and the one below |
||
|
||
def serialize(self, numpy_storage='auto'): | ||
''' | ||
Serialize this object to a python dictionary that can easily be converted | ||
to/from JSON using Python's standard JSON library. | ||
|
||
Arguments | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've been using a slightly different format for arguments (it's the same format used by
Will add this to the style guide! |
||
|
||
numpy-storage: choose one of ['auto', 'ascii', 'base64'] (default: auto) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should be numpyStorage to follow the camelCase guidelines |
||
|
||
Use the 'nmupy_storage' argument to select whether numpy arrays | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/nmupy/numpy |
||
will be encoded in ASCII (as a list of lists) in Base64 (i.e. | ||
space efficient binary), or to select automatically (the default) | ||
depending on the size of the array. Currently the Base64 encoding | ||
is selecting if the array has more than 1000 elements. | ||
|
||
Returns | ||
|
||
The object encoded as a python dictionary with "JSON-safe" datatypes that is ready to | ||
be converted to a string using Python's standard JSON library (or another library of | ||
your choice. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ) |
||
''' | ||
from collections import namedtuple, Iterable, OrderedDict | ||
import numpy as np | ||
|
||
def serialize_recursively(data): | ||
import datetime | ||
|
||
if data is None or isinstance(data, (bool, int, long, float, basestring)): | ||
return data | ||
if isinstance(data, list): | ||
return [serialize_recursively(val) for val in data] # Recurse into lists | ||
if isinstance(data, OrderedDict): | ||
return {"py/collections.OrderedDict": | ||
[[serialize_recursively(k), serialize_recursively(v)] for k, v in data.iteritems()]} | ||
if _isnamedtuple(data): | ||
return {"py/collections.namedtuple": { | ||
"type": type(data).__name__, | ||
"fields": list(data._fields), | ||
"values": [serialize_recursively(getattr(data, f)) for f in data._fields]}} | ||
if isinstance(data, dict): | ||
if all(isinstance(k, basestring) for k in data): # Recurse into dicts | ||
return {k: serialize_recursively(v) for k, v in data.iteritems()} | ||
else: | ||
return {"py/dict": [[serialize_recursively(k), serialize_recursively(v)] for k, v in data.iteritems()]} | ||
if isinstance(data, tuple): # Recurse into tuples | ||
return {"py/tuple": [serialize_recursively(val) for val in data]} | ||
if isinstance(data, set): # Recurse into sets | ||
return {"py/set": [serialize_recursively(val) for val in data]} | ||
if isinstance(data, datetime.datetime): | ||
return {"py/datetime": str(data)} | ||
if isinstance(data, np.ndarray): | ||
if numpy_storage == 'ascii' or (numpy_storage == 'auto' and data.size < 1000): | ||
return {"py/numpy.ndarray.ascii": { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any concern that this "type" is of our own invention and not a real python or numpy type, unlike say |
||
"shape": data.shape, | ||
"values": data.tolist(), | ||
"dtype": str(data.dtype)}} | ||
else: | ||
import base64 | ||
return {"py/numpy.ndarray.base64": { | ||
"shape": data.shape, | ||
"values": base64.b64encode(data), | ||
"dtype": str(data.dtype)}} | ||
|
||
raise TypeError("Type %s not data-serializable" % type(data)) | ||
|
||
# Start serializing from the top level object dictionary | ||
return serialize_recursively(self.wrapped.__dict__) | ||
|
||
# ------------------------------------------------------------------------------ | ||
# DESERIALIZE() | ||
|
||
@staticmethod | ||
def deserialize(serialized_dict): | ||
''' | ||
Restore the object that has been converted to a python dictionary using an @serializable | ||
class's serialize() method. | ||
|
||
Arguments | ||
|
||
serialized_dict: a python dictionary returned by serialize() | ||
|
||
Returns: | ||
|
||
A reconstituted class instance | ||
''' | ||
|
||
def restore_recursively(dct): | ||
''' | ||
This object hook helps to deserialize object encoded using the | ||
serialize() method above. | ||
''' | ||
import numpy as np | ||
import base64 | ||
|
||
if "py/dict" in dct: | ||
return dict(restore_recursively(dct["py/dict"])) | ||
if "py/tuple" in dct: | ||
return tuple(restore_recursively(dct["py/tuple"])) | ||
if "py/set" in dct: | ||
return set(restore_recursively(dct["py/set"])) | ||
if "py/collections.namedtuple" in dct: | ||
data = restore_recursively(dct["py/collections.namedtuple"]) | ||
return namedtuple(data["type"], data["fields"])(*data["values"]) | ||
if "py/collections.OrderedDict" in dct: | ||
return OrderedDict(restore_recursively(dct["py/collections.OrderedDict"])) | ||
if "py/datetime" in dct: | ||
from dateutil import parser | ||
return parser.parse(dct["py/datetime"]) | ||
if "py/numpy.ndarray.ascii" in dct: | ||
data = dct["py/numpy.ndarray.ascii"] | ||
return np.array(data["values"], dtype=data["dtype"]) | ||
if "py/numpy.ndarray.base64" in dct: | ||
data = dct["py/numpy.ndarray.base64"] | ||
arr = np.frombuffer(base64.decodestring(data["values"]), np.dtype(data["dtype"])) | ||
return arr.reshape(data["shape"]) | ||
|
||
# Base case: data type needs no further decoding. | ||
return dct | ||
|
||
# First we must restore the object's dictionary entries. These are decoded recursively | ||
# using the helper function above. | ||
restored_dict = {} | ||
for k in serialized_dict.keys(): | ||
restored_dict[k] = restore_recursively(serialized_dict[k]) | ||
|
||
# Next we recreate the object. Calling the __new__() function here creates | ||
# an empty object without calling __init__(). We then take this empty | ||
# shell of an object, and set its dictionary to the reconstructed | ||
# dictionary we pulled from the JSON file. | ||
thawed_object = cls.__new__(cls) | ||
thawed_object.__dict__ = restored_dict | ||
|
||
# Finally, we would like this re-hydrated object to also be @serializable, so we re-wrap it | ||
# in the ThunderSerializeableObjectWrapper using the same trick with __new__(). | ||
rewrapped_object = ThunderSerializeableObjectWrapper.__new__(ThunderSerializeableObjectWrapper) | ||
rewrapped_object.__dict__['wrapped'] = thawed_object | ||
|
||
# Return the re-constituted class | ||
return rewrapped_object | ||
|
||
# End of decorator. Return the wrapper class from inside this closure. | ||
return ThunderSerializeableObjectWrapper | ||
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move these tests into a separate test module
test_decorators.py