Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable byref=False for classes defined in modules #128

Open
mmckerns opened this issue Sep 2, 2015 · 11 comments
Open

enable byref=False for classes defined in modules #128

mmckerns opened this issue Sep 2, 2015 · 11 comments

Comments

@mmckerns
Copy link
Member

mmckerns commented Sep 2, 2015

See: http://stackoverflow.com/questions/32363312/why-dill-dumps-external-classes-by-reference-no-matter-what

# file: foo.py
# 
class Foo:
    y = 1
    def bar( self, x ):
        return x + Foo.y

We also define Foo dynamically, for comparison.

>>> import dill
>>> import foo
>>> 
>>> class Foo:
...     y = 1
...     def bar( self, x ):
...         return x + Foo.y
... 
>>> f = Foo()
>>> ff = foo.Foo()

So when Foo is defined in __main__, byref is respected.

>>> dill.dumps(f, byref=False)              
b'\x80\x03cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01X\x04\x00\x00\x00typeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05h\x01X\x06\x00\x00\x00objectq\x06\x85q\x07Rq\x08\x85q\t}q\n(X\r\x00\x00\x00__slotnames__q\x0b]q\x0cX\x03\x00\x00\x00barq\rcdill.dill\n_create_function\nq\x0e(cdill.dill\n_unmarshal\nq\x0fC]\xe3\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x01\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x03Foo\xda\x01y)\x02\xda\x04self\xda\x01x\xa9\x00r\x05\x00\x00\x00\xfa\x07<stdin>\xda\x03bar\x03\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x10\x85q\x11Rq\x12c__builtin__\n__main__\nh\rNN}q\x13tq\x14Rq\x15X\x07\x00\x00\x00__doc__q\x16NX\n\x00\x00\x00__module__q\x17X\x08\x00\x00\x00__main__q\x18X\x01\x00\x00\x00yq\x19K\x01utq\x1aRq\x1b)\x81q\x1c.'
>>> dill.dumps(f, byref=True)
b'\x80\x03c__main__\nFoo\nq\x00)\x81q\x01.'
>>>

However, when the class is defined in a module, byref is not respected.

>>> dill.dumps(ff, byref=False)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
>>> dill.dumps(ff, byref=True)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'

Note, that I wouldn't use the recurse option in this case, as Foo.y will likely infinitely recurse. That's also something that I believe there's current ticket for, but if there isn't, there should be.

Let's dig a little deeper… what if we modify the instance…

>>> ff.zap = lambda x: x + ff.y
>>> _ff = dill.loads(dill.dumps(ff))
>>> _ff.zap(2)
3
>>> dill.dumps(ff, byref=True)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01}q\x02X\x03\x00\x00\x00zapq\x03cdill.dill\n_create_function\nq\x04(cdill.dill\n_unmarshal\nq\x05CY\xe3\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x00\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x02ff\xda\x01y)\x01\xda\x01x\xa9\x00r\x04\x00\x00\x00\xfa\x07<stdin>\xda\x08<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x06\x85q\x07Rq\x08c__builtin__\n__main__\nX\x08\x00\x00\x00<lambda>q\tNN}q\ntq\x0bRq\x0csb.'
>>> dill.dumps(ff, byref=False)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01}q\x02X\x03\x00\x00\x00zapq\x03cdill.dill\n_create_function\nq\x04(cdill.dill\n_unmarshal\nq\x05CY\xe3\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x00\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x02ff\xda\x01y)\x01\xda\x01x\xa9\x00r\x04\x00\x00\x00\xfa\x07<stdin>\xda\x08<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x06\x85q\x07Rq\x08c__builtin__\n__main__\nX\x08\x00\x00\x00<lambda>q\tNN}q\ntq\x0bRq\x0csb.'
>>> 

No biggie, it pulls in the dynamically added code. However, we'd probably like to modify Foo and not the instance.

>>> Foo.zap = lambda self,x: x + Foo.y
>>> dill.dumps(f, byref=True)
b'\x80\x03c__main__\nFoo\nq\x00)\x81q\x01.'
>>> dill.dumps(f, byref=False)
b'\x80\x03cdill.dill\n_create_type\nq\x00(cdill.dill\n_load_type\nq\x01X\x04\x00\x00\x00typeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05h\x01X\x06\x00\x00\x00objectq\x06\x85q\x07Rq\x08\x85q\t}q\n(X\x03\x00\x00\x00barq\x0bcdill.dill\n_create_function\nq\x0c(cdill.dill\n_unmarshal\nq\rC]\xe3\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x01\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x03Foo\xda\x01y)\x02\xda\x04self\xda\x01x\xa9\x00r\x05\x00\x00\x00\xfa\x07<stdin>\xda\x03bar\x03\x00\x00\x00s\x02\x00\x00\x00\x00\x01q\x0e\x85q\x0fRq\x10c__builtin__\n__main__\nh\x0bNN}q\x11tq\x12Rq\x13X\x07\x00\x00\x00__doc__q\x14NX\r\x00\x00\x00__slotnames__q\x15]q\x16X\n\x00\x00\x00__module__q\x17X\x08\x00\x00\x00__main__q\x18X\x01\x00\x00\x00yq\x19K\x01X\x03\x00\x00\x00zapq\x1ah\x0c(h\rC`\xe3\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00C\x00\x00\x00s\x0b\x00\x00\x00|\x01\x00t\x00\x00j\x01\x00\x17S)\x01N)\x02\xda\x03Foo\xda\x01y)\x02\xda\x04self\xda\x01x\xa9\x00r\x05\x00\x00\x00\xfa\x07<stdin>\xda\x08<lambda>\x01\x00\x00\x00s\x00\x00\x00\x00q\x1b\x85q\x1cRq\x1dc__builtin__\n__main__\nX\x08\x00\x00\x00<lambda>q\x1eNN}q\x1ftq Rq!utq"Rq#)\x81q$.'

Ok, that's fine, but what about the Foo in our external module?

>>> ff = foo.Foo()
>>> 
>>> foo.Foo.zap = lambda self,x: x + foo.Foo.y
>>> dill.dumps(ff, byref=False)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
>>> dill.dumps(ff, byref=True)
b'\x80\x03cfoo\nFoo\nq\x00)\x81q\x01.'
>>> 

Hmmm… not good. So the above is probably a pretty compelling use case to change the behavior dill exhibits for classes defined in modules -- or at least enable one of the settings to provide better behavior.

@mmckerns
Copy link
Member Author

mmckerns commented Sep 2, 2015

Related to #123, somewhat, I think.

Also note that using recurse=True gives a RecursionError… and should probably be it's own separate issue.

@fake-name
Copy link

Has anything happened with this?

I'm in a context where I need to serialize classes for passing out over a RPC layer, and the fact that classes seem to be serialized by reference is a problem.

@matsjoyce
Copy link
Contributor

Ooo, looks like I need to resurrect #47. @mmckerns Do we want to do this before or after making diff official?

@mmckerns
Copy link
Member Author

@matsjoyce: I went back and looked at the timings, and diff just kills speed... however, it's really useful and central to a few issues. I think I was leaning toward making diff a dill.setting much like byref. What that does is let the user chose either performance or capability, at least in the near term until there's a faster implementation for what's in diff.

I'd be happy to see #47 (and related) resurrected, and have them go in at least in a dill.settings type of capacity. Would you be interested in that approach?

@kjanko
Copy link

kjanko commented Nov 20, 2018

Has there been any progress on this?

@matsjoyce
Copy link
Contributor

No. The way dill currently handles objects is a mix of reference and complete serialization (modifiable to some extent with flags). I think to fully cover cases like this requires a bit of a redesign of dill as if we continue adding flags we're going to end up with a bit of a mess. Ideally we'd write several serialization functions for each type (for complete and by reference serialization, also see all the different ways we serialize files) and have some sort of general way to switch between modes. However doing this in dill is difficult due to all the compatibility code with various old python versions and the fact that people don't always like breaking API changes. I was considering writing a new serialization module that did this but since I don't have a need for it at the moment not much has been done.

Anyway, in the short term, if you need to adjust the behaviour of dill you can override the pickler for a particular type quite easily using dill._dill.register.

@make-ing
Copy link

I don't want to be pushy, but has there been any progress until now? Or is there a way I can make it work with flags so I can send my object from a class which I imported from a module via dill?

@mmckerns
Copy link
Member Author

mmckerns commented Dec 19, 2018

@make-ing: It's the same answer as less than a month ago. If you'd like the flags to work for this particular case, you either need to make some small hacks to turn on dill.diff (which is currently disabled), or you can work with dill._dill.register to support the specific class you are interested in. Of course, if it's one of your own-built classes, you can always add a reduce method. Of course, you can always fork the repo and then contribute a patch...

@mmckerns
Copy link
Member Author

@matsjoyce: We don't store classes defined in __main__ by reference when byref=False. We'd have to either store the accompanying module with the class, or the class plus any global references (the latter I think was your approach)... or, there's the hack of modifying the module attribute to be __main__ and temporarily storing the original module name somewhere else. The first approach is not very efficient, but it seems like it would actually be somewhat simple. Not sure it makes sense to do so, however.

@zabodek
Copy link

zabodek commented Apr 22, 2020

Any update on this feature? My use case is I have an parent class and a child class. When I dill the child class and then undill it in location where the parent doesn't exist, I get an error trying to call methods of the parent class because only references to those methods were dilled, not the methods themselves.

@mr-easy
Copy link

mr-easy commented Aug 31, 2023

Is there any other way around to save the class definition along with the pickled object?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants