Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dill can't pickle functions wrapped with ipythons @require #9

Closed
roryk opened this issue Oct 23, 2013 · 12 comments
Closed

dill can't pickle functions wrapped with ipythons @require #9

roryk opened this issue Oct 23, 2013 · 12 comments

Comments

@roryk
Copy link
Contributor

roryk commented Oct 23, 2013

If you have a function like this:

@require("module_name")
def require_test(x):
return True

And you try to use IPython parallel's parallel map, you get this error:

  File "/n/home05/kirchner/anaconda/envs/gemini/lib/python2.7/site-packages/IPython/kernel/zmq/serialize.py", line 102, in serialize_object
    buffers.insert(0, pickle.dumps(cobj,-1))
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 419, in save_reduce
    save(state)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/site-packages/dill-0.2a2.dev-py2.7.egg/dill/dill.py", line 443, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 681, in _batch_setitems
    save(v)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 419, in save_reduce
    save(state)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/site-packages/dill-0.2a2.dev-py2.7.egg/dill/dill.py", line 443, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 681, in _batch_setitems
    save(v)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/site-packages/dill-0.2a2.dev-py2.7.egg/dill/dill.py", line 421, in save_function
    obj.func_closure), obj=obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 405, in save_reduce
    self.memoize(obj)
  File "/n/home05/kirchner/anaconda/envs/ipc/lib/python2.7/pickle.py", line 244, in memoize
    assert id(obj) not in self.memo

But if you just use the regular pickle, it works fine. I have a minimal example here:

https://github.com/roryk/ipython-cluster-helper/blob/master/example/example.py

Do you have any thoughts about why? I dug around a bunch but quickly got out of my depth. :)

@mmckerns
Copy link
Member

ipython has some hacks on pickle. I'd guess that require is leveraging some of their hacks... so it'd be conflicting with my hacks... most likely, as a guess. ipython parallel works similarly to my own pathos framework... so I can probably digest the code and find out. I'll take a peek into why... but it might take a bit for me to get to it. This also might be a good thing to ping the ipython devs about, especially MinRK who wrote most of the ipython parallel code... they are aware of dill, and have used it. When I follow up on this ticket, I'll probably do that.

@roryk
Copy link
Contributor Author

roryk commented Oct 24, 2013

Pinging @minrk. Hey Ben do you have any idea what could be the issue here?

@mmckerns
Copy link
Member

@roryk: Looks like from the traceback, we are hitting something that didn't get stuffed into pickle's registry... so that makes me think that either (1) I've missed adding some of my recent additions to the pickle registry, or (2) @require is maybe extending a pickler, but not the registry table... so like I said a mismatch between a pickle extension done in ipython and in dill. If you look at the first line in the traceback, you see Ipython/kernel/zmq/serialize... and then a call to Pickler... so I'm guessing that it's (2). I'll have a quick look after I get all the ipython stuff installed again. I'd like to make sure that ipython-cluster-helper gets what it needs. I'm booked pretty hard until SC13... but I'll try to find some time for it. Maybe you can pass me versions of whatever is relevant that you are using...? If you want to clang around with it in the meantime, try the stuff in dill.detect, like dill.detect.errors...

BTW, I never really looked at ipython-cluster-helper, but there looks like some similar stuff in my pyina package. Mine is a couple years old, and in need of some serious work... or maybe scrapping altogether, I don't know. I'd love to pull development toward each other at least... pathos/pyina is planning to so with the gc3pie guys, if possible.

@minrk
Copy link

minrk commented Oct 25, 2013

@require is actually very simple. The one thing that it does do is set __module__ = '__main__' on one or two things, which may be what is confusing dill.

@mmckerns
Copy link
Member

Thanks for the link. Dude... that's not really very simple. Setting __module__ or __name__ are definitely potential issues. I'll have a crack at it soon and get back to you.

@roryk
Copy link
Contributor Author

roryk commented Oct 25, 2013

@mmckerns Thanks for the followup Mike; this is from IPython 1.1.0. I poked around a little bit more but haven't made any progress. I found another edge case where dill is not happy even without using @require, but I haven't gotten it down to a simple test case yet.

I agree pyina looks to be similar to what we are doing with ipython-cluster-helper. There is another drmaa package that does something similar as well. gc3pie looks pretty sweet, but, dang, that is a lot of configuration setup. We hacked together ipython-cluster-helper to make it as easy as possible for end users to get going to use it to talk to the scheduler; just say what scheduler you have, what queue to use and how many jobs you want to use at maximum and go from there.

Thanks for jumping in @minrk, you are like Beetlejuice, say your name a couple of times and poof. I'd hate to see your inbox.

@mmckerns
Copy link
Member

@roryk: Do please send me any of your failure cases for dill you find. I appreciate the feedback, bug reports, etc.

I agree about gc3pie... I'm hoping I can reduce that for them a bit with my stuff. Per ipython-cluster-helper, mine is pretty minimal too. I do have python-c bindings for torque and pbs in one of my svn branches of pyina, but I don't have it merged into the trunk (I want to keep the build overhead low). We should at least steal each others cases that we haven't covered yet. https://github.com/uqfoundation/pyina/blob/master/pyina/schedulers.py

@roryk
Copy link
Contributor Author

roryk commented Dec 1, 2013

Hi Mike,

Sweet, it is awesome you are banging on this. Just wanted to a) give a heads up that the same IPython issue exists with the more robust pickling since it was tagged as being maybe-related and b) ping here so you know I'm still interested. :)

@mmckerns
Copy link
Member

mmckerns commented Dec 1, 2013

@roryk: I haven't forgotten about either. After a full month of travel, workshops, etc... I'm back into it. The goal is to keep converting packages to python 3x and to keep rolling out the releases. I'm pretty near final on dill, and just about to pull the trigger on klepto. pathos should be next, then pyina. That's when I should start a serious press on seeing what crossover we have -- probably later this month.

As far as this particular issue is concerned (with @require), there's a general fail that I see dill has... it's when the name or method of the object does not correspond to where the code lives. Python does this a good bit, and it's also a common thing that several python structures produce. For example, dill can pickle this:

>>> X = namedtuple("Y", ['a','b'])
>>> X.__name__ = 'X'

However, without the last line, it can't -- unless you do this:

>>> X = namedtuple("X", ['a','b'])

This (figuring out the correct name for deceptively named objects) and making iterators pickle are my two next big targets for dill. If the @require stuff ends up being a consequence of the above, then it'll take some time. If not, maybe it'll be easy. I'll find out this month.

@mmckerns
Copy link
Member

mmckerns commented Apr 1, 2014

@roryk: I've added a patch for some similar in-decorator stuff. It might apply… I can serialize require and require_test. Not sure if if works in IPython.parallel, and with your cluster_helper stuff, however.

If you have a chance, can you spin up your test again, and let me know if it passes or fails?

Otherwise, I'll get to it as I start getting into some of IPython.parallel and the cluster_helper stuff for overlap purposes.

@mmckerns
Copy link
Member

mmckerns commented Apr 1, 2014

I just tested it with pathos.multiprocessing and it works. Can you do the same for your cluster_helper or IPython.parallel?

In [1]: import dill

In [2]: from IPython.parallel import require

In [3]: @require('time')
   ...: def require_test(x):
   ...:     return True
   ...: 

In [4]: dill.loads(dill.dumps(require_test))
Out[4]: <IPython.parallel.controller.dependency.dependent at 0x106112d50>

In [5]: from pathos.multiprocessing import ProcessingPool as Pool

In [6]: p = Pool() 

In [7]: def squared(x):
   ....:     return x**2
   ....: 

In [8]: p.map(squared, [1,2,3,4,5])
Out[8]: [1, 4, 9, 16, 25]

In [9]: p.map(require_test, [3])
Out[9]: [True]

@mmckerns
Copy link
Member

works forIPython "using" dill.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants