Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump of decorated function in module fails #27

Closed
matsjoyce opened this issue Mar 31, 2014 · 14 comments
Closed

Dump of decorated function in module fails #27

matsjoyce opened this issue Mar 31, 2014 · 14 comments
Labels

Comments

@matsjoyce
Copy link
Contributor

To reproduce:
Place in this code in a file:

def f(func):
    def w(*args):
        return func(*args)
    return w

@f
def f2(): pass

import dill

print(dill.dumps(f2))

In a interactive session or another file:
import (your file name)
I have a crash in both versions of python with the lastest source which looks like:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./debug.py", line 11, in <module>
    print(dill.dumps(f2))
  File "./dill/dill.py", line 130, in dumps
    dump(obj, file, protocol, byref)
  File "./dill/dill.py", line 123, in dump
    pik.dump(obj)
  File "/usr/lib/python3.3/pickle.py", line 235, in dump
    self.save(obj)
  File "/usr/lib/python3.3/pickle.py", line 297, in save
    f(self, obj) # Call unbound method with explicit self
  File "./dill/dill.py", line 438, in save_function
    obj.__closure__), obj=obj)
  File "/usr/lib/python3.3/pickle.py", line 416, in save_reduce
    self.memoize(obj)
  File "/usr/lib/python3.3/pickle.py", line 255, in memoize
    assert id(obj) not in self.memo
AssertionError

It looks like something to do with pickling the __code__ attribute, then repickling it when doing the __globals__ attribute, as if you comment out the __globals__ attribute in dill.py, there is no crash (just an incomplete pickle, presumably).

@matsjoyce
Copy link
Contributor Author

Replacing the last line of the file with dill.dumps([f2.__globals__]) does not result in a crash, but

from io import BytesIO
file = BytesIO()
pik = dill.Pickler(file)
pik.save_reduce(dill.dill.FunctionType, (f2.__globals__,), obj=f2)

does and it does crash for all functions.

@mmckerns
Copy link
Member

@matsjoyce: thanks for the report and the test code. I think this issue is exactly the same as the one identified in issue #18. I think the solution involves passing in a "globals dict" where anything that the function doesn't require has been removed.

Interestingly, if you just run your code with python (your file name).py, then it pickles.

@matsjoyce
Copy link
Contributor Author

Looking at the dill logging, it runs as __main__ due to:

#from dill.py
def save_module_dict(pickler, obj):
    if is_dill(pickler) and obj is pickler._main_module.__dict__:

When run as __main__,__globals__ is stored here

        log.info("D1: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        if PYTHON3:
            pickler.write(bytes('c__builtin__\n__main__\n', 'UTF-8'))
        else:
            pickler.write('c__builtin__\n__main__\n')
    elif not is_dill(pickler) and obj is _main_module.__dict__:
        log.info("D3: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        if PYTHON3:
            pickler.write(bytes('c__main__\n__dict__\n', 'UTF-8'))
        else:
            pickler.write('c__main__\n__dict__\n')   #XXX: works in general?
    else: 

When run as module, __globals__ is stored here

        log.info("D2: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        StockPickler.save_dict(pickler, obj)
    return

Which means that when the file is run as __main__, the __globals__ attribute is not actually stored, so there is no crash.

f2.__globals__ is debug.__dict__
>>> True

Maybe inject 'c%s\n__dict__\n' % obj.__module__ in somehow?

@mmckerns
Copy link
Member

Absolutely.

I think the fix, however, is to pass in a copy of __globals__, where everything that's not required to build the function has been popped out.

I think I have bits of the code needed to do that in dill.pointers and dill.detect.

@matsjoyce
Copy link
Contributor Author

I've got some code that seems to work:

@register(dict)
def save_module_dict(pickler, obj):
    if is_dill(pickler) and obj is pickler._main_module.__dict__:
        log.info("D1: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        if PYTHON3:
            pickler.write(bytes('c__builtin__\n__main__\n', 'UTF-8'))
        else:
            pickler.write('c__builtin__\n__main__\n')
    elif not is_dill(pickler) and obj is _main_module.__dict__:
        log.info("D3: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        if PYTHON3:
            pickler.write(bytes('c__main__\n__dict__\n', 'UTF-8'))
        else:
            pickler.write('c__main__\n__dict__\n')   #XXX: works in general?
    elif "__name__" in obj:
        try:
            module = _import_module(obj["__name__"])
        except:
            pass
        if module.__dict__ is obj:
            if PYTHON3:
                pickler.write(bytes('c%s\n__dict__\n' % obj['__name__'], 'UTF-8'))
            else:
                pickler.write('c%s\n__dict__\n' % obj['__name__'])
    else:
        log.info("D2: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        StockPickler.save_dict(pickler, obj)
    return

The new bit (in the middle) tries to find a module that has a __dict__ that is the obj, and if that can be found, deal with it the same way as for __main__. Haven't fully tested it yet though.

@matsjoyce
Copy link
Contributor Author

That fix works for individual objects, but load_session fails with:

Traceback (most recent call last):
  File "debug2.py", line 1, in <module>
    from debug import *
  File "/home/matthew/Programming/C++/Python/eggs/debug.py", line 17, in <module>
    b=dill.load_session()
  File "/home/matthew/Programming/C++/Python/eggs/dill/dill.py", line 183, in load_session
    module = unpickler.load()
  File "/home/matthew/Programming/C++/Python/eggs/pickle.py", line 911, in load
    dispatch[key[0]](self)
  File "/home/matthew/Programming/C++/Python/eggs/pickle.py", line 1314, in load_build
    inst = stack[-1]
IndexError: list index out of range

Might just be a silly mistake on my side, though.

@mmckerns
Copy link
Member

Ah… that's a good idea.

@matsjoyce
Copy link
Contributor Author

The above exception can be fixed by checking if it is the main module.

@register(dict)
def save_module_dict(pickler, obj):
    if is_dill(pickler) and obj is pickler._main_module.__dict__:
        log.info("D1: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        if PYTHON3:
            pickler.write(bytes('c__builtin__\n__main__\n', 'UTF-8'))
        else:
            pickler.write('c__builtin__\n__main__\n')
    elif not is_dill(pickler) and obj is _main_module.__dict__:
        log.info("D3: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        if PYTHON3:
            pickler.write(bytes('c__main__\n__dict__\n', 'UTF-8'))
        else:
            pickler.write('c__main__\n__dict__\n')   #XXX: works in general?
    elif '__name__' in obj and obj != _main_module.__dict__:
        try:
            module = _import_module(obj['__name__'])
        except:
            pass
        if module.__dict__ is obj:
            if PYTHON3:
                pickler.write(bytes('c%s\n__dict__\n' % obj['__name__'], 'UTF-8'))
            else:
                pickler.write('c%s\n__dict__\n' % obj['__name__'])
    else:
        log.info("D2: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        StockPickler.save_dict(pickler, obj)
    return

Shall I make a PR?

@mmckerns
Copy link
Member

mmckerns commented Apr 1, 2014

There are a few small things above that would need to be addressed first:

  1. if your try throws an exception, you pass, then module is undefined.
  2. if module.__dict__ is not object, then nothing gets serialized

Both (1) and (2) above should be rolled up into the elif, so where those conditions aren't met, something else happens (such as 'D2': save_dict)

@mmckerns
Copy link
Member

mmckerns commented Apr 1, 2014

Just a suggestion:

    elif '__name__' in obj and obj is not _main_module.__dict__ \
        and obj is getattr(_import_module(obj['__name__']),'__dict__',None):
        log.info("D4: <dict%s" % str(obj.__repr__).split('dict')[-1]) # obj
        if PYTHON3:
            pickler.write(bytes('c%s\n__dict__\n' % obj['__name__'], 'UTF-8'))
        else:
            pickler.write('c%s\n__dict__\n' % obj['__name__'])

Then, yes, do a pull request.

@matsjoyce
Copy link
Contributor Author

The problem with that is if _import_module throws. I'll cook something up.

@mmckerns
Copy link
Member

mmckerns commented Apr 1, 2014

Yep. So possible fix would be to catch errors in _import_module and return a None in that case. Could add a safe=True, flag to _import_module if didn't want to deal with changes elsewhere, but I think that the other places it's used it'd be consistent. (safe=True would catch errors and return None, safe=False would throw errors)

@mmckerns mmckerns added the bug label Apr 1, 2014
@mmckerns
Copy link
Member

mmckerns commented Apr 1, 2014

Something like this is python2/3 agnostic:

def _import_module(import_name, safe=False):
    try:
        if '.' in import_name:
            items = import_name.split('.')
            module = '.'.join(items[:-1])
            obj = items[-1]
        else:
            return __import__(import_name)
        return getattr(__import__(module, None, None, [obj]), obj)
    except:
        if safe: return None
        raise sys.exc_info()[1]

But handle it as you like.

@matsjoyce matsjoyce mentioned this issue Apr 1, 2014
@mmckerns
Copy link
Member

mmckerns commented Apr 1, 2014

closed by #28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants