Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialize only the global variables used by function #86

Closed
wants to merge 1 commit into from

Conversation

davies
Copy link

@davies davies commented Mar 14, 2015

This PR try to fix the issue in #50

def capture_output(f):
stdout = sys.stdout
sys.stdout = StringIO()
f()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ugly, should we parse the byte code by ourselves?

@davies
Copy link
Author

davies commented Mar 14, 2015

cc @JoshRosen

@mmckerns
Copy link
Member

@davies: Ok, interesting… If I get the time to play with it today, I will try to get back to you this evening. From what I see, I expect there will be a few small cleaning/changes before it goes in.

@davies
Copy link
Author

davies commented Mar 16, 2015

@mmckerns Thanks, any comments are welcome. detect.py depends on dill.py, so I'm not detect.py is the perfect place to put the change in.

@matsjoyce
Copy link
Contributor

In python 3 you could use dis.get_instructions, but, for compatability with 2, you could use (modified from dis.py):

import dis

def to_int(x):
    if not isinstance(x, int):
        return ord(x)
    return x

def find_globals(co):
    globs = set()
    code = co.co_code
    n = len(code)
    i = 0
    extended_arg = 0
    while i < n:
        op = to_int(code[i])
        i = i+1
        if op >= dis.HAVE_ARGUMENT:
            oparg = to_int(code[i]) + to_int(code[i+1]) * 256 + extended_arg
            extended_arg = 0
            i = i+2
            if op == dis.EXTENDED_ARG:
                extended_arg = oparg * 65536
            if op in dis.hasname and dis.opname[op].endswith("_GLOBAL"):
                globs.add(co.co_names[oparg])
    return globs

@mmckerns
Copy link
Member

@davies: yeah, that's part of what should be resolved, along with some naming, and as @matsjoyce points out, some comparability issues.

@davies
Copy link
Author

davies commented Mar 18, 2015

@matsjoyce @mmckerns Could you take over this patch? I think that will be easier to merge.

@mmckerns
Copy link
Member

@davies: Ok, sure, I/we will do that. I'm a bit backed up at the moment, but will get to it hopefully soon. One thing you can do in the meantime is to make sure your patch works for test cases in #50, provide any further cases it does/doesn't work, and any further feedback on the proposed changes/requirements from #50. Thanks for the initial submission. I'll merge it when changes/updates ready are to go.

@mmckerns
Copy link
Member

@matsjoyce: I don't get the same results from your code as I do from @davies. Just FYI.

@davies: I have a question for you… what would you expect from something like this:

a = 1
def squared(x):
  return a+x**2

def foo(x):
  def bar(y):
    return squared(x)+y
  return bar

res = globalvars(foo, shallow=False)

Would you expect: assert set(res) == set(['squared', 'a']) or assert set(res) == set(['squared'])? Your code finds the latter, but it's easy to edit the code (see below) to find the former. Do you want the former or the latter?

            func = find_globals(getattr(func, func_code)) #, depth=None)
            # find globals for all entries of func
            for key in func.copy():
                func.update(globalvars(globs.get(key), shallow=False).keys())

I definitely think what is currently in the dill trunk (i.e. find all global variables that are referenced in the local scope only) is a desired and consistent use case. I think the use case of finding all globals that are required to serialize a function is another -- and is what you were shooting at. They should probably be separate dill functions, and not one replacing the other.

@mmckerns
Copy link
Member

Ok, obviously this is a high-priority to wrap-up. I'll finish this off first opportunity.

@mmckerns
Copy link
Member

mmckerns commented Jun 3, 2015

edited this PR heavily, then added the 'essence' of it in: fadbffc

Issues arose that removed functionality from dill in certain corner cases (e.g. test_classdef among other tests failed), so I opted for a serialization flag on the Pickler. You get the behavior you desired with recurse=True and it returns to the default with recurse=False.

If I can remove the conflict, I can remove the flag (or make the default recurse=True).
Closing this PR, and will open an issue for failures with the recurse flag.

@mmckerns mmckerns closed this Jun 3, 2015
@mmckerns
Copy link
Member

mmckerns commented Jun 3, 2015

@mmckerns
Copy link
Member

mmckerns commented Jun 3, 2015

added global setting dill.settings['recurse']

@matsjoyce
Copy link
Contributor

Looks good. Just one question: why is it called recurse for dump?

@mmckerns
Copy link
Member

mmckerns commented Jun 3, 2015

The major difference is whether or not dump does a "shallow" pickling of globals or a "deep" pickling of globals. This flag is also relevant for dill.source.globalvars and other items in dill.source, where the choice comes down to whether one wants to do a "shallow" pass at getting the globals or a "deep" one (using recursion). I started with shallow=False to have the code recursively find the referred-to globals, then didn't like the name, so opted for recurse=True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants