Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should dump_session accept a list/dict of objects to ignore? #66

Open
mmckerns opened this issue Oct 5, 2014 · 9 comments · May be fixed by #475
Open

should dump_session accept a list/dict of objects to ignore? #66

mmckerns opened this issue Oct 5, 2014 · 9 comments · May be fixed by #475
Labels

Comments

@mmckerns
Copy link
Member

mmckerns commented Oct 5, 2014

I'm not sure what kind of impact that might have if one would ignore an object... then expect to start up a session again and everything work. Maybe it's not up to dill to care… and it's the user's problem if it blows things up in the dump/load of the session.

@matsjoyce
Copy link
Contributor

We could replace the objects with a dill.Ignored singleton. It'd be a relatively simple change, but I'm having difficulty visualising a circumstance where its OK to have a load_session where half the things don't work... But, as you say, it's the users problem. It would just have to come with a big doc string saying Use as last resort! The object will NOT work on the other end!.

@mmckerns
Copy link
Member Author

mmckerns commented Oct 8, 2014

It might make sense when there's an isolated object, such as a generator that was created but not used… but in the case that it's a matplotlib plot in an IPython session, and it doesn't serialize, but the user is primarily wanting to capture it… I think it's not so good.

Maybe a better alternate would be dump_session(ignore=True), where dill just "skips" anything that it can't serialize… (i.e. catch all serialization errors, and move on). Then there's only some of the corner cases that blow up on load… and the same could be done there. Is then a session with missing bits worthwhile for the user? That's for the user to decide.

@matsjoyce
Copy link
Contributor

Yup, I suppose so. What would be the best way to implement that? Overload Pickler.save and Unpickler.load?

@mmckerns
Copy link
Member Author

mmckerns commented Oct 8, 2014

I think either overload the dump method of the dill.Pickler, or wrap the behavior into the dump_session function call -- probably the former. I could see a pure try-except approach, or an approach driven by the methods in dill.detect. Similarly for load.

@mittenchops
Copy link

I just asked a StackOverflow question that might be a use case for this:
http://stackoverflow.com/questions/27351980/how-to-add-a-custom-type-to-dills-pickleable-types

@matsjoyce
Copy link
Contributor

OK, @mittenchops, question answered. It could be a use case, but remember that if anything on the other end needs your collection, it wouldn't be there.

@mmckerns
Copy link
Member Author

mmckerns commented Mar 9, 2016

Currently when I do a dump_session, I do something like this:

map(globals().pop, tuple(i for (i,j) in globals().iteritems() if not dill.pickles(j) and i not in ('__builtins__',)))

to remove all objects that will not pickle. There's probably a better way to do it, but I tend to at least do some variant of the above on-the-fly. This will not be the most efficient, but will work as long as dill.pickles is correct (which is overwhelmingly most of the time).

@RuneScape314159265
Copy link

RuneScape314159265 commented Jun 1, 2021

Firstly, what an epic tool! Super useful when working with jupyter notebooks that take a long time to complete and recomputing everything is either a) impossible b) merely a massive pain - thank you for making it!

I think it would be great if this ^ above hack were incorporated into the package itself, i.e.

dill.dump_session(fileName, ignore=True)

Produces

Variables:
- var1
- var3
- var5 
could not be pickled and will not be restored. Do you wish to continue (everything else that can be will be stored)? [y/n]: 

The user types yes to continue and dill saves everything about the current environment that it can, ignoring the variables specified.

In a large file with a lot of globals I imagine that even running the check might take a while (it does in some of my files) so there should probably be an additional flag so that the user can set whether they want dill to automatically pickle even if it can't do everything - default being yes (if the user selects no, then, if dill can't pickle everything like above, they will be prompted asking whether they want to continue).

It would be a big quality of life improvement for me, and since I'm hardly unique I'm guessing many others.

Going hunting online for a work around isn't easy (this post, presenting the best solution, is quite hidden away and I'm guessing many don't read through it / miss it).

######################################################

P.S: Until such a time: @mmckerns fix needs a tiny bit of updating. iteritems is deprecated in Python3. Also, map is lazy, and thus doesn't actually do anything. An easy way to force execution is to turn it into a list. In all:

list(map(globals().pop, tuple(i for (i,j) in globals().items() if not dill.pickles(j) and i not in ('__builtins__',))))
dill.dump_session("testing.db")

@leogama
Copy link
Contributor

leogama commented May 3, 2022

Hello, I'm working on a new feature like this for dump_session(). As it is not always possible or convenient to delete unpickable or large but cheap to generate objects from the namespace before saving the session, as they could be needed after it, I consider this to be a relevant feature.

Before I submit a draft PR, how do you think the API could be like? And how should load_session()'s behavior be in this case? Should it simply ignore the not saved variables, restore them as a dill singleton as suggested, or should it do this just for variables not defined in the namespace?

I already have a working prototype that can deal with IPython's command history variables. 👌🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants