-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation: Writing persistent objects: Be careful about __eq__ and __hash__ #106
Comments
SGTM |
On Sat, Sep 10, 2016 at 1:08 PM, Jason Madden notifications@github.com
Maybe in "Other things you can do, but shouldn't". Jim Jim Fulton |
As in, you can add custom |
Yup. It occurs to me that it might be nice to have mix-in classes (or maybe just one) that implements identity-based hash and comparison based on OIDs. In the past, I wanted PxBTrees, but maybe IdentityHashablePersistent and IdentyComparablePersistent (or maybe just the later and maybe with better names :)) |
Ok, I'll write this up and submit a PR. (I think it could also use something about implementing comparable methods to be used in a BTree, at least a pointer.) Those sound like pretty good mixin classes. What would do you do before the object is assigned an OID though? Use its |
I would error. Using it's id would be a disaster. |
I had cause to remember about |
Opened PR #118 for this. |
Add a section on the pitfalls of __eq__/__hash__. Fixes #106.
In general, objects without custom
__eq__
and__hash__
objects are going to be friendlier on the DB and the cache (if you can get away with identity semantics).Suppose you have two classes:
Now if you had some dictionaries using those classes as keys:
Doing something like:
is going to unghost 5000
WithHash
objects, whereasisn't going to unghost any objects (because unpickling a dictionary re-hashes all the keys, and
hash()
looks up__hash__
on the class not the instance, so if the__hash__
method doesn't access any attributes---like the default one---nothing has to be unghosted) .Even if your object cache is sized appropriately, large-ish dictionaries can take a long time to unpickle when accessed for the first time in a particular connection, adding lots of load to the DB and/or cache system; the same happens when creating a dictionary in memory for the first time of such persistent objects.
If you can accept identity semantics (and for persistent objects, you surprisingly often can), it's better to avoid custom
__eq__
and__hash__
methods if you'll ever be creating dictionaries or sets of your persistent objects.This is a lesson we learned the hard way; coming from a Java background almost all of our objects defined custom
__eq__
and__hash__
methods, and that was fine until we started to get a lot of objects, when it became a performance burden. Now it turns out that many such of those objects don't need these methods.Worth adding to the docs?
The text was updated successfully, but these errors were encountered: