-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lookup and hash is slow #178
Comments
If
So simply setting |
Well, in fact nothing is broken. I know the language specification and I can not see how in our case |
Yes, it is. First, let's break things and establish that >>> from zope.interface.interface import InterfaceClass
>>> InterfaceClass.__hash__ = object.__hash__
>>> from zope.interface import Interface
>>> i1 = InterfaceClass('I1', (Interface,), {})
>>> i1_2 = InterfaceClass('I1', (Interface,), {})
>>> i1 == i1_2
True
>>> hash(i1) == hash(i1_2)
False Lists still work, because they only test
But >>> int_1 = 234563
>>> int_1_2 = int(str(int_1))
# Not the same object instance, but equal, and same hash code.
>>> int_1 is int_1_2
False
>>> int_1 == int_1_2
True
>>> int_1_2 == int_1
True
>>> d = {int_1: "Yes"}
>>> int_1_2 in d
True
>>> d.pop(int_1_2)
'Yes' But our interface violates this. >>> i1 == i1_2
True
>>> d = {i1: "Yes"}
>>> i1 in d
True
# What, no, they're equal, why isn't it there?
>>> i1_2 in d
False
>>> d.pop(i1_2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: <InterfaceClass __main__.I1>
>>> It is often the case that the simplest uses of |
I can not imagine any non-insane use case to have two Interface classes with the same name and module (at the moment this may happen by accident, because of the way we detect We can ensure to not have two of the same with a global and some validating code. And given that we do not have more than one InterfaceClass instance with same name and module, we can use the object.hash, because then eq and hash always match the same object. Given no one comes up with a sane use case to have two InterfaceClass instances with same module and name - for performance sake - I would propose to add this breaking change with this kind of singleton-by-module-name-pair InterfaceClass instances. |
ok, I kick the idea, I found another micro-optimization, so the influence of |
This performance problem bugs me since a while and some optimization was already done by me, primary in
InterfaceClass
.I took the long-running and intense component-architecture using Plone tests as a measure here.
I took Py-Spy to get numbers by letting it sample for ~100s. The "Own-Time" of a function or line is here of interest: Where is time wasted?
Two functions always in the top 5 are
zope.interface.interface._lookup
andzope.interface.interface.InterfaceClass.__hash__
. Both together take with:of the test time.
For
__hash__
I looked into the code. It is primary in there, because if__eq__
is implemented__hash__
needs also to be implemented. Just, our__eq__
does not use the__hash__
(which is fine.Our current
__hash__
hashes the tuple of name and module. To be sortable it would need to be swapped anyway.So what if the
InterfaceClass.__hash__
uses the very fastobject.__hash__
?I tried this and it breaks one test in zope.interface. And I question if this test makes sense at all.
I'll prepare a PR to discuss it.
I used it with the Plone tests and nothing breaks so far.
The combined time of
_lookup
and__hash__
shrinks to about 5%I created Plone site and added content, restarted Zope: all fine.
So, its doubles the speed compared to the current state.
For
_lookup
I have no further ideas to optimize. Maybe by shifting this over to C-code? But I would keep this for a different discussion.I really appreciate feedback here.
The text was updated successfully, but these errors were encountered: