Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: store cached hashvalue in slot #179

Closed
wants to merge 1 commit into from

Conversation

jensens
Copy link
Member

@jensens jensens commented Feb 16, 2020

Micro-optimizations. In fact this does only save a few percent:

  • Py-Spy on Plone, running all tests.
  • 20k samples (200sec)
  • with current master __hash__ takes 2.6% of time
  • with this branch __hash__ takes 1.6% of time

I did not found much information on slots, Python C-extensions and inheritance. I hope I got it right.

Copy link
Member

@jamadden jamadden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce any measurable performance improvement in hashing or dict/set performance with this PR (script at the end). I would be surprised to find that accessing a single instance attribute out of a dictionary was the performance bottleneck, but I also don't detect any significant differences there (there is a tiny difference, but like all the differences, the tight standard deviations overlap).

Hashing:

normal: Mean +- std dev: 147 ns +- 6 ns
deferred: Mean +- std dev: 152 ns +- 4 ns
slotted: Mean +- std dev: 138 ns +- 4 ns

Interface (master):  Mean +- std dev: 180 ns +- 5 ns
Interface (this PR): Mean +- std dev: 181 ns +- 5 ns

What is surprising is that I would expect the steady-state performance of either Interface to match that of deferred, but it doesn't. Where's that 30ns of overhead coming from?

Set/Dict performance:

Interface lookup: set (master):  Mean +- std dev: 135 ns +- 10 ns
Interface lookup: set (this PR): Mean +- std dev: 141 ns +- 8 ns
Interface lookup: dict (master):  Mean +- std dev: 141 ns +- 6 ns
Interface lookup: dict (this PR): Mean +- std dev: 132 ns +- 7 ns
Interface dict assign: (master):  Mean +- std dev: 142 ns +- 5 ns
Interface dict assign: (this PR): Mean +- std dev: 143 ns +- 4 ns

Raw ivar access:

Interface ivar (master):  Mean +- std dev: 26.7 ns +- 1.4 ns
Interface ivar (this PR): Mean +- std dev: 24.6 ns +- 0.6 ns

Script:

import pyperf
from zope.interface.interface import InterfaceClass

class Normal:
    def __init__(self):
        self._hash = 42
    def __hash__(self):
        return self._hash

class Deferred:
    def __hash__(self):
        try:
            return self._hash
        except AttributeError:
            self._hash = 42
            return self._hash

class Slotted:
    __slots__ = ('_hash',)
    def __init__(self):
        self._hash = 42
    def __hash__(self):
        return self._hash

class OldInterfaceClass(InterfaceClass):
    def __hash__(self):
        try:
            return self._v_hashvalue
        except AttributeError:
            self._v_hashvalue = hash((self.__name__, self.__module__))
        return self._v_hashvalue


runner = pyperf.Runner()
kw = dict(duplicate=1000, globals=dict(globals()))
runner.timeit('normal', 'hash(it)',
              'it = Normal()',
              **kw)
runner.timeit('deferred', 'hash(it)',
              'it = Deferred()',
              **kw)
runner.timeit('slotted', 'hash(it)',
              'it = Slotted()',
              **kw)
runner.timeit('Interface (master)', 'hash(it)',
              'it = OldInterfaceClass("name", (), {})',
              **kw)
runner.timeit('Interface (this PR)', 'hash(it)',
              'it = InterfaceClass("name", (), {})',
              **kw)

runner.timeit('Interface lookup: set (master)', 'it in s',
              'it = OldInterfaceClass("name", (), {}); s = set()',
              **kw)
runner.timeit('Interface lookup: set (this PR)', 'it in s',
              'it = InterfaceClass("name", (), {}); s = set()',
              **kw)
runner.timeit('Interface lookup: dict (master)', 'it in s',
              'it = OldInterfaceClass("name", (), {}); s = set()',
              **kw)
runner.timeit('Interface lookup: dict (this PR)', 'it in s',
              'it = InterfaceClass("name", (), {}); s = set()',
              **kw)
runner.timeit('Interface dict assign: (master)', 'd[it] = 42',
              'it = OldInterfaceClass("name", (), {}); d = {}',
              **kw)
runner.timeit('Interface dict assign: (this PR)', 'd[it] = 42',
              'it = InterfaceClass("name", (), {}); d = {}',
              **kw)
runner.timeit('Interface ivar (master)', 'it._v_hashvalue',
              'it = OldInterfaceClass("name", (), {}); hash(it)',
              **kw)
runner.timeit('Interface ivar (this PR)', 'it._hashvalue',
              'it = InterfaceClass("name", (), {}); hash(it)',
              **kw)

@@ -277,6 +277,7 @@ typedef struct {
PyObject* _v_attrs;
PyObject* __iro__;
PyObject* __sro__;
PyObject* _hashvalue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds 8 bytes of bloat to all objects derived from Spec, even though most objects that derive from Spec do not use it. (You get instances of Spec when you write @implementer or classProvides(), among others.) In the example app I used to test #155, there are twice as many ClassProvides and Implements instances as there are InterfaceClass instances.

Does that matter? In the example app I site, it amounts to an extra 36KB of memory allocated. Out of 230MB, that's not much. However, it does have other effects that are harder to measure such as reducing CPU cache efficiencies (because objects become further apart).

@@ -413,6 +416,7 @@ static PyMemberDef Spec_members[] = {
{"_v_attrs", T_OBJECT_EX, offsetof(Spec, _v_attrs), 0, ""},
{"__iro__", T_OBJECT_EX, offsetof(Spec, __iro__), 0, ""},
{"__sro__", T_OBJECT_EX, offsetof(Spec, __sro__), 0, ""},
{"_hashvalue", T_OBJECT_EX, offsetof(Spec, _hashvalue), 0, ""},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be T_LONG on Python 2 and T_PYSSIZET (I think) on Python 3. That unwraps the integer into its native value and would remove the overhead of keeping the int object around. So in theory that could actually wind up reducing memory usage a tiny bit.

On the other hand, the rewrapping back to an object might actually slow things down a bit.

@jensens
Copy link
Member Author

jensens commented Mar 2, 2020

Scrap it, if something helps , then a better algorythm/C-impl.

@jensens jensens closed this Mar 2, 2020
@jensens jensens deleted the hash-performance branch March 2, 2020 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants