performance: store cached hashvalue in slot #179

jensens · 2020-02-16T23:48:29Z

Micro-optimizations. In fact this does only save a few percent:

Py-Spy on Plone, running all tests.
20k samples (200sec)
with current master __hash__ takes 2.6% of time
with this branch __hash__ takes 1.6% of time

I did not found much information on slots, Python C-extensions and inheritance. I hope I got it right.

jamadden

I cannot reproduce any measurable performance improvement in hashing or dict/set performance with this PR (script at the end). I would be surprised to find that accessing a single instance attribute out of a dictionary was the performance bottleneck, but I also don't detect any significant differences there (there is a tiny difference, but like all the differences, the tight standard deviations overlap).

Hashing:

normal: Mean +- std dev: 147 ns +- 6 ns
deferred: Mean +- std dev: 152 ns +- 4 ns
slotted: Mean +- std dev: 138 ns +- 4 ns

Interface (master):  Mean +- std dev: 180 ns +- 5 ns
Interface (this PR): Mean +- std dev: 181 ns +- 5 ns

What is surprising is that I would expect the steady-state performance of either Interface to match that of deferred, but it doesn't. Where's that 30ns of overhead coming from?

Set/Dict performance:

Interface lookup: set (master):  Mean +- std dev: 135 ns +- 10 ns
Interface lookup: set (this PR): Mean +- std dev: 141 ns +- 8 ns
Interface lookup: dict (master):  Mean +- std dev: 141 ns +- 6 ns
Interface lookup: dict (this PR): Mean +- std dev: 132 ns +- 7 ns
Interface dict assign: (master):  Mean +- std dev: 142 ns +- 5 ns
Interface dict assign: (this PR): Mean +- std dev: 143 ns +- 4 ns

Raw ivar access:

Interface ivar (master):  Mean +- std dev: 26.7 ns +- 1.4 ns
Interface ivar (this PR): Mean +- std dev: 24.6 ns +- 0.6 ns

Script:

import pyperf
from zope.interface.interface import InterfaceClass

class Normal:
    def __init__(self):
        self._hash = 42
    def __hash__(self):
        return self._hash

class Deferred:
    def __hash__(self):
        try:
            return self._hash
        except AttributeError:
            self._hash = 42
            return self._hash

class Slotted:
    __slots__ = ('_hash',)
    def __init__(self):
        self._hash = 42
    def __hash__(self):
        return self._hash

class OldInterfaceClass(InterfaceClass):
    def __hash__(self):
        try:
            return self._v_hashvalue
        except AttributeError:
            self._v_hashvalue = hash((self.__name__, self.__module__))
        return self._v_hashvalue


runner = pyperf.Runner()
kw = dict(duplicate=1000, globals=dict(globals()))
runner.timeit('normal', 'hash(it)',
              'it = Normal()',
              **kw)
runner.timeit('deferred', 'hash(it)',
              'it = Deferred()',
              **kw)
runner.timeit('slotted', 'hash(it)',
              'it = Slotted()',
              **kw)
runner.timeit('Interface (master)', 'hash(it)',
              'it = OldInterfaceClass("name", (), {})',
              **kw)
runner.timeit('Interface (this PR)', 'hash(it)',
              'it = InterfaceClass("name", (), {})',
              **kw)

runner.timeit('Interface lookup: set (master)', 'it in s',
              'it = OldInterfaceClass("name", (), {}); s = set()',
              **kw)
runner.timeit('Interface lookup: set (this PR)', 'it in s',
              'it = InterfaceClass("name", (), {}); s = set()',
              **kw)
runner.timeit('Interface lookup: dict (master)', 'it in s',
              'it = OldInterfaceClass("name", (), {}); s = set()',
              **kw)
runner.timeit('Interface lookup: dict (this PR)', 'it in s',
              'it = InterfaceClass("name", (), {}); s = set()',
              **kw)
runner.timeit('Interface dict assign: (master)', 'd[it] = 42',
              'it = OldInterfaceClass("name", (), {}); d = {}',
              **kw)
runner.timeit('Interface dict assign: (this PR)', 'd[it] = 42',
              'it = InterfaceClass("name", (), {}); d = {}',
              **kw)
runner.timeit('Interface ivar (master)', 'it._v_hashvalue',
              'it = OldInterfaceClass("name", (), {}); hash(it)',
              **kw)
runner.timeit('Interface ivar (this PR)', 'it._hashvalue',
              'it = InterfaceClass("name", (), {}); hash(it)',
              **kw)

jamadden · 2020-02-17T11:37:52Z

src/zope/interface/_zope_interface_coptimizations.c

@@ -277,6 +277,7 @@ typedef struct {
    PyObject* _v_attrs;
    PyObject* __iro__;
    PyObject* __sro__;
+    PyObject* _hashvalue;


This adds 8 bytes of bloat to all objects derived from Spec, even though most objects that derive from Spec do not use it. (You get instances of Spec when you write @implementer or classProvides(), among others.) In the example app I used to test #155, there are twice as many ClassProvides and Implements instances as there are InterfaceClass instances.

Does that matter? In the example app I site, it amounts to an extra 36KB of memory allocated. Out of 230MB, that's not much. However, it does have other effects that are harder to measure such as reducing CPU cache efficiencies (because objects become further apart).

jamadden · 2020-02-17T11:43:52Z

src/zope/interface/_zope_interface_coptimizations.c

@@ -413,6 +416,7 @@ static PyMemberDef Spec_members[] = {
  {"_v_attrs", T_OBJECT_EX, offsetof(Spec, _v_attrs), 0, ""},
  {"__iro__", T_OBJECT_EX, offsetof(Spec, __iro__), 0, ""},
  {"__sro__", T_OBJECT_EX, offsetof(Spec, __sro__), 0, ""},
+  {"_hashvalue", T_OBJECT_EX, offsetof(Spec, _hashvalue), 0, ""},


I think this should be T_LONG on Python 2 and T_PYSSIZET (I think) on Python 3. That unwraps the integer into its native value and would remove the overhead of keeping the int object around. So in theory that could actually wind up reducing memory usage a tiny bit.

On the other hand, the rewrapping back to an object might actually slow things down a bit.

jensens · 2020-03-02T10:33:16Z

Scrap it, if something helps , then a better algorythm/C-impl.

jensens added the enhancement label Feb 16, 2020

performance: store cached hashvalue in slot (~1.6x faster)

9c90b93

jensens force-pushed the hash-performance branch from ea4457f to 9c90b93 Compare February 16, 2020 23:51

jamadden reviewed Feb 17, 2020

View reviewed changes

jensens closed this Mar 2, 2020

jensens deleted the hash-performance branch March 2, 2020 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance: store cached hashvalue in slot #179

performance: store cached hashvalue in slot #179

jensens commented Feb 16, 2020

jamadden left a comment

jamadden Feb 17, 2020

jamadden Feb 17, 2020

jensens commented Mar 2, 2020

performance: store cached hashvalue in slot #179

performance: store cached hashvalue in slot #179

Conversation

jensens commented Feb 16, 2020

jamadden left a comment

Choose a reason for hiding this comment

jamadden Feb 17, 2020

Choose a reason for hiding this comment

jamadden Feb 17, 2020

Choose a reason for hiding this comment

jensens commented Mar 2, 2020