You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding __slots__ = () changes none of this. (16 bytes of the overhead would be for the two GC pointers, another 8 for the __dict__ pointer, if present. I can't explain the final 8. Perhaps alignment? Perhaps the char* is no longer stored at the end of the object when subclassed so there's an extra pointer involved? I haven't looked into it.)
This adds up surprisingly quickly because ZODB uses zodbpickle.binaryto store OIDs. They get turned into str in some cases, but in ghosts you can see the binary objects:
>>> import persistent
>>> importZODB
>>> db =ZODB.DB(None)
>>> with db.transaction() as c:
... c.root.key = persistent.Persistent()
...
>>> with db.transaction() as c:
... type(c.root.key._p_oid)
...
<type 'str'>
>>> db.cacheMinimize()
None
>>> with db.transaction() as c:
... type(c.root.key._p_oid)
...
<class 'zodbpickle.binary'>
In one application, binary was the largest type of object tracked by the GC by an order of magnitude (according to objgraph):
binary 1141836
LOBucket 316823
tuple 282777
LLBucket 236532
dict 233084
list 159828
function 124778
That's about a 35MB difference in memory used compared to str, but even worse, because all those objects are tracked by the GC, GC times increase by 7x (the relative impact diminishes as other objects are added but the constant cost remains):
$ python -m pyperf timeit \ -s "strs = [str(i) for i in range(1141836)]; import gc" \ "gc.collect()".....................Mean +- std dev: 10.5 ms +- 0.9 ms
$ python -m pyperf timeit \ -s "from zodbpickle import binary; strs = [binary(i) for i in range(1141836)]; import gc" \ "gc.collect()".....................Mean +- std dev: 69.8 ms +- 3.0 ms
I don't know of a way to solve these problems in Python, but I'm guessing/hoping it should be pretty simple to solve them by implementing binary using a C extension.
The text was updated successfully, but these errors were encountered:
Instances of
zodbpickle.binary
on CPython 2.7 are at least 32 bytes larger than the equivalent bytes/str object:They are also tracked by the garbage collector, where bytes (which are known to be immutable) are not:
Adding
__slots__ = ()
changes none of this. (16 bytes of the overhead would be for the two GC pointers, another 8 for the__dict__
pointer, if present. I can't explain the final 8. Perhaps alignment? Perhaps thechar*
is no longer stored at the end of the object when subclassed so there's an extra pointer involved? I haven't looked into it.)This adds up surprisingly quickly because ZODB uses
zodbpickle.binary
to store OIDs. They get turned intostr
in some cases, but in ghosts you can see the binary objects:In one application,
binary
was the largest type of object tracked by the GC by an order of magnitude (according to objgraph):That's about a 35MB difference in memory used compared to
str
, but even worse, because all those objects are tracked by the GC, GC times increase by 7x (the relative impact diminishes as other objects are added but the constant cost remains):I don't know of a way to solve these problems in Python, but I'm guessing/hoping it should be pretty simple to solve them by implementing
binary
using a C extension.The text was updated successfully, but these errors were encountered: