Skip to content

gh-135552: Clear weakrefs to types in GC after garbage finalization not before #135728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

sergey-miryanov
Copy link
Contributor

@sergey-miryanov sergey-miryanov commented Jun 19, 2025

While trying to find the root cause of the linked issue, I observed that the type cache was holding an obsolete reference. Because the type cache holds borrowed references, this leads to a use-after-free error.

I'm resetting the cache in this PR. But I suspect that this may affect overall performance.

@sergey-miryanov
Copy link
Contributor Author

@ZeroIntensity @sobolevn @Eclips4 @efimov-mikhail @fxeqxmulfx Please take a look.

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a test case and a NEWS entry.
I think that there's a minimal repro available in the issue.

The solution seems reasonable!

@Zheaoli
Copy link
Contributor

Zheaoli commented Jun 20, 2025

Would you mind adding a benchmark for this commit by comment?

@sergey-miryanov
Copy link
Contributor Author

@sobolevn Thanks for the review! I will do soon. However, as with all simple solutions, this one doesn't seem to be the right one. I'm digging further.

@Zheaoli Yes, I'm planning to do perf benchmarks.

@ZeroIntensity
Copy link
Member

However, as with all simple solutions, this one doesn't seem to be the right one. I'm digging further.

Yeah, as I said, this seems to be a problem with the GIL-ful garbage collector, not the object model. Something like this would also cause problems on the free-threaded build, but FT is working perfectly.

@sergey-miryanov
Copy link
Contributor Author

@ZeroIntensity The root cause of this error is the absence of subclasses for BaseNode class. Therefore, the STORE_ATTR (->PyObject_SetAttr->tp_setattro->type_update_dict->type_modified_unlocked) can't reset the cache for subclasses. As a result, the Node type cache contains obsolete entries.

I have the following code, and I believe the problem is common to both:

class XXX:
    def __init__(self):
        self.x = weakref.ref(self)
        self.z = self

    def __del__(self):
        assert self.x() == self, (self.x(), self.z, self)

XXX()
XXX()

➜ .\python.bat -X faulthandler x.py 
Exception ignored while calling deallocator <function XXX.__del__ at 0x000001E0085F5010>:
Traceback (most recent call last):
  File "D:\Sources\_pythonish\cpython\main\x.py", line 29, in __del__
    assert self.x() == self, (self.x(), self.z, self)
AssertionError: (None, <__main__.XXX object at 0x000001E00838B760>, <__main__.XXX object at 0x000001E00838B760>)
Exception ignored while calling deallocator <function XXX.__del__ at 0x000001E0085F5010>:
Traceback (most recent call last):
  File "D:\Sources\_pythonish\cpython\main\x.py", line 29, in __del__
    assert self.x() == self, (self.x(), self.z, self)
AssertionError: (None, <__main__.XXX object at 0x000001E00838B8C0>, <__main__.XXX object at 0x000001E00838B8C0>)

As you can see, the weakref is None, but self has not been destroyed yet. I believe the garbage collector has a mechanism to handle weakrefs, but it runs too early, before _Py_Finalize (where tp_finalize is called).

@ZeroIntensity
Copy link
Member

The root cause of this error is the absence of subclasses for BaseNode class. Therefore, the STORE_ATTR (->PyObject_SetAttr->tp_setattro->type_update_dict->type_modified_unlocked) can't reset the cache for subclasses. As a result, the Node type cache contains obsolete entries.

Ah, and there's no type cache under free-threading. Interesting!

@sergey-miryanov
Copy link
Contributor Author

This fails on gc.collect and is not tied to interpreter finalization.

import gc

def x():

    class BaseNode:
        next = None


    class Node(BaseNode):
        def __init__(self) -> None:
            if BaseNode.next is None:
                BaseNode.next = self
            else:
                BaseNode.next.next = self

        def __del__(self) -> None:
            print(BaseNode.__subclasses__())
            BaseNode.next = BaseNode.next.next


    Node()
    Node()

x()
gc.collect()

➜ .\python.bat -X faulthandler x.py 
Running Debug|x64 interpreter...
[]
[]
Windows fatal exception: access violation

Current thread 0x000127b0 (most recent call first):
  Garbage-collecting
  File "D:\Sources\_pythonish\cpython\main\x.py", line 18 in __del__
  File "D:\Sources\_pythonish\cpython\main\x.py", line 25 in <module>

Current thread's C stack trace (most recent call first):
  <cannot get C stack on this system>

The call to gc.collect without arguments do a full collection and handle_weakrefs is called in this case. handle_weakrefs clears weakrefs and calls any callbacks if they exist. And moves some refs to the reachable list.

@efimov-mikhail
Copy link
Contributor

In this variant I have crash even with proposed fix:

import gc

def x():
    class BaseNode:
        next = None

    class Node(BaseNode):
        def __init__(self) -> None:
            if BaseNode.next is None:
                BaseNode.next = self
            else:
                BaseNode.next.next = self

        def __del__(self) -> None:
            BaseNode.next.next
            BaseNode.next = self
            BaseNode.next.next

    Node()
    Node()

x()
print("Before GC")
gc.collect()
print("No segfault")

@sergey-miryanov
Copy link
Contributor Author

@efimov-mikhail Yeah, thanks! This is great example that proposed solution is wrong.

@python-cla-bot
Copy link

python-cla-bot bot commented Jun 22, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@sergey-miryanov
Copy link
Contributor Author

sergey-miryanov commented Jun 23, 2025

On my local machine I have following results:

Geometric mean - 1.01x slower
All benchmarks:
===============

+--------------------------+----------+------------------------+
| Benchmark                | ref      | fix                    |
+==========================+==========+========================+
| async_generators         | 421 ms   | 427 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| asyncio_tcp              | 769 ms   | 741 ms: 1.04x faster   |
+--------------------------+----------+------------------------+
| asyncio_tcp_ssl          | 1.92 sec | 1.88 sec: 1.02x faster |
+--------------------------+----------+------------------------+
| chaos                    | 79.7 ms  | 81.4 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| coverage                 | 86.3 ms  | 85.0 ms: 1.02x faster  |
+--------------------------+----------+------------------------+
| deepcopy                 | 317 us   | 323 us: 1.02x slower   |
+--------------------------+----------+------------------------+
| deepcopy_reduce          | 3.34 us  | 3.42 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| deepcopy_memo            | 36.3 us  | 36.7 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| deltablue                | 4.71 ms  | 4.81 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| django_template          | 44.7 ms  | 45.8 ms: 1.03x slower  |
+--------------------------+----------+------------------------+
| docutils                 | 2.46 sec | 2.48 sec: 1.01x slower |
+--------------------------+----------+------------------------+
| fannkuch                 | 429 ms   | 435 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| genshi_text              | 28.5 ms  | 29.9 ms: 1.05x slower  |
+--------------------------+----------+------------------------+
| genshi_xml               | 61.8 ms  | 63.7 ms: 1.03x slower  |
+--------------------------+----------+------------------------+
| logging_simple           | 12.1 us  | 12.2 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| meteor_contest           | 101 ms   | 103 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| nqueens                  | 103 ms   | 106 ms: 1.03x slower   |
+--------------------------+----------+------------------------+
| pickle                   | 11.3 us  | 11.2 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| pickle_dict              | 25.8 us  | 25.3 us: 1.02x faster  |
+--------------------------+----------+------------------------+
| pickle_pure_python       | 411 us   | 415 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| pprint_safe_repr         | 1.03 sec | 1.03 sec: 1.00x slower |
+--------------------------+----------+------------------------+
| pprint_pformat           | 2.09 sec | 2.11 sec: 1.01x slower |
+--------------------------+----------+------------------------+
| pyflate                  | 486 ms   | 489 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| regex_compile            | 144 ms   | 145 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| regex_effbot             | 2.00 ms  | 2.02 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| richards                 | 59.1 ms  | 60.1 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| richards_super           | 67.5 ms  | 69.1 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| scimark_fft              | 312 ms   | 316 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| scimark_sor              | 148 ms   | 149 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| spectral_norm            | 115 ms   | 116 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| sqlglot_normalize        | 129 ms   | 132 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| sqlglot_optimize         | 60.0 ms  | 61.0 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| sqlglot_parse            | 1.53 ms  | 1.54 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| sqlite_synth             | 2.33 us  | 2.38 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| sympy_expand             | 495 ms   | 507 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| sympy_integrate          | 19.0 ms  | 19.7 ms: 1.04x slower  |
+--------------------------+----------+------------------------+
| sympy_sum                | 143 ms   | 145 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| sympy_str                | 285 ms   | 289 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| telco                    | 8.19 ms  | 8.11 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| typing_runtime_protocols | 184 us   | 186 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| unpickle                 | 14.3 us  | 14.1 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| unpickle_pure_python     | 302 us   | 306 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| xml_etree_parse          | 137 ms   | 133 ms: 1.03x faster   |
+--------------------------+----------+------------------------+
| xml_etree_iterparse      | 108 ms   | 106 ms: 1.02x faster   |
+--------------------------+----------+------------------------+
| xml_etree_generate       | 107 ms   | 109 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| xml_etree_process        | 76.4 ms  | 77.7 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| Geometric mean           | (ref)    | 1.01x slower           |
+--------------------------+----------+------------------------+

Benchmark hidden because not significant (37): 2to3, comprehensions, bench_mp_pool, bench_thread_pool, coroutines, crypto_pyaes, dask, dulwich_log, float, create_gc_cycles, gc_traversal, generators, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_silent, mako, mdp, nbody, pathlib, pickle_list, pidigits, python_startup, python_startup_no_site, raytrace, regex_dna, regex_v8, scimark_lu, scimark_monte_carlo, scimark_sparse_mat_mult, sqlglot_transpile, tomli_loads, unpack_sequence, unpickle_list

CPU

CPU

11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz

Base speed:	3.91 GHz
Sockets:	1
Cores:	6
Logical processors:	12
Virtualization:	Enabled
L1 cache:	480 KB
L2 cache:	3.0 MB
L3 cache:	12.0 MB

Utilization	3%
Speed	2.38 GHz
Up time	11:08:47:59
Processes	470
Threads	7904
Handles	253279

@sergey-miryanov
Copy link
Contributor Author

@Zheaoli @ZeroIntensity @sobolevn @Eclips4 @efimov-mikhail @fxeqxmulfx Please take a look, it is ready to review.

@sergey-miryanov sergey-miryanov changed the title gh-135552: Reset type cache after finalization to prevent possible segfault while garbage collecting gh-135552: Clear weakrefs to types in GC after garbage finalization Jun 23, 2025
@sergey-miryanov sergey-miryanov changed the title gh-135552: Clear weakrefs to types in GC after garbage finalization gh-135552: Clear weakrefs to types in GC after garbage finalization not before Jun 23, 2025
…e-135552.eG9vzP.rst

Co-authored-by: sobolevn <mail@sobolevn.me>
…e-135552.eG9vzP.rst

Co-authored-by: Mikhail Efimov <efimov.mikhail@gmail.com>
@sergey-miryanov
Copy link
Contributor Author

Updated results:

Geometric mean - 1.00x faster
All benchmarks:
===============

+--------------------------+----------+------------------------+
| Benchmark                | ref      | fix2                   |
+==========================+==========+========================+
| 2to3                     | 341 ms   | 345 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| asyncio_tcp              | 769 ms   | 745 ms: 1.03x faster   |
+--------------------------+----------+------------------------+
| asyncio_tcp_ssl          | 1.92 sec | 1.89 sec: 1.01x faster |
+--------------------------+----------+------------------------+
| chaos                    | 79.7 ms  | 79.0 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| coroutines               | 30.5 ms  | 31.0 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| dulwich_log              | 85.9 ms  | 85.4 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| fannkuch                 | 429 ms   | 434 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| gc_traversal             | 2.91 ms  | 2.93 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| generators               | 42.8 ms  | 43.1 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| mdp                      | 1.36 sec | 1.37 sec: 1.00x slower |
+--------------------------+----------+------------------------+
| meteor_contest           | 101 ms   | 100 ms: 1.00x faster   |
+--------------------------+----------+------------------------+
| pickle                   | 11.3 us  | 11.2 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| pickle_dict              | 25.8 us  | 25.4 us: 1.02x faster  |
+--------------------------+----------+------------------------+
| pprint_pformat           | 2.09 sec | 2.10 sec: 1.00x slower |
+--------------------------+----------+------------------------+
| scimark_fft              | 312 ms   | 314 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| scimark_lu               | 128 ms   | 129 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| scimark_sor              | 148 ms   | 149 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| spectral_norm            | 115 ms   | 117 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| sqlglot_parse            | 1.53 ms  | 1.54 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| sqlite_synth             | 2.33 us  | 2.35 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| typing_runtime_protocols | 184 us   | 185 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| unpickle                 | 14.3 us  | 14.2 us: 1.01x faster  |
+--------------------------+----------+------------------------+
| unpickle_list            | 4.09 us  | 4.13 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| unpickle_pure_python     | 302 us   | 300 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| xml_etree_parse          | 137 ms   | 133 ms: 1.03x faster   |
+--------------------------+----------+------------------------+
| xml_etree_iterparse      | 108 ms   | 107 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| xml_etree_process        | 76.4 ms  | 77.4 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| Geometric mean           | (ref)    | 1.00x faster           |
+--------------------------+----------+------------------------+

Benchmark hidden because not significant (56): async_generators, comprehensions, bench_mp_pool, bench_thread_pool, coverage, crypto_pyaes, dask, deepcopy, deepcopy_reduce, deepcopy_memo, deltablue, django_template, docutils, float, create_gc_cycles, genshi_text, genshi_xml, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_silent, logging_simple, mako, nbody, nqueens, pathlib, pickle_list, pickle_pure_python, pidigits, pprint_safe_repr, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, regex_dna, regex_effbot, regex_v8, richards, richards_super, scimark_monte_carlo, scimark_sparse_mat_mult, sqlglot_normalize, sqlglot_optimize, sqlglot_transpile, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tomli_loads, unpack_sequence, xml_etree_generate

@sobolevn sobolevn requested a review from nascheme June 24, 2025 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants