-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-135552: Clear weakrefs to types in GC after garbage finalization not before #135728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-135552: Clear weakrefs to types in GC after garbage finalization not before #135728
Conversation
@ZeroIntensity @sobolevn @Eclips4 @efimov-mikhail @fxeqxmulfx Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a test case and a NEWS entry.
I think that there's a minimal repro available in the issue.
The solution seems reasonable!
Would you mind adding a benchmark for this commit by comment? |
Yeah, as I said, this seems to be a problem with the GIL-ful garbage collector, not the object model. Something like this would also cause problems on the free-threaded build, but FT is working perfectly. |
@ZeroIntensity The root cause of this error is the absence of subclasses for BaseNode class. Therefore, the I have the following code, and I believe the problem is common to both:
As you can see, the weakref is None, but self has not been destroyed yet. I believe the garbage collector has a mechanism to handle weakrefs, but it runs too early, before |
Ah, and there's no type cache under free-threading. Interesting! |
This fails on
The call to |
In this variant I have crash even with proposed fix:
|
@efimov-mikhail Yeah, thanks! This is great example that proposed solution is wrong. |
On my local machine I have following results: Geometric mean - 1.01x slower
Benchmark hidden because not significant (37): 2to3, comprehensions, bench_mp_pool, bench_thread_pool, coroutines, crypto_pyaes, dask, dulwich_log, float, create_gc_cycles, gc_traversal, generators, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_silent, mako, mdp, nbody, pathlib, pickle_list, pidigits, python_startup, python_startup_no_site, raytrace, regex_dna, regex_v8, scimark_lu, scimark_monte_carlo, scimark_sparse_mat_mult, sqlglot_transpile, tomli_loads, unpack_sequence, unpickle_list CPUCPU
|
@Zheaoli @ZeroIntensity @sobolevn @Eclips4 @efimov-mikhail @fxeqxmulfx Please take a look, it is ready to review. |
Misc/NEWS.d/next/Core_and_Builtins/2025-06-23-01-07-09.gh-issue-135552.eG9vzP.rst
Outdated
Show resolved
Hide resolved
…e-135552.eG9vzP.rst Co-authored-by: sobolevn <mail@sobolevn.me>
Misc/NEWS.d/next/Core_and_Builtins/2025-06-23-01-07-09.gh-issue-135552.eG9vzP.rst
Outdated
Show resolved
Hide resolved
…e-135552.eG9vzP.rst Co-authored-by: Mikhail Efimov <efimov.mikhail@gmail.com>
…iryanov/cpython into pythongh-135552-fix-gc-segfault
Updated results: Geometric mean - 1.00x faster
Benchmark hidden because not significant (56): async_generators, comprehensions, bench_mp_pool, bench_thread_pool, coverage, crypto_pyaes, dask, deepcopy, deepcopy_reduce, deepcopy_memo, deltablue, django_template, docutils, float, create_gc_cycles, genshi_text, genshi_xml, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_silent, logging_simple, mako, nbody, nqueens, pathlib, pickle_list, pickle_pure_python, pidigits, pprint_safe_repr, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, regex_dna, regex_effbot, regex_v8, richards, richards_super, scimark_monte_carlo, scimark_sparse_mat_mult, sqlglot_normalize, sqlglot_optimize, sqlglot_transpile, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tomli_loads, unpack_sequence, xml_etree_generate |
While trying to find the root cause of the linked issue, I observed that the type cache was holding an obsolete reference. Because the type cache holds borrowed references, this leads to a use-after-free error.
I'm resetting the cache in this PR. But I suspect that this may affect overall performance.