Skip to content

Crash due to racy read in dictobject do_lookup under free threading #132869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hawkinsp opened this issue Apr 24, 2025 · 1 comment
Closed

Crash due to racy read in dictobject do_lookup under free threading #132869

hawkinsp opened this issue Apr 24, 2025 · 1 comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@hawkinsp
Copy link
Contributor

hawkinsp commented Apr 24, 2025

Crash report

What happened?

Repro:

import concurrent.futures
import functools
import threading
import typing

def closure(b, c):
  b.wait()
  for i in range(10):
    getattr(c, str(i), None)
    setattr(c, str(i), 99)

with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
  for _ in range(100):
    class MyClass:
      pass
    b = threading.Barrier(32)
    o = MyClass()
    for j in range(32):
      executor.submit(functools.partial(closure, b, o))

with an interpreter built with free threading and --with-pydebug this will eventually crash with:

python: Objects/dictobject.c:1139: int compare_unicode_unicode(PyDictObject *, PyDictKeysObject *, void *, Py_ssize_t, PyObject *, Py_hash_t): Assertion `ep_key != NULL' failed.

From a JAX CI job that exhibited the problem, I have this stack trace:

    #5 __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) tsan_interceptors_posix.cpp.o (python3.13+0xe5db5) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)
    #6 _PyDictKeys_StringLookup /usr/local/google/home/phawkins/p/cpython/Objects/dictobject.c:1209:12 (python3.13+0x2f4fc5) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)
    #7 _PyObject_TryGetInstanceAttribute /usr/local/google/home/phawkins/p/cpython/Objects/dictobject.c:6975:21 (python3.13+0x2f4fc5)
    #8 _PyObject_GenericGetAttrWithDict /usr/local/google/home/phawkins/p/cpython/Objects/object.c:1676:17 (python3.13+0x320157) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)
    #9 PyObject_GenericGetAttr /usr/local/google/home/phawkins/p/cpython/Objects/object.c:1751:12 (python3.13+0x31ff14) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)
    #10 PyObject_GetAttr /usr/local/google/home/phawkins/p/cpython/Objects/object.c:1261:18 (python3.13+0x31f44a) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)
    #11 _PyEval_EvalFrameDefault /usr/local/google/home/phawkins/p/cpython/Python/generated_cases.c.h:3770:28 (python3.13+0x4c00a0) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)
    #12 _PyEval_EvalFrame /usr/local/google/home/phawkins/p/cpython/./Include/internal/pycore_ceval.h:119:16 (python3.13+0x4a8283) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)
    #13 _PyEval_Vector /usr/local/google/home/phawkins/p/cpython/Python/ceval.c:1816:12 (python3.13+0x4a8283)
    #14 _PyFunction_Vectorcall /usr/local/google/home/phawkins/p/cpython/Objects/call.c (python3.13+0x2595dc) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)
    #15 _PyObject_VectorcallDictTstate /usr/local/google/home/phawkins/p/cpython/Objects/call.c:146:15 (python3.13+0x257864) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)
    #16 _PyObject_Call_Prepend /usr/local/google/home/phawkins/p/cpython/Objects/call.c:504:24 (python3.13+0x259a42) (BuildId: 9b1024b81603c5cc913dc4b031a6f189f128dafb)

I think what is happening here is that do_lookup does an unlocked read of the index:

ix = dictkeys_get_index(dk, i);

and then the key:
PyObject *ep_key = FT_ATOMIC_LOAD_PTR_RELAXED(ep->me_key);

but a writer may set the index of a fresh entry before populating the key. For example, this writer would do that:

dictkeys_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);

(I don't know which writer is involved here.)

This is from the 3.13 branch at commit 341b86e .

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.13.3+ experimental free-threading build (heads/3.13:341b86e095e, Apr 23 2025, 21:51:53) [Clang 18.1.8 (16+build1)]

Linked PRs

@hawkinsp hawkinsp added the type-crash A hard crash of the interpreter, possibly with a core dump label Apr 24, 2025
@picnixz picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading labels Apr 24, 2025
hawkinsp added a commit to hawkinsp/cpython that referenced this issue May 7, 2025
…ct under free threading.

Currently the reads and writes to a dictionary entry and to its contents are
unordered. For example in `do_lookup` we have the following code:
```
    for (;;) {
        ix = dictkeys_get_index(dk, i);
        if (ix >= 0) {
            int cmp = check_lookup(mp, dk, ep0, ix, key, hash);
```
where `dictkeys_get_index` performs a relaxed atomic read of `dk_indices[i]`,
where the `check_lookup` function might be, say, compare_unicode_unicode which
makes a relaxed load of the me_key value in that index.
```
    PyDictUnicodeEntry *ep = &((PyDictUnicodeEntry *)ep0)[ix];
    PyObject *ep_key = FT_ATOMIC_LOAD_PTR_RELAXED(ep->me_key);
    assert(ep_key != NULL);
```

However, the writer also does not order these two writes appropriately; for
example `insert_combined_dict` does the following:

```
    Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
    dictkeys_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);

    if (DK_IS_UNICODE(mp->ma_keys)) {
        PyDictUnicodeEntry *ep;
        ep = &DK_UNICODE_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
        STORE_KEY(ep, key);
        STORE_VALUE(ep, value);
    }
    else {
        PyDictKeyEntry *ep;
        ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
        STORE_KEY(ep, key);
        STORE_VALUE(ep, value);
        STORE_HASH(ep, hash);
    }
    mp->ma_version_tag = new_version;
```

where the `dk_indices` value is set first, followed by setting the `me_key`,
both as relaxed writes. This is problematic because the write to
`dk_indices` may be ordered first, either by the program order on x86, or
allowed by the relaxed memory ordering semantics of ARM. The reader above
will be able to observe a state where the index has been set but the key value
is still null, leading to a crash in the reproducer of python#132869.

The fix is two-fold:
* order the index write after the the write to its contents, and
* use sequentially consistent reads and writes. It would suffice to use
  load-acquire and store-release here but those atomic operations do not exist
  in the CPython atomic headers at the necessary types.

I was only able to reproduce the crash under CPython 3.13, but I do not
see any reason the bug is fixed on the 3.14 branch either since the code
does not seem to have changed.

Fixes python#132869
hawkinsp added a commit to hawkinsp/cpython that referenced this issue May 7, 2025
…ct under free threading.

Currently the reads and writes to a dictionary entry and to its contents are
unordered. For example in `do_lookup` we have the following code:
```
    for (;;) {
        ix = dictkeys_get_index(dk, i);
        if (ix >= 0) {
            int cmp = check_lookup(mp, dk, ep0, ix, key, hash);
```
where `dictkeys_get_index` performs a relaxed atomic read of `dk_indices[i]`,
where the `check_lookup` function might be, say, compare_unicode_unicode which
makes a relaxed load of the me_key value in that index.
```
    PyDictUnicodeEntry *ep = &((PyDictUnicodeEntry *)ep0)[ix];
    PyObject *ep_key = FT_ATOMIC_LOAD_PTR_RELAXED(ep->me_key);
    assert(ep_key != NULL);
```

However, the writer also does not order these two writes appropriately; for
example `insert_combined_dict` does the following:

```
    Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
    dictkeys_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);

    if (DK_IS_UNICODE(mp->ma_keys)) {
        PyDictUnicodeEntry *ep;
        ep = &DK_UNICODE_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
        STORE_KEY(ep, key);
        STORE_VALUE(ep, value);
    }
    else {
        PyDictKeyEntry *ep;
        ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
        STORE_KEY(ep, key);
        STORE_VALUE(ep, value);
        STORE_HASH(ep, hash);
    }
    mp->ma_version_tag = new_version;
```

where the `dk_indices` value is set first, followed by setting the `me_key`,
both as relaxed writes. This is problematic because the write to
`dk_indices` may be ordered first, either by the program order on x86, or
allowed by the relaxed memory ordering semantics of ARM. The reader above
will be able to observe a state where the index has been set but the key value
is still null, leading to a crash in the reproducer of python#132869.

The fix is two-fold:
* order the index write after the the write to its contents, and
* use sequentially consistent reads and writes. It would suffice to use
  load-acquire and store-release here but those atomic operations do not exist
  in the CPython atomic headers at the necessary types.

I was only able to reproduce the crash under CPython 3.13, but I do not
see any reason the bug is fixed on the 3.14 branch either since the code
does not seem to have changed.

Fixes python#132869
hawkinsp added a commit to hawkinsp/cpython that referenced this issue May 8, 2025
…ct under free threading.

Currently the reads and writes to a dictionary entry and to its contents are
unordered. For example in `do_lookup` we have the following code:
```
    for (;;) {
        ix = dictkeys_get_index(dk, i);
        if (ix >= 0) {
            int cmp = check_lookup(mp, dk, ep0, ix, key, hash);
```
where `dictkeys_get_index` performs a relaxed atomic read of `dk_indices[i]`,
where the `check_lookup` function might be, say, compare_unicode_unicode which
makes a relaxed load of the me_key value in that index.
```
    PyDictUnicodeEntry *ep = &((PyDictUnicodeEntry *)ep0)[ix];
    PyObject *ep_key = FT_ATOMIC_LOAD_PTR_RELAXED(ep->me_key);
    assert(ep_key != NULL);
```

However, the writer also does not order these two writes appropriately; for
example `insert_combined_dict` does the following:

```
    Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
    dictkeys_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);

    if (DK_IS_UNICODE(mp->ma_keys)) {
        PyDictUnicodeEntry *ep;
        ep = &DK_UNICODE_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
        STORE_KEY(ep, key);
        STORE_VALUE(ep, value);
    }
    else {
        PyDictKeyEntry *ep;
        ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
        STORE_KEY(ep, key);
        STORE_VALUE(ep, value);
        STORE_HASH(ep, hash);
    }
    mp->ma_version_tag = new_version;
```

where the `dk_indices` value is set first, followed by setting the `me_key`,
both as relaxed writes. This is problematic because the write to
`dk_indices` may be ordered first, either by the program order on x86, or
allowed by the relaxed memory ordering semantics of ARM. The reader above
will be able to observe a state where the index has been set but the key value
is still null, leading to a crash in the reproducer of python#132869.

The fix is two-fold:
* order the index write after the the write to its contents, and
* use sequentially consistent reads and writes. It would suffice to use
  load-acquire and store-release here but those atomic operations do not exist
  in the CPython atomic headers at the necessary types.

I was only able to reproduce the crash under CPython 3.13, but I do not
see any reason the bug is fixed on the 3.14 branch either since the code
does not seem to have changed.

Fixes python#132869
@colesbury colesbury added the 3.13 bugs and security fixes label May 8, 2025
colesbury added a commit to colesbury/cpython that referenced this issue May 8, 2025
This fixes a crash in `_PyObject_TryGetInstanceAttribute` due to the use
of `_PyDictKeys_StringLookup` on an unlocked dictionary that may be
concurrently modified.

The underlying bug was already fixed in 3.14 and the main branch.

(partially cherry picked from commit 1b15c89)
methane pushed a commit that referenced this issue May 14, 2025
This fixes a crash in `_PyObject_TryGetInstanceAttribute` due to the use
of `_PyDictKeys_StringLookup` on an unlocked dictionary that may be
concurrently modified.

The underlying bug was already fixed in 3.14 and the main branch.

(partially cherry picked from commit 1b15c89)
@kumaraditya303
Copy link
Contributor

Fixed by #133700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants