-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-128942: make arraymodule.c free-thread safe (lock-free) #130771
base: main
Are you sure you want to change the base?
Conversation
ping @colesbury |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Disclaimer: I'm not an expert on the FT list implementation, so take some of my comments with a grain of salt.
Seeing good single-threaded performance is nice, but what about multi-threaded scaling? The number of locks that are still here scare me a little--it would be nice if this scaled well for concurrent use as well, especially for operations that don't require concurrent writes (e.g., comparisons and copies).
Note, this is not ready to go, there is the memory issue which needs resolving. |
@ZeroIntensity you can remove the do-not-merge, its not an |
The main thing here for acceptance is a benchmark run which I am not able to start (I only did local pyperformance check against main), so someone with access will have to initiate that to compare with main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gotten a chance to look through arraymodule.c
yet. I'll review that later this week.
Include/internal/pycore_pymem.h
Outdated
@@ -117,7 +117,7 @@ extern wchar_t *_PyMem_DefaultRawWcsdup(const wchar_t *str); | |||
extern int _PyMem_DebugEnabled(void); | |||
|
|||
// Enqueue a pointer to be freed possibly after some delay. | |||
extern void _PyMem_FreeDelayed(void *ptr); | |||
PyAPI_FUNC(void) _PyMem_FreeDelayed(void *ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move arraymodule
from Setup.stdlib.in to Setup.bootstrap.in so that it's statically linked to CPython. That will:
- avoid the need to make this function visible externally
- Make the critical section API uses faster because
PyThreadState_GET()
will be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there possible repercussions to some weird software out there depending on array being an external module? Like some instrumentation package or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall approach here seems good. A few comments below.
@@ -1673,5 +1677,266 @@ def test_gh_128961(self): | |||
self.assertRaises(StopIteration, next, it) | |||
|
|||
|
|||
class FreeThreadingTest(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add test_array
to TSAN_TESTS
in Lib/test/libregrtest/tsan.py
. The tests should be able to pass with thread sanitizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing is I don't think the test infrastructure used here and / or the other tests (global vars (probably more this)) are parallel safe yet (will look into it):
$ ./python -m test --parallel-threads=4 -j4 test_array
\Using random seed: 2880345231
0:00:00 load avg: 0.37 Run 1 test in parallel using 1 worker process
0:00:02 load avg: 0.37 [1/1/1] test_array failed (35 errors, 17 failures)
Warning -- Uncaught thread exception: IndexErrorWarning -- Uncaught thread exception: IndexError
Exception in thread test_cmp (test.test_array.ByteTest.test_cmp)-0:
Exception in thread test_cmp (test.test_array.ByteTest.test_cmp)-2:
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/tom/work/cpython/arraylockfree/cp/Lib/threading.py", line 1054, in _bootstrap_inner
self.run()
~~~~~~~~^^
File "/home/tom/work/cpython/arraylockfree/cp/Lib/threading.py", line 996, in run
self._target(*self._args, **self._kwargs)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tom/work/cpython/arraylockfree/cp/Lib/threading.py", line 1054, in _bootstrap_inner
self.run()
~~~~~~~~^^
File "/home/tom/work/cpython/arraylockfree/cp/Lib/test/libregrtest/parallel_case.py", line 24, in run_worker
test_case.run(result)
~~~~~~~~~~~~~^^^^^^^^
File "/home/tom/work/cpython/arraylockfree/cp/Lib/unittest/case.py", line 664, in run
self.doCleanups()
~~~~~~~~~~~~~~~^^
File "/home/tom/work/cpython/arraylockfree/cp/Lib/threading.py", line 996, in run
self._target(*self._args, **self._kwargs)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tom/work/cpython/arraylockfree/cp/Lib/test/libregrtest/parallel_case.py", line 24, in run_worker
test_case.run(result)
~~~~~~~~~~~~~^^^^^^^^
File "/home/tom/work/cpython/arraylockfree/cp/Lib/unittest/case.py", line 664, in run
self.doCleanups()
~~~~~~~~~~~~~~~^^
File "/home/tom/work/cpython/arraylockfree/cp/Lib/unittest/case.py", line 695, in doCleanups
function, args, kwargs = self._cleanups.pop()
~~~~~~~~~~~~~~~~~~^^
File "/home/tom/work/cpython/arraylockfree/cp/Lib/unittest/case.py", line 695, in doCleanups
function, args, kwargs = self._cleanups.pop()
~~~~~~~~~~~~~~~~~~^^
IndexError: pop from empty list
IndexError: pop from empty list
Warning -- Uncaught thread exception: IndexError
Exception in thread test_count (test.test_array.ByteTest.test_count)-3:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I am not suggesting using --parallel-threads=N
. Just adding it TSAN_TESTS
so it's run with Python built with TSAN. To get the same effect locally, you would build Python with TSAN and run:
./python -m test --tsan -j0
Which runs the tests in the TSAN_TESTS
list. It doesn't run with --parallel-threads=N
.
The --parallel-threads=N
option is of limited use right now. It doesn't work with warnings (yet), and may not work with some cleanups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it did help me find two races so it is helpful. Still need to add extra individual tests for the standard tsan run.
@unittest.skipUnless(support.Py_GIL_DISABLED, 'this test can only possibly fail with GIL disabled') | ||
@threading_helper.reap_threads | ||
@threading_helper.requires_working_threading() | ||
def test_free_threading(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is not catching some basic data races. See https://gist.github.com/colesbury/358feeb3a9fc628f63f5d22a2f33916d for an example.
I think you'll want:
- More subscript accesses (reads or writes) concurrent with other operations, especially things that resize the array
- Do things in a short loop, not just once.
Modules/arraymodule.c
Outdated
@@ -130,9 +144,89 @@ enum machine_format_code { | |||
|
|||
#define array_Check(op, state) PyObject_TypeCheck(op, state->ArrayType) | |||
|
|||
#ifdef Py_GIL_DISABLED | |||
typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably use the same structure for the free threading and default builds. We didn't do that in PyListObject because it was constrained by the public API.
What I mean is something like:
typedef struct arrayobject {
PyObject_VAR_HEAD
_PyArrayArray *data;
const struct arraydescr *ob_descr;
PyObject *weakreflist; /* List of weak references */
Py_ssize_t ob_exports; /* Number of exported buffers */
} arrayobject;
And accesses to ob_item
would look like arr->data.ob_item
and capacity as arr->data.capacity
. So no more need to mess with _Py_CONTAINER_OF()
. And allocated
is stored in only one place (in _PyArrayArray.allocated
).
_PyArrayArray
is a bit confusing of a name and there's already other functions with the array_array_
prefix in this file. Maybe PyArrayData
or PyArrayItems
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for minimal reasonable changes, but if you want the full cleanup can certainly do that.
Modules/arraymodule.c
Outdated
@@ -231,37 +350,37 @@ b_setitem(arrayobject *ap, Py_ssize_t i, PyObject *v) | |||
return -1; | |||
} | |||
if (i >= 0) | |||
((char *)ap->ob_item)[i] = (char)x; | |||
((char *)ob_item)[i] = (char)x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these need to be FT_ATOMIC_STORE_CHAR_RELAXED
and similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not here as these can be called multiple times from a single locked function, better acquire / release (for set) in calling lock-free function?
The actual
Are there any other places where this needs to take place? Its the test and trying to run it with
Which is not Left the bad |
I'd like
Yes, |
Ok, I've been playing with the races and have been able to shut up the ones between reads and writes by using atomic operations in get/set, but not between reads/writes and resize without a mutex. Are you sure that tsan is supposed to be quiet here? After all, tsan detects when multiple threads access the same memory without synchronization, which is the whole point of lockfree operation, to allow multiple threads to access the same memory without synchronization. Which makes me think the tsan warnings should be expected? I'm sending up a version with atomic reads/writes in get and set so you can have a look, but I don't think they should be there. They don't slow anything down on Intel but could be bad on other architectures (ARM? not an expert). Adding links to three places in Thoughts? List behavior:
|
Are you using the TSan suppressions? |
Yes. And also no warnings if you run for fewer iterations. List tsan warnings:
To be clear, I am not saying this is an error in |
|
I don't think the memory fence is necessary either, but doesn't hurt. Will remove anyway. Point is, are you sure its a bug. Would you not expect tsan to complain with lock-free operation? |
Yes, it's a bug because it's a data race and data races are undefined behavior. More concretely, it's important that we write the list element with at least
You need to use atomic operations with the appropriate memory order. In some cases, Footnotes
|
Don't we want acquire ordering, not release? That's what I originally suggested before the impl switched to relaxed. |
Yes you were right about that one, but he's talking about writing the list element in |
Issue posted #130920. |
So if you insist on appeasing the tsan gods then as far as I can see its either this or locked Relevant timings:
Note: This instead of locked writes could have data integrity isssues. |
I added lock-free single element reads and writes by mostly copying the
list
object's homework. TL;DR: pyperformance scimark seems to be back to about what it was without the free-thread safe stuff (pending confirmation of course). Tried a few other things but the list strategy seems good enough (except for the negative index thing I mentioned in #130744, if that is an issue).Timings, the relevant ones are "OLD" - non free-thread safe arraymodule, "SLOW" - the previous slower PR and the last two "LFREERW".
array
module is not free-thread safe. #128942