-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Concurrency] Fix a race when using cancelAll on a task group concurrently with child tasks being removed. #80096
[Concurrency] Fix a race when using cancelAll on a task group concurrently with child tasks being removed. #80096
Conversation
…ently with child tasks being removed. _swift_taskGroup_cancelAllChildren relies on there being no concurrent modification when called from the owning task, but this is not guaranteed. Rearrange things to always take the owning task's status record lock when walking the group's children. Split _swift_taskGroup_cancelAllChildren into two functions, one which assumes/requires the lock is already held, and one which acquires the lock. We don't have the owning task in this case, but we can either get it from the current task, or by looking at the parent of the child task we're working on. rdar://147172991
@swift-ci please test |
withStatusRecordLock(owningTask, [&group](ActiveTaskStatus status) { | ||
_swift_taskGroup_cancelAllChildren(group); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we use a different naming scheme with the status records in TaskPrivate.h
so the locked ones are without suffix and the ones "assume to call with a lock" are Locked, but I guess _unlocked
is fine, since we want to keep cancelAllChildren
I guess.
Sounds okey, just got me thinking about naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of a tangent, but the ObjC runtime has a crazy scheme where debug builds wrap all locks with a debug lock type which tracks which locks are owned by the current thread, and then functions can easily assert that a given lock is held or not held when it expects. That sort of thing could be helpful here. (Of course, you still have to realize that you require that lock to be held, and write the assert.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, we could sprinkle them over some methods which access records etc. Could be something nice to follow up on
asBaseImpl(group)->cancelAll(); | ||
// TaskGroup is not a Sendable type, so this can only be called from the | ||
// owning task. | ||
asBaseImpl(group)->cancelAll(swift_task_getCurrent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yeah that's true today. It's fine to make this explicit here this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is repeated in a few places where we make this assumption, so I figured I'd put it here too.
@@ -176,15 +181,18 @@ class TaskGroupTaskStatusRecord : public TaskStatusRecord { | |||
|
|||
void detachChild(AsyncTask *child) { | |||
assert(child && "cannot remove a null child from group"); | |||
if (FirstChild == child) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that would have been our offending not locked access AFAICS? (well and the other one just above in attach perhaps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is OK, the task status record lock is held when calling this. The non-locked access was:
static AsyncTask *getNextChildTask(AsyncTask *task) {
return task->childFragment()->getNextChild();
}
Which gets called from cancelAll
. We can run this, get the pointer to the next child, then another thread can detachChild
and then free the task. If that happens before this thread uses the child pointer, it will end up with a dangling pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the details!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a surprise we accessed the status but I guess it's the attach/detach that were the offenders here huh? Thanks for the fix Mike, LGTM
_swift_taskGroup_cancelAllChildren relies on there being no concurrent modification when called from the owning task, but this is not guaranteed.
Rearrange things to always take the owning task's status record lock when walking the group's children. Split _swift_taskGroup_cancelAllChildren into two functions, one which assumes/requires the lock is already held, and one which acquires the lock. We don't have the owning task in this case, but we can either get it from the current task, or by looking at the parent of the child task we're working on.
rdar://147172991