Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Concurrency] Fix a race when using cancelAll on a task group concurrently with child tasks being removed. #80096

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

mikeash
Copy link
Contributor

@mikeash mikeash commented Mar 18, 2025

_swift_taskGroup_cancelAllChildren relies on there being no concurrent modification when called from the owning task, but this is not guaranteed.

Rearrange things to always take the owning task's status record lock when walking the group's children. Split _swift_taskGroup_cancelAllChildren into two functions, one which assumes/requires the lock is already held, and one which acquires the lock. We don't have the owning task in this case, but we can either get it from the current task, or by looking at the parent of the child task we're working on.

rdar://147172991

…ently with child tasks being removed.

_swift_taskGroup_cancelAllChildren relies on there being no concurrent modification when called from the owning task, but this is not guaranteed.

Rearrange things to always take the owning task's status record lock when walking the group's children. Split _swift_taskGroup_cancelAllChildren into two functions, one which assumes/requires the lock is already held, and one which acquires the lock. We don't have the owning task in this case, but we can either get it from the current task, or by looking at the parent of the child task we're working on.

rdar://147172991
@mikeash mikeash requested a review from ktoso as a code owner March 18, 2025 18:52
@mikeash
Copy link
Contributor Author

mikeash commented Mar 18, 2025

@swift-ci please test

withStatusRecordLock(owningTask, [&group](ActiveTaskStatus status) {
_swift_taskGroup_cancelAllChildren(group);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we use a different naming scheme with the status records in TaskPrivate.h so the locked ones are without suffix and the ones "assume to call with a lock" are Locked, but I guess _unlocked is fine, since we want to keep cancelAllChildren I guess.

Sounds okey, just got me thinking about naming

Copy link
Contributor Author

@mikeash mikeash Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a tangent, but the ObjC runtime has a crazy scheme where debug builds wrap all locks with a debug lock type which tracks which locks are owned by the current thread, and then functions can easily assert that a given lock is held or not held when it expects. That sort of thing could be helpful here. (Of course, you still have to realize that you require that lock to be held, and write the assert.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, we could sprinkle them over some methods which access records etc. Could be something nice to follow up on

asBaseImpl(group)->cancelAll();
// TaskGroup is not a Sendable type, so this can only be called from the
// owning task.
asBaseImpl(group)->cancelAll(swift_task_getCurrent());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah that's true today. It's fine to make this explicit here this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is repeated in a few places where we make this assumption, so I figured I'd put it here too.

@@ -176,15 +181,18 @@ class TaskGroupTaskStatusRecord : public TaskStatusRecord {

void detachChild(AsyncTask *child) {
assert(child && "cannot remove a null child from group");
if (FirstChild == child) {
Copy link
Contributor

@ktoso ktoso Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that would have been our offending not locked access AFAICS? (well and the other one just above in attach perhaps).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is OK, the task status record lock is held when calling this. The non-locked access was:

  static AsyncTask *getNextChildTask(AsyncTask *task) {
    return task->childFragment()->getNextChild();
  }

Which gets called from cancelAll. We can run this, get the pointer to the next child, then another thread can detachChild and then free the task. If that happens before this thread uses the child pointer, it will end up with a dangling pointer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the details!

Copy link
Contributor

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a surprise we accessed the status but I guess it's the attach/detach that were the offenders here huh? Thanks for the fix Mike, LGTM

@mikeash mikeash merged commit 1e265c2 into swiftlang:main Mar 20, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants