Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: allow cleanups to run concurrently #71825

Closed
mknyszek opened this issue Feb 18, 2025 · 21 comments
Closed

runtime: allow cleanups to run concurrently #71825

mknyszek opened this issue Feb 18, 2025 · 21 comments
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal Proposal-Accepted
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented Feb 18, 2025

Proposal Details

Currently the docs for runtime.AddCleanup state:

A single goroutine runs all cleanup calls for a program, sequentially.

But this was copied from runtime.SetFinalizer because it was convenient. It really deserved more thought.

Since runtime.AddCleanup just debuted in Go 1.24, and thus there probably aren't too many programs actually depending on this behavior, I propose we remove this sentence from the docs and intentionally run cleanups concurrently (on two goroutines to start with) to prevent dependence on this behavior.

We can always change the behavior back to sequential in a backwards-compatible way. We can still recommend that people shunt long-running work to a separate goroutine.

This proposal is partially motivated by #71772, in which sub-optimal usage of unique.Make may result in a rate of insertions into unique's internal map that exceeds the pace of deletions, resulting in unbounded memory growth. The current deletion strategy involves a goroutine that wastefully walks over the whole map. Using runtime.AddCleanup mitigates this problem by directly targeting dead entries in the map. However, this is only a mitigation, and not a full solution. Today, to fully resolve the issue, we will have to shunt work to a global work queue that's specific to the unique package using runtime.AddCleanup, and introduce a new backpressure mechanism in unique.Make. Even so, the cleanup/finalizer goroutine can still theoretically become a bottleneck at high GOMAXPROCS (think 128+). Worse still, anyone else using runtime.AddCleanup to build a canonicalization map or cache has a chance of running into the same problem, and must roll their own solution. This issue has been demonstrated in a small reproducer, where GOMAXPROCS doesn't have to be very high at all, though it hasn't yet been demonstrated in production.

Even if we don't have a concrete plan yet on how exactly to support such uses of runtime.AddCleanup, I think we should at least open the door to alternative cleanup strategies.

@gopherbot gopherbot added this to the Proposal milestone Feb 18, 2025
@gabyhelp
Copy link

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@gabyhelp gabyhelp added the LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool label Feb 19, 2025
@aclements
Copy link
Member

Because this affects the 1.24 release that just happened, we're planning to fast-track this proposal. It doesn't seem too controversial, and the impact should be minimal because this API was only introduced in 1.24.

@aclements
Copy link
Member

@mknyszek , can you write exactly what that paragraph of the AddCleanup documentation would be after this change?

@mknyszek
Copy link
Contributor Author

Here's a draft diff:

-// A single goroutine runs all cleanup calls for a program, sequentially. If a
-// cleanup function must run for a long time, it should create a new goroutine.
+// Cleanups may run concurrently with one another, unlike finalizers. However, if
+// a cleanup function must run for a long time, it should create a new goroutine
+// to avoid blocking the execution of other cleanups.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/650697 mentions this issue: runtime: schedule cleanups across multiple goroutines

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/650696 mentions this issue: runtime: document that cleanups can run concurrently with each other

@aclements
Copy link
Member

@cherrymui notes that this text allows cleanups to run on a regular user goroutine (e.g., the one calling AddCleanup). It's tempting to allow this because it could be a useful back-pressure mechanism. However, if we were to run a cleanup on a regular user goroutine, it could cause deadlocks; for example, if a goroutine is holding lock L and calls AddCleanup, which then synchronously runs a cleanup that itself tries to acquire lock L. Thus, we should be specific that that the cleanups run concurrently with all user-created goroutines.

I think other than that detail, the proposal committee is good with this change.

@aclements aclements moved this to Incoming in Proposals Feb 19, 2025
@aclements
Copy link
Member

Based on the discussion above, this proposal seems like a likely accept.
— aclements for the proposal review group

The proposal is to remove the following sentences from the runtime.AddCleanup documentation:

// A single goroutine runs all cleanup calls for a program, sequentially. If a
// cleanup function must run for a long time, it should create a new goroutine.

and replace them with

// Cleanups run concurrently with any user-created goroutines.
// Cleanups may also run concurrently with one another (unlike finalizers).
// If a cleanup function must run for a long time, it should create a new goroutine
// to avoid blocking the execution of other cleanups.

@aclements aclements moved this from Incoming to Likely Accept in Proposals Feb 19, 2025
@mateusz834
Copy link
Member

Since runtime.AddCleanup just debuted in Go 1.24, and thus there probably aren't too many programs actually depending on this behavior, I propose we remove this sentence from the docs and intentionally run cleanups concurrently (on two goroutines to start with) to prevent dependence on this behavior.

Should there be a backport, at least for the change in the doc comment?

@mknyszek
Copy link
Contributor Author

@cherrymui notes that this text allows cleanups to run on a regular user goroutine (e.g., the one calling AddCleanup). It's tempting to allow this because it could be a useful back-pressure mechanism. However, if we were to run a cleanup on a regular user goroutine, it could cause deadlocks; for example, if a goroutine is holding lock L and calls AddCleanup, which then synchronously runs a cleanup that itself tries to acquire lock L.

That's a very good point. Requiring it to be totally concurrent SGTM.

@mknyszek
Copy link
Contributor Author

mknyszek commented Feb 19, 2025

Should there be a backport, at least for the change in the doc comment?

Yes, definitely. I'll backport the doc change at the very least. Actually running cleanups on more than 2 goroutines concurrently might be too complicated to backport, but just for the sake of preventing a dependency on the current behavior, it might be fairly simple to just occasionally shunt cleanup work off to another goroutine.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/651659 mentions this issue: [release-branch.go1.24] runtime: run every 4th cleanup on a separate goroutine

@mknyszek
Copy link
Contributor Author

@gopherbot Please open a backport issue for Go 1.24.

This API is new to Go 1.24. At present, we only plan to backport the documentation change, since https://go.dev/cl/651659 is a bit hacky and thus risky. I haven't been able yet to come up with a simpler way to do this. There may be something we can do with the race detector.

(This is all assuming there's isn't a last-minute reason why we shouldn't approve this proposal.)

@mknyszek
Copy link
Contributor Author

I'm going to work on a patch in Go 1.25 to start enforcing this in race detector by treating cleanup calls as executing in different race contexts, even if they're executed from the same goroutine, which will help identify any latent data races.

Hopefully this will be easy to backport as well.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #71955 (for 1.24).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/652636 mentions this issue: [release-branch.go1.24] runtime: help the race detector detect possible concurrent cleanups

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/652637 mentions this issue: [release-branch.go1.24] runtime: document that cleanups can run concurrently with each other

@aclements
Copy link
Member

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— aclements for the proposal review group

The proposal is to remove the following sentences from the runtime.AddCleanup documentation:

// A single goroutine runs all cleanup calls for a program, sequentially. If a
// cleanup function must run for a long time, it should create a new goroutine.

and replace them with

// Cleanups run concurrently with any user-created goroutines.
// Cleanups may also run concurrently with one another (unlike finalizers).
// If a cleanup function must run for a long time, it should create a new goroutine
// to avoid blocking the execution of other cleanups.

@aclements aclements moved this from Likely Accept to Accepted in Proposals Feb 26, 2025
@aclements aclements changed the title proposal: runtime: allow cleanups to run concurrently runtime: allow cleanups to run concurrently Feb 26, 2025
@aclements aclements removed this from the Proposal milestone Feb 26, 2025
@aclements aclements added this to the Backlog milestone Feb 26, 2025
gopherbot pushed a commit that referenced this issue Feb 26, 2025
…rrently with each other

For #71825.
Fixes #71955.

Change-Id: I25af19eb72d75f13cf661fc47ee5717782785326
Reviewed-on: https://go-review.googlesource.com/c/go/+/652637
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
@mateusz834
Copy link
Member

Lets say that at some point we would like to optimize AddCleanup, such that if ptr does not escape we just defer (block scoped) the cleanup (for inlined functions). I am not saying that this is a good idea, but then "Cleanups run concurrently with any user-created goroutines.", does not permit this.

Any thoughts?

@mknyszek
Copy link
Contributor Author

This implementation strategy is attractive but does fall into the same issue as:

However, if we were to run a cleanup on a regular user goroutine, it could cause deadlocks; for example, if a goroutine is holding lock L and calls AddCleanup, which then synchronously runs a cleanup that itself tries to acquire lock L.

This could be a surprising source of deadlocks, though I suppose in this case it would trigger deterministically.

I personally would like it to be the case that we can run cleanups on user goroutines for other reasons (backpressure), I'm not sure how we do this without adding another footgun to AddCleanup.

(As cleanups approach "can run anywhere, any time on user goroutines," they start to look more like Unix signals, which are notoriously hard to reason about. This is of course an extreme comparison though. There might be a way to leave this particular door open.)

@aclements
Copy link
Member

In limited cases we may be able to statically prove that there's no way to distinguish whether the cleanup is running on the same goroutine or not, and transform it into a defer. This would be quite limited, but could probably capture simple cleanups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LibraryProposal Issues describing a requested change to the Go standard library or x/ libraries, but not to a tool Proposal Proposal-Accepted
Projects
Status: Accepted
Development

No branches or pull requests

5 participants