Skip to content

fix(matching): leaked RLock in describe() blocks all writers indefinitely#10077

Merged
dnr merged 4 commits into
temporalio:mainfrom
sanbricio:fix/release-rwmutex-lock-on-describe-early-return
May 18, 2026
Merged

fix(matching): leaked RLock in describe() blocks all writers indefinitely#10077
dnr merged 4 commits into
temporalio:mainfrom
sanbricio:fix/release-rwmutex-lock-on-describe-early-return

Conversation

@sanbricio
Copy link
Copy Markdown
Contributor

@sanbricio sanbricio commented Apr 26, 2026

When buildIds contains an empty string and defaultQueue() returns nil, describe() returned errDefaultQueueNotInit without first releasing the RLock acquired at the start of the function, leaving the mutex permanently locked and blocking any concurrent writers. I've detected these bug using my own linter for concurrency https://github.com/sanbricio/goconcurrencylint

What changed?

Added missing versionedQueuesLock.RUnlock() in describe() before the early return when defaultQueue() returns nil.

Why?

The function acquires RLock at the top but does not use defer instead releases the lock manually at each exit point. One early return path was missing the RUnlock: when buildIds contains "" and the default queue is not yet initialized, the function returned errDefaultQueueNotInit without releasing the lock, leaving the mutex permanently locked and blocking any concurrent writer.

How did you test it?

  • built
  • added new unit test(s) - TestDescribeReleasesVersionedQueuesLockWhenDefaultQueueNotInitialized

Potential risks

No impact only solving a deadlock bug

When buildIds contains an empty string and defaultQueue() returns nil,
describe() returned errDefaultQueueNotInit without first releasing the
RLock acquired at the start of the function, leaving the mutex
permanently locked and blocking any concurrent writers.
@sanbricio sanbricio requested a review from a team as a code owner April 26, 2026 15:34
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 26, 2026

CLA assistant check
All committers have signed the CLA.

@sanbricio
Copy link
Copy Markdown
Contributor Author

sanbricio commented May 10, 2026

Friendly ping @temporalio/oss-matching @temporalio/oss-foundations this fixes a real deadlock
in describe() where an early-return path skips RUnlock. Small change,
covered by a new unit test. Could someone take a look when you have a
moment? 🙏

@sanbricio sanbricio changed the title fix: release versionedQueuesLock on early return in describe() fix(matching): leaked RLock in describe() blocks all writers indefinitely May 14, 2026
@dnr
Copy link
Copy Markdown
Contributor

dnr commented May 16, 2026

It's not a real deadlock, all code paths that get there wait until the default queue is initialized. The check is there just to turn a panic into an error in case there's a logic bug.

We should have the RUnlock for consistency though, but I don't think we need the test. If you remove the test from the PR I can merge it.

@sanbricio
Copy link
Copy Markdown
Contributor Author

It's not a real deadlock, all code paths that get there wait until the default queue is initialized. The check is there just to turn a panic into an error in case there's a logic bug.

We should have the RUnlock for consistency though, but I don't think we need the test. If you remove the test from the PR I can merge it.

@dnr Done! thanks for the feedback!

@dnr dnr enabled auto-merge (squash) May 18, 2026 18:40
@dnr dnr merged commit 69d2f05 into temporalio:main May 18, 2026
48 checks passed
@sanbricio sanbricio deleted the fix/release-rwmutex-lock-on-describe-early-return branch May 30, 2026 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants