Skip to content

kvserver: do not cancel while in HandleRaftResponse #148965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 27, 2025

Conversation

pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented Jun 26, 2025

If the context is canceled in the middle of HandleRaftResponse (after processQueue returns), a Replica can be left in a broken state, e.g. its destruction can be partially completed. This can brick a RangeID, e.g. cause a subsequent call to getOrCreateReplica to be stuck in an infinite loop.

This commit allows the HandleRaftResponse loop to terminate gracefully, by using a higher-level context which is not cancelable.

Fixes #140958, #145030

@pav-kv pav-kv requested a review from tbg June 26, 2025 21:07
@pav-kv pav-kv requested a review from a team as a code owner June 26, 2025 21:07
Copy link

blathers-crl bot commented Jun 26, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv pav-kv changed the title kvserver: move context shenanigans to processQueue kvserver: do not cancel while in HandleRaftResponse Jun 26, 2025
@pav-kv pav-kv added the backport-all Flags PRs that need to be backported to all supported release branches label Jun 26, 2025
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for looking into this.

@pav-kv pav-kv force-pushed the raft-transport-fix-context-cancel branch from db8b50c to b965278 Compare June 27, 2025 13:36
@pav-kv
Copy link
Collaborator Author

pav-kv commented Jun 27, 2025

TFTR!

bors r=tbg

@pav-kv
Copy link
Collaborator Author

pav-kv commented Jun 27, 2025

bors cancel

@craig
Copy link
Contributor

craig bot commented Jun 27, 2025

Canceled.

If the context is canceled in the middle of HandleRaftResponse, a
Replica can be left in a broken state, e.g. its destruction can be
partially completed. This can brick a RangeID, e.g. cause a subsequent
call to getOrCreateReplica to be stuck in an infinite loop.

Epic: none
Release note: none
@pav-kv pav-kv force-pushed the raft-transport-fix-context-cancel branch from b965278 to 1f63750 Compare June 27, 2025 13:47
@pav-kv
Copy link
Collaborator Author

pav-kv commented Jun 27, 2025

bors r=tbg

@craig
Copy link
Contributor

craig bot commented Jun 27, 2025

🕐 Waiting for PR status (GitHub check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

@craig
Copy link
Contributor

craig bot commented Jun 27, 2025

@craig craig bot merged commit 301cc3e into cockroachdb:master Jun 27, 2025
21 of 22 checks passed
Copy link

blathers-crl bot commented Jun 27, 2025

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details.

error creating merge commit from 4273ead to blathers/backport-release-23.2-148965: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch release-23.2 failed. See errors above.


💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details.

error creating merge commit from 4273ead to blathers/backport-release-24.1-148965: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch release-24.1 failed. See errors above.


💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details.

error creating merge commit from 4273ead to blathers/backport-release-24.3-148965: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch release-24.3 failed. See errors above.


💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details.

error creating merge commit from 4273ead to blathers/backport-release-25.1-148965: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch release-25.1 failed. See errors above.


💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details.

error creating merge commit from 4273ead to blathers/backport-release-25.2-148965: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch release-25.2 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Jul 1, 2025

Backporting this fix to 25.x versions. It could come up in earlier versions - if it does, we'll reconsider backporting further back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-all Flags PRs that need to be backported to all supported release branches backport-failed target-release-25.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv/kvserver: TestBackpressureNotAppliedWhenReducingRangeSize failed
3 participants