-
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential race condition between evaluator and partition #1295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the changes, this seems to be correct and I do believe that the issue is real. I think one potential symptom is that a query_supervisor
would miss a response from the cleaned-up partition and stay in limbo eternally, which in turn would block the corresponding exporter from iterating through the remaining candidate partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing a changlog entry.
9190d36
to
8ecbd84
Compare
8ecbd84
to
49f46e2
Compare
@lava can you rebase onto master? |
The partition currently assumes strict ownership of its indexers, and terminates them when it shuts down. However, the evaluator did not store a partition handle but just the handles of the relevant indexers. This could lead to a race condition between an evaluator and a partition when an active partition finished persisting right when a new query was spawned, or when a query took long enough to see its associated passive partition evicted from the LRU cache.
49f46e2
to
60c135d
Compare
@dominiklohmann Sorry, didn't notice until after I pushed. Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No functional changes happened since I last reviewed this. CI should catch rebase mistakes.
Ignoring the mac CI fault since it is https://app.clubhouse.io/tenzir/story/20882/race-condition-in-node-queries-unit-test again. |
📔 Description
This came up while pairing on on unrelated PR with @tobim . I have no proof that the described race can actually happen in practice, but intuitively it makes sense to me that it would.
📝 Checklist