Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential race condition between evaluator and partition #1295

Merged
merged 2 commits into from
Jan 25, 2021

Conversation

lava
Copy link
Member

@lava lava commented Jan 20, 2021

📔 Description

This came up while pairing on on unrelated PR with @tobim . I have no proof that the described race can actually happen in practice, but intuitively it makes sense to me that it would.

📝 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on docs.tenzir.com/vast, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

tobim
tobim previously approved these changes Jan 21, 2021
Copy link
Member

@tobim tobim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the changes, this seems to be correct and I do believe that the issue is real. I think one potential symptom is that a query_supervisor would miss a response from the cleaned-up partition and stay in limbo eternally, which in turn would block the corresponding exporter from iterating through the remaining candidate partitions.

@tobim tobim dismissed their stale review January 21, 2021 08:16

Missing changelog entry

Copy link
Member

@tobim tobim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing a changlog entry.

@lava lava requested a review from tobim January 21, 2021 11:14
@lava lava force-pushed the topic/partition-stream-shutdown branch from 9190d36 to 8ecbd84 Compare January 22, 2021 11:47
CHANGELOG.md Outdated Show resolved Hide resolved
@dominiklohmann dominiklohmann added the bug Incorrect behavior label Jan 23, 2021
@lava lava force-pushed the topic/partition-stream-shutdown branch from 8ecbd84 to 49f46e2 Compare January 25, 2021 09:01
@dominiklohmann
Copy link
Member

@lava can you rebase onto master?

The partition currently assumes strict ownership of its
indexers, and terminates them when it shuts down.

However, the evaluator did not store a partition handle but
just the handles of the relevant indexers.

This could lead to a race condition between an evaluator and a
partition when an active partition finished persisting right when
a new query was spawned, or when a query took long enough to see
its associated passive partition evicted from the LRU cache.
@lava lava force-pushed the topic/partition-stream-shutdown branch from 49f46e2 to 60c135d Compare January 25, 2021 09:09
@lava
Copy link
Member Author

lava commented Jan 25, 2021

@dominiklohmann Sorry, didn't notice until after I pushed. Done.

Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional changes happened since I last reviewed this. CI should catch rebase mistakes.

@lava
Copy link
Member Author

lava commented Jan 25, 2021

Ignoring the mac CI fault since it is https://app.clubhouse.io/tenzir/story/20882/race-condition-in-node-queries-unit-test again.

@lava lava merged commit 3fcb4dc into master Jan 25, 2021
@lava lava deleted the topic/partition-stream-shutdown branch January 25, 2021 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants