Skip to content

Stop accepting new queries after initiating shutdown #2215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 13, 2022

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented Apr 13, 2022

Before this change, the index continued accepting queries after initiating shutdown, which could lead to an infinite hang because queries were coming in faster than the index was able to process them.

📝 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on docs.tenzir.com/vast, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

🎯 Review Instructions

Try locally:

T1> vast [--use-legacy-query-scheduler] start
T2> while true; do { sleep 0.5; vast export null & }; done
T3> vast stop

Before this change, the index continued accepting queries after initiating
shutdown, which could lead to an infinite hang because queries were coming in
faster than the index was able to process them.
@dominiklohmann dominiklohmann added the bug Incorrect behavior label Apr 13, 2022
@dominiklohmann dominiklohmann requested a review from tobim April 13, 2022 08:30
Copy link
Member

@tobim tobim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a live demo that the fix works.

The sink command must link itself to the exporter until it knows that the
exporter monitors the sink command's scoped actor in order to avoid a dead
window where ungraceful exits of the sink command leave dangling exporter actors
in the VAST server.
@dominiklohmann dominiklohmann requested a review from tobim April 13, 2022 09:42
@dominiklohmann
Copy link
Member Author

@tobim I've re-requested your review after the last two commits.

Copy link
Member

@tobim tobim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we accept this additional complexity, can we test whether bidirectional linkage between all 3 actors would work? I mean the sink command calls self->link_to(snk) and self->link_to(exporter), and the exporter links itself to the sink? The rest of the lifetime management in the sink command can probably ripped out.

@dominiklohmann
Copy link
Member Author

Before we accept this additional complexity, can we test whether bidirectional linkage between all 3 actors would work?

It does work, but we lose the ability to give a good exit code and error message if there's an unexpected shutdown. So I think listening for down messages is still preferred.

Copy link
Member

@tobim tobim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that this resolves a timing issue with early shutdowns or crashes of the sink process. We should evaluate a solution based on the remote spawn functionality in CAF 0.18 once we are able to upgrade.

@dominiklohmann dominiklohmann merged commit 0d30678 into master Apr 13, 2022
@dominiklohmann dominiklohmann deleted the story/sc-32453/new-queries-after-stop branch April 13, 2022 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants