Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple node shutdown fixes #1563

Merged
merged 5 commits into from Apr 21, 2021
Merged

Multiple node shutdown fixes #1563

merged 5 commits into from Apr 21, 2021

Conversation

tobim
Copy link
Member

@tobim tobim commented Apr 20, 2021

馃摂 Description

  • Don't bail when receiving a down message during teardown
  • We've been terminating non-essential components after the main
    ones, but now we do it before.
  • We don't try to terminate remote actors any more.
  • The shutdown sequence used to treat labels as types, which means
    the actor of the label wouldn't be terminated at all and keeping
    the process alive.

馃摑 Checklist

  • All user-facing changes have changelog entries.
  • The PR description contains instructions for the reviewer, if necessary.

馃幆 Review Instructions

By commit.

@tobim tobim added the bug Incorrect behavior label Apr 20, 2021
@tobim tobim requested a review from a team April 20, 2021 21:15
@dominiklohmann dominiklohmann self-assigned this Apr 21, 2021
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looking good. Ran this locally and it reduced the shutdown time with plugins a lot. Your reasoning why this fixes the rare assertion failure in CI seems sound to me.

This PR needs two changelog entries:

  • bug-fixes for the rare assertion failure fix
  • features for the parallel shutdown of auxiliary actors

libvast/src/system/node.cpp Outdated Show resolved Hide resolved
libvast/src/system/node.cpp Outdated Show resolved Hide resolved
libvast/src/system/node.cpp Outdated Show resolved Hide resolved
- We've been terminating non-essential components after the main
  ones, but now we do it before.
- We don't try to terminate remote actors any more.
- The shutdown sequence used to treat labels as types, which means
  the actor of the label wouldn't be terminated at all and keeping
  the process alive.
@tobim tobim force-pushed the story/ch24577/shutdown-fixes branch from 6eae71a to 597385f Compare April 21, 2021 15:40
@tobim tobim force-pushed the story/ch24577/shutdown-fixes branch from 59cbb94 to df7e4c3 Compare April 21, 2021 15:49
@tobim
Copy link
Member Author

tobim commented Apr 21, 2021

This PR needs two changelog entries:

* `bug-fixes` for the rare assertion failure fix

-> done.

* `features` for the parallel shutdown of auxiliary actors

I doubt this will really be noticeable under normal circumstances. That change was mostly done to speed things up in case on of the components is not responsive.

The shutdown logic contained a bug that would make the node fail to terminate
in case a plugin actor is registered at said node.

A race condition in the shutdown logic that caused an assertion was fixed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every paragraph is a changelog entry of its own, I doubt this is what you wanted to achieve here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are 2 separate Bugs that are both fixed in this PR.

@tobim tobim merged commit 74958da into master Apr 21, 2021
@tobim tobim deleted the story/ch24577/shutdown-fixes branch April 21, 2021 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior
Projects
None yet
2 participants