Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix assertition failure in component shutdown handler #1473

Merged
merged 10 commits into from Mar 20, 2021

Conversation

mavam
Copy link
Member

@mavam mavam commented Mar 18, 2021

馃摂 Description

This PR attempts to fix an assertition failure in the DOWN handler in the node. The invariant should be that components in the registry are always monitored.

馃摑 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on docs.tenzir.com/vast, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

馃幆 Review Instructions

Commit-by-commit.

@mavam mavam added the bug Incorrect behavior label Mar 19, 2021
@mavam mavam marked this pull request as draft March 19, 2021 08:18
@mavam mavam marked this pull request as ready for review March 19, 2021 12:59
@mavam mavam force-pushed the story/ch23390 branch 2 times, most recently from 47dd15a to 072a4bd Compare March 19, 2021 13:28
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks alright to me. I have a few minor requests. Let's hope this fixes the actual issue.

libvast/src/system/node.cpp Outdated Show resolved Hide resolved
libvast/src/system/node.cpp Show resolved Hide resolved
libvast/vast/system/component_registry.hpp Outdated Show resolved Hide resolved
@mavam
Copy link
Member Author

mavam commented Mar 19, 2021

On my local machine, this works as well:

--- a/libvast/src/system/node.cpp
+++ b/libvast/src/system/node.cpp
@@ -611,16 +611,7 @@ node(node_actor::stateful_pointer<node_state> self, std::string name, path dir,
     for (const auto& label : remaining)
       schedule_teardown(label);
     // Finally, bring down the filesystem.
-    // FIXME: there's a super-annoying bug that makes it impossible to receive a
-    // DOWN message from the filesystem during shutdown, but *only* when the
-    // filesystem is detached! This might be related to a bug we experienced
-    // earlier: https://github.com/actor-framework/actor-framework/issues/1110.
-    // Until it gets fixed, we cannot add the filesystem to the set of
-    // sequentially terminated actors but instead let it implicitly terminate
-    // after the node exits when the filesystem ref count goes to 0. (A
-    // shutdown after the node won't be an issue because the filesystem is
-    // currently stateless, but this needs to be reconsidered when it changes.)
-    // components.push_back(std::move(*filesystem));
+    components.push_back(std::move(*filesystem));
     auto shutdown_kill_timeout = shutdown_grace_period / 5;
     shutdown<policy::sequential>(self, std::move(components),
                                  shutdown_grace_period, shutdown_kill_timeout);

Let's see if CI is happy. I might add this as separate commit afterwards.

@dominiklohmann
Copy link
Member

This also still needs a changelog entry

@dominiklohmann
Copy link
Member

dominiklohmann commented Mar 19, 2021

I might add this as separate commit afterwards.

I think this should go into a separate follow-up PR. I'd rather not merge that change in the freeze period. Resolved in a call.

@dominiklohmann
Copy link
Member

This just needs the changelog entry, then it's ready to go imo.

mavam added 10 commits March 19, 2021 20:22
Killed components were never removed from the registry because
demonitoring means their DOWN handler doesn't get invoked. We remove the
call to demonitor in order to make sure the cleanup code in the DOWN
handler gets executed.
This commit introduces utility functions to register/deregister
components in the common case to prevent unmatched monitor/demonitor
calls.
@mavam mavam enabled auto-merge March 19, 2021 20:05
@mavam mavam merged commit 6da87b8 into master Mar 20, 2021
@mavam mavam deleted the story/ch23390 branch March 20, 2021 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior
Projects
None yet
2 participants