Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WFLY-16279] Fixing intermittent test failure #15444

Merged
merged 1 commit into from Apr 22, 2022

Conversation

ropalka
Copy link
Contributor

@ropalka ropalka commented Apr 20, 2022

@github-actions github-actions bot added the deps-ok Dependencies have been checked, and there are no significant changes label Apr 20, 2022
@@ -82,7 +82,7 @@ private Server load(String xmlFile, String serverName) throws Exception {

assertNotNull(serverService);
serverService.setMode(ServiceController.Mode.ACTIVE);
final Server server = serverService.getValue();
final Server server = serverService.awaitValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ropalka This change makes sense given line 84 above, but I don't understand why L84 was needed. ServerAdd installs the service without calling ServiceBuilder.setInitialMode, so it should default to ACTIVE. And then the boot of the test ModelController should result in the normal wait for MSC stability when the boot op runs.

Copy link
Contributor Author

@ropalka ropalka Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The observed regression might happen because of https://issues.redhat.com/browse/MSC-255 @bstansberry . I just little bit enhanced the code. What about line #84 it is of course useless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the MSC-255 scenario, i.e.

After step 6) user code might expect Service A is in stable state.

But since StabilityMonitor doesn't block services in any way this might not be true.

Do you mean that something else might cause the service to change state? If so, that's understood, and I can see how in some scenarios that could be a problem. I don't really see it for the OperationContext tracking configuration write ops case though, at least not as something that shouldn't be very rare. We don't allow concurrent mgmt ops that alter MSC state.

We need to be sure that MSC provides something that gives us the current semantics that an OperationContext relies on when executing management operations.

That said, this change is fine, as awaitValue seems like a better choice. I think we should leave the JIRA open though until we see evidence it fixes the problem. I still don't see what logic ever ensured that the Host service was started, as nothing demands it. Unless I forgot something that depends on it and thus demands it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the StabilityMonitor is tracking only subset of transtivie closure of service dependencies graph then any service that isn't registered with StabilityMonitor and is part of that transitive closure can cause the state change (concurrently). So even if concurrent mgmt ops are forbidden iff mgmg op is not tracking all services (via StabilityMonitor) from transitive closure of dependencies graph there are no 'stability' guaranties of mgmt ops mentioned above @bstansberry .

@bstansberry bstansberry merged commit 67c1246 into wildfly:main Apr 22, 2022
@ropalka ropalka deleted the WFLY-16279 branch April 23, 2022 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deps-ok Dependencies have been checked, and there are no significant changes
Projects
None yet
2 participants