[WFLY-16279] Fixing intermittent test failure #15444

ropalka · 2022-04-20T11:28:37Z

https://issues.redhat.com/browse/WFLY-16279

bstansberry · 2022-04-20T12:20:34Z

undertow/src/test/java/org/wildfly/extension/undertow/ServerServiceTestCase.java

@@ -82,7 +82,7 @@ private Server load(String xmlFile, String serverName) throws Exception {

        assertNotNull(serverService);
        serverService.setMode(ServiceController.Mode.ACTIVE);
-        final Server server = serverService.getValue();
+        final Server server = serverService.awaitValue();


@ropalka This change makes sense given line 84 above, but I don't understand why L84 was needed. ServerAdd installs the service without calling ServiceBuilder.setInitialMode, so it should default to ACTIVE. And then the boot of the test ModelController should result in the normal wait for MSC stability when the boot op runs.

The observed regression might happen because of https://issues.redhat.com/browse/MSC-255 @bstansberry . I just little bit enhanced the code. What about line #84 it is of course useless.

I don't understand the MSC-255 scenario, i.e.

After step 6) user code might expect Service A is in stable state. But since StabilityMonitor doesn't block services in any way this might not be true.

Do you mean that something else might cause the service to change state? If so, that's understood, and I can see how in some scenarios that could be a problem. I don't really see it for the OperationContext tracking configuration write ops case though, at least not as something that shouldn't be very rare. We don't allow concurrent mgmt ops that alter MSC state.

We need to be sure that MSC provides something that gives us the current semantics that an OperationContext relies on when executing management operations.

That said, this change is fine, as awaitValue seems like a better choice. I think we should leave the JIRA open though until we see evidence it fixes the problem. I still don't see what logic ever ensured that the Host service was started, as nothing demands it. Unless I forgot something that depends on it and thus demands it.

If the StabilityMonitor is tracking only subset of transtivie closure of service dependencies graph then any service that isn't registered with StabilityMonitor and is part of that transitive closure can cause the state change (concurrently). So even if concurrent mgmt ops are forbidden iff mgmg op is not tracking all services (via StabilityMonitor) from transitive closure of dependencies graph there are no 'stability' guaranties of mgmt ops mentioned above @bstansberry .

[WFLY-16279] Fixing intermittent test failure

f1530a5

github-actions bot added the deps-ok Dependencies have been checked, and there are no significant changes label Apr 20, 2022

bstansberry reviewed Apr 20, 2022

View reviewed changes

bstansberry merged commit 67c1246 into wildfly:main Apr 22, 2022

ropalka deleted the WFLY-16279 branch April 23, 2022 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WFLY-16279] Fixing intermittent test failure #15444

[WFLY-16279] Fixing intermittent test failure #15444

ropalka commented Apr 20, 2022

bstansberry Apr 20, 2022

ropalka Apr 20, 2022 •

edited

bstansberry Apr 20, 2022 •

edited

ropalka Apr 22, 2022

[WFLY-16279] Fixing intermittent test failure #15444

[WFLY-16279] Fixing intermittent test failure #15444

Conversation

ropalka commented Apr 20, 2022

bstansberry Apr 20, 2022

Choose a reason for hiding this comment

ropalka Apr 20, 2022 • edited

Choose a reason for hiding this comment

bstansberry Apr 20, 2022 • edited

Choose a reason for hiding this comment

ropalka Apr 22, 2022

Choose a reason for hiding this comment

ropalka Apr 20, 2022 •

edited

bstansberry Apr 20, 2022 •

edited