Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#163] shutdown thread may point to a wrong process #164

Closed
wants to merge 1 commit into from

Conversation

ochaloup
Copy link
Contributor

fixes #163

@ochaloup
Copy link
Contributor Author

ochaloup commented Jun 8, 2020

hi @jamezp , do you think this fix could be considered for merging?

@jamezp
Copy link
Member

jamezp commented Jun 9, 2020

Sorry @ochaloup. I've been meaning to look at this and haven't had a change. Have you seen this issue happen or do you have a way to reproduce it?

To my knowledge these were not meant to be thread safe, but if they need to be we need to fully analyze as I can see some other issues as well.

@ochaloup
Copy link
Contributor Author

@jamezp I see. I focused only on this one issue as I was struggling with it. I haven't deep dived to other code parts to help with the analysis.

The whole context is that I now realized that transaction related testsuites need to workaround this deficit of Arquillian for the time being. These testsuites often use Byteman to kill the container in the background. Then later we have to ask Arquillian to start the killed container again to run transaction recovery and to check if transactions are recovered.

E.g. JBoss EAP transaction recovery testsuite workarouned it and the Narayana testsuite as well. There is created an extension which changes the kill functionality (e.g. https://github.com/jbosstm/narayana/blob/5.10.5.Final/XTS/localjunit/crash-recovery-tests/src/main/java/com/arjuna/qa/extension/BaseServerKillProcessor.java#L41). The WFLY Arquillian handler then does not run its own kill/stop at all.

With issue JBEAP-19408/WFLY-13516 I needed to add a new test with the similar behaviour to WildFly testsuite - app conainer is JVM crashed at particular time and then restarted and transaction recovery should fix the transaction state of the app after restart.
I was stuck with the instability of the test (at that time I haven't realized that Narayana testsuite already handles the kill with an extension) and I digged down and I find the reason is the Arquillian's behaviour and I created this issue (aka. there is ticking time which when timeout elapses kills "any available" container).

For now this issue could be verified with the WFLY testsuite where I added one test: https://github.com/wildfly/wildfly/blob/20.0.0.Final/testsuite/integration/manualmode/src/test/java/org/jboss/as/test/manualmode/ejb/client/outbound/connection/transaction/preparehalt/TransactionPropagationPrepareHaltTestCase.java#L180
There the JVM is crashed during the test (on call bean.twoPhaseCommitCrashAtClient(SERVER_DEPLOYMENT)), then there is invoked ContainerController.kill to provide to the Arquillian information that the container "is killed" and then the container may be started again and recovery could be checked.
When there is added one more test and the container is started at the beginning of the second test and then it could be checked that's killed when timeout elapses.

As I need to add more tests (https://issues.redhat.com/browse/WFLY-13578) to this particular class I was concerned about it.
Then I found I may use the similar workaround as used in Narayana testsuite. I'm in progress of creating a new testcases and for them I will create a PR for the manualmode testsuite with the kill behaviour like this
ochaloup/wildfly@776c467#diff-f02c57ff9ca074ca651be68495527ba1R31

I was just thinking that such extension would not be needed if the Arquillian would be handling the container kill with the thread safely.
I you consider the current wfly arq behaviour is right then currently I'm fine as the workaround works for me and this PR could be then closed.

@jamezp
Copy link
Member

jamezp commented Jun 16, 2020

@ochaloup I think I understand what you're trying to do. Your concern seems to be that during stop a start may be invoked while a stop is already executing which could be a race. I'm not sure if base Arquillian containers are meant to be thread-safe or not. If they are we need to make it thread-safe we may be able to, but I'll need to review all the code.

Is this something you'd like me to look at or have you found a workaround?

@ochaloup
Copy link
Contributor Author

@jamezp my idea was to provide the fix which is here in the PR. The point is to just not sharing the same variable as the 'process' holder. That would work for the later use and make me a chance for not using any workaround. If the change in this PR is not a) right, b) appropriate as a single change in area of complex issue or c) may cause some backward compatibility issues, then just let's close this PR and probably the issue #163 as well.
I hope I have a workaround.

@jamezp
Copy link
Member

jamezp commented Jun 18, 2020

@ochaloup My concern is it seems like a workaround for a use case that shouldn't be used. What I mean is start() and stop() should only ever be invoked by the same thread.

I guess my concern is that if stop is completed the thread should be interrupted and stopped. So start must be getting called while the stop method is still executing.

@ochaloup
Copy link
Contributor Author

@jamezp but the start() and stop() is invoked by the same thread :)
The issue is that the stop() operation launches the separate thread (see https://github.com/wildfly/wildfly-arquillian/blob/2.2.0.Final/container-managed/src/main/java/org/jboss/as/arquillian/container/managed/ManagedDeployableContainer.java#L259). The Arquillian is the executor of the thread, not the test.

The stop() operation is invoked by test (on the same thread), but the stop() operation fails to interrupt it (https://github.com/wildfly/wildfly-arquillian/blob/2.2.0.Final/container-managed/src/main/java/org/jboss/as/arquillian/container/managed/ManagedDeployableContainer.java#L290). The thread started by the stop() operation is still in execution. Then the start() is executed (on the same thread). The shutdown thread started by stop() kills the started container.

To be honest, I can't find how attaching the process holder (https://github.com/wildfly/wildfly-arquillian/blob/2.2.0.Final/container-managed/src/main/java/org/jboss/as/arquillian/container/managed/ManagedDeployableContainer.java#L258) to the started shutdown thread for not being changed arbitrarily in the background, could be dangerous.

@jamezp
Copy link
Member

jamezp commented Jun 18, 2020

Got it. Thanks for the detailed answer @ochaloup. I was missing the part where the thread was not interrupted :) TBH I kind of wonder if we should just remove the thread all together and just use something like:

if (process.waitFor(getContainerConfiguration().getStopTimeoutInSeconds(), TimeUnit.SECONDS)) {
    process.destroyForcibly();
}

@ochaloup
Copy link
Contributor Author

@jamezp I see, that should fine (at least from the usecase perspective I have where I want to start the second container as soon as possible, as the process.waitFor exits immediately). Just the code block will need to be adjusted cosiderably. Should I propose a new PR or should I just close this PR?

@jamezp
Copy link
Member

jamezp commented Jun 19, 2020

@ochaloup I'm happy to throw up a PR if you'd like or feel free to update this or close it and open a new one. If you'd like me to do it I'll at all the places where this pattern may be used.

@ochaloup
Copy link
Contributor Author

ochaloup commented Jun 22, 2020

@jamezp ok. I'm fine to close this and leave with you ;-) I would not be able to cover all the places where the fix is needed (at least for sure not for the first time).
I'm closing this and in case let me know if I can help with #163 (e.g. testing or something) if there is a fix.

@ochaloup ochaloup closed this Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants