Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WFLY-12610] adding properties to setup transaction recovery and operation to run recovery #14773

Conversation

ochaloup
Copy link
Contributor

@ochaloup ochaloup commented Oct 7, 2021

https://issues.redhat.com/browse/WFLY-12610

The point of this PR is giving a way to execute Narayana recovery processing from jboss cli without a need to open socket + to make recovery configuration changes without a need to restart JVM + do not suspend the recovery manager when WildFly is suspended (e.g. WFLY suspend happens on call of the graceful shutdown).

@ochaloup
Copy link
Contributor Author

ochaloup commented Oct 7, 2021

This PR should be a working concept but I would like to hear from you - @mmusgrov @jmfinelli @mnovak1 @istraka what do you think.

With these changes the idea of the graceful shutdown recovery speedup processing would be like
./bin/jboss-cli.sh -c --commands="/subsystem=transactions:write-attribute(name=stop-recovery-when-suspended, value=false),/subsystem=transactions:write-attribute(name=recovery-period, value=1),/subsystem=transactions:write-attribute(name=orphan-safety-interval, value=1),:shutdown(timeout=300)"

@github-actions github-actions bot added the deps-ok Dependencies have been checked, and there are no significant changes label Oct 7, 2021
@ochaloup
Copy link
Contributor Author

ochaloup commented Oct 8, 2021

I need to fix the two failing tests but the concept of the change should be working.

@kabir kabir added Feature PR provides a new feature missing-reqs Features missing one or more of the pre-merge requirements labels Oct 8, 2021
Copy link
Contributor

@mmusgrov mmusgrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality looks very good indeed.

I had a number of minor non-functional comments/queries (which, apart from my comment about TransactionSubsystem61Parser.java, you may ignore) so I am approving the PR.

private static final String ORPHAN_SAFETY_INTERVAL_ATTRIBUTE_NAME = "orphan-safety-interval";
private static final String RECOVERY_PERIOD_ATTRIBUTE_NAME = "recovery-period";
private static final String RECOVERY_BACKOFF_PERIOD_ATTRIBUTE_NAME = "recovery-backoff-period";
private static final String STOP_RECOVERY_WHEN_SUSPENDED_ATTRIBUTE_NAME = "stop-recovery-when-suspended";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be stop-recovery-when-suspending or disable-recovery-before-suspend etc because there cannot be any processing when the server is suspended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, you're right while my initial idea was letting the server being suspended while leaving the chance for recovery manager to finish the transactions. That way the whole server would be "sleeping" and recovery manager could finish anything necessary. Maybe it's not what's needed.

With this comment I realized you are right that it should be this way at least for graceful shutdown. I wrote the :shutdown(timeout=300) will be finishing with recovery but it's not true. I need to implement something like what EJB suspend controller does
https://github.com/wildfly/wildfly/blob/main/ejb3/src/main/java/org/jboss/as/ejb3/suspend/EJBSuspendHandlerService.java#L209
That's a missing piece and it's the functionality where attribute of the name stop-recovery-when-suspending makes sense. Maybe the attribute should be rather named like `suspend-with-no-indoubt-transactions" or like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave this conversation unresolved until you know what you want to do then.

// of the XAResource than the one which is used during 2PC processing
private static final List<Xid> preparedXids = new ArrayList<>();
// this is an global in-VM storage as the recovery processing (XAResourceRecoveryHelper) creates a brand new XAResource
// and this recovery purpose resource needs to access the list of the prepared Xids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: I don't understand the comment but I do have access to the source code so I guess I can read that and ignore the comment.

transactions.recovery-period=Number of seconds to wait before launching next periodic recovery scan.
transactions.recovery-backoff-period=Number of seconds between the first (top-down) and the second (bottom-up) part of the periodic recovery scan.
transactions.orphan-safety-interval=Number of milliseconds which defines safety interval timeout during any participant is not considered as orphan during recovery processing. Orphaned participants are considered for rollback.
transactions.stop-recovery-when-suspended=When true the periodic recovery is stopped when the application server is suspended. When true the periodic recovery is *not* stopped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can periodic recovery continue when the application server is suspended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the server is suspended but recovery may be processing. The incoming connections are not permitted but the outgoing should be possible.
I need to reconsider this as per the first comment. Probably the serve should shutdown the recovery when suspended and just there should be a chance to not permit suspending when there are in-doubt transactions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I will leave you to decide what the best approach is.

~
-->

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff between this and the previous version (wildfly-txn_6_0.xsd) is too large for me to review it completely (personally I'd have taken copy and just added the new definitions). But I did check that it contained definitions for
recovery-period, recovery-backoff-period and stop-recovery-when-suspended which looked okay apart from the confusing name for stop-recovery-when-suspended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed above. Change probably needed.

@ochaloup ochaloup force-pushed the WFLY-12610-wfly-recover-operation-for-narayana branch 2 times, most recently from ffa52a6 to 2b29502 Compare October 12, 2021 22:10
@ochaloup ochaloup force-pushed the WFLY-12610-wfly-recover-operation-for-narayana branch from 2b29502 to 2ccf03d Compare October 20, 2021 15:47
@ochaloup
Copy link
Contributor Author

I updated the code based of the points in the comment. Still missing to change the suspend handler to run recovery instead of not stopping it.

@mmusgrov
Copy link
Contributor

I am still okay to merge this PR.

@ochaloup ochaloup force-pushed the WFLY-12610-wfly-recover-operation-for-narayana branch from 2ccf03d to 1ace3b5 Compare October 21, 2021 13:05
@bstansberry
Copy link
Contributor

Superseded by #15382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deps-ok Dependencies have been checked, and there are no significant changes Feature PR provides a new feature missing-reqs Features missing one or more of the pre-merge requirements
Projects
None yet
4 participants