[WFL-Operator-100] adding a way to report heuristic transaction state during scaledown #295

ochaloup · 2020-02-20T11:35:41Z

wildfly/wildfly-operator#100
https://issues.redhat.com/browse/EAP7-1394

mchoma · 2020-03-17T11:18:03Z

cloud/WFLY-Operator-100-report-txn-heuristic-state-better.adoc

+
+When transaction scale down process reviews the state of transactions in the pods it checks if there are no unfinished prepared transactions.
+If there are found some then it does not permit the pod being terminated and the pod is marked with the _state_ `SCALING_DOWN_RECOVERY_DIRTY`.
+It's marked as dirty until all transactions are safely recovered.


Is pod fully working in dirty state? Is load balancer forwarding traffic to such pod is it handling requests?
What happens when I scale up to origin number of pods again? I understand dirty state of pod is not changed.
Can I scale up to bigger number of pods?

I am thinking of usecase where Autoscaler is involved based on current cluster utilization. So pods get scaledown and scaleup randomly. How it will happen in this situation?

This summarizes how the current functionality is already working.

When in dirty state the pod is not capable to start a new transaction. But otherwise it's working. There is a issue to manage that in a better way here: https://issues.redhat.com/browse/EAP7-1355

No, load balancer is not forwarding the traffic to the pod

The pod should be moved to the state ACTIVE. The pod is not dirty anymore as it's normally operating and normally operating pod may process recovery.

Yes, you can scale up as you wish. The processing of scaledown does not touch it

All these situations should be part of the eap xpaas testsuite.

I assume the scaledown and scaleup processing in a faster pace should be working fine. Just it could happen that the pod won't be really never scaleddown as scaling down really takes time. That needs to be checked.

mchoma · 2020-03-17T11:24:46Z

cloud/WFLY-Operator-100-report-txn-heuristic-state-better.adoc

+The goal of this feature change is that the WildFly Operator needs to report better the transactions in heuristic to user/administrator.
+
+* a new _pod state_ should be created. This _state_ would mean that there are some transactions in heuristic state which needs a manual recovery (because otherwise the pod is never terminated)
+* when the pod is marked with this heuristic _state_ an event about this should be emitted (event should be bound to the pod and to the operator object as well). The event announces that there is a pod in scaledown process which requires a manual assistance.


Thinking how to notify user something is going on. For me as administrator it would be nice to have OpenShift Alert fired. Don't know how feasible that is though.

With new state and event, administrator must get notify something happens which needs its attention. In case of manual scaledown this can be included in his operations steps. But in case of Autoscaler he must be notified somehow.

I don't know about some other Kubernetes primitives that could be used for announcing that something happens. I know only about events thus they are expected to be used here.
Do you have some more ideas how to announce the user?

I meant to provide alert rule. But as I am reading [1] Currently you cannot add custom alerting rules So it is up to administrator to add pod state to monitoring which they are using.

[1] https://docs.openshift.com/container-platform/4.3/monitoring/cluster_monitoring/configuring-the-monitoring-stack.html#applying-custom-alertmanager-configuration_configuring-monitoring

mchoma · 2020-03-17T11:27:00Z

cloud/WFLY-Operator-100-report-txn-heuristic-state-better.adoc

+== Test Plan
+
+The test means to automatize the checks for newly created state and counter and to check if the flag is working.
+The EAP QE testsuite for OpenShift should be used for this: https://gitlab.mw.lab.eng.bos.redhat.com/jbossqe-eap/openshift-eap-tests


Could you remove internal URL, please?

… during scaledown

mchoma reviewed Mar 17, 2020

View reviewed changes

[WFL-Operator-100] adding a way to report heuristic transaction state…

6d7943e

… during scaledown

ochaloup force-pushed the WFLY-OPERATOR-100-report-better-heuristic-txn-state branch from f0b421a to 6d7943e Compare March 19, 2020 17:50

mchoma approved these changes Mar 24, 2020

View reviewed changes

mmusgrov mentioned this pull request Sep 18, 2023

[WFLY-17742] Support Transactions during WildFly’s graceful shutdown #520

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WFL-Operator-100] adding a way to report heuristic transaction state during scaledown #295

[WFL-Operator-100] adding a way to report heuristic transaction state during scaledown #295

ochaloup commented Feb 20, 2020

mchoma Mar 17, 2020

ochaloup Mar 19, 2020

mchoma Mar 17, 2020

ochaloup Mar 19, 2020

mchoma Mar 24, 2020

mchoma Mar 17, 2020

[WFL-Operator-100] adding a way to report heuristic transaction state during scaledown #295

Are you sure you want to change the base?

[WFL-Operator-100] adding a way to report heuristic transaction state during scaledown #295

Conversation

ochaloup commented Feb 20, 2020

mchoma Mar 17, 2020

Choose a reason for hiding this comment

ochaloup Mar 19, 2020

Choose a reason for hiding this comment

mchoma Mar 17, 2020

Choose a reason for hiding this comment

ochaloup Mar 19, 2020

Choose a reason for hiding this comment

mchoma Mar 24, 2020

Choose a reason for hiding this comment

mchoma Mar 17, 2020

Choose a reason for hiding this comment