Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WFL-Operator-100] adding a way to report heuristic transaction state during scaledown #295

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ochaloup
Copy link
Contributor


When transaction scale down process reviews the state of transactions in the pods it checks if there are no unfinished prepared transactions.
If there are found some then it does not permit the pod being terminated and the pod is marked with the _state_ `SCALING_DOWN_RECOVERY_DIRTY`.
It's marked as dirty until all transactions are safely recovered.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is pod fully working in dirty state? Is load balancer forwarding traffic to such pod is it handling requests?
What happens when I scale up to origin number of pods again? I understand dirty state of pod is not changed.
Can I scale up to bigger number of pods?

I am thinking of usecase where Autoscaler is involved based on current cluster utilization. So pods get scaledown and scaleup randomly. How it will happen in this situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This summarizes how the current functionality is already working.

  • When in dirty state the pod is not capable to start a new transaction. But otherwise it's working. There is a issue to manage that in a better way here: https://issues.redhat.com/browse/EAP7-1355
  • No, load balancer is not forwarding the traffic to the pod
  • The pod should be moved to the state ACTIVE. The pod is not dirty anymore as it's normally operating and normally operating pod may process recovery.
  • Yes, you can scale up as you wish. The processing of scaledown does not touch it

All these situations should be part of the eap xpaas testsuite.

I assume the scaledown and scaleup processing in a faster pace should be working fine. Just it could happen that the pod won't be really never scaleddown as scaling down really takes time. That needs to be checked.

The goal of this feature change is that the WildFly Operator needs to report better the transactions in heuristic to user/administrator.

* a new _pod state_ should be created. This _state_ would mean that there are some transactions in heuristic state which needs a manual recovery (because otherwise the pod is never terminated)
* when the pod is marked with this heuristic _state_ an event about this should be emitted (event should be bound to the pod and to the operator object as well). The event announces that there is a pod in scaledown process which requires a manual assistance.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking how to notify user something is going on. For me as administrator it would be nice to have OpenShift Alert fired. Don't know how feasible that is though.

With new state and event, administrator must get notify something happens which needs its attention. In case of manual scaledown this can be included in his operations steps. But in case of Autoscaler he must be notified somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about some other Kubernetes primitives that could be used for announcing that something happens. I know only about events thus they are expected to be used here.
Do you have some more ideas how to announce the user?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to provide alert rule. But as I am reading [1] Currently you cannot add custom alerting rules So it is up to administrator to add pod state to monitoring which they are using.

[1] https://docs.openshift.com/container-platform/4.3/monitoring/cluster_monitoring/configuring-the-monitoring-stack.html#applying-custom-alertmanager-configuration_configuring-monitoring

== Test Plan

The test means to automatize the checks for newly created state and counter and to check if the flag is working.
The EAP QE testsuite for OpenShift should be used for this: https://gitlab.mw.lab.eng.bos.redhat.com/jbossqe-eap/openshift-eap-tests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove internal URL, please?

@ochaloup ochaloup force-pushed the WFLY-OPERATOR-100-report-better-heuristic-txn-state branch from f0b421a to 6d7943e Compare March 19, 2020 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants