Add basic retry around switchover #1510
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Operator should retry switchovers before giving up on moving master pods from non-ready nodes.
That is currently not the case: operator attempts to move the master pods once and then leaves them as is, thereby potentially blocking k8s cluster-wide processes such as node rotation. With retries we avoid some of the blocking, namely the cases where a replica was moved shortly before the master and is not ready at the time of the first switchover attempt of the operator.
To test, build and start operator and one PG cluster in
kind
as normal. Then:nofailover
tagThe operator will log unsuccessful attempts to do a switchover with 1 minutes intervals for 5 minutes.