-
Notifications
You must be signed in to change notification settings - Fork 950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No switchover candidate found #1992
Comments
Are both instances healthy? "no switchover candidate found" sounds like no healthy replica has been found. |
I’m not fully sure if the second one is successfully restarted (and healthy) at that moment, but it’s definitely healthy after short time. If I do manual switchover using patronictl, everything is working well. I haven’t manage to do successful switchover while updating, it fails every time. |
I am also running into this issue, but I'm not sure it's exactly like OP. Like @dobrac I can exec into the pod and force a switchover and all is well, but the Operator fails to do the switchover when pods need evicting:
The cluster status from the URI mentioned is: {
"members": [
{
"name": "gitlab-postgres-0",
"role": "replica",
"state": "streaming",
"api_url": "http://10.44.22.248:8008/patroni",
"host": "10.44.22.248",
"port": 5432,
"timeline": 63,
"lag": 0
},
{
"name": "gitlab-postgres-1",
"role": "leader",
"state": "running",
"api_url": "http://10.44.95.149:8008/patroni",
"host": "10.44.95.149",
"port": 5432,
"timeline": 63
},
{
"name": "gitlab-postgres-2",
"role": "replica",
"state": "streaming",
"api_url": "http://10.44.98.184:8008/patroni",
"host": "10.44.98.184",
"port": 5432,
"timeline": 63,
"lag": 0
}
]
} This is with Postgres Operator 1.10.1 and It looks like the |
Thank you @bootc for picking this up and providing a solution 👍 It seems it might trace down to a change introduced here patroni/patroni@d46ca88. We'll aim to merge the fix next week. |
Thank you @bootc for your contribution! |
We still see this problem with operator version v1.11.0. The "ghcr.io/zalando/spilo-15:3.2-p1" image, where the patrocnictl list output has "streaming" instead of "running" in the state. Going back to "ghcr.io/zalando/spilo-15:3.0-p1" image sovles the problem Edit: Nevermind, we updated the operator chart but had set the old operator image explicitly. |
Please, answer some short questions which should help us to understand your problem / question better?
Helllo,
we are encountering an error while cluster updating (any change in the configuration that needs pods restart), where the master is not correctly switched to the synchronous replica. We have 2 instances, one is master and one is a synchronous replica.
Here is log from the operator, the problem occurs every time:
Any idea what could be wrong? It could be definitely some wrong configuration (maybe three instances are the minimum?), but I can't figure out that it is.
The text was updated successfully, but these errors were encountered: