failover fails with synchronous_mode and 2 instances #2276

koelnconcert · 2023-03-28T09:03:55Z

Please, answer some short questions which should help us to understand your problem / question better?

Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.9.0
Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s
Are you running Postgres Operator in production? yes
Type of issue? Bug report

We have installed postgres-operator 1.9.0 via helm with default values and a cluster with synchronous_mode: true and numberOfInstances: 2.

On initial deployment everything is okay. postgres-testdb-0 is the leader, postgres-testdb-1 is following.

When we force a cluster update, e.g. by updating podAnnotations (foo: "2"):

kubectl get postgresqls shows Updating
postgres-testdb-1 is restarted
kubectl get postgresqls shows UpdateFailed
kubectl describe postgresql shows

Switchover from "postgres-testdb-1" to "default/postgres-testdb-0" FAILED: could not switch over from "postgres-testdb-1" to "default/postgres-testdb-0": patroni returned 'Failover failed'
postgres-testdb-0 is not restarted

Combined log of postgres-operator ("time=") and leader pod:

time="2023-03-27T14:32:27Z" level=info msg="there are 2 pods in the cluster to recreate"
2023-03-27 14:32:33,814 INFO: Updating synchronous privilege temporarily from ['postgres-testdb-1'] to []
time="2023-03-27T14:32:52Z" level=info msg="pod \"default/postgres-testdb-1\" has been recreated
time="2023-03-27T14:32:52Z" level=debug msg="making POST http request: http://10.147.180.36:8008/failover"
2023-03-27 14:32:52,621 WARNING: Failover candidate=postgres-testdb-1 does not match with sync_standbys=None
2023-03-27 14:32:52,621 WARNING: manual failover: members list is empty
2023-03-27 14:32:52,621 WARNING: manual failover: no healthy members found, failover is not possible
2023-03-27 14:32:52,669 INFO: Assigning synchronous standby status to ['postgres-testdb-1']
2023-03-27 14:32:54,732 INFO: Synchronous standby status assigned to ['postgres-testdb-1']

It looks to me that postgres-operator is initiating the failover to the (only possible) replica although the restarted replica is not yet marked as fully ready. This finally happens 2 seconds too late.

For more extensive logs and cluster yaml see https://gist.github.com/koelnconcert/30f541aee49b0de163faeefa1bce74a7

Remarks:

this worked absolutely fine with postgres-operator 1.6.3
with postgres-operator 1.8.0 we have a very similar (probably the same) issue
from the release notes of 1.7.0: "Add basic retry around switchover". This does not happen here, but we are not sure if that feature is even meant to be applicable here.
when using numberOfInstances: 3 we do not face this issue

The text was updated successfully, but these errors were encountered:

kkrasnov1 · 2023-03-30T08:26:20Z

I have the same problem.
I found the following workarounds:

wait for the operator to retry synchronization
restart operator

FxKu · 2023-03-30T16:15:33Z

Thanks for raising this @koelnconcert and @kkrasnov1 . It looks indeed like we do not wait long enough. I will try to fix it for the next release.

kkrasnov1 · 2023-05-17T09:36:09Z

I updated the operator to version 1.10.0 but the problem remained.

FxKu added the bug label Mar 30, 2023

FxKu mentioned this issue Mar 31, 2023

in sync mode select only syncStandby as switchover candidate #2278

Merged

FxKu closed this as completed in #2278 Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failover fails with synchronous_mode and 2 instances #2276

failover fails with synchronous_mode and 2 instances #2276

koelnconcert commented Mar 28, 2023

kkrasnov1 commented Mar 30, 2023

FxKu commented Mar 30, 2023

kkrasnov1 commented May 17, 2023

failover fails with synchronous_mode and 2 instances #2276

failover fails with synchronous_mode and 2 instances #2276

Comments

koelnconcert commented Mar 28, 2023

kkrasnov1 commented Mar 30, 2023

FxKu commented Mar 30, 2023

kkrasnov1 commented May 17, 2023