You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please, answer some short questions which should help us to understand your problem / question better?
Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.9.0
Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s
Are you running Postgres Operator in production? yes
Type of issue? Bug report
We have installed postgres-operator 1.9.0 via helm with default values and a cluster with synchronous_mode: true and numberOfInstances: 2.
On initial deployment everything is okay. postgres-testdb-0 is the leader, postgres-testdb-1 is following.
When we force a cluster update, e.g. by updating podAnnotations (foo: "2"):
kubectl get postgresqls shows Updating
postgres-testdb-1 is restarted
kubectl get postgresqls shows UpdateFailed
kubectl describe postgresql shows
Switchover from "postgres-testdb-1" to "default/postgres-testdb-0" FAILED: could not switch over from "postgres-testdb-1" to "default/postgres-testdb-0": patroni returned 'Failover failed'
postgres-testdb-0 is not restarted
Combined log of postgres-operator ("time=") and leader pod:
time="2023-03-27T14:32:27Z" level=info msg="there are 2 pods in the cluster to recreate"
2023-03-27 14:32:33,814 INFO: Updating synchronous privilege temporarily from ['postgres-testdb-1'] to []
time="2023-03-27T14:32:52Z" level=info msg="pod \"default/postgres-testdb-1\" has been recreated
time="2023-03-27T14:32:52Z" level=debug msg="making POST http request: http://10.147.180.36:8008/failover"
2023-03-27 14:32:52,621 WARNING: Failover candidate=postgres-testdb-1 does not match with sync_standbys=None
2023-03-27 14:32:52,621 WARNING: manual failover: members list is empty
2023-03-27 14:32:52,621 WARNING: manual failover: no healthy members found, failover is not possible
2023-03-27 14:32:52,669 INFO: Assigning synchronous standby status to ['postgres-testdb-1']
2023-03-27 14:32:54,732 INFO: Synchronous standby status assigned to ['postgres-testdb-1']
It looks to me that postgres-operator is initiating the failover to the (only possible) replica although the restarted replica is not yet marked as fully ready. This finally happens 2 seconds too late.
this worked absolutely fine with postgres-operator 1.6.3
with postgres-operator 1.8.0 we have a very similar (probably the same) issue
from the release notes of 1.7.0: "Add basic retry around switchover". This does not happen here, but we are not sure if that feature is even meant to be applicable here.
when using numberOfInstances: 3 we do not face this issue
The text was updated successfully, but these errors were encountered:
Please, answer some short questions which should help us to understand your problem / question better?
We have installed postgres-operator 1.9.0 via helm with default values and a cluster with
synchronous_mode: true
andnumberOfInstances: 2
.On initial deployment everything is okay.
postgres-testdb-0
is the leader,postgres-testdb-1
is following.When we force a cluster update, e.g. by updating podAnnotations (
foo: "2"
):kubectl get postgresqls
showsUpdating
postgres-testdb-1
is restartedkubectl get postgresqls
showsUpdateFailed
kubectl describe postgresql
showspostgres-testdb-0
is not restartedCombined log of postgres-operator ("time=") and leader pod:
It looks to me that postgres-operator is initiating the failover to the (only possible) replica although the restarted replica is not yet marked as fully ready. This finally happens 2 seconds too late.
For more extensive logs and cluster yaml see https://gist.github.com/koelnconcert/30f541aee49b0de163faeefa1bce74a7
Remarks:
numberOfInstances: 3
we do not face this issueThe text was updated successfully, but these errors were encountered: