-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
Description
How to replicate:
- Start a new operator
- Create cluster with 2+ instances
- Trigger a switchover by changing the manifest
- See that operator pod is crashing in the middle of the rolling update
time="2022-04-25T16:29:56Z" level=debug msg="performing rolling update" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:29:56Z" level=info msg="there are 2 pods in the cluster to recreate" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:29:56Z" level=debug msg="subscribing to pod \"default/acid-minimal-cluster-1\"" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:21Z" level=info msg="pod \"default/acid-minimal-cluster-1\" has been recreated" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:21Z" level=debug msg="unsubscribing from pod \"default/acid-minimal-cluster-1\" events" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:21Z" level=debug msg="making GET http request: http://10.2.16.115:8008/cluster" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:21Z" level=debug msg="switching over from \"acid-minimal-cluster-0\" to \"default/acid-minimal-cluster-1\"" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:21Z" level=debug msg="making POST http request: http://10.2.16.115:8008/failover" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:21Z" level=debug msg="subscribing to pod \"default/acid-minimal-cluster-1\"" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:23Z" level=debug msg="successfully switched over from \"acid-minimal-cluster-0\" to \"default/acid-minimal-cluster-1\"" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:23Z" level=debug msg="unsubscribing from pod \"default/acid-minimal-cluster-1\" events" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:23Z" level=info msg="recreating old master pod \"default/acid-minimal-cluster-0\"" cluster-name=default/acid-minimal-cluster pkg=cluster
time="2022-04-25T16:30:23Z" level=debug msg="subscribing to pod \"default/acid-minimal-cluster-0\"" cluster-name=default/acid-minimal-cluster pkg=cluster
panic: send on closed channel
goroutine 196 [running]:
github.com/zalando/postgres-operator/pkg/cluster.(*Cluster).processPodEvent(0xc0005e0000, {0x1bdc360, 0xc002eb03c0})
/workspace/pkg/cluster/cluster.go:1039 +0x205
k8s.io/client-go/tools/cache.(*FIFO).Pop(0xc000143680, 0xc0008f7790)
/workspace/vendor/k8s.io/client-go/tools/cache/fifo.go:300 +0x1fc
github.com/zalando/postgres-operator/pkg/cluster.(*Cluster).processPodEventQueue(0xc0005e0000, 0x0)
/workspace/pkg/cluster/cluster.go:1056 +0x65
created by github.com/zalando/postgres-operator/pkg/cluster.(*Cluster).Run
/workspace/pkg/cluster/cluster.go:1047 +0x77
The operator pod comes back immediately and finishes the rolling update as expected (since it finds the corresponding annotations on the pods to be rotated). Looks similar to #342
The error happens when the operator tries to delete the old master pod. The pod is not deleted though. The new operator finds t still the rolling update flag on one pod meaning it wasn't replaced yet. The replacement of the replica then works as expected.