databases keep restarting after the upgrade from 5.3 to 5.7.2


## Questions

I did postgres-operator version upgrade from 5.2 to 5.7.2 yesterday and it seemed working fine after the upgrade. I was able to connect to the database and query. However this morning found that clusters keep restarting after running normal for ~1-2 mins. I did the exact same upgrade for 2 different environments and one works fine while another one keeps restarting. 

Only difference I found is on kube cluster with version v1.26.3+k3s1 it works fine, and we have an issue on the cluster with version v1.25.4+k3s1. Not sure if it makes great difference. 

In the pgo pod log, I can see the below error is generated : 

`time="2025-01-13T09:11:21Z" level=error msg="Reconciler error" PostgresCluster=postgres-operator/hippo controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster error="Operation cannot be fulfilled on Pod \"hippo-instance1-f9rv-0\": the ResourceVersion in the precondition (1043902053) does not match the ResourceVersion in record (1043902088). The object might have been modified" file="internal/controller/postgrescluster/instance.go:879" func="postgrescluster.(*Reconciler).rolloutInstance" name=hippo namespace=postgres-operator reconcileID=6f692192-9b74-4921-891e-ac91a86b00ea version=5.7.2-0`

Also in the pod log, I got an exit code 137, which is a bit strange as I have enough memory resource on the cluster. 

```
2025-01-13 09:19:49.116 UTC [95] LOG:  received SIGHUP, reloading configuration files
2025-01-13 09:19:59.121 UTC [95] LOG:  received SIGHUP, reloading configuration files
2025-01-13 09:20:09.127 UTC [95] LOG:  received SIGHUP, reloading configuration files
2025-01-13 09:20:14.758 UTC [98] LOG:  checkpoint complete: wrote 90256 buffers (2.9%); 0 WAL file(s) added, 0 removed, 58 recycled; write=159.616 s, sync=0.007 s, total=159.711 s; sync files=70, longest=0.004 s, average=0.001 s; distance=695145 kB, estimate=695145 kB
2025-01-13 09:20:14.759 UTC [98] LOG:  checkpoint starting: immediate force wait
2025-01-13 09:20:15.001 UTC [98] LOG:  checkpoint complete: wrote 11836 buffers (0.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.233 s, sync=0.005 s, total=0.243 s; sync files=10, longest=0.003 s, average=0.001 s; distance=100283 kB, estimate=635659 kB
2025-01-13 09:20:15.417 UTC [95] LOG:  received fast shutdown request
2025-01-13 09:20:15.418 UTC [95] LOG:  aborting any active transactions
2025-01-13 09:20:15.418 UTC [752] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [211] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [210] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [181] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [187] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [179] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [175] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [177] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [173] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [169] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [171] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [167] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [161] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [115] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.418 UTC [163] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.419 UTC [165] FATAL:  terminating connection due to administrator command
2025-01-13 09:20:15.420 UTC [95] LOG:  background worker "logical replication launcher" (PID 131) exited with exit code 1
2025-01-13 09:20:15.427 UTC [98] LOG:  shutting down
2025-01-13 09:20:15.456 UTC [98] LOG:  checkpoint starting: shutdown immediate
2025-01-13 09:20:15.477 UTC [98] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.015 s, sync=0.001 s, total=0.022 s; sync files=0, longest=0.000 s, average=0.000 s; distance=7386 kB, estimate=572832 kB
2025-01-13 09:20:16.278 UTC [95] LOG:  database system is shut down
command terminated with exit code 137
```

Have you ever had the same experience, would appreciate if you share your thoughts here, thanks. 

### Environment

Please provide the following details:

- Platform: Kubernetes
- Platform Version: v1.25.4+k3s1
- PGO Image Tag: postgres-operator:ubi8-5.7.2-0
- Postgres Version 15
- Storage: longhorn


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

databases keep restarting after the upgrade from 5.3 to 5.7.2 #4069

Questions

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

databases keep restarting after the upgrade from 5.3 to 5.7.2 #4069

Description

Questions

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions