restart instances via rest api instead of recreating pods, fixes bug with being unable to decrease some values, like max_connections #1103

yanchenko-igor · 2020-08-12T13:57:49Z

This PR is intended to solve an issue with values like max_connections being decreased, at the moment after the pod restarting the new decreased value doesn't apply unless instance is restarted by api. When we compare statefulsets we ignore bootstrap.dcs section so pod will not restart after changing parameters which matters only on cluster bootstrap.

yanchenko-igor · 2020-08-18T06:20:49Z

Can this PR be reviewed? if there other way to fix the bug let's discuss it.

sdudoladov · 2020-08-18T07:35:02Z

hello @yanchenko-igor

I will have a look today

Jan-M · 2020-08-18T08:53:54Z

pkg/cluster/k8sres.go

i wonder if this list should not be in the config map, this could be the hard coded default. Spilo image may change and not sure this always warrants a new operator build

I am not sure I understand what you mean, this list is what @CyberDem0n gave me when we discussed how it should work.

@yanchenko-igor could you share a link to the discussion ?

@sdudoladov it was private discussion in Slack.

Jan-M · 2020-08-18T08:55:07Z

Thanks for chipping in on this important issue.

For such a change to rolling upgrades maybe worth having an end 2 end test added or confirm there is one checking respective changes.

sdudoladov

Please also comment in the admin docs on when restarts should happen instead of pod re-creation

e2e/tests/test_e2e.py

pkg/cluster/cluster.go

pkg/cluster/cluster_test.go

sdudoladov · 2020-08-24T12:34:20Z

pkg/cluster/k8sres.go

@yanchenko-igor could you share a link to the discussion ?

pkg/cluster/sync.go

CyberDem0n · 2020-08-27T08:45:25Z

pkg/cluster/sync.go

Let's say that we changed work_mem from '4MB' to '16MB'.
Patroni will update postgresql.conf and do pg_ctl reload.
That doesn't require a restart.

How we can distinguish parameters which require restart from those which don't?

You have to wait for a while (ttl seconds should be enough), and call Patroni API on every pod:

$ curl -s http://localhost:8008/patroni | jq . { "pending_restart": true,

If there is pending_restart flag is set - postgres in this pod must be restarted.

Also, the /restart endpoint expects to get the json object, where you can specify {"restart_pending":true}, so Patroni will restart postgres only if it is required.

Waiting for ttl seconds is mandatory.

Added {"restart_pending":true} and waiting after restart.

yanchenko-igor · 2020-09-09T14:39:55Z

Hello, I need help with e2e tests, I don't understand why they fail, can someone support me? They seem to be unrelated to my changes.

Jan-M · 2020-09-09T15:27:01Z

Hi! Right now you should not spend time on the e2e test, we are aware of this not being as reliable as we want it to be and also the feedback time seems really slow.

We are looking into this in a separate PR, lets continue this once we have this sorted out.

…lacklist

…after restart

… and patroni is not pending restart

… required

… values like max_connections decreased

FxKu · 2021-06-07T13:25:54Z

Did some local tests and it works. Had to ensure that other custom changes so the config are not overwritten by the restart. However, I noticed that a restart happens on each sync if parameters are defined. I think, checkAndSetGlobalPostgreSQLConfiguration should read the config and compare with existing parameters in the manifest and only add them to optionsToSet if they differ.

FxKu · 2021-06-09T10:59:58Z

@yanchenko-igor with pending_restart an actual restart will not happen on each sync, but still seeing the messages in the logs and events makes it quite confusing:

Events:
  Type    Reason  Age                  From               Message
  ----    ------  ----                 ----               -------
  Normal  Update  119s (x79 over 44h)  postgres-operator  restarting Postgres server within pods
  Normal  Update  87s (x79 over 44h)   postgres-operator  Postgres server restart done - all instances have been restarted

I think, it's better if we read the config from patroni and compare it with what's in the manifest under postgresql.parameters.

FxKu · 2021-06-09T14:14:59Z

@yanchenko-igor Before you start working on it again, maybe we will merge the PR now as is, because I have to build up on it anyway and could then refactor that part.

FxKu · 2021-06-14T08:48:35Z

👍

sdudoladov · 2021-06-14T08:59:17Z

👍

FxKu · 2021-06-14T09:02:09Z

Thanks again @yanchenko-igor for this important contribution and sorry it took so long to merge. Thanks for your patience and responsiveness.

yanchenko-igor marked this pull request as ready for review August 13, 2020 14:46

yanchenko-igor requested review from CyberDem0n, FxKu, Jan-M, RafiaSabih, avaczi, erthalion and sdudoladov as code owners August 13, 2020 14:46

yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from 0b79c26 to 2965f42 Compare August 14, 2020 09:15

Jan-M reviewed Aug 18, 2020

View reviewed changes

Jan-M added the enhancement label Aug 18, 2020

yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from ace7e05 to 9afd724 Compare August 18, 2020 15:32

sdudoladov reviewed Aug 18, 2020

View reviewed changes

e2e/tests/test_e2e.py Outdated Show resolved Hide resolved

e2e/tests/test_e2e.py Outdated Show resolved Hide resolved

e2e/tests/test_e2e.py Outdated Show resolved Hide resolved

sdudoladov reviewed Aug 18, 2020

View reviewed changes

e2e/tests/test_e2e.py Outdated Show resolved Hide resolved

yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch 2 times, most recently from b8b7b15 to 4316988 Compare August 19, 2020 05:59

yanchenko-igor requested review from Jan-M and sdudoladov August 19, 2020 06:57

sdudoladov reviewed Aug 24, 2020

View reviewed changes

yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from dffb061 to 2b0ce99 Compare August 25, 2020 11:34

CyberDem0n reviewed Aug 27, 2020

View reviewed changes

yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from 2b0ce99 to 853c011 Compare August 27, 2020 13:36

yanchenko-igor requested review from CyberDem0n and sdudoladov September 1, 2020 13:47

yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from 853c011 to 4fcb754 Compare September 3, 2020 12:38

yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from cad63c9 to 7e28c6c Compare March 4, 2021 06:49

FxKu modified the milestones: 1.7, 1.8 Mar 26, 2021

yanchenko-igor added 16 commits June 7, 2021 16:17

restart instances via rest api instead of recreating pods

da2993e

Ignore differences in bootstrap.dcs when compare SPILO_CONFIGURATION

94b64c2

isBootstrapOnlyParameter is rewritten, instead of whitelist it uses b…

9012a96

…lacklist

added tests

4e2586b

style fix

3620e63

added e2e test for max_connections decreasing

cad8833

test fixed

61d542d

documentation updated

ffcf68b

review fixes

0dab347

pending_restart flag added to restart api call, wait fot ttl seconds …

8c27f8d

…after restart

code coverage increased

0f72ad4

refactoring, /restart returns error if pending_restart is set to true…

754e313

… and patroni is not pending restart

restart postgresql instances within pods only if pod's restart is not…

85946a0

… required

patroni might need to restart postgresql after pods were recreated if…

6ea6b77

… values like max_connections decreased

instancesRestart is not critical, try to restart pods if not successful

cf4783d

cleanup

16661df

yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from 7e28c6c to 16661df Compare June 7, 2021 13:17

Merge branch 'master' into restart_instances_via_api_instead_of_pods

5a1b462

FxKu merged commit ebb3204 into zalando:master Jun 14, 2021

This was referenced Aug 23, 2021

postgres-operator is restarting pods while not necessary #501

Closed

Fix patroni config updates #513

Closed

restart instances via rest api instead of recreating pods, fixes bug with being unable to decrease some values, like max_connections #1103

restart instances via rest api instead of recreating pods, fixes bug with being unable to decrease some values, like max_connections #1103

Uh oh!

Conversation

yanchenko-igor commented Aug 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yanchenko-igor commented Aug 18, 2020

Uh oh!

sdudoladov commented Aug 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jan-M commented Aug 18, 2020

Uh oh!

sdudoladov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanchenko-igor commented Sep 9, 2020

Uh oh!

Jan-M commented Sep 9, 2020

Uh oh!

FxKu commented Jun 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FxKu commented Jun 9, 2021

Uh oh!

FxKu commented Jun 9, 2021

Uh oh!

FxKu commented Jun 14, 2021

Uh oh!

sdudoladov commented Jun 14, 2021

Uh oh!

FxKu commented Jun 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

yanchenko-igor commented Aug 12, 2020 •

edited

Loading

FxKu commented Jun 7, 2021 •

edited

Loading