Skip to content

Conversation

yanchenko-igor
Copy link
Contributor

@yanchenko-igor yanchenko-igor commented Aug 12, 2020

This PR is intended to solve an issue with values like max_connections being decreased, at the moment after the pod restarting the new decreased value doesn't apply unless instance is restarted by api. When we compare statefulsets we ignore bootstrap.dcs section so pod will not restart after changing parameters which matters only on cluster bootstrap.

@yanchenko-igor yanchenko-igor marked this pull request as ready for review August 13, 2020 14:46
@yanchenko-igor yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from 0b79c26 to 2965f42 Compare August 14, 2020 09:15
@yanchenko-igor
Copy link
Contributor Author

Can this PR be reviewed? if there other way to fix the bug let's discuss it.

@sdudoladov
Copy link
Member

hello @yanchenko-igor

I will have a look today

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if this list should not be in the config map, this could be the hard coded default. Spilo image may change and not sure this always warrants a new operator build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand what you mean, this list is what @CyberDem0n gave me when we discussed how it should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanchenko-igor could you share a link to the discussion ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdudoladov it was private discussion in Slack.

@Jan-M
Copy link
Member

Jan-M commented Aug 18, 2020

Thanks for chipping in on this important issue.

For such a change to rolling upgrades maybe worth having an end 2 end test added or confirm there is one checking respective changes.

@yanchenko-igor yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from ace7e05 to 9afd724 Compare August 18, 2020 15:32
Copy link
Member

@sdudoladov sdudoladov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also comment in the admin docs on when restarts should happen instead of pod re-creation

@yanchenko-igor yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch 2 times, most recently from b8b7b15 to 4316988 Compare August 19, 2020 05:59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanchenko-igor could you share a link to the discussion ?

@yanchenko-igor yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from dffb061 to 2b0ce99 Compare August 25, 2020 11:34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say that we changed work_mem from '4MB' to '16MB'.
Patroni will update postgresql.conf and do pg_ctl reload.
That doesn't require a restart.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How we can distinguish parameters which require restart from those which don't?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to wait for a while (ttl seconds should be enough), and call Patroni API on every pod:

$ curl -s http://localhost:8008/patroni | jq .
{
  "pending_restart": true,

If there is pending_restart flag is set - postgres in this pod must be restarted.

Also, the /restart endpoint expects to get the json object, where you can specify {"restart_pending":true}, so Patroni will restart postgres only if it is required.

Waiting for ttl seconds is mandatory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added {"restart_pending":true} and waiting after restart.

@yanchenko-igor yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from 2b0ce99 to 853c011 Compare August 27, 2020 13:36
@yanchenko-igor yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from 853c011 to 4fcb754 Compare September 3, 2020 12:38
@yanchenko-igor
Copy link
Contributor Author

Hello, I need help with e2e tests, I don't understand why they fail, can someone support me? They seem to be unrelated to my changes.

@Jan-M
Copy link
Member

Jan-M commented Sep 9, 2020

Hi! Right now you should not spend time on the e2e test, we are aware of this not being as reliable as we want it to be and also the feedback time seems really slow.

We are looking into this in a separate PR, lets continue this once we have this sorted out.

@yanchenko-igor yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from cad63c9 to 7e28c6c Compare March 4, 2021 06:49
@FxKu FxKu modified the milestones: 1.7, 1.8 Mar 26, 2021
@yanchenko-igor yanchenko-igor force-pushed the restart_instances_via_api_instead_of_pods branch from 7e28c6c to 16661df Compare June 7, 2021 13:17
@FxKu
Copy link
Member

FxKu commented Jun 7, 2021

Did some local tests and it works. Had to ensure that other custom changes so the config are not overwritten by the restart. However, I noticed that a restart happens on each sync if parameters are defined. I think, checkAndSetGlobalPostgreSQLConfiguration should read the config and compare with existing parameters in the manifest and only add them to optionsToSet if they differ.

@FxKu
Copy link
Member

FxKu commented Jun 9, 2021

@yanchenko-igor with pending_restart an actual restart will not happen on each sync, but still seeing the messages in the logs and events makes it quite confusing:

Events:
  Type    Reason  Age                  From               Message
  ----    ------  ----                 ----               -------
  Normal  Update  119s (x79 over 44h)  postgres-operator  restarting Postgres server within pods
  Normal  Update  87s (x79 over 44h)   postgres-operator  Postgres server restart done - all instances have been restarted

I think, it's better if we read the config from patroni and compare it with what's in the manifest under postgresql.parameters.

@FxKu
Copy link
Member

FxKu commented Jun 9, 2021

@yanchenko-igor Before you start working on it again, maybe we will merge the PR now as is, because I have to build up on it anyway and could then refactor that part.

@FxKu
Copy link
Member

FxKu commented Jun 14, 2021

👍

1 similar comment
@sdudoladov
Copy link
Member

👍

@FxKu FxKu merged commit ebb3204 into zalando:master Jun 14, 2021
@FxKu
Copy link
Member

FxKu commented Jun 14, 2021

Thanks again @yanchenko-igor for this important contribution and sorry it took so long to merge. Thanks for your patience and responsiveness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants