-
Notifications
You must be signed in to change notification settings - Fork 1k
restart instances via rest api instead of recreating pods, fixes bug with being unable to decrease some values, like max_connections #1103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0b79c26
to
2965f42
Compare
Can this PR be reviewed? if there other way to fix the bug let's discuss it. |
hello @yanchenko-igor I will have a look today |
pkg/cluster/k8sres.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if this list should not be in the config map, this could be the hard coded default. Spilo image may change and not sure this always warrants a new operator build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand what you mean, this list is what @CyberDem0n gave me when we discussed how it should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanchenko-igor could you share a link to the discussion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sdudoladov it was private discussion in Slack.
Thanks for chipping in on this important issue. For such a change to rolling upgrades maybe worth having an end 2 end test added or confirm there is one checking respective changes. |
ace7e05
to
9afd724
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also comment in the admin docs on when restarts should happen instead of pod re-creation
b8b7b15
to
4316988
Compare
pkg/cluster/k8sres.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanchenko-igor could you share a link to the discussion ?
dffb061
to
2b0ce99
Compare
pkg/cluster/sync.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say that we changed work_mem
from '4MB' to '16MB'.
Patroni will update postgresql.conf
and do pg_ctl reload
.
That doesn't require a restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How we can distinguish parameters which require restart from those which don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to wait for a while (ttl seconds should be enough), and call Patroni API on every pod:
$ curl -s http://localhost:8008/patroni | jq .
{
"pending_restart": true,
If there is pending_restart
flag is set - postgres in this pod must be restarted.
Also, the /restart
endpoint expects to get the json object, where you can specify {"restart_pending":true}
, so Patroni will restart postgres only if it is required.
Waiting for ttl
seconds is mandatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added {"restart_pending":true}
and waiting after restart.
2b0ce99
to
853c011
Compare
853c011
to
4fcb754
Compare
Hello, I need help with e2e tests, I don't understand why they fail, can someone support me? They seem to be unrelated to my changes. |
Hi! Right now you should not spend time on the e2e test, we are aware of this not being as reliable as we want it to be and also the feedback time seems really slow. We are looking into this in a separate PR, lets continue this once we have this sorted out. |
cad63c9
to
7e28c6c
Compare
… and patroni is not pending restart
… values like max_connections decreased
7e28c6c
to
16661df
Compare
Did some local tests and it works. Had to ensure that other custom changes so the config are not overwritten by the restart. However, I noticed that a restart happens on each sync if |
@yanchenko-igor with
I think, it's better if we read the config from patroni and compare it with what's in the manifest under |
@yanchenko-igor Before you start working on it again, maybe we will merge the PR now as is, because I have to build up on it anyway and could then refactor that part. |
👍 |
1 similar comment
👍 |
Thanks again @yanchenko-igor for this important contribution and sorry it took so long to merge. Thanks for your patience and responsiveness. |
This PR is intended to solve an issue with values like max_connections being decreased, at the moment after the pod restarting the new decreased value doesn't apply unless instance is restarted by api. When we compare statefulsets we ignore bootstrap.dcs section so pod will not restart after changing parameters which matters only on cluster bootstrap.