Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add restart option in config_pgcluster.yml #354

Merged
merged 18 commits into from
Jun 7, 2023

Conversation

artemsafiyulin
Copy link
Contributor

This MR add next functionality:
If changed postgresql_variable parameter, who need restart postgresql service and you set variable "pending_restart", True value, than patroni cluster will be restarted.

@ThomasSanson
Copy link
Sponsor Contributor

Thank you for your contribution, @artemsafiyulin. We appreciate the effort you put into this.

Before we move forward, could you please run make lint on your code? This should help identify and fix any minor linting errors and ensure the code adheres to our project's coding style.

We're looking forward to further reviewing this new functionality you've proposed. In the meantime, we'll wait for @vitabaks's feedback and analysis on this pull request.

Thanks again for your contribution.

config_pgcluster.yml Outdated Show resolved Hide resolved
config_pgcluster.yml Outdated Show resolved Hide resolved
config_pgcluster.yml Outdated Show resolved Hide resolved
config_pgcluster.yml Outdated Show resolved Hide resolved
config_pgcluster.yml Outdated Show resolved Hide resolved
config_pgcluster.yml Outdated Show resolved Hide resolved
config_pgcluster.yml Outdated Show resolved Hide resolved
vars/main.yml Outdated Show resolved Hide resolved
artemsafiyulin and others added 10 commits May 29, 2023 20:56
Co-authored-by: Vitaliy Kukharik <37010174+vitabaks@users.noreply.github.com>
Co-authored-by: Vitaliy Kukharik <37010174+vitabaks@users.noreply.github.com>
Co-authored-by: Vitaliy Kukharik <37010174+vitabaks@users.noreply.github.com>
Co-authored-by: Vitaliy Kukharik <37010174+vitabaks@users.noreply.github.com>
Co-authored-by: Vitaliy Kukharik <37010174+vitabaks@users.noreply.github.com>
Co-authored-by: Vitaliy Kukharik <37010174+vitabaks@users.noreply.github.com>
Co-authored-by: Vitaliy Kukharik <37010174+vitabaks@users.noreply.github.com>
Co-authored-by: Vitaliy Kukharik <37010174+vitabaks@users.noreply.github.com>
@artemsafiyulin
Copy link
Contributor Author

@vitabaks your changes commited
all check passed (including lint)

@vitabaks
Copy link
Owner

OK, thank you. I need to further test this code.

@vitabaks
Copy link
Owner

vitabaks commented Jun 3, 2023

Test 1

PLAY [config_pgcluster.yml | Check needed restart cluster and prepare for it] ***

TASK [Gathering Facts] *********************************************************
ok: [10.172.0.21]
ok: [10.172.0.22]
ok: [10.172.0.20]

TASK [[Prepare] Get Patroni Cluster Leader Node] *******************************
ok: [10.172.0.22]
ok: [10.172.0.20]
ok: [10.172.0.21]

TASK [[Prepare] Add host to group "primary" (in-memory inventory)] *************
ok: [10.172.0.20] => (item=10.172.0.20)

TASK [[Prepare] Add hosts to group "secondary" (in-memory inventory)] **********
ok: [10.172.0.20] => (item=10.172.0.21)
ok: [10.172.0.20] => (item=10.172.0.22)

TASK [Print Patroni Cluster info] **********************************************
ok: [10.172.0.20] => {
    "msg": [
        "Cluster Name: postgres-cluster",
        "Cluster Leader: pgnode01"
    ]
}
fatal: [10.172.0.20]: FAILED! => {"changed": false, "msg": "unable to connect to database: connection to server at \"localhost\" (::1), port 5432 failed: Connection refused\n\tIs the server running on that host and accepting TCP/IP connections?\nconnection to server at \"localhost\" (127.0.0.1), port 5432 failed: fe_sendauth: no password supplied\n"}

TASK [Get count pg settings needed restart] ************************************

not passed

@vitabaks
Copy link
Owner

vitabaks commented Jun 3, 2023

Re-test

Fix: c4382f8

result:


TASK [Get count pg settings needed restart] ************************************
ok: [10.172.0.20]
ok: [10.172.0.22]
ok: [10.172.0.21]

TASK [Set pg_pending_restart_count variable] ***********************************
ok: [10.172.0.20]
ok: [10.172.0.21]
ok: [10.172.0.22]

TASK [Print pg_pending_restart_count] ******************************************
ok: [10.172.0.20] => {
    "pg_pending_restart_count": "0"
}
ok: [10.172.0.21] => {
    "pg_pending_restart_count": "0"
}
ok: [10.172.0.22] => {
    "pg_pending_restart_count": "0"
}

PLAY [config_pgcluster.yml | Restart patroni on secondary after config settings if need] ***

TASK [Include OS-specific variables] *******************************************
ok: [10.172.0.21]

PLAY [config_pgcluster.yml | Restart patroni on secondary after config settings if need] ***
ok: [10.172.0.22]

PLAY [config_pgcluster.yml | Restart patroni on master after config settings if need] ***

TASK [Include OS-specific variables] *******************************************
ok: [10.172.0.20]

PLAY RECAP *********************************************************************
10.172.0.20                : ok=70   changed=1    unreachable=0    failed=0    skipped=191  rescued=0    ignored=0
10.172.0.21                : ok=65   changed=1    unreachable=0    failed=0    skipped=184  rescued=0    ignored=0
10.172.0.22                : ok=65   changed=1    unreachable=0    failed=0    skipped=184  rescued=0    ignored=0
localhost                  : ok=0    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0

passed

Fixed (example):

ERROR! The requested handler 'reload patroni' was not found in either the main handlers list nor in the listening handlers list
@vitabaks
Copy link
Owner

vitabaks commented Jun 3, 2023

Test with restart

result:

TASK [Check if there are any changed parameters that require a restart] ********
ok: [10.172.0.21]
ok: [10.172.0.22]
ok: [10.172.0.20]

TASK [Set pg_pending_restart_settings variable] ********************************
ok: [10.172.0.20]
ok: [10.172.0.21]
ok: [10.172.0.22]

TASK [Display parameters requiring PostgreSQL restart] *************************
ok: [10.172.0.20] => {
    "msg": [
        "Parameters changed that require PostgreSQL to restart:",
        [
            {
                "name": "autovacuum_max_workers",
                "setting": "5"
            },
            {
                "name": "max_connections",
                "setting": "500"
            },
            {
                "name": "max_worker_processes",
                "setting": "24"
            }
        ]
    ]
}
ok: [10.172.0.21] => {
    "msg": [
        "Parameters changed that require PostgreSQL to restart:",
        [
            {
                "name": "autovacuum_max_workers",
                "setting": "5"
            },
            {
                "name": "max_connections",
                "setting": "500"
            },
            {
                "name": "max_worker_processes",
                "setting": "24"
            }
        ]
    ]
}
ok: [10.172.0.22] => {
    "msg": [
        "Parameters changed that require PostgreSQL to restart:",
        [
            {
                "name": "autovacuum_max_workers",
                "setting": "5"
            },
            {
                "name": "max_connections",
                "setting": "500"
            },
            {
                "name": "max_worker_processes",
                "setting": "24"
            }
        ]
    ]
}

PLAY [config_pgcluster.yml | Restart patroni on secondary after config settings if need] ***

TASK [Include OS-specific variables] *******************************************
ok: [10.172.0.21]

TASK [Stop read-only traffic] **************************************************

TASK [update : Edit patroni.yml | enable noloadbalance, nosync, nofailover] ****
changed: [10.172.0.21] => (item=noloadbalance: true)
changed: [10.172.0.21] => (item=nosync: true)
changed: [10.172.0.21] => (item=nofailover: true)

TASK [update : Reload patroni service] *****************************************
changed: [10.172.0.21]
FAILED - RETRYING: [10.172.0.21]: Make sure replica endpoint is unavailable (30 retries left).
FAILED - RETRYING: [10.172.0.21]: Make sure replica endpoint is unavailable (29 retries left).

TASK [update : Make sure replica endpoint is unavailable] **********************
ok: [10.172.0.21]

TASK [update : Wait for active transactions to complete] ***********************
ok: [10.172.0.21]

TASK [Stop Services] ***********************************************************

TASK [update : Check PostgreSQL is started and accepting connections] **********
ok: [10.172.0.21]

TASK [update : Execute CHECKPOINT before stopping PostgreSQL] ******************
changed: [10.172.0.21]

TASK [update : Stop Patroni service on the Cluster Replica (pgnode02)] *********
changed: [10.172.0.21]

TASK [Start Services] **********************************************************

TASK [update : Start Patroni service] ******************************************
changed: [10.172.0.21]

TASK [update : Wait for port 8008 to become open on the host] ******************
ok: [10.172.0.21]

TASK [update : Check that the Patroni is healthy] ******************************
ok: [10.172.0.21]

TASK [update : Check PostgreSQL is started and accepting connections] **********
ok: [10.172.0.21]

TASK [Start read-only traffic] *************************************************

TASK [update : Edit patroni.yml | disable noloadbalance, nosync, nofailover] ***
changed: [10.172.0.21] => (item=noloadbalance: false)
changed: [10.172.0.21] => (item=nosync: false)
changed: [10.172.0.21] => (item=nofailover: false)

TASK [update : Reload patroni service] *****************************************
changed: [10.172.0.21]
FAILED - RETRYING: [10.172.0.21]: Make sure replica endpoint is available (30 retries left).
FAILED - RETRYING: [10.172.0.21]: Make sure replica endpoint is available (29 retries left).
FAILED - RETRYING: [10.172.0.21]: Make sure replica endpoint is available (28 retries left).

TASK [update : Make sure replica endpoint is available] ************************
ok: [10.172.0.21]

PLAY [config_pgcluster.yml | Restart patroni on secondary after config settings if need] ***

TASK [Include OS-specific variables] *******************************************
ok: [10.172.0.22]

TASK [Stop read-only traffic] **************************************************

TASK [update : Edit patroni.yml | enable noloadbalance, nosync, nofailover] ****
changed: [10.172.0.22] => (item=noloadbalance: true)
changed: [10.172.0.22] => (item=nosync: true)
changed: [10.172.0.22] => (item=nofailover: true)

TASK [update : Reload patroni service] *****************************************
changed: [10.172.0.22]

TASK [update : Make sure replica endpoint is unavailable] **********************
ok: [10.172.0.22]

TASK [update : Wait for active transactions to complete] ***********************
ok: [10.172.0.22]

TASK [Stop Services] ***********************************************************

TASK [update : Check PostgreSQL is started and accepting connections] **********
ok: [10.172.0.22]

TASK [update : Execute CHECKPOINT before stopping PostgreSQL] ******************
changed: [10.172.0.22]

TASK [update : Stop Patroni service on the Cluster Replica (pgnode03)] *********
changed: [10.172.0.22]

TASK [Start Services] **********************************************************

TASK [update : Start Patroni service] ******************************************
changed: [10.172.0.22]

TASK [update : Wait for port 8008 to become open on the host] ******************
ok: [10.172.0.22]

TASK [update : Check that the Patroni is healthy] ******************************
ok: [10.172.0.22]

TASK [update : Check PostgreSQL is started and accepting connections] **********
ok: [10.172.0.22]

TASK [Start read-only traffic] *************************************************

TASK [update : Edit patroni.yml | disable noloadbalance, nosync, nofailover] ***
changed: [10.172.0.22] => (item=noloadbalance: false)
changed: [10.172.0.22] => (item=nosync: false)
changed: [10.172.0.22] => (item=nofailover: false)

TASK [update : Reload patroni service] *****************************************
changed: [10.172.0.22]
FAILED - RETRYING: [10.172.0.22]: Make sure replica endpoint is available (30 retries left).
FAILED - RETRYING: [10.172.0.22]: Make sure replica endpoint is available (29 retries left).
FAILED - RETRYING: [10.172.0.22]: Make sure replica endpoint is available (28 retries left).

TASK [update : Make sure replica endpoint is available] ************************
ok: [10.172.0.22]

PLAY [config_pgcluster.yml | Restart patroni on master after config settings if need] ***

TASK [Include OS-specific variables] *******************************************
ok: [10.172.0.20]

TASK [Switchover Patroni leader role] ******************************************

TASK [update : Perform switchover of the leader for the Patroni cluster "postgres-cluster"] ***
changed: [10.172.0.20]
FAILED - RETRYING: [10.172.0.20]: Make sure that the Patroni is healthy and is a replica (300 retries left).

TASK [update : Make sure that the Patroni is healthy and is a replica] *********
ok: [10.172.0.20]

TASK [Stop read-only traffic] **************************************************

TASK [update : Edit patroni.yml | enable noloadbalance, nosync, nofailover] ****
changed: [10.172.0.20] => (item=noloadbalance: true)
changed: [10.172.0.20] => (item=nosync: true)
changed: [10.172.0.20] => (item=nofailover: true)

TASK [update : Reload patroni service] *****************************************
changed: [10.172.0.20]
FAILED - RETRYING: [10.172.0.20]: Make sure replica endpoint is unavailable (30 retries left).
FAILED - RETRYING: [10.172.0.20]: Make sure replica endpoint is unavailable (29 retries left).

TASK [update : Make sure replica endpoint is unavailable] **********************
ok: [10.172.0.20]

TASK [update : Wait for active transactions to complete] ***********************
ok: [10.172.0.20]

TASK [Stop Services] ***********************************************************

TASK [update : Check PostgreSQL is started and accepting connections] **********
ok: [10.172.0.20]

TASK [update : Execute CHECKPOINT before stopping PostgreSQL] ******************
changed: [10.172.0.20]

TASK [update : Stop Patroni service on the old Cluster Leader (pgnode01)] ******
changed: [10.172.0.20]

TASK [Start Services] **********************************************************

TASK [update : Start Patroni service] ******************************************
changed: [10.172.0.20]

TASK [update : Wait for port 8008 to become open on the host] ******************
ok: [10.172.0.20]

TASK [update : Check that the Patroni is healthy] ******************************
ok: [10.172.0.20]

TASK [update : Check PostgreSQL is started and accepting connections] **********
ok: [10.172.0.20]

TASK [Start read-only traffic] *************************************************

TASK [update : Edit patroni.yml | disable noloadbalance, nosync, nofailover] ***
changed: [10.172.0.20] => (item=noloadbalance: false)
changed: [10.172.0.20] => (item=nosync: false)
changed: [10.172.0.20] => (item=nofailover: false)

TASK [update : Reload patroni service] *****************************************
changed: [10.172.0.20]
FAILED - RETRYING: [10.172.0.20]: Make sure replica endpoint is available (30 retries left).
FAILED - RETRYING: [10.172.0.20]: Make sure replica endpoint is available (29 retries left).
FAILED - RETRYING: [10.172.0.20]: Make sure replica endpoint is available (28 retries left).

TASK [update : Make sure replica endpoint is available] ************************
ok: [10.172.0.20]

PLAY RECAP *********************************************************************
10.172.0.20                : ok=86   changed=9    unreachable=0    failed=0    skipped=188  rescued=0    ignored=0
10.172.0.21                : ok=79   changed=8    unreachable=0    failed=0    skipped=182  rescued=0    ignored=0
10.172.0.22                : ok=79   changed=8    unreachable=0    failed=0    skipped=182  rescued=0    ignored=0
localhost 

passed

since the setting field shows only the current value of the parameter and not the new one, in this case we do not need to output this field. A list of parameter names that require restarting is sufficient.
@vitabaks
Copy link
Owner

vitabaks commented Jun 3, 2023

Display only parameter names that require a restart

commit 799b626 ea1ca14

In this example, we see that server 20 has a different list of parameters for restarting (it is possible if the parameters are set at the level of a separate host (for example, using alter system) than the other servers
, so it is important to output information for each server

TASK [Display parameters requiring PostgreSQL restart] *************************
ok: [10.172.0.20] => {
    "msg": [
        "On server pgnode01, the following parameters have changed and require PostgreSQL to restart:",
        [
            "autovacuum_max_workers",
            "max_connections",
            "max_locks_per_transaction",
            "max_worker_processes"
        ]
    ]
}
ok: [10.172.0.21] => {
    "msg": [
        "On server pgnode02, the following parameters have changed and require PostgreSQL to restart:",
        [
            "autovacuum_max_workers",
            "max_connections",
            "max_worker_processes"
        ]
    ]
}
ok: [10.172.0.22] => {
    "msg": [
        "On server pgnode03, the following parameters have changed and require PostgreSQL to restart:",
        [
            "autovacuum_max_workers",
            "max_connections",
            "max_worker_processes"
        ]
    ]
}

@vitabaks vitabaks merged commit e57b07f into vitabaks:master Jun 7, 2023
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants