Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in place major upgrade #488

Merged
merged 45 commits into from Nov 30, 2020
Merged

in place major upgrade #488

merged 45 commits into from Nov 30, 2020

Conversation

CyberDem0n
Copy link
Contributor

@CyberDem0n CyberDem0n commented Sep 7, 2020

  • make configure_spilo to ignore PGVERSION when generating postgres.yml if it doesn't match $PGDATA/PG_VERSION
  • configure the backup path dependant on major version ($cluster_name/wal/$PGVERSION)
  • move functions used across different modules to the spilo_commons
  • add rsync to Dockerfile
  • implemented inplace_upgrade script (WIP)

How to trigger upgrade? This is a two-step process:

  1. Update configuration (version) and rotate all pods. On start, configure_spilo will notice version mismatch and start the old version.
  2. When all pods are rotated exec into the master container and call python3 /scripts/inplace_upgrade.py N, where N the capacity of the PostgreSQL cluster.

What inplace_upgrade.py does:

  1. Safety checks:
  • new version must be bigger than the old one
  • current node must be running as a master with the leader lock
  • the current number of members must match with N
  • the cluster must not be running in maintenance mode
  • all replicas must be streaming from the master with small lag
  1. Prepare data_new by running initdb with matching parameters
  2. Run pg_upgrade --check. If it fails - abort and do a cleanup.
  3. Drop objects from the database which could be incompatible with the new version (e.g. pg_stat_statements wrapper, postgres_log fdw)
  4. enable maintenance mode (patronictl pause --wait)
  5. Do a clean shutdown of the postgres
  6. Get the latest checkpoint location from pg_controldata
  7. Wait for replicas to receive/apply latest checkpoint location
  8. Start rsyncd, listening on port 5432 (we know that it is exposed!)
  9. If all previous steps succeeded call pg_upgrade -k
  10. If pg_upgrade succeeded we reached the point of no return!
    If it failed we need to rollback previous steps.
  11. Rename data directories data -> data_old and data_new -> data
  12. Update configuration files (postgres.yml and wal-e envdir).
  13. Call CHECKPOINT on replicas (predictable shutdown time).
  14. Trigger rsync on replicas (COPY (SELECT) TO PROGRAM)
  15. Wait for replicas rsync to complete. Feedback status is generated by post-xfer exec script. Wait timeout 300 seconds.
  16. Stop rsyncd
  17. Remove the initialize key from DCS (it contains old sysid)
  18. Restart Patroni on the master with the new configuration
  19. Start the local postgres up as the master by calling REST API POST /restart
  20. Memorize and reset custom statistics targets.
  21. Start the ANALYZE in stages in a separate thread.
  22. Wait until Patroni on replicas is restarted.
  23. Disable maintenance mode (patronictl resume)
  24. Wait until analyze in stages finishes.
  25. Restore custom statistics targets and analyze these tables
  26. Call post_bootstrap script (restore dropped objects)
  27. Remove data_old
  28. Trigger creation of the new backup

Rollback:

  1. Stop rsyncd if it is running
  2. Disable maintenance mode (patronictl resume)
  3. Remove data_new if it exists

Replicas upgrade with rsync

There are many options on how to call the script:

  1. Start a separate REST API for such maintenance tasks (requires opening a new port and some changes in infrastructure)
  2. Allow pod/exec (works only on K8s, not desirable)
  3. Use COPY TO PROGRAM "hack"

The COPY TO PROGRAM seems to be low-hanging fruit. It requires only postgres to be up and running, which is in turn already one of the requirements for the in-place upgrade to start. When being started, the script does some sanity checks based on input parameters.

There are three parameters required: new_version, primary_ip, and PID.

  • new_version - the version we are upgrading to
  • primary_ip - where to rsync from
  • PID - the pid of postgres backend that executed COPY TO PROGRAM.
    The script must wait until the backend will exit before continuing. Also the script must check that its parent (maybe grandparent?) process has the right PID which is matching with the argument.

There are some problems with COPY TO PROGRAM approach. The Patroni and therefore PostgreSQL environment is cleared before start. As a result, the script started by postgres backend will not see for example $KUBERNETES_SERVICE_HOST and won't be able to work with DCS in all cases.

Once made sure that the client backend is gone the script will:

  1. Remember the old sysid
  2. Do a clean shutdown of the postgres
  3. Rename data directory data -> data_old
  4. Update configuration file (postgres.yaml and wal-e envdir). We do it before rsync because the initialize key could be cleaned up right after rsync was completed and Patroni will exit!
  5. Call rsync. If it failed, rename data directory back.
  6. Now we need to wait for the fact that the initialize key is removed from DCS. Since we know that it happens before the postgres on the master is started we will try to connect to the master via replication protocol and check the sysid.
  7. Restart Patroni.
  8. Remove data_old

In addition to that, implement integration tests. Mostly they are testing happy-case scenarios, like:

  1. Successful in-place upgrade from 9.6 to 10
  2. Successful in-place upgrade from 10 to 12
  3. Major upgrade after the custom bootstrap with wal-g
  4. Major upgrade after the custom bootstrap with pg_basebackup
  5. Bootstrap of a new replica with wal-g

Also tests are covering a few unhappy cases, like: in-place upgrade doesn't start if pre-conditions are not meet.

Alexander Kukushkin added 24 commits August 21, 2020 07:43
* make configure_spilo to ignor PGVERSION when generating postgres.yml
if it doesn't match $PGDATA/PG_VERSION
* move functions used across different modules to the spilo_commons
* add rsync to Dockerfile
* implemented inplace_upgrade script (WIP)

How to tigger upgrade? This is a two step process:
1. Update configuration (version) and rotate all pods. On start
configure_spilo will notice version mismatch and start the old version.
2. When all pods are rotated exec into the master container and call
`python3 /scripts/inplace_upgrade.py N`, where N the capacity of the
PostgreSQL cluster.

What `inplace_upgrade.py` does:
1. Safety checks:
  * new version must be bigger then the old one
  * current node must be running as a master with the leader lock
  * the current number of members must match with `N`
  * the cluster must not be running in maintenance mode
  * all replicas must be streaming from the master with small lag
2. Prepare `data_new` by running `initdb` with matching parameters
3. Drop objects from the database which could be incompatible with the
new version (e.g. pg_stat_statements wrapper, postgres_log fdw)
4. Memorize and reset custom statistics targets (not yet implemented)
5. enable maintenance mode (patronictl pause --wait)
6. Do a clean shutdown of the postgre
7. Get the latest checkpoint location from pg_controldata
8. Wait for replicas to receive/apply latest checkpoint location
9. Start rsyncd, listening on port 5432 (we know that it is exposed!)
10. If all previous steps succeeded call `pg_upgrade`
11. If pg_upgrade succeeded we reached the point of no return! If it
failed we need to rollback previous steps.
12. Rename data directories `data -> data_old` and `data_new -> data`
13. Update configuration file (postgres.yaml and wal-e envdir)
14. Call CHECKPOINT on replicas (not yet implemented)
15. Trigger rsync on replicas (COPY (SELECT) TO PROGRAM)
16. Wait for replicas rsync to complete (feedback status is generated
by `post-xfer exec` script. Wait timeout 300 seconds.
17. Stop rsyncd
18. Remove the initialize key from DCS (it contains old sysid)
19. Restart Patroni on the master with the new configuration
20. Start the master up by calling REST API `POST /restart`
21. Disable maintenance mode (patronictl resume)
22. Run vacuumdb --analyze-in-stages
23. Restore custom statistics targets and analyze these tables
24. Call post_bootstrap script (restore dropped objects)
25. Remove `data_old`

Rollback:
1. Stop rsyncd if it is running
2. Disable maintenance mode (patronictl resume)
3. Remove `data_new` if it exists

Replicas upgrade with rsync
---------------------------

There are many options on how to call the script:
1. Start a separate REST API for such maintenance tasks (requires
opening a new port and some changes in infrastructure)
2. Allow `pod/exec` (works only on K8s, not desireable)
3. Use COPY TO PROGRAM "hack"

The `COPY TO PROGRAM` seems to be a low-hanging fruit. It requires only
postgres to be up and running, which is in turn already one of the
requirement for upgrade to start. When being started, the script does
some sanity checks based on input parameters.

There are three parameters required: new_version, primary_ip, and PID.
* new_version - the version we are upgrading to
* primary_ip - where to rsync from
* PID - the pid of postgres backend that executed COPY TO PROGRAM. The
script must wait until backend will exit before continuing. Also the
script must check that its parent (maybe grandparent?) process has the
right PID which is matching with the argument.

There are some problems with `COPY TO PROGRAM` approach. The Patroni and
therefore PostgreSQL environment is cleared before start. As a result,
the script started by postgres backend will not see for example
$KUBERNETES_SERVICE_HOST and wont be able to work with DCS in all cases.

Once made sure that the client backend is gone the script will:
1. Remember the old sysid
2. Do a clean shutdown of the postgres
3. Rename data directory `data -> data_old`
4. Update configuration file (postgres.yaml and wal-e envdir). We do it
before rsync because initialize key could be cleaned up right after
rsync was completed and Patroni will exit!
5. Call rsync. If it failed, rename data directory back.
6. Now we need to wait for the fact that the initialize key is removed
from DCS. Since we know that it happens before the postgres on the
master is started we will try to connect to the master via replication
protocol and check the sysid.
7. Restart Patroni.
8. Remove `data_old`
* handle custom statistics target (speed up analyze)
* remove more incompatible objects (pg_stat_statements)
* truncate unlogged tables (should we do that?)
* update extensions after upgrade
* exclude pg_wal/* from rsync
* CHECKPOINT on replica before shutdown to make rsync time predictable
* Unpause when we know that Patroni on replicas was restarted
* run pg_upgrade --check after initdb
* wal-e 1.1.1
* wal-g 0.2.17
* timescaledb 1.7.3
* refactor DCS configuration (close #468)
@CyberDem0n CyberDem0n changed the title [WiP] in place upgrade [WiP] in place major upgrade Sep 8, 2020
@CyberDem0n CyberDem0n changed the title [WiP] in place major upgrade in place major upgrade Sep 24, 2020
backup = choose_backup(backup_list, recovery_target_time)
if backup:
return backup, (name if value != old_value else None)
else: # We assume that the LATEST backup will be for the biggest postgres version!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to what does the "biggest" refer in this comment ? Does it meant the LATEST backup has to be for PG v 12 when spilo-13 is deployed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have the major version of the source cluster specified explicitly we try all postgres versions starting from the biggest. I.e., get_wale_environments() function yields tuples:

  • ('WALE_S3_PREFIX', 's3://$bucket/spilo/cluster-name/$uid/wal/12')
  • ('WALE_S3_PREFIX', 's3://$bucket/spilo/cluster-name/$uid/wal/11')
  • ('WALE_S3_PREFIX', 's3://$bucket/spilo/cluster-name/$uid/wal/10')
    and so on.

For every prefix we call wal-e backup-list and trying to find the backup suitable for the given recovery_target_time.
If the recovery_target_time is not specified we just pick the LATEST backup.

But! It might be, that under the s3://$bucket/spilo/cluster-name/$uid/wal/ path there are backups for 12 and lets say 10. The correct way of selecting the latest backup between two (or more) different versions would be listing backups for all versions and choosing between them. This is too much work with too few benefits. Therefore I made an assumption if the backup for version 12 is there to not continue with other versions, because most likely the backup for 10 would be older.

postgres-appliance/tests/test_spilo.sh Show resolved Hide resolved
postgres-appliance/scripts/spilo_commons.py Show resolved Hide resolved
postgres-appliance/scripts/configure_spilo.py Show resolved Hide resolved
@CyberDem0n CyberDem0n changed the base branch from master to feature/pg13 September 29, 2020 14:26
@Jan-M
Copy link
Member

Jan-M commented Nov 30, 2020

👍

1 similar comment
@CyberDem0n
Copy link
Contributor Author

👍

@CyberDem0n CyberDem0n merged commit 852c17f into feature/pg13 Nov 30, 2020
@CyberDem0n CyberDem0n deleted the feature/in-place-upgrade branch November 30, 2020 10:28
@anasanjaria
Copy link
Contributor

anasanjaria commented Mar 1, 2024

@CyberDem0n
I have a question. Why this script does not enforce "Write lock" before starting upgrade process?

My story

I forgot to enforce write lock on minor version upgrade & ended up corrupted indices. I need to manually resolve those corruptions and this SO post [1] was very helpful

[1] https://stackoverflow.com/a/45317850/665905

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants