Summary:
This PR introduces support for PG-15 upgrades on Kubernetes (K8s) universes, catering to both auth-enabled and auth-disabled configurations.
The upgrade process involves a series of common steps applicable to all universes, along with additional steps specific to auth-enabled setups. Key improvements and workflows are outlined below:
Common Upgrade Steps (Auth Enabled/Disabled):
- Set the ysql_yb_major_version_upgrade_compatibility flag to 11 on all tservers.
- Execute pg_upgrade check.
- Upgrade only master nodes.
- Re-run pg_upgrade check.
- Perform catalog upgrade.
- Upgrade tserver nodes
Additional Steps for Auth-Enabled Universes:
- To facilitate catalog upgrades, YB_DB requires the upgrade username and password to be stored in a .pgpass file within the node's home directory.
- YBA creates the upgrade user before master upgrades and generates the .pgpass file exclusively on the master leader node post-master upgrade (to avoid file loss during pod restarts).
- Post catalog upgrade, YBA removes the .pgpass file but retains the user until rollback/finalize completion, as DDLs are blocked during the upgrade.
- In case of task failure, YBA ensures the .pgpass file is removed from the master leader node.
Retry-Ability Enhancements:
- Upgrades, rollbacks, and finalize tasks are now retryable for both VM and K8s universes.
- If an upgrade superuser is required, YBA retries from the state where any master is upgraded until catalog upgrade completion. If necessary, YBA rolls back the catalog upgrade to re-enable DDLs for updating superuser details.
PG Upgrade TServer Check (K8s-Specific):
- Unlike VMs, where both old and new DB versions coexist on the same node before upgrade, K8s require the DB package to be copied onto the tserver pod for pg_upgrade checks.
- YBA downloads the package to a temporary directory on the pod, performs the check, and cleans up the downloaded package afterward.
Additional Updates:
- Updated master default flags to include YSQL_HBA_CONF_CSV and YSQL_ENABLE_AUTH, enabling masters to verify if authentication is enabled.
- Enhanced KubernetesPartitions logic to avoid unnecessary updates to tservers or masters unless changes are detected, ensuring only the required servers are upgraded.
- **Blocked non-rolling software upgrades, as they were never officially supported, and rolling upgrades were performed even when users selected non-rolling options.**
Test Plan:
Tested manually by performing upgrade/rollback/finalize on universe with boht auth and non-auth disabled universe (VM/K8s).
Additionally, the retry-ability and rollback functionality were rigorously tested by simulating failures at the following critical stages:
- Partial Master Upgrades: When only a subset of master nodes were upgraded.
- Complete Master Upgrades: After all master nodes were successfully upgraded.
- Catalog Upgrade Failures: Scenarios where the catalog upgrade process encountered errors.
- Post-Catalog Upgrade: After the catalog upgrade was completed successfully.
- Partial TServer Upgrades: When only a portion of tserver nodes were upgraded.
- Complete TServer Upgrades: After all tserver nodes were upgraded.
Will write the required unit/local provider test in a separate diff.
Reviewers: #yba-api-review, sneelakantan, sanketh, anijhawan, vkumar, hsunder
Reviewed By: #yba-api-review, sanketh, anijhawan
Subscribers: yugaware
Differential Revision: https://phorge.dev.yugabyte.com/D41637