New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docdb] Master catalog corruption: Backup failed with RuntimeException: ERROR: function yb_catalog_version() does not exist #18507
Comments
On a recent customer cluster we saw the repro of this bug.
We can see that V1, V2, V3 and V4 migration scripts were missing. The function yb_catalog_version is created by V1 migration script. I have found the bug, the relevant code is here:
This function tries to determine which version we should start from, for example, it checks the existence of the table pg_yb_catalog_version to decide whether we need to skip V1__ migration script or not. Let’s look at V1__ script:
This script has two blocks, the first block creates the table pg_yb_catalog_version if it does not exist. The second block creates the function yb_catalog_version. If in an old cluster where pg_yb_catalog_version already existed and is not empty, then according to the above C++ code we will skip V1 migration script entirely, therefore not executing the second block that creates the function yb_catalog_version.
Let’s look at what can happen when an old cluster 2.4.x is upgraded:
If in a 2.4.x cluster, pg_yb_catalog_version table isn’t empty, pg_tablegroup table exists, pg_stat_statements table exists, and function jsonb_path_query exists, then we will skip V1, V2, V3, V4 and that’s the symptom we see on the customer’s cluster. I have started yugabyted on a 2.4 cluster, and all the above are true:
That’s why we skipped the first 4 migration scripts. Leading to this bug. Had the customer cluster was created on a 2.2 release, we will not see this bug because none of the above is true:
|
Further analysis indicates that if the cluster was created on a release that is < 2.4 (e.g., 2.2 or earlier), or > 2.6 (e.g., 2.8 or beyond), we will not have this bug. If < 2.4, then none of those is true so the upgrade will go through V1, V2, ..., without missing any. If > 2.6, then all those hard-coded (V1 to V8) in C++ are true so we should be skipping all of V1 to V8. |
According to the comment, those 8 (V1 to V8) migration scripts represent catalog features released before the ysql_upgrade feature landed. So if we see V1, V2, V3, V4 are not applied, it logically means that they represent features already present in the existing cluster and therefore they should be skipped. The only bug is that the second block in V1 which creates the function |
Summary: A customer reported that backup failed and the reason is that the function `yb_catalog_version()` does not exist. This function is introduced in the migration script `V1__3979__pg_yb_catalog_version.sql`. After debugging I found there is a code bug in the YSQL upgrade code. Specifically: ``` Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) { int major_version = 0; // Helper macro removing boilerplate. if (VERIFY_RESULT(oneliner_with_result)) { \ ++major_version; \ } else { \ return major_version; \ } // V1: #3979 introducing pg_yb_catalog_version table. INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version")) ``` This function tries to determine which version we should start from. For example, it checks the existence of a non-empty table `pg_yb_catalog_version` to decide whether we need to skip V1 migration script or not. The V1 migration script has two blocks: * The first block introduces the table `pg_catalog.pg_yb_catalog_version` * The second block introduces the function `yb_catalog_version()` In a 2.4.x or 2.6.x cluster, `pg_catalog.pg_yb_catalog_version` already exists but `yb_catalog_version()` does not. As a result, the above code will skip V1 migration script and that's why function `yb_catalog_version` isn't introduced. I have verified that any older release < 2.4 (e.g., 2.2.7.0), or any newer release > 2.6 (e.g., 2.8.0.0) do not have this bug. In an older release < 2.4, `pg_catalog.pg_yb_catalog_version` does not exist so the above code will not skip V1 migration script. In a newer release > 2.6, `yb_catalog_version()` also exists so it is correct to skip V1 migration script. The customer cluster was manually fixed by reapplying V1 migration script. I made a fix such that after the normal migration has completed, check whether `yb_catalog_version()` exists or not. If it is missing then apply V1 migration script. Test Plan: (1) ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsByNonSuperuser' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsCreatesThemEverywhere' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsDontFireTriggers' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsAfterFailure' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#sharedRelsIndexesWork' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemViewsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#viewReloptionsAreFilteredOnReplace' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#replacingViewKeepsCacheConsistent' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#insertOnConflictWithOidsWorks' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#dmlsUpdatePgCache' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#pinnedObjectsCacheIsUpdated' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotent' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotentSingleConn' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migratingIsEquivalentToReinitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationInGeoPartitionedSetup' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationFilenameComment' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#invalidUpgradeActions' (2) Download old release packages such as yugabyte-2.2.7.0 yugabyte-2.4.0.0 yugabyte-2.4.8.0 yugabyte-2.6.0.0, yugabyte-2.6.20.0 yugabyte-2.8.0.0 yugabyte-2.8.12.0 Run the following test manually against each of the old releases. For example: ``` ./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 1 --master_flags initial_sys_catalog_snapshot_path=$HOME/tmp/yugabyte-2.4.0.0/share/initial_sys_catalog_snapshot ./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql ``` Look at yb-tserver.INFO ``` W0118 17:16:55.246408 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template1 I0118 17:16:55.246526 11555 ysql_upgrade.cc:473] template1: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.246542 11555 ysql_upgrade.cc:481] Found pg_global in migration file V1__3979__pg_yb_catalog_version.sql when applying to template1 I0118 17:16:55.341861 11555 ysql_upgrade.cc:517] template1: migration successfully applied without version bump W0118 17:16:55.478569 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template0 I0118 17:16:55.478693 11555 ysql_upgrade.cc:473] template0: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.563812 11555 ysql_upgrade.cc:517] template0: migration successfully applied without version bump W0118 17:16:55.700487 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in postgres I0118 17:16:55.700606 11555 ysql_upgrade.cc:473] postgres: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.783318 11555 ysql_upgrade.cc:517] postgres: migration successfully applied without version bump W0118 17:16:55.802189 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in yugabyte I0118 17:16:55.802299 11555 ysql_upgrade.cc:473] yugabyte: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.826416 11555 ysql_upgrade.cc:517] yugabyte: migration successfully applied without version bump W0118 17:16:55.845325 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in system_platform I0118 17:16:55.845443 11555 ysql_upgrade.cc:473] system_platform: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.865423 11555 ysql_upgrade.cc:517] system_platform: migration successfully applied without version bump ``` Rerun the upgrade command ``` ./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql ``` Look at the yb-tserver.INFO again: ``` I0118 17:34:51.982327 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template1 I0118 17:34:51.985905 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template0 I0118 17:34:51.989349 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in postgres I0118 17:34:51.992571 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in yugabyte I0118 17:34:51.997953 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in system_platform ``` Reviewers: jason, tverona Reviewed By: jason Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D31793
…exist Summary: A customer reported that backup failed and the reason is that the function `yb_catalog_version()` does not exist. This function is introduced in the migration script `V1__3979__pg_yb_catalog_version.sql`. After debugging I found there is a code bug in the YSQL upgrade code. Specifically: ``` Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) { int major_version = 0; // Helper macro removing boilerplate. if (VERIFY_RESULT(oneliner_with_result)) { \ ++major_version; \ } else { \ return major_version; \ } // V1: #3979 introducing pg_yb_catalog_version table. INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version")) ``` This function tries to determine which version we should start from. For example, it checks the existence of a non-empty table `pg_yb_catalog_version` to decide whether we need to skip V1 migration script or not. The V1 migration script has two blocks: * The first block introduces the table `pg_catalog.pg_yb_catalog_version` * The second block introduces the function `yb_catalog_version()` In a 2.4.x or 2.6.x cluster, `pg_catalog.pg_yb_catalog_version` already exists but `yb_catalog_version()` does not. As a result, the above code will skip V1 migration script and that's why function `yb_catalog_version` isn't introduced. I have verified that any older release < 2.4 (e.g., 2.2.7.0), or any newer release > 2.6 (e.g., 2.8.0.0) do not have this bug. In an older release < 2.4, `pg_catalog.pg_yb_catalog_version` does not exist so the above code will not skip V1 migration script. In a newer release > 2.6, `yb_catalog_version()` also exists so it is correct to skip V1 migration script. The customer cluster was manually fixed by reapplying V1 migration script. I made a fix such that after the normal migration has completed, check whether `yb_catalog_version()` exists or not. If it is missing then apply V1 migration script. Original commit: 55aac07 / D31793 Jira: DB-7466 Test Plan: (1) ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsByNonSuperuser' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsCreatesThemEverywhere' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsDontFireTriggers' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsAfterFailure' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#sharedRelsIndexesWork' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemViewsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#viewReloptionsAreFilteredOnReplace' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#replacingViewKeepsCacheConsistent' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#insertOnConflictWithOidsWorks' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#dmlsUpdatePgCache' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#pinnedObjectsCacheIsUpdated' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotent' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotentSingleConn' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migratingIsEquivalentToReinitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationInGeoPartitionedSetup' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationFilenameComment' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#invalidUpgradeActions' (2) Download old release packages such as yugabyte-2.2.7.0 yugabyte-2.4.0.0 yugabyte-2.4.8.0 yugabyte-2.6.0.0, yugabyte-2.6.20.0 yugabyte-2.8.0.0 yugabyte-2.8.12.0 Run the following test manually against each of the old releases. For example: ``` ./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 1 --master_flags initial_sys_catalog_snapshot_path=$HOME/tmp/yugabyte-2.4.0.0/share/initial_sys_catalog_snapshot ./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql ``` Look at yb-tserver.INFO ``` W0118 17:16:55.246408 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template1 I0118 17:16:55.246526 11555 ysql_upgrade.cc:473] template1: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.246542 11555 ysql_upgrade.cc:481] Found pg_global in migration file V1__3979__pg_yb_catalog_version.sql when applying to template1 I0118 17:16:55.341861 11555 ysql_upgrade.cc:517] template1: migration successfully applied without version bump W0118 17:16:55.478569 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template0 I0118 17:16:55.478693 11555 ysql_upgrade.cc:473] template0: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.563812 11555 ysql_upgrade.cc:517] template0: migration successfully applied without version bump W0118 17:16:55.700487 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in postgres I0118 17:16:55.700606 11555 ysql_upgrade.cc:473] postgres: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.783318 11555 ysql_upgrade.cc:517] postgres: migration successfully applied without version bump W0118 17:16:55.802189 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in yugabyte I0118 17:16:55.802299 11555 ysql_upgrade.cc:473] yugabyte: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.826416 11555 ysql_upgrade.cc:517] yugabyte: migration successfully applied without version bump W0118 17:16:55.845325 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in system_platform I0118 17:16:55.845443 11555 ysql_upgrade.cc:473] system_platform: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.865423 11555 ysql_upgrade.cc:517] system_platform: migration successfully applied without version bump ``` Rerun the upgrade command ``` ./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql ``` Look at the yb-tserver.INFO again: ``` I0118 17:34:51.982327 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template1 I0118 17:34:51.985905 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template0 I0118 17:34:51.989349 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in postgres I0118 17:34:51.992571 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in yugabyte I0118 17:34:51.997953 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in system_platform ``` Reviewers: jason, tverona Reviewed By: jason Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31898
…exist Summary: A customer reported that backup failed and the reason is that the function `yb_catalog_version()` does not exist. This function is introduced in the migration script `V1__3979__pg_yb_catalog_version.sql`. After debugging I found there is a code bug in the YSQL upgrade code. Specifically: ``` Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) { int major_version = 0; // Helper macro removing boilerplate. if (VERIFY_RESULT(oneliner_with_result)) { \ ++major_version; \ } else { \ return major_version; \ } // V1: #3979 introducing pg_yb_catalog_version table. INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version")) ``` This function tries to determine which version we should start from. For example, it checks the existence of a non-empty table `pg_yb_catalog_version` to decide whether we need to skip V1 migration script or not. The V1 migration script has two blocks: * The first block introduces the table `pg_catalog.pg_yb_catalog_version` * The second block introduces the function `yb_catalog_version()` In a 2.4.x or 2.6.x cluster, `pg_catalog.pg_yb_catalog_version` already exists but `yb_catalog_version()` does not. As a result, the above code will skip V1 migration script and that's why function `yb_catalog_version` isn't introduced. I have verified that any older release < 2.4 (e.g., 2.2.7.0), or any newer release > 2.6 (e.g., 2.8.0.0) do not have this bug. In an older release < 2.4, `pg_catalog.pg_yb_catalog_version` does not exist so the above code will not skip V1 migration script. In a newer release > 2.6, `yb_catalog_version()` also exists so it is correct to skip V1 migration script. The customer cluster was manually fixed by reapplying V1 migration script. I made a fix such that after the normal migration has completed, check whether `yb_catalog_version()` exists or not. If it is missing then apply V1 migration script. Original commit: 55aac07 / D31793 Jira: DB-7466 Test Plan: (1) ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsByNonSuperuser' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsCreatesThemEverywhere' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsDontFireTriggers' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsAfterFailure' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#sharedRelsIndexesWork' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemViewsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#viewReloptionsAreFilteredOnReplace' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#replacingViewKeepsCacheConsistent' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#insertOnConflictWithOidsWorks' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#dmlsUpdatePgCache' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#pinnedObjectsCacheIsUpdated' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotent' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotentSingleConn' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migratingIsEquivalentToReinitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationInGeoPartitionedSetup' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationFilenameComment' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#invalidUpgradeActions' (2) Download old release packages such as yugabyte-2.2.7.0 yugabyte-2.4.0.0 yugabyte-2.4.8.0 yugabyte-2.6.0.0, yugabyte-2.6.20.0 yugabyte-2.8.0.0 yugabyte-2.8.12.0 Run the following test manually against each of the old releases. For example: ``` ./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 1 --master_flags initial_sys_catalog_snapshot_path=$HOME/tmp/yugabyte-2.4.0.0/share/initial_sys_catalog_snapshot ./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql ``` Look at yb-tserver.INFO ``` W0118 17:16:55.246408 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template1 I0118 17:16:55.246526 11555 ysql_upgrade.cc:473] template1: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.246542 11555 ysql_upgrade.cc:481] Found pg_global in migration file V1__3979__pg_yb_catalog_version.sql when applying to template1 I0118 17:16:55.341861 11555 ysql_upgrade.cc:517] template1: migration successfully applied without version bump W0118 17:16:55.478569 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template0 I0118 17:16:55.478693 11555 ysql_upgrade.cc:473] template0: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.563812 11555 ysql_upgrade.cc:517] template0: migration successfully applied without version bump W0118 17:16:55.700487 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in postgres I0118 17:16:55.700606 11555 ysql_upgrade.cc:473] postgres: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.783318 11555 ysql_upgrade.cc:517] postgres: migration successfully applied without version bump W0118 17:16:55.802189 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in yugabyte I0118 17:16:55.802299 11555 ysql_upgrade.cc:473] yugabyte: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.826416 11555 ysql_upgrade.cc:517] yugabyte: migration successfully applied without version bump W0118 17:16:55.845325 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in system_platform I0118 17:16:55.845443 11555 ysql_upgrade.cc:473] system_platform: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.865423 11555 ysql_upgrade.cc:517] system_platform: migration successfully applied without version bump ``` Rerun the upgrade command ``` ./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql ``` Look at the yb-tserver.INFO again: ``` I0118 17:34:51.982327 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template1 I0118 17:34:51.985905 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template0 I0118 17:34:51.989349 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in postgres I0118 17:34:51.992571 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in yugabyte I0118 17:34:51.997953 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in system_platform ``` Reviewers: jason, tverona Reviewed By: jason Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31929
…exist Summary: A customer reported that backup failed and the reason is that the function `yb_catalog_version()` does not exist. This function is introduced in the migration script `V1__3979__pg_yb_catalog_version.sql`. After debugging I found there is a code bug in the YSQL upgrade code. Specifically: ``` Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) { int major_version = 0; // Helper macro removing boilerplate. if (VERIFY_RESULT(oneliner_with_result)) { \ ++major_version; \ } else { \ return major_version; \ } // V1: #3979 introducing pg_yb_catalog_version table. INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version")) ``` This function tries to determine which version we should start from. For example, it checks the existence of a non-empty table `pg_yb_catalog_version` to decide whether we need to skip V1 migration script or not. The V1 migration script has two blocks: * The first block introduces the table `pg_catalog.pg_yb_catalog_version` * The second block introduces the function `yb_catalog_version()` In a 2.4.x or 2.6.x cluster, `pg_catalog.pg_yb_catalog_version` already exists but `yb_catalog_version()` does not. As a result, the above code will skip V1 migration script and that's why function `yb_catalog_version` isn't introduced. I have verified that any older release < 2.4 (e.g., 2.2.7.0), or any newer release > 2.6 (e.g., 2.8.0.0) do not have this bug. In an older release < 2.4, `pg_catalog.pg_yb_catalog_version` does not exist so the above code will not skip V1 migration script. In a newer release > 2.6, `yb_catalog_version()` also exists so it is correct to skip V1 migration script. The customer cluster was manually fixed by reapplying V1 migration script. I made a fix such that after the normal migration has completed, check whether `yb_catalog_version()` exists or not. If it is missing then apply V1 migration script. Original commit: 55aac07 / D31793 Jira: DB-7466 Test Plan: (1) ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsByNonSuperuser' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsCreatesThemEverywhere' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsDontFireTriggers' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsAfterFailure' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#sharedRelsIndexesWork' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemViewsIsLikeInitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#viewReloptionsAreFilteredOnReplace' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#replacingViewKeepsCacheConsistent' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#insertOnConflictWithOidsWorks' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#dmlsUpdatePgCache' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#pinnedObjectsCacheIsUpdated' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotent' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotentSingleConn' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migratingIsEquivalentToReinitdb' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationInGeoPartitionedSetup' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationFilenameComment' ./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#invalidUpgradeActions' (2) Download old release packages such as yugabyte-2.2.7.0 yugabyte-2.4.0.0 yugabyte-2.4.8.0 yugabyte-2.6.0.0, yugabyte-2.6.20.0 yugabyte-2.8.0.0 yugabyte-2.8.12.0 Run the following test manually against each of the old releases. For example: ``` ./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 1 --master_flags initial_sys_catalog_snapshot_path=$HOME/tmp/yugabyte-2.4.0.0/share/initial_sys_catalog_snapshot ./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql ``` Look at yb-tserver.INFO ``` W0118 17:16:55.246408 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template1 I0118 17:16:55.246526 11555 ysql_upgrade.cc:473] template1: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.246542 11555 ysql_upgrade.cc:481] Found pg_global in migration file V1__3979__pg_yb_catalog_version.sql when applying to template1 I0118 17:16:55.341861 11555 ysql_upgrade.cc:517] template1: migration successfully applied without version bump W0118 17:16:55.478569 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template0 I0118 17:16:55.478693 11555 ysql_upgrade.cc:473] template0: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.563812 11555 ysql_upgrade.cc:517] template0: migration successfully applied without version bump W0118 17:16:55.700487 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in postgres I0118 17:16:55.700606 11555 ysql_upgrade.cc:473] postgres: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.783318 11555 ysql_upgrade.cc:517] postgres: migration successfully applied without version bump W0118 17:16:55.802189 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in yugabyte I0118 17:16:55.802299 11555 ysql_upgrade.cc:473] yugabyte: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.826416 11555 ysql_upgrade.cc:517] yugabyte: migration successfully applied without version bump W0118 17:16:55.845325 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in system_platform I0118 17:16:55.845443 11555 ysql_upgrade.cc:473] system_platform: applying migration 'V1__3979__pg_yb_catalog_version.sql' I0118 17:16:55.865423 11555 ysql_upgrade.cc:517] system_platform: migration successfully applied without version bump ``` Rerun the upgrade command ``` ./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql ``` Look at the yb-tserver.INFO again: ``` I0118 17:34:51.982327 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template1 I0118 17:34:51.985905 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template0 I0118 17:34:51.989349 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in postgres I0118 17:34:51.992571 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in yugabyte I0118 17:34:51.997953 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in system_platform ``` Reviewers: jason, tverona Reviewed By: jason Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31939
Jira Link: DB-7466
The text was updated successfully, but these errors were encountered: