Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docdb] Master catalog corruption: Backup failed with RuntimeException: ERROR: function yb_catalog_version() does not exist #18507

Closed
yugabyte-ci opened this issue Aug 2, 2023 · 3 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) jira-originated kind/bug This issue is a bug priority/high High Priority

Comments

@yugabyte-ci
Copy link
Contributor

yugabyte-ci commented Aug 2, 2023

Jira Link: DB-7466

@yugabyte-ci yugabyte-ci added area/docdb YugabyteDB core features jira-originated kind/bug This issue is a bug priority/high High Priority status/awaiting-triage Issue awaiting triage labels Aug 2, 2023
@yugabyte-ci yugabyte-ci added area/ysql Yugabyte SQL (YSQL) and removed area/docdb YugabyteDB core features labels Aug 15, 2023
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Nov 13, 2023
@yugabyte-ci yugabyte-ci assigned myang2021 and unassigned tverona1 Nov 13, 2023
@myang2021
Copy link
Contributor

On a recent customer cluster we saw the repro of this bug.

# select * from pg_yb_migration;
 major | minor |                              name                               | time_applied
-------+-------+-----------------------------------------------------------------+---------------
     9 |     1 | V9.1__10204__yb_fdw_role.sql                                    | 1651092957096
    20 |     4 | V20.4__10595__pg_stat_progress_create_index.sql                 | 1681393723619
    24 |     1 | V24.1__14939__yb_profiles.sql                                   | 1695230137160
    31 |     0 | V31__10595__pg_stat_progress_create_index.sql                   | 1701788305793
    33 |     0 | V33__14209__yb_terminated_queries.sql                           | 1701788483926
    29 |     0 | V29__14562__update_yb_is_database_colocated.sql                 | 1701788212076
    24 |     0 | V24__13609__yb_geo_distribution_helper_functions.sql            | 1695230135673
    22 |     0 | V22__11906__yb_servers_function_uuid.sql                        | 1695230133771
    18 |     0 | V18__11944__yb_servers_function_proretset_true.sql              | 1681393671899
    20 |     3 | V20.3__14939__yb_profiles.sql                                   | 1681393720468
     6 |     0 | V6__7879__yb_servers_function.sql                               | 1651092956202
    23 |     0 | V23__12919__query_terminations.sql                              | 1695230134556
    21 |     0 | V21__4873__yb_get_range_split_clause.sql                        | 1695230129067
    26 |     0 | V26__14066__yb_pg_backend_memory_functions.sql                  | 1701788200911
    30 |     0 | V30__14939__yb_profiles.sql                                     | 1701788216066
    35 |     0 | V35__13568__pg_get_backend_memory_contexts.sql                  | 1701788654795
    15 |     0 | V15__1979__text_search_configuration.sql                        | 1674075205047
     8 |     0 | V8__7850__ybgin_access_method.sql                               | 1651092956888
    35 |     2 | V35.2__17700__yb_percentile.sql                                 | 1701788832455
    20 |     1 | V20.1__12919__query_terminations.sql                            | 1681393717999
    19 |     0 | V19__6560__pg_collation_icu_70.sql                              | 1681393712694
    17 |     0 | V17__7378__colocation_id.sql                                    | 1674075205677
    25 |     0 | V25__11632__binary_upgrade_set_next_tablegroup_oid.sql          | 1701788196812
    20 |     2 | V20.2__13609__yb_geo_distribution_helper_functions.sql          | 1681393718374
    13 |     0 | V13__9665__yb_obj_properties_functions.sql                      | 1674075197302
     7 |     0 | V7__8719__yb_hash_code_function.sql                             | 1651092956363
     9 |     0 | V9__10150__yb_extension_role.sql                                | 1651092957011
    11 |     0 | V11__10204__yb_fdw_role.sql                                     | 1674075151767
    34 |     0 | V34__15085__pg_stat_activity_mem_cols.sql                       | 1701788573749
     5 |     0 | V5__6509__yb_mem_functions.sql                                  | 1651092955994
    10 |     0 | V10__9178__yb_is_local_table_function.sql                       | 1674075151640
    24 |     2 | V24.2__10595__pg_stat_progress_create_index.sql                 | 1695230195601
    14 |     0 | V14__10190__pg_yb_tablegroup.sql                                | 1674075201815
    20 |     0 | V20__11928__stat_progress_copy.sql                              | 1681393716145
    28 |     0 | V28__13762__yb_transaction_priority_and_effective_isolation_lev | 1701788207278
    27 |     0 | V27__14704__yb_heap_stats_function.sql                          | 1701788203233
    35 |     1 | V35.1__14445__alter_pg_stat_statements.sql                      | 1701788778263
    16 |     0 | V16__11551__yb_db_admin_role.sql                                | 1674075205267
    12 |     0 | V12__1127__pg_collation.sql                                     | 1674075196954
    32 |     0 | V32__7376__pg_stat_activity_catalog_version.sql                 | 1701788390885
     4 |     0 | <baseline>                                                      |
(41 rows)

We can see that V1, V2, V3 and V4 migration scripts were missing. The function yb_catalog_version is created by V1 migration script.

I have found the bug, the relevant code is here:

Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) {
  int major_version = 0;

  // Helper macro removing boilerplate.
#define INCREMENT_VERSION_OR_RETURN_IT(oneliner_with_result) \
    if (VERIFY_RESULT(oneliner_with_result)) { \
      ++major_version; \
    } else { \
      return major_version; \
    }

  // V1: #3979 introducing pg_yb_catalog_version table.
  INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version"))

This function tries to determine which version we should start from, for example, it checks the existence of the table pg_yb_catalog_version to decide whether we need to skip V1__ migration script or not. Let’s look at V1__ script:

[myang@dev-server-myang ysql_migrations]$ cat V1__3979__pg_yb_catalog_version.sql 
BEGIN;
  CREATE TABLE IF NOT EXISTS pg_catalog.pg_yb_catalog_version (
    db_oid                oid    NOT NULL,
    current_version       bigint NOT NULL,
    last_breaking_version bigint NOT NULL,
    CONSTRAINT pg_yb_catalog_version_db_oid_index PRIMARY KEY (db_oid ASC)
      WITH (table_oid = 8012)
  ) WITH (
    oids = false,
    table_oid = 8010,
    row_type_oid = 8011
  ) TABLESPACE pg_global;
COMMIT;

BEGIN;
  SET LOCAL yb_non_ddl_txn_for_sys_tables_allowed TO true;

  INSERT INTO pg_catalog.pg_proc (
    oid, proname, pronamespace, proowner, prolang, procost, prorows, provariadic, protransform,
    prokind, prosecdef, proleakproof, proisstrict, proretset, provolatile, proparallel, pronargs,
    pronargdefaults, prorettype, proargtypes, proallargtypes, proargmodes, proargnames,
    proargdefaults, protrftypes, prosrc, probin, proconfig, proacl
  ) VALUES
    (8029, 'yb_catalog_version', 11, 10, 12, 1, 0, 0, '-', 'f', false, false, true, false,
     'v', 's', 0, 0, 20, '', NULL, NULL, NULL, NULL, NULL, 'yb_catalog_version', NULL, NULL, NULL)
  ON CONFLICT DO NOTHING;

  -- Create dependency records for everything we (possibly) created.
  -- Since pg_depend has no OID or unique constraint, using PL/pgSQL instead.
  DO $$
  BEGIN
    IF NOT EXISTS (
      SELECT FROM pg_catalog.pg_depend
        WHERE refclassid = 1255 AND refobjid = 8029
    ) THEN
      INSERT INTO pg_catalog.pg_depend (
        classid, objid, objsubid, refclassid, refobjid, refobjsubid, deptype
      ) VALUES
        (0, 0, 0, 1255, 8029, 0, 'p');
    END IF;
  END $$;

  -- Insert a version which would immediately follow the one we track using legacy approach.
  WITH catalog_version AS (
    SELECT pg_catalog.yb_catalog_version() AS v
  )
  INSERT INTO pg_yb_catalog_version(
    db_oid, current_version, last_breaking_version
  ) SELECT 1, v, v FROM catalog_version
  ON CONFLICT DO NOTHING;
COMMIT;

This script has two blocks, the first block creates the table pg_yb_catalog_version if it does not exist. The second block creates the function yb_catalog_version. If in an old cluster where pg_yb_catalog_version already existed and is not empty, then according to the above C++ code we will skip V1 migration script entirely, therefore not executing the second block that creates the function yb_catalog_version.

// Verify that system table exists and is not empty.
Result<bool> SystemTableHasRows(PGConn* pgconn, const std::string& table_name) {
  if (!VERIFY_RESULT(SystemTableExists(pgconn, table_name)))
    return false;
  return VERIFY_RESULT(SelectCountStar(pgconn, table_name)) > 0;
}

Let’s look at what can happen when an old cluster 2.4.x is upgraded:

  // V1: #3979 introducing pg_yb_catalog_version table.
  INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version"))

  // V2: #4525 which creates pg_tablegroup.
  INCREMENT_VERSION_OR_RETURN_IT(SystemTableExists(pgconn, "pg_tablegroup"))

  // V3: #5478 installing pg_stat_statements.
  INCREMENT_VERSION_OR_RETURN_IT(SystemTableExists(pgconn, "pg_stat_statements"))

  // V4: #5408 introducing a bunch of JSONB functions.
  INCREMENT_VERSION_OR_RETURN_IT(FunctionExists(pgconn, "jsonb_path_query"))

If in a 2.4.x cluster, pg_yb_catalog_version table isn’t empty, pg_tablegroup table exists, pg_stat_statements table exists, and function jsonb_path_query exists, then we will skip V1, V2, V3, V4 and that’s the symptom we see on the customer’s cluster.

I have started yugabyted on a 2.4 cluster, and all the above are true:

yugabyte=# select version();
                                                  version
------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.4.8.0-b0 on x86_64-pc-linux-gnu, compiled by gcc (Homebrew gcc 5.5.0_4) 5.5.0, 64-bit
(1 row)

yugabyte=# select count(*) from pg_yb_catalog_version;
 count
-------
     1
(1 row)

yugabyte=# select relname from pg_class where relname = 'pg_tablegroup';
    relname
---------------
 pg_tablegroup
(1 row)

yugabyte=# select relname from pg_class where relname = 'pg_stat_statements';
      relname
--------------------
 pg_stat_statements
(1 row)

yugabyte=# select proname from pg_proc where proname = 'jsonb_path_query';
     proname
------------------
 jsonb_path_query
(1 row)

That’s why we skipped the first 4 migration scripts. Leading to this bug.

Had the customer cluster was created on a 2.2 release, we will not see this bug because none of the above is true:

select version();
                                                  version
------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.2.7.0-b0 on x86_64-pc-linux-gnu, compiled by gcc (Homebrew gcc 5.5.0_4) 5.5.0, 64-bit
(1 row)

select count(*) from pg_yb_catalog_version;
ysqlsh:/tmp/t.sql:2: ERROR:  relation "pg_yb_catalog_version" does not exist
LINE 1: select count(*) from pg_yb_catalog_version;
                             ^
select relname from pg_class where relname = 'pg_tablegroup';
 relname
---------
(0 rows)

select relname from pg_class where relname = 'pg_stat_statements';
 relname
---------
(0 rows)

select proname from pg_proc where proname = 'jsonb_path_query';
 proname
---------
(0 rows)

@myang2021
Copy link
Contributor

Further analysis indicates that if the cluster was created on a release that is < 2.4 (e.g., 2.2 or earlier), or > 2.6 (e.g., 2.8 or beyond), we will not have this bug. If < 2.4, then none of those is true so the upgrade will go through V1, V2, ..., without missing any. If > 2.6, then all those hard-coded (V1 to V8) in C++ are true so we should be skipping all of V1 to V8.

@myang2021
Copy link
Contributor

// Analyze pg_catalog state of a database to determine a current major version of a catalog state
// by checking presence of catalog changing features released before the migrations feature landed.
// 0 means that no migrations were applied yet.
Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) {

According to the comment, those 8 (V1 to V8) migration scripts represent catalog features released before the ysql_upgrade feature landed. So if we see V1, V2, V3, V4 are not applied, it logically means that they represent features already present in the existing cluster and therefore they should be skipped. The only bug is that the second block in V1 which creates the function yb_catalog_version cannot be represented by the existence of a non-empty pg_yb_catalog_version table.

myang2021 added a commit that referenced this issue Jan 22, 2024
Summary:
A customer reported that backup failed and the reason is that the function
`yb_catalog_version()` does not exist. This function is introduced in the
migration script `V1__3979__pg_yb_catalog_version.sql`.

After debugging I found there is a code bug in the YSQL upgrade code.
Specifically:

```
Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) {
  int major_version = 0;

  // Helper macro removing boilerplate.
    if (VERIFY_RESULT(oneliner_with_result)) { \
      ++major_version; \
    } else { \
      return major_version; \
    }

  // V1: #3979 introducing pg_yb_catalog_version table.
  INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version"))
```
This function tries to determine which version we should start from. For
example, it checks the existence of a non-empty table `pg_yb_catalog_version` to
decide whether we need to skip V1 migration script or not. The V1 migration
script has two blocks:

* The first block introduces the table `pg_catalog.pg_yb_catalog_version`
* The second block introduces the function `yb_catalog_version()`

In a 2.4.x or 2.6.x cluster, `pg_catalog.pg_yb_catalog_version` already exists
but `yb_catalog_version()` does not. As a result, the above code will skip V1
migration script and that's why function `yb_catalog_version` isn't introduced.

I have verified that any older release < 2.4 (e.g., 2.2.7.0), or any newer
release > 2.6 (e.g., 2.8.0.0) do not have this bug. In an older release < 2.4,
`pg_catalog.pg_yb_catalog_version` does not exist so the above code
will not skip V1 migration script. In a newer release > 2.6, `yb_catalog_version()`
also exists so it is correct to skip V1 migration script.

The customer cluster was manually fixed by reapplying V1 migration script. I
made a fix such that after the normal migration has completed, check whether
`yb_catalog_version()` exists or not. If it is missing then apply V1 migration
script.

Test Plan:
(1)

./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsByNonSuperuser'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsCreatesThemEverywhere'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsDontFireTriggers'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsAfterFailure'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#sharedRelsIndexesWork'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemViewsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#viewReloptionsAreFilteredOnReplace'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#replacingViewKeepsCacheConsistent'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#insertOnConflictWithOidsWorks'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#dmlsUpdatePgCache'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#pinnedObjectsCacheIsUpdated'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotent'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotentSingleConn'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migratingIsEquivalentToReinitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationInGeoPartitionedSetup'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationFilenameComment'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#invalidUpgradeActions'

(2) Download old release packages such as
yugabyte-2.2.7.0
yugabyte-2.4.0.0
yugabyte-2.4.8.0
yugabyte-2.6.0.0,
yugabyte-2.6.20.0
yugabyte-2.8.0.0
yugabyte-2.8.12.0

Run the following test manually against each of the old releases. For example:

```
./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 1 --master_flags initial_sys_catalog_snapshot_path=$HOME/tmp/yugabyte-2.4.0.0/share/initial_sys_catalog_snapshot
./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql
```

Look at yb-tserver.INFO
```
W0118 17:16:55.246408 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template1
I0118 17:16:55.246526 11555 ysql_upgrade.cc:473] template1: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.246542 11555 ysql_upgrade.cc:481] Found pg_global in migration file V1__3979__pg_yb_catalog_version.sql when applying to template1
I0118 17:16:55.341861 11555 ysql_upgrade.cc:517] template1: migration successfully applied without version bump
W0118 17:16:55.478569 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template0
I0118 17:16:55.478693 11555 ysql_upgrade.cc:473] template0: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.563812 11555 ysql_upgrade.cc:517] template0: migration successfully applied without version bump
W0118 17:16:55.700487 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in postgres
I0118 17:16:55.700606 11555 ysql_upgrade.cc:473] postgres: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.783318 11555 ysql_upgrade.cc:517] postgres: migration successfully applied without version bump
W0118 17:16:55.802189 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in yugabyte
I0118 17:16:55.802299 11555 ysql_upgrade.cc:473] yugabyte: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.826416 11555 ysql_upgrade.cc:517] yugabyte: migration successfully applied without version bump
W0118 17:16:55.845325 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in system_platform
I0118 17:16:55.845443 11555 ysql_upgrade.cc:473] system_platform: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.865423 11555 ysql_upgrade.cc:517] system_platform: migration successfully applied without version bump
```

Rerun the upgrade command
```
./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql
```

Look at the yb-tserver.INFO again:
```
I0118 17:34:51.982327 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template1
I0118 17:34:51.985905 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template0
I0118 17:34:51.989349 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in postgres
I0118 17:34:51.992571 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in yugabyte
I0118 17:34:51.997953 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in system_platform
```

Reviewers: jason, tverona

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D31793
myang2021 added a commit that referenced this issue Jan 24, 2024
…exist

Summary:
A customer reported that backup failed and the reason is that the function
`yb_catalog_version()` does not exist. This function is introduced in the
migration script `V1__3979__pg_yb_catalog_version.sql`.

After debugging I found there is a code bug in the YSQL upgrade code.
Specifically:

```
Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) {
  int major_version = 0;

  // Helper macro removing boilerplate.
    if (VERIFY_RESULT(oneliner_with_result)) { \
      ++major_version; \
    } else { \
      return major_version; \
    }

  // V1: #3979 introducing pg_yb_catalog_version table.
  INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version"))
```
This function tries to determine which version we should start from. For
example, it checks the existence of a non-empty table `pg_yb_catalog_version` to
decide whether we need to skip V1 migration script or not. The V1 migration
script has two blocks:

* The first block introduces the table `pg_catalog.pg_yb_catalog_version`
* The second block introduces the function `yb_catalog_version()`

In a 2.4.x or 2.6.x cluster, `pg_catalog.pg_yb_catalog_version` already exists
but `yb_catalog_version()` does not. As a result, the above code will skip V1
migration script and that's why function `yb_catalog_version` isn't introduced.

I have verified that any older release < 2.4 (e.g., 2.2.7.0), or any newer
release > 2.6 (e.g., 2.8.0.0) do not have this bug. In an older release < 2.4,
`pg_catalog.pg_yb_catalog_version` does not exist so the above code
will not skip V1 migration script. In a newer release > 2.6, `yb_catalog_version()`
also exists so it is correct to skip V1 migration script.

The customer cluster was manually fixed by reapplying V1 migration script. I
made a fix such that after the normal migration has completed, check whether
`yb_catalog_version()` exists or not. If it is missing then apply V1 migration
script.

Original commit: 55aac07 / D31793
Jira: DB-7466

Test Plan:
(1)

./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsByNonSuperuser'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsCreatesThemEverywhere'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsDontFireTriggers'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsAfterFailure'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#sharedRelsIndexesWork'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemViewsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#viewReloptionsAreFilteredOnReplace'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#replacingViewKeepsCacheConsistent'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#insertOnConflictWithOidsWorks'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#dmlsUpdatePgCache'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#pinnedObjectsCacheIsUpdated'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotent'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotentSingleConn'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migratingIsEquivalentToReinitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationInGeoPartitionedSetup'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationFilenameComment'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#invalidUpgradeActions'

(2) Download old release packages such as
yugabyte-2.2.7.0
yugabyte-2.4.0.0
yugabyte-2.4.8.0
yugabyte-2.6.0.0,
yugabyte-2.6.20.0
yugabyte-2.8.0.0
yugabyte-2.8.12.0

Run the following test manually against each of the old releases. For example:

```
./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 1 --master_flags initial_sys_catalog_snapshot_path=$HOME/tmp/yugabyte-2.4.0.0/share/initial_sys_catalog_snapshot
./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql
```

Look at yb-tserver.INFO
```
W0118 17:16:55.246408 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template1
I0118 17:16:55.246526 11555 ysql_upgrade.cc:473] template1: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.246542 11555 ysql_upgrade.cc:481] Found pg_global in migration file V1__3979__pg_yb_catalog_version.sql when applying to template1
I0118 17:16:55.341861 11555 ysql_upgrade.cc:517] template1: migration successfully applied without version bump
W0118 17:16:55.478569 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template0
I0118 17:16:55.478693 11555 ysql_upgrade.cc:473] template0: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.563812 11555 ysql_upgrade.cc:517] template0: migration successfully applied without version bump
W0118 17:16:55.700487 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in postgres
I0118 17:16:55.700606 11555 ysql_upgrade.cc:473] postgres: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.783318 11555 ysql_upgrade.cc:517] postgres: migration successfully applied without version bump
W0118 17:16:55.802189 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in yugabyte
I0118 17:16:55.802299 11555 ysql_upgrade.cc:473] yugabyte: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.826416 11555 ysql_upgrade.cc:517] yugabyte: migration successfully applied without version bump
W0118 17:16:55.845325 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in system_platform
I0118 17:16:55.845443 11555 ysql_upgrade.cc:473] system_platform: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.865423 11555 ysql_upgrade.cc:517] system_platform: migration successfully applied without version bump
```

Rerun the upgrade command
```
./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql
```

Look at the yb-tserver.INFO again:
```
I0118 17:34:51.982327 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template1
I0118 17:34:51.985905 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template0
I0118 17:34:51.989349 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in postgres
I0118 17:34:51.992571 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in yugabyte
I0118 17:34:51.997953 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in system_platform
```

Reviewers: jason, tverona

Reviewed By: jason

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31898
myang2021 added a commit that referenced this issue Jan 24, 2024
…exist

Summary:
A customer reported that backup failed and the reason is that the function
`yb_catalog_version()` does not exist. This function is introduced in the
migration script `V1__3979__pg_yb_catalog_version.sql`.

After debugging I found there is a code bug in the YSQL upgrade code.
Specifically:

```
Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) {
  int major_version = 0;

  // Helper macro removing boilerplate.
    if (VERIFY_RESULT(oneliner_with_result)) { \
      ++major_version; \
    } else { \
      return major_version; \
    }

  // V1: #3979 introducing pg_yb_catalog_version table.
  INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version"))
```
This function tries to determine which version we should start from. For
example, it checks the existence of a non-empty table `pg_yb_catalog_version` to
decide whether we need to skip V1 migration script or not. The V1 migration
script has two blocks:

* The first block introduces the table `pg_catalog.pg_yb_catalog_version`
* The second block introduces the function `yb_catalog_version()`

In a 2.4.x or 2.6.x cluster, `pg_catalog.pg_yb_catalog_version` already exists
but `yb_catalog_version()` does not. As a result, the above code will skip V1
migration script and that's why function `yb_catalog_version` isn't introduced.

I have verified that any older release < 2.4 (e.g., 2.2.7.0), or any newer
release > 2.6 (e.g., 2.8.0.0) do not have this bug. In an older release < 2.4,
`pg_catalog.pg_yb_catalog_version` does not exist so the above code
will not skip V1 migration script. In a newer release > 2.6, `yb_catalog_version()`
also exists so it is correct to skip V1 migration script.

The customer cluster was manually fixed by reapplying V1 migration script. I
made a fix such that after the normal migration has completed, check whether
`yb_catalog_version()` exists or not. If it is missing then apply V1 migration
script.

Original commit: 55aac07 / D31793
Jira: DB-7466

Test Plan:
(1)

./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsByNonSuperuser'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsCreatesThemEverywhere'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsDontFireTriggers'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsAfterFailure'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#sharedRelsIndexesWork'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemViewsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#viewReloptionsAreFilteredOnReplace'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#replacingViewKeepsCacheConsistent'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#insertOnConflictWithOidsWorks'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#dmlsUpdatePgCache'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#pinnedObjectsCacheIsUpdated'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotent'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotentSingleConn'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migratingIsEquivalentToReinitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationInGeoPartitionedSetup'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationFilenameComment'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#invalidUpgradeActions'

(2) Download old release packages such as
yugabyte-2.2.7.0
yugabyte-2.4.0.0
yugabyte-2.4.8.0
yugabyte-2.6.0.0,
yugabyte-2.6.20.0
yugabyte-2.8.0.0
yugabyte-2.8.12.0

Run the following test manually against each of the old releases. For example:

```
./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 1 --master_flags initial_sys_catalog_snapshot_path=$HOME/tmp/yugabyte-2.4.0.0/share/initial_sys_catalog_snapshot
./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql
```

Look at yb-tserver.INFO
```
W0118 17:16:55.246408 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template1
I0118 17:16:55.246526 11555 ysql_upgrade.cc:473] template1: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.246542 11555 ysql_upgrade.cc:481] Found pg_global in migration file V1__3979__pg_yb_catalog_version.sql when applying to template1
I0118 17:16:55.341861 11555 ysql_upgrade.cc:517] template1: migration successfully applied without version bump
W0118 17:16:55.478569 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template0
I0118 17:16:55.478693 11555 ysql_upgrade.cc:473] template0: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.563812 11555 ysql_upgrade.cc:517] template0: migration successfully applied without version bump
W0118 17:16:55.700487 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in postgres
I0118 17:16:55.700606 11555 ysql_upgrade.cc:473] postgres: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.783318 11555 ysql_upgrade.cc:517] postgres: migration successfully applied without version bump
W0118 17:16:55.802189 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in yugabyte
I0118 17:16:55.802299 11555 ysql_upgrade.cc:473] yugabyte: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.826416 11555 ysql_upgrade.cc:517] yugabyte: migration successfully applied without version bump
W0118 17:16:55.845325 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in system_platform
I0118 17:16:55.845443 11555 ysql_upgrade.cc:473] system_platform: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.865423 11555 ysql_upgrade.cc:517] system_platform: migration successfully applied without version bump
```

Rerun the upgrade command
```
./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql
```

Look at the yb-tserver.INFO again:
```
I0118 17:34:51.982327 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template1
I0118 17:34:51.985905 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template0
I0118 17:34:51.989349 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in postgres
I0118 17:34:51.992571 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in yugabyte
I0118 17:34:51.997953 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in system_platform
```

Reviewers: jason, tverona

Reviewed By: jason

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31929
myang2021 added a commit that referenced this issue Jan 25, 2024
…exist

Summary:
A customer reported that backup failed and the reason is that the function
`yb_catalog_version()` does not exist. This function is introduced in the
migration script `V1__3979__pg_yb_catalog_version.sql`.

After debugging I found there is a code bug in the YSQL upgrade code.
Specifically:

```
Result<int> GetMajorVersionFromSystemCatalogState(PGConn* pgconn) {
  int major_version = 0;

  // Helper macro removing boilerplate.
    if (VERIFY_RESULT(oneliner_with_result)) { \
      ++major_version; \
    } else { \
      return major_version; \
    }

  // V1: #3979 introducing pg_yb_catalog_version table.
  INCREMENT_VERSION_OR_RETURN_IT(SystemTableHasRows(pgconn, "pg_yb_catalog_version"))
```
This function tries to determine which version we should start from. For
example, it checks the existence of a non-empty table `pg_yb_catalog_version` to
decide whether we need to skip V1 migration script or not. The V1 migration
script has two blocks:

* The first block introduces the table `pg_catalog.pg_yb_catalog_version`
* The second block introduces the function `yb_catalog_version()`

In a 2.4.x or 2.6.x cluster, `pg_catalog.pg_yb_catalog_version` already exists
but `yb_catalog_version()` does not. As a result, the above code will skip V1
migration script and that's why function `yb_catalog_version` isn't introduced.

I have verified that any older release < 2.4 (e.g., 2.2.7.0), or any newer
release > 2.6 (e.g., 2.8.0.0) do not have this bug. In an older release < 2.4,
`pg_catalog.pg_yb_catalog_version` does not exist so the above code
will not skip V1 migration script. In a newer release > 2.6, `yb_catalog_version()`
also exists so it is correct to skip V1 migration script.

The customer cluster was manually fixed by reapplying V1 migration script. I
made a fix such that after the normal migration has completed, check whether
`yb_catalog_version()` exists or not. If it is missing then apply V1 migration
script.

Original commit: 55aac07 / D31793
Jira: DB-7466

Test Plan:
(1)

./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsByNonSuperuser'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsCreatesThemEverywhere'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSharedRelsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsDontFireTriggers'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemRelsAfterFailure'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#sharedRelsIndexesWork'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#creatingSystemViewsIsLikeInitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#viewReloptionsAreFilteredOnReplace'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#replacingViewKeepsCacheConsistent'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#insertOnConflictWithOidsWorks'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#dmlsUpdatePgCache'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#pinnedObjectsCacheIsUpdated'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotent'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#upgradeIsIdempotentSingleConn'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migratingIsEquivalentToReinitdb'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationInGeoPartitionedSetup'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#migrationFilenameComment'
./yb_build.sh release --sj --java-test 'org.yb.pgsql.TestYsqlUpgrade#invalidUpgradeActions'

(2) Download old release packages such as
yugabyte-2.2.7.0
yugabyte-2.4.0.0
yugabyte-2.4.8.0
yugabyte-2.6.0.0,
yugabyte-2.6.20.0
yugabyte-2.8.0.0
yugabyte-2.8.12.0

Run the following test manually against each of the old releases. For example:

```
./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 1 --master_flags initial_sys_catalog_snapshot_path=$HOME/tmp/yugabyte-2.4.0.0/share/initial_sys_catalog_snapshot
./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql
```

Look at yb-tserver.INFO
```
W0118 17:16:55.246408 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template1
I0118 17:16:55.246526 11555 ysql_upgrade.cc:473] template1: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.246542 11555 ysql_upgrade.cc:481] Found pg_global in migration file V1__3979__pg_yb_catalog_version.sql when applying to template1
I0118 17:16:55.341861 11555 ysql_upgrade.cc:517] template1: migration successfully applied without version bump
W0118 17:16:55.478569 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in template0
I0118 17:16:55.478693 11555 ysql_upgrade.cc:473] template0: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.563812 11555 ysql_upgrade.cc:517] template0: migration successfully applied without version bump
W0118 17:16:55.700487 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in postgres
I0118 17:16:55.700606 11555 ysql_upgrade.cc:473] postgres: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.783318 11555 ysql_upgrade.cc:517] postgres: migration successfully applied without version bump
W0118 17:16:55.802189 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in yugabyte
I0118 17:16:55.802299 11555 ysql_upgrade.cc:473] yugabyte: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.826416 11555 ysql_upgrade.cc:517] yugabyte: migration successfully applied without version bump
W0118 17:16:55.845325 11555 ysql_upgrade.cc:438] Function yb_catalog_version is missing in system_platform
I0118 17:16:55.845443 11555 ysql_upgrade.cc:473] system_platform: applying migration 'V1__3979__pg_yb_catalog_version.sql'
I0118 17:16:55.865423 11555 ysql_upgrade.cc:517] system_platform: migration successfully applied without version bump
```

Rerun the upgrade command
```
./build/latest/bin/yb-admin --timeout_ms=720000 upgrade_ysql
```

Look at the yb-tserver.INFO again:
```
I0118 17:34:51.982327 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template1
I0118 17:34:51.985905 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in template0
I0118 17:34:51.989349 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in postgres
I0118 17:34:51.992571 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in yugabyte
I0118 17:34:51.997953 11558 ysql_upgrade.cc:443] Found function yb_catalog_version in system_platform
```

Reviewers: jason, tverona

Reviewed By: jason

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31939
@yugabyte-ci yugabyte-ci closed this as not planned Won't fix, can't repro, duplicate, stale Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) jira-originated kind/bug This issue is a bug priority/high High Priority
Projects
None yet
Development

No branches or pull requests

3 participants