Skip to content

2024.2.6.0-b89

@myang2021 myang2021 tagged this 14 Oct 15:56
Summary:
The symptom of this bug:
```
yugabyte=# create table foo(id int, id2 int);
CREATE TABLE
yugabyte=# insert into foo values (1, 1), (2, 2);
INSERT 0 2
yugabyte=# insert into foo values (1, 3), (2, 4);
INSERT 0 2
yugabyte=# create unique index concurrently id_idx on foo(id); -- id_idx is invalid
ERROR:  ERROR:  duplicate key value violates unique constraint "id_idx"
yugabyte=# create unique index concurrently id_idx2 on foo(id2); -- moves id_idx permission from INDEX_PERM_WRITE_AND_DELETE_WHILE_REMOVING to INDEX_PERM_DELETE_ONLY_WHILE_REMOVING
CREATE INDEX
yugabyte=# create unique index concurrently id_idx3 on foo(id2); -- moves id_idx permission from INDEX_PERM_DELETE_ONLY_WHILE_REMOVING to INDEX_PERM_INDEX_UNUSED, which causes master to delete the docdb table for id_idx
CREATE INDEX
yugabyte=# insert into foo values (5, 5); -- OBJECT_NOT_FOUND error
ERROR:  Table with identifier 000034d4000030008000000000004003 not found: OBJECT_NOT_FOUND
```

This bug did not exist in 2024.2.4.1. The regression was introduced by commit
e01d18cce6760f7c9d4fd5b5336a9ceea4584ecd.

In function `MultiStageAlterTable::LaunchNextTableInfoVersionIfNecessary`:

Before the commit:

```
 466       } else if (idx_pb.index_permissions() != INDEX_PERM_READ_WRITE_AND_DELETE && !is_ysql_table) {
 467         indexes_to_update.emplace(idx_pb.table_id(), NextPermission(idx_pb.index_permissions()));
 468       }
```

After the commit:

```
 472       } else if (
 473           idx_pb.index_permissions() != INDEX_PERM_READ_WRITE_AND_DELETE && update_to_backfill) {
 474         indexes_to_update.emplace(idx_pb.table_id(), NextPermission(idx_pb.index_permissions()));
 475       }
```

We were not updating YSQL index permission before, now we do.

So when processing id_idx2, we will come to line 473 for id_idx as well, that will move its
permission state to the next one (`NextPermission`). Then when processing id_idx3, we come to line
473 for id_idx again, and that will move its permission state to another next one which is now
INDEX_PERM_INDEX_UNUSED. Therefore if we do not create id_idx3 as above, then the last insert
statement will pass. It is `INDEX_PERM_INDEX_UNUSED` that causes master to delete id_idx's docdb
table, and that leads to the OBJECT_NOT_FOUND error.

Because when backfill fails, we set permission to INDEX_PERM_WRITE_AND_DELETE_WHILE_REMOVING:
```
      permissions_to_set.emplace(
          kv_pair.first,
          success ? INDEX_PERM_READ_WRITE_AND_DELETE : INDEX_PERM_WRITE_AND_DELETE_WHILE_REMOVING);
```

I made two changes
(1) separate ycql and ysql
(2) for ysql do not move to next state for INDEX_PERM_WRITE_AND_DELETE_WHILE_REMOVING

I added a new unit test that would fail with the following error before the fix:

```
[ts-1] 2025-10-09 18:05:51.320 UTC [3707396] ERROR:  Table with identifier 000034d4000030008000000000004003 not found: OBJECT_NOT_FOUND
[ts-1] 2025-10-09 18:05:51.320 UTC [3707396] STATEMENT:  INSERT INTO foo VALUES (5, 5)
```
Jira: DB-18567

Original commit: b6efa55e7247da85f50ab77ed4d1151e49fd9101 / D47307

Test Plan:
./yb_build.sh release --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.MultipleIndexesFirstOneInvalid/0
./yb_build.sh release --cxx-test pg_index_backfill-test --gtest_filter PgIndexBackfillTest.MultipleIndexesFirstOneInvalid/1

Reviewers: #db-approvers, amitanand

Reviewed By: amitanand

Subscribers: yql, sanketh

Differential Revision: https://phorge.dev.yugabyte.com/D47433
Assets 2
Loading