Skip to content

2024.2.3.0-b73

@arpang arpang tagged this 02 Apr 06:45
Summary:
This revision adds the procedure `yb_index_check()` that checks whether the given index is consistent with its base relation or not. This operation doesn't support GIN indexes yet.

An index row has the following components:
- index attributes: key and non-key columns (if any)
- ybbasectid: system attribute storing the ybctid of base relation row
- ybuniqueidxkeysuffix: system attribute, present only for unique indexes (it is non-null only when the index is in null-are-distinct mode and the key columns contain at least one NULL)

The structure of an index row is as follows (in `<DocKey>` -> `<DocValue>` format):
- Non-unique index: (key cols, ybbasectid) -> (non-key cols)
- Unique index: (key cols, ybuniqueidxkeysuffix) -> (non-key cols, ybbasectid)

An index row is consistent if all of its attributes are consistent. An index attribute is consistent if its value and the corresponding base relation value are
- for key attributes: binary equal (semantic equality without binary equality runs the risk of allowing multiple index rows for a given base table row if the key column can have multiple binary representations).
- for non-key attributes: binary or semantically equal.

Note: if both the values are NULL, they are consistent.

Index consistency check is done in two steps:
  # Check for spurious index rows
  # Check for missing index rows

**Part 1: Check for spurious index rows**
Here, we check if the index contains a row that it should not. To do this:

For every index row, fetch the row in the base table (filtered by partial index predicate) such that baserow.ybctid == indexrow.ybbasectid. If such a row doesn’t exist, use a baserow with all NULL values. The result will be the same as a LEFT join on indexrow.ybbasectid = baserow.ybctid with the index table on the left (if the index was a regular relation).

Fetch the following columns as the join targetlist:
- from index row: ybbasectid, index attributes, ybuniqueidxkeysuffix (only for unique indexes)
- from base table row: ybctid, columns/expressions corresponding to index attributes

On the joined result, make the following checks:

  # ybbasectid should be non-null
  # ybbasectid should be equal to ybctid
  # index attributes and the corresponding base relation column/expression should be consistent as per the above definition
  # for unique indexes, ybuniqueidxkeysuffix should be non-null iff the index uses null-are-distinct mode and key columns contain at least one null. When non-null, it should be equal to ybbasectid

If the above checks pass for every row in the index, it implies that the index does not contain any spurious rows. This can be proved by contradiction as follows:

Let’s assume that the above checks passed for every row in the index, yet it contains a spurious row, namely indexrow1. This index row must satisfy the following:
- indexrow1.ybbasectid != null (from check #1)
- base table has a row, namely baserow, such that baserow.ybctid == indexrow1.ybbasectid (otherwise ybctid would be null and check #2 would have failed)
- index attributes of indexrow1 are consistent with baserow (from check #3)
- If the index is unique, indexrow1.ybuniqueidxkeysuffix is either null or equal to ybbasectid, depending on the index mode and key cols (from check #4)

The above shows that indexrow1 has a valid counterpart in the baserow. Given this, the only possible reason why indexrow1 should not have been present in the index is that another index row, namely indexrow2, must exist such that the pair (indexrow2, baserow) also satisfies the above checks.

We can say that indexrow1 and indexrow2
- have the same ybbasectid (baserow.ybctid == indexrow2.ybbasectid == indexrow1.ybbasectid).
- have binary equal values for key columns. This is because key cols of both index rows are binary equal to the corresponding baserow values (from check #3 and definition of consistency).
- have identical ybuniqueidxkeysuffix (it depends on index type, mode, and key cols - all of these are already established to be the same for the two index rows).

The DocKey of the index row is created by a subset of (key cols, ybbasectid, ybuniqueidxkeysuffix). Each component is identical for the two index rows, implying identical DocKeys. This is not possible because DocDB does not allow duplicate DocKeys. Hence, such an indexrow1 does not exist.

**Part 2: Check for missing index rows**
This part checks if no entries are missing from the index. Given that it is already established that the index does not contain any spurious rows, it suffices to check if the index row count is what it should be. That is, for every qualifying row in the base table (filtered by partial index predicate), the index should contain one row.

- To fetch the index row and the corresponding base table tow efficiently, batch nested loop join is used (details below)
- Both parts of the check use a single read time. This works out of the box because the entire check is executed as a single YSQL statement.

**Batch Nested Loop Join usage**
Batchable join clauses must be of the form `inner_indexed_var = expression on (outer_vars)` and the expression must not involve functions.

To satisfy the above requirement,
- join condition: baserow.ybctid == indexrow.ybbasectid.
- outer subplan: index relation scan
- inner subplan: base relation scan. BNL expects an index on the var referenced in the join clause (ybctid, in this case). So, a dummy primary key index object on the ybctid column is temporarily created (not persisted in the PG catalog). Like any other PK index in YB, this index points to the base relation and doesn't have a separate docdb table. Because such an index object doesn't actually exist, the planner was bypassed and the join plan was hardcoded.
Jira: DB-15118

Backport summary:
- `V73__25820__yb_index_check.sql`
    - Rename it as per the migration backport standard.
    - Rename `prosupport` field to `protransform`
    - Add pg_depend entry
    - Remove pg_description entry
- pg_yb_migration.dat: Specify the correct migration filename and major/minor version
- catalog.h: Set YB_LAST_USED_OID to 8090 (same as in master branch)
- pggate.cc, pg_dml_read.h, pg_dml_read.cc: YbctidProvider is not available in 2024.2 and earlier branches. Introduce a new field, `requested_ybctids_owned_`, in `PgDml` to hold the container.
- indexam.c:
     - rename `rd_indam` to  `rd_amroutine`
     - Additional imports `access/sysattr.h` and `catalog/pg_type.h`
     - Adjacent line conflicts
- yb_scan.c:
   - SystemAttributeDefinition() takes an additional argument, relhasoids
   - Use YBSystemFirstLowInvalidAttributeNumber instead of YBFirstLowInvalidAttributeNumber in ybcSetupTargets() because index check involves attributes with attnum <  YBFirstLowInvalidAttributeNumber. This change is inline with D40090 on the master branch.
   - adjacent line conflicts
- heap.c:
  - YbSystemAttributeDefinition(): change return type from `const FormData_pg_attribute *` to `Form_pg_attribute` (inline with SystemAttributeDefinition())
  - definitions TYPALIGN_INT and TYPSTORAGE_EXTENDED are not available, use the underlying char values directly
  - Unlike master branch, ybctid column is not exposed in this version. Update SystemAttributeDefinition() such that it returns dummy pg_attribute entry for it.
- nodeIndexOnlyScan.c: macro definitions of TABLETUPLE_YBCTID and INDEXTUPLE_YBCTID are not available, expand them.
- datum.c: add missing function datum_image_eq(). It is required by yb_index_check().
- yb_index_check.c:
  - indnullsnotdistinct is not applicable as the NULLS NOT DISTINCT feature is not available in 2024.2 and earlier branches
  - SystemAttributeDefinition() function call takes addition `relhasoids` argument
  - lnext()'s definition is different in this and earlier branches
  - ExecTypeFromTL() function call takes an additional `hasoid` argument
  - additional imports: executor/ybcExpr.h, optimizer/clauses.h, access/htup_details.h, pg_yb_utils.h, catalog/pg_type.h, access/sysattr.h
  - ExecInitRangeTable() is not available, set the range table manually
  - ExecCloseResultRelations() and ExecCloseRangeTableRelations() are neither available nor applicable in this branch
- yb_index_check.sql/yb_index_check.out: remove nulls not distinct tests as the feature is not available in this branch
- pg_operator.dat: add symbol `ByteaEqualOperator` to equality operator on bytea type
- nodeIndexScan.c
  -  lockmode when calling index_open() on this branch differs from the master branch, the same change applies when calling yb_dummy_baserel_index_open().
  - adjacent line conflicts
- execExprInterp.c:
    - ExecEvalSysVar() is not available on this branch and the related code change is not applicable either.
    - ExecInterpExpr(): Handle scan of sysvar YBIdxBaseTupleIdAttributeNumber and YBUniqueIdxKeySuffixAttributeNumber
- ybc_pggate.cc/ybc_pggate.h:
   - YbcStatus -> YBCStatus
   - YbcPgStatement -> YBCPgStatement
- Adjacent line conflicts in: misc/Makefile, itup.h, pg_proc.dat, tuptable.h, ybc_pg_typedefs.h, ybc_pggate.h, pggate.cc, pg_dml_read.h, pg_expr.cc
- Adjacent line conflicts in guc.h. This change is not even required and is being removed from master (D42398), skip it.
Original commit: 10de037ad68e8ca068d648de9ad41ac1727d9e29 / D41376

Test Plan:
   ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressYbIndexCheck

Reviewers: amartsinchyk, tnayak

Reviewed By: amartsinchyk

Subscribers: yql, smishra

Differential Revision: https://phorge.dev.yugabyte.com/D42444
Assets 2
Loading