Skip to content

2.25.1.0-b263

@spolitov spolitov tagged this 30 Jan 07:04
Summary:
Currently we ignore the data that was present in the table before vector index was created.
It should be fixed by backfilling the data during index creation.

This diff implements the first phase for backfill implementation.
The backfill process is started as soon as vector index is added to the tablet.
During nonconcurrent index creation we wait until backfill process finishes.

The things left to implement:
1) In this diff index backfill happens in a single write. There is could be a lot of data in indexed table, so we should split writes into multiple chunks.
2) TServer could be restarted during backfill procedure. So resuming index backfill should be implemented.
3) When checking whether index is ready, only the leader state is checked. It is preferable to check for replica majority at least.
4) Concurrent index backfill. It could happen that concurrent index creation also does not work, did not check this part.
5) Backfill implemented and tested is for the nonconcurrently case only.
6) The concurrently case behaviour is undefined.

Upgrade/Rollback safety: Safe to upgrade rollback.

Jira: DB-14932

Test Plan: PgVectorIndexTest.ManyRowsWithBackfill

Reviewers: arybochkin, tnayak, jason

Reviewed By: arybochkin, jason

Subscribers: jason, yql, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D41326
Assets 2
Loading