Skip to content

2.25.0.0-b132

@spolitov spolitov tagged this 14 Oct 09:22
Summary:
This diff implements vector LSM persistence. Each vector LSM index chunk is stored in a separate file. To maintain the "live" set of those vector index chunk files, we store vector LSM metadata in newly introduced meta files, conceptually similar to RocksDB manifest files.

Vector LSM meta files are named as sequential numbers. Each Vector LSM meta file itself is also structured as a sequence of updates. An update is serialized in the following format:
1) 4 bytes - size of serialized update body.
2) serialized update body (protobufs are used for serializing updates). Currently this could be an addition or removal of a vector index chunk, identified with a numeric id.
3) CRC sum for the serialized update body.

During bootstrap we read the list of available meta files, sort them by their sequence number reflected in their names, and construct the complete set of vector index chunk ids representing the current state of the vector LSM.

An update record could also contain a reset mark flag to support compacting metadata. If the reset mark flag is set, we ignore any preceding updates. This allows us to compact vector LSM metadata by combining the final set of vector index chunk ids into one update and set a reset mark on it, write it to a new file, and delete all preceding files.
Jira: DB-13283

Test Plan: VectorLSMTest.Bootstrap

Reviewers: mbautin, aleksandr.ponomarenko, arybochkin

Reviewed By: mbautin

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D38911
Assets 2
Loading