Skip to content

2025.1.0.0-b17

@spolitov spolitov tagged this 29 Apr 19:05
Summary:
When executing a vector index query, the operation may include complex filtering conditions that cannot be pushed down to the index level. A common example of this would be queries containing inner joins or other non-trivial relational operations.

This presents a challenge: when such filters are applied after the vector search, we cannot determine in advance how many candidate vectors from the index will ultimately satisfy the complete query conditions. Consequently, we may need to retrieve significantly more vectors than the user-specified limit to ensure we return the correct number of valid results after filtering.

To address this issue robustly, the system should implement paginated querying capability for vector index operations. This approach would allow us to:
1. Fetch vectors in manageable batches
2. Apply the complex filters incrementally
3. Continue retrieving additional pages until either:
   - We satisfy the user's requested limit
   - We exhaust the relevant portion of the index

This pagination mechanism would provide both correctness (ensuring we respect all query conditions) and efficiency (avoiding loading excessive unnecessary vectors into memory at once).

**Upgrade / Downgrade Safety**
Safe since PgVector feature not released yet.

Original commit: 94ac3c01a41ea70efdfeef24c19e3fe371d025f2 / D43353

Jira: DB-16298

Test Plan: PgVectorIndexTest.Paging/*

Reviewers: arybochkin

Reviewed By: arybochkin

Subscribers: yql, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D43572
Assets 2
Loading