Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about pre-filtering performance #113

Closed
mertalev opened this issue Nov 1, 2023 · 10 comments
Closed

Clarification about pre-filtering performance #113

mertalev opened this issue Nov 1, 2023 · 10 comments

Comments

@mertalev
Copy link

mertalev commented Nov 1, 2023

Hi! I'm looking at both pgvecto.rs and pgvector, and I think pre-filtering support is a killer feature that makes this project appealing.

But the README seems to recommend against using it in the filtering section. Instead, it suggests applying post-filtering for better optimization.

Could you clarify what optimization issues there are with pre-filtering? Relatedly, I think it would be helpful to add a performance comparison between pre-filtering and post-filtering.

@VoVAllen
Copy link
Member

VoVAllen commented Nov 2, 2023

Thanks for your interest! The actual performance of filtering depends on how tight your condition is:

  • If your filter is loose, let's say 90% data satisfy it. Performing an ANN search first and then applying a filter to the results would yield the fastest and relatively good outcomes. This is post filtering
  • If your filter is really tight, let's say only 100 rows satisfy it. The optimal approach is to apply the filter first, obtaining 100 results, and then calculate the distance directly without utilizing any vector index. This is brute force
  • If your filter is kind of tight, let's say 20% data satisfy it. The post-filtering strategy may encounter issues as the ANN search might not yield sufficient results for filtering. The optimal approach in this case is prefiltering. As the algorithm traverses the hnsw graph to discover new points, it will simultaneously verify the filter until there are an adequate number of candidates. All the results from the vector index have already met the filter criteria.

Currently pgvector on hnsw index is doing the post filtering. Therefore when the filter is not loose enough, the recall drops rapidly, because index cannot provide sufficient results. User need to manually increase the ef_search parameter to ask the index provide more results for filtering to get reasonable results.

We have done the benchmark on the laion dataset with filter. This is the only real world dataset we can find that has filtering test. Basically you don't need to manually tune any parameter to get reasonable good results with filtering. Under the same configuration, pgvector can only achieve about 50% precision, as well as pgvecto.rs can achieve 95% precision with higher QPS.

@VoVAllen
Copy link
Member

VoVAllen commented Nov 2, 2023

Currently we don't have time to carefully benchmark each mode, so we let user to select it on their own based on their data. Ideally we can select the best plan for the users in the future.

To select different mode:
Prefiltering (default mode): SET vectors.enable_vector_index=on; SET vectors.enable_prefilter=on
Postfiltering: SET vectors.enable_vector_index=on; SET vectors.enable_prefilter=off
Brute force: SET vectors.enable_vector_index=off

@VoVAllen
Copy link
Member

VoVAllen commented Nov 2, 2023

We are also implementing a new method that retrieves the bitmap from the filter condition and indexed column. This allows us to push down the bitmap to enhance performance in the vector search process. Please stay tuned!

@mertalev
Copy link
Author

mertalev commented Nov 3, 2023

Thank you for the detailed explanation (and of course for working on this great project)! I have a better sense of the pros/cons now.

I'm planning on testing pgvecto.rs soon and will let you know how it goes. Besides performance, there are some things in particular I want to make sure get handled, like changing the dimension size and not creating the index when the table is empty (I read that the index crashes in this case).

@VoVAllen
Copy link
Member

VoVAllen commented Nov 3, 2023

@mertalev We're looking forward to your feedback! Yes, currently create index on empty table doesn't work well and dimension size cannot be changed after column creation. Another point worth attention is the capacity parameters. Current pgvecto.rs needs to predefine a capacity as the maximum number of vectors. If you want to change it, you can do this by recreate the index, like #101

@mertalev
Copy link
Author

I've been testing pgvecto.rs over the last few days and it's very nice! With the latest release, it now handles empty tables so that makes migration easier.

From comparison with pgvector:

  • Brute-force performance is about 5% higher
  • Basic HNSW queries are about 5% slower
  • Indexing speed is night and day in favor of pgvecto.rs
  • Filtering works very well
    • With pgvector, we have to think about partitioning an existing table, what to do if we need to have a second filter later, etc.
    • pgvecto.rs gives an easy and scalable way to filter without any DBMS complexity

Ease-of-use improvements:

  • Automatically increasing capacity as the index grows
  • Not requiring vectors_load to use the index
  • More documentation
    • Besides capacity (which I made an issue for), I'm also not sure what to expect from quantization. It seems to be about as fast, takes close to the same amount of time to build and uses the same amount of memory. This is with 200k vectors, so is it that it only makes a difference with a larger number?

@usamoi
Copy link
Collaborator

usamoi commented Nov 12, 2023

vectors_load is no more needed and the capacity increases automatically in the latest release.

@usamoi
Copy link
Collaborator

usamoi commented Nov 12, 2023

Quantization uses lower memory. By default it uses 1/4 memory compared before.

@mertalev
Copy link
Author

I was using the latest tagged image, but after setting it to pg15-v0.1.6-amd64 you're right on both points. With x16 quantization, latency is more than halved and it uses much less memory.

@VoVAllen
Copy link
Member

Thanks for your detailed feedback and it's super valuable to us! As usamoi explained, we just fixed the capacity problem this week so that user doesn't need to take care of it any more. And we'll work on prepare the arm64 image and more pg versions.

For quantization, you may also want to check the precision if needed. This can be accomplished by simply using brute force mode and comparing the results between them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants