Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions: How do we run this in supabase and drizzle? #109

Open
ShravanSunder opened this issue Oct 30, 2023 · 5 comments
Open

Questions: How do we run this in supabase and drizzle? #109

ShravanSunder opened this issue Oct 30, 2023 · 5 comments
Labels
type/question 🙋 Further information is requested

Comments

@ShravanSunder
Copy link

I wanted to explore running this in supabase with drizzle orm. Do you have any guidance on how to do that?

pg vector has https://github.com/pgvector/pgvector-node to connect with drizzle orm

relevant links:

@VoVAllen
Copy link
Member

Unfortunately there's no way to install custom extension on postgres supabase/supabase#14235. I would suggest you tried pgvecto.rs through docker first with your requirement. Such as

docker run --name pgvecto-rs-demo -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d tensorchord/pgvecto-rs:latest

I believe we can easily support this for the drizzle-orm. The only syntax that differs from pgvector is the index creation command. All other query commands are exactly the same as pgvector. Let me give it a try, and I will submit a pull request for drizzle soon.

@VoVAllen
Copy link
Member

I would also like to hear your scenarios (what kind of filter condition you'd like to use). pgvecto.rs has made significant efforts to support various filter mode (prefilter/postfilter/brute force, and we are working on bitmap pushdown to use postgres index on other columns) and optimize performance. If it helps, I can provide further guidance on performance optimization. Thank you!

@ShravanSunder
Copy link
Author

I would need cross filtering for my usecases. Mostly my usecases are filtering by normal sql columns with vector similarity as well as one of the columns in the where clause.

pre/Post filtering has a higher probably of no/spare results or irrelevant results.

For example searching for documents with a tag and with similarity (vector column) assigned to a particular team.

@VoVAllen
Copy link
Member

We are also building an example (https://github.com/kemingy/ragen/blob/main/ragen/client.py#L77-L82) similar to your scenario, using vector search with tag filter. And it worked well based on our example.

For pgvecto.rs, The default prefilter will ensure that the vector index returns a number of results equal to vectors.k and meets the specified filter condition.

May I ask what your typical filter condition selection rate is (what percentage of data satisfy your filter condition)? Also what is the "cross filtering" method you mentioned??

@cutecutecat
Copy link
Member

cutecutecat commented Nov 1, 2023

Supabase only supports Trusted Language Extensions(TLE) supabase/supabase#14600 (comment) for security of custom extensions.

Many extensions provide functions whose implementation is written in C, and creating them in a database means that the compiled C code is “dynamically linked” into your running Postgres process. These dynamically-loaded libraries can now access every aspect of your running database process, right down to raw memory. They are essentially database superusers on steroids. Because of this, C is an “untrusted language” and installing extensions written in C requires filesystem access.

There is a Rust tle implementation: https://github.com/tcdi/plrust
However, pgvecto.rs is hard to converted to TLE as it uses ipc/mmap, which is absolutely forbidden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question 🙋 Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants