-
-
Notifications
You must be signed in to change notification settings - Fork 742
Vectors & Cosine Similarity #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @mickdelaney, we've been talking about doing something similar as part of #130. Could you expand on how you're doing this with ES currently? Happy to jump on a quick call to discuss this if that's quicker. |
well actually we've moved out of ES due to issues with maintaining our plugin. we've also used postgres in the past, for many years to achieve the same goal of query + vector math. basically you have some model that produces vectors (it doesnt really matter how, it could be proprietary, or vanilla word2vec etc), and you need to index those vectors, and also to use the same model to create a search vector from the query. then you end up with vector to vector operations, so cosine similarity or example. does that explain it ? |
@mickdelaney Thank you for the details. I've already done some work on this front and your suggestion is something we're interested to pursue. I've one further question:
It is straightforward to fetch the nearest K results for a given search query vector, but that would just be just a search in the vector space. Are you also looking to mix these vector search results with keyword based search results? One approach I've seen being used is to use the vector dimensions as weights (like weighted TF-IDF) so you get both keyword matching and semantic relevancy. |
Yeah that sounds interesting. I think if you can support that & top N + query filters your good. The issue is you often want to reduce the vector space as much as possible, eg filter by region, then by X or Y property, then use the vector space. |
As I was asked to add my use case here: It would be useful to be able to filter/sort results based on euclidean distance between vectors. |
@kishorenc any updates on this? I was trying to find my thread on this topic. I think it was on Slack and beyond the retention period. iirc my thought was it would be a great start if TypeSense exposed thr index and vector data via APIs. I feel like there's opportunitiy for a extension system that enables different engines to be plugged in for semanric search type use cases. |
Found the Slack thread which explains my use case. This other thread is also highly related. FYI I'm not an expert by any means on this topic. :) |
@janaka This is still something that we wish to tackle some time, but it's not on our immediate roadmap. |
We've now added support for vector search in Here are instructions on how to use the feature: https://gist.github.com/kishorenc/f008c3a60ee58cb084b0c33c0dbce148 |
See practical usage example here: #130 (comment) |
v0.24 is now available with this feature: https://typesense.org/docs/0.24.0/api/vector-search.html |
Hi,
This is a feature request/roadmap question.
Maybe this is the wrong place ?
I was wondering if any thought has gone into supprting a numeric vector with cosine similarity indexing ?
Modern NLP leverages these vectors as inputs and outputs, e.g. Word2Vec, and a common deployment story is to encode the text query as a vector and encoded the documents index time, and then leverage cosine similarity between the doc & query vectors.
we've done this in elasticsearch, and more recently in the vector database milvus. but having a combine search & vector index allows you to combine NLP/Machine Learning & Information Retrieval techniques.
Anyways, just an idea.
Regards
The text was updated successfully, but these errors were encountered: