fix(opensearch): Return Distance and respect the provided ScoreThreshold#112
fix(opensearch): Return Distance and respect the provided ScoreThreshold#112HavenDV merged 1 commit intotryAGI:mainfrom dandeto:main
Conversation
WalkthroughThe pull request introduces modifications to the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/OpenSearch/src/OpenSearchVectorCollection.cs (1)
121-127: Clarify the distinction between score and distance.You are assigning
hit.ScoretoDistance, but in most vector search paradigms, a higher score implies higher similarity, which can be the opposite of distance if using certain metrics. If your goal is to track similarity rather than an actual distance, consider renaming the property for clarity. Otherwise, ensure this matches your end users' expectations of "distance."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/OpenSearch/src/OpenSearchVectorCollection.cs(2 hunks)
🔇 Additional comments (1)
src/OpenSearch/src/OpenSearchVectorCollection.cs (1)
101-101: Check the default ScoreThreshold value.Defaulting
ScoreThresholdto0.0fallows all hits to pass the threshold. Consider whether this aligns with your desired logic. If you intend to exclude marginally relevant hits, setting a slightly higher default might be beneficial.Could you confirm that
0.0fis the intended default, or would you like a different default (e.g., 0.5f)?
| Items = response.Hits | ||
| .Where(hit => !string.IsNullOrWhiteSpace(hit.Source.Text) && hit.Score >= settings.ScoreThreshold) |
There was a problem hiding this comment.
Handle empty or null embeddings edge case.
The code calls request.Embeddings.First(), but there's no check to ensure that request.Embeddings is not empty. If request.Embeddings is empty, this could lead to a runtime exception. You might want to add a guard clause or throw an argument exception if no embeddings are provided.
+ if (request.Embeddings == null || !request.Embeddings.Any())
+ {
+ throw new ArgumentException("At least one embedding must be provided.", nameof(request.Embeddings));
+ }Committable suggestion skipped: line range outside the PR's diff.
I don't see the "Ready for review" template mentioned in contributing.md, but this is a simple PR, so it shouldn't require too much explanation. The only modifications were to
SearchAsyncinOpenSearchVectorCollection.The
Distanceproperty is filled out in the postgres and mongodb abstractions, but it always returns 0 when using OpenSearch. I have a usecase for wanting to know the distance. Rather than using opensearch's convenient "Documents" API, I switched to using the "Hits" API which provides some additional information, such as the score for each hit.Another feature that was not implemented was utilizing the
ScoreThresholdoptionally passed in by the user to filter out hits that are below the desired threshold. I added a second condition to the linqwhereclause to implement this.I have an application using langchain that I used to test this function before and after my adjustments, and I can verify that the
Distanceis passed back and theScoreThresholdis respected.Summary by CodeRabbit
Bug Fixes
Performance
Distanceproperty to capture hit scores more effectively