Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: OpenSearch. added image search by text and image similarity. added integration tests #197

Merged
merged 21 commits into from
Apr 8, 2024

Conversation

curlyfro
Copy link
Contributor

@curlyfro curlyfro commented Apr 8, 2024

Summary by CodeRabbit

  • New Features

    • Introduced an advanced method for adding images to the database, including image processing and vector indexing.
    • Added dynamic dimension setting for vector indexing based on configuration.
    • Implemented a new extension method for converting file paths to binary data, enhancing file handling capabilities.
  • Enhancements

    • Improved embedding model functionality, including dynamic input size calculation and better settings management.
    • Updated the AWS provider configuration process, making it easier to set up AWS services.
  • Refactor

    • Streamlined the request creation process in Amazon Bedrock by renaming classes and methods for clarity.
    • Removed redundant properties from embedding models to optimize performance.
  • Documentation

    • Added guidance on configuring AWS access and setting up the Amazon Bedrock provider.
  • Dependencies

    • Introduced new package dependencies to enhance functionality and support new features.

Copy link
Contributor

sweep-ai bot commented Apr 8, 2024

Apply Sweep Rules to your PR?

  • Apply: All new business logic should have corresponding unit tests.
  • Apply: Refactor large functions to be more modular.
  • Apply: Add docstrings to all functions and file headers.

This is an automated message generated by Sweep AI.

Copy link
Contributor

coderabbitai bot commented Apr 8, 2024

Warning

Rate Limit Exceeded

@HavenDV has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 22 minutes and 29 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.
Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.
Please see our FAQ for further information.

Commits Files that changed from the base of the PR and between 2f59b28 and 3d7b161.

Walkthrough

The recent updates enhance OpenSearch with image and document indexing and querying capabilities. A new utility simplifies converting file paths to binary data, boosting content management efficiency and enabling versatile search operations for textual and visual data.

Changes

File Path Changes
src/Databases/OpenSearch/test/OpenSearchTests.cs Added image/document testing support, setup methods, and new test cases for indexing and querying.
src/Sources/Abstractions/src/EnumerableExtensions.cs
src/Sources/Abstractions/src/LangChain.Sources.Abstractions.csproj
Introduced ToBinaryData extension method and added System.Memory.Data package for file to binary data conversion.
src/Providers/Abstractions/src/Embedding/IEmbeddingModel.cs Removed MaximumInputLength property from IEmbeddingModel in LangChain.Providers.

🐰✨
In the fields of data, where queries run deep,
A rabbit hops in, no secrets to keep.
With images, texts, in binary streams,
It crafts and it builds, where innovation gleams.
🌟 To search and to find, in the vast data night,
Guided by code, in the soft glow of light.
📚✨🖼️

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between ddc5cf7 and f3257bb.
Files selected for processing (19)
  • src/Databases/OpenSearch/src/OpenSearchVectorStore.cs (2 hunks)
  • src/Databases/OpenSearch/src/OpenSearchVectorStoreOptions.cs (1 hunks)
  • src/Databases/OpenSearch/test/OpenSearchTests.cs (5 hunks)
  • src/Directory.Packages.props (1 hunks)
  • src/Providers/Abstractions/src/Embedding/IEmbeddingModel.cs (1 hunks)
  • src/Providers/Amazon.Bedrock/src/BedrockModelRequest.cs (1 hunks)
  • src/Providers/Amazon.Bedrock/src/Chat/AmazonTitanChatModel.cs (1 hunks)
  • src/Providers/Amazon.Bedrock/src/Chat/AnthropicClaude3ChatModel.cs (1 hunks)
  • src/Providers/Amazon.Bedrock/src/Chat/AnthropicClaudeChatModel.cs (1 hunks)
  • src/Providers/Amazon.Bedrock/src/Chat/CohereCommandChatModel.cs (1 hunks)
  • src/Providers/Amazon.Bedrock/src/Chat/MetaLlama2ChatModel.cs (1 hunks)
  • src/Providers/Amazon.Bedrock/src/Chat/MistralInstructChatModel.cs (1 hunks)
  • src/Providers/Amazon.Bedrock/src/Embedding/AmazonTitanEmbeddingModel.cs (3 hunks)
  • src/Providers/Amazon.Bedrock/src/Embedding/AmazonTitanImageEmbeddingModel.cs (3 hunks)
  • src/Providers/Amazon.Bedrock/src/Embedding/BedrockEmbeddingSettings.cs (1 hunks)
  • src/Providers/Amazon.Bedrock/test/BedrockTests.cs (3 hunks)
  • src/Sources/Abstractions/src/EnumerableExtensions.cs (2 hunks)
  • src/Sources/Abstractions/src/LangChain.Sources.Abstractions.csproj (1 hunks)
  • usingBedrock.md (1 hunks)
Additional comments not posted (20)
src/Databases/OpenSearch/src/OpenSearchVectorStoreOptions.cs (1)

9-9: Consider adding documentation for the Dimensions property to explain its purpose and usage.

src/Providers/Abstractions/src/Embedding/IEmbeddingModel.cs (1)

6-11: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1-13]

Ensure all implementations of IEmbeddingModel are updated to reflect the removal of MaximumInputLength.

src/Providers/Amazon.Bedrock/src/BedrockModelRequest.cs (1)

7-7: Ensure all references to BedrockModelStreamRequest and its Create method are updated to BedrockModelRequest and CreateStreamRequest, respectively.

Verification successful

The search did not find any occurrences of 'BedrockModelStreamRequest', indicating that references to the old class name have likely been updated. The results for 'Create(' appear to be unrelated to the specific method renaming in question, suggesting that the search was too broad. Based on this, it seems the specific concern of updating references to 'BedrockModelStreamRequest' and its 'Create' method has been addressed.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for old class and method names to ensure they've been updated.
rg 'BedrockModelStreamRequest' || rg 'Create('

Length of output: 110



Script:

#!/bin/bash
# Corrected search for old class and method names to ensure they've been updated.
rg 'BedrockModelStreamRequest' || rg 'Create\('

Length of output: 1261

src/Sources/Abstractions/src/LangChain.Sources.Abstractions.csproj (1)

18-19: Package references added. Ensure their functionalities are utilized appropriately in the codebase.

Verification successful

The search results confirm the usage of Microsoft.AspNetCore.StaticFiles within the codebase, specifically in the file EnumerableExtensions.cs. However, there is no output related to System.Memory.Data, indicating that its usage within the codebase might not be evident from the search results provided. Given this information, it's clear that Microsoft.AspNetCore.StaticFiles is indeed utilized, but further investigation is required to ascertain the usage of System.Memory.Data.

To proceed, I will generate a script to search for any potential indirect references or usage patterns related to System.Memory.Data that might not be captured by a direct string search. This could include looking for specific classes or methods from the package that might be used in the code.


The search results clearly demonstrate the usage of System.Memory.Data through the BinaryData class across various parts of the codebase, including tests, examples, provider abstractions, and core functionalities. This confirms that both Microsoft.AspNetCore.StaticFiles and System.Memory.Data are indeed utilized within the codebase, aligning with the purpose of their inclusion as indicated by the package references.

Given this evidence, the addition of these package references is justified by their active use in the project. Therefore, the initial review comment is supported by the findings from the executed scripts.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify the usage of newly added packages in the codebase.
rg 'Microsoft.AspNetCore.StaticFiles' || rg 'System.Memory.Data'

Length of output: 378



Script:

#!/bin/bash
# Search for indirect usage patterns of System.Memory.Data, such as BinaryData
rg 'BinaryData'

Length of output: 1986

src/Sources/Abstractions/src/EnumerableExtensions.cs (1)

32-47: Consider adding a file size check before reading files in ToBinaryData to prevent memory issues with large files.

src/Providers/Amazon.Bedrock/src/Embedding/AmazonTitanEmbeddingModel.cs (1)

18-42: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [22-59]

Ensure the calculated settings in CreateEmbeddingsAsync are validated to prevent invalid configurations.

src/Providers/Amazon.Bedrock/src/Embedding/BedrockEmbeddingSettings.cs (1)

4-56: Consider providing more specific error messages in the Calculate method of BedrockEmbeddingSettings for better user experience.

src/Providers/Amazon.Bedrock/src/Embedding/AmazonTitanImageEmbeddingModel.cs (1)

58-73: Document the behavior of processing only the first image in the CreateBodyJson method for clarity.

src/Providers/Amazon.Bedrock/src/Chat/MetaLlama2ChatModel.cs (1)

38-38: Updated method call from BedrockModelStreamRequest.Create to BedrockModelRequest.CreateStreamRequest aligns with PR objectives.

src/Providers/Amazon.Bedrock/src/Chat/CohereCommandChatModel.cs (1)

38-38: Updated method call from BedrockModelStreamRequest.Create to BedrockModelRequest.CreateStreamRequest aligns with PR objectives.

src/Providers/Amazon.Bedrock/src/Chat/MistralInstructChatModel.cs (1)

38-38: Updated method call from BedrockModelStreamRequest.Create to BedrockModelRequest.CreateStreamRequest aligns with PR objectives.

src/Providers/Amazon.Bedrock/src/Chat/AnthropicClaudeChatModel.cs (1)

38-39: Updated method call from BedrockModelStreamRequest.Create to BedrockModelRequest.CreateStreamRequest and the return type change align with PR objectives.

src/Providers/Amazon.Bedrock/src/Chat/AmazonTitanChatModel.cs (1)

38-38: Updated method call from BedrockModelStreamRequest.Create to BedrockModelRequest.CreateStreamRequest aligns with PR objectives.

src/Providers/Amazon.Bedrock/src/Chat/AnthropicClaude3ChatModel.cs (1)

38-38: Updated method call from BedrockModelStreamRequest.Create to BedrockModelRequest.CreateStreamRequest aligns with PR objectives.

src/Directory.Packages.props (1)

33-33: Added package reference to "Microsoft.AspNetCore.StaticFiles" version "2.2.0" aligns with PR objectives.

src/Databases/OpenSearch/src/OpenSearchVectorStore.cs (1)

196-196: LGTM! Dynamic setting of KNN vector dimension in CreateIndex.

The dynamic setting of the KNN vector dimension based on options.Dimensions is correctly implemented and aligns with the PR objectives.

src/Databases/OpenSearch/test/OpenSearchTests.cs (2)

146-175: Verify consistency in test setup and teardown.

Ensure that the setup_document_tests method and other test setup methods maintain consistency in how test environments are configured and cleaned up. This includes verifying that indices are correctly created and deleted to prevent test environment pollution.


57-139: Enhance test coverage for image functionalities.

Consider enhancing the test coverage for image functionalities to include more diverse scenarios, such as handling invalid image formats or empty responses. Additionally, ensure that exceptions are properly handled and logged to facilitate debugging.

src/Providers/Amazon.Bedrock/test/BedrockTests.cs (2)

120-121: Ensure proper integration of new models.

Verify that the new models, such as Claude3HaikuModel and TitanEmbedImageV1Model, are properly integrated and tested within the BedrockTests.cs file. This includes ensuring that the models are correctly instantiated and that their functionalities are fully utilized in the tests.


248-255: Validate the ClaudeImageToText test functionality.

Ensure that the ClaudeImageToText test accurately reflects the capabilities of the Claude3HaikuModel and that the modified message content is appropriate for the test scenario. This includes verifying that the test covers relevant scenarios and properly handles exceptions.

src/Databases/OpenSearch/test/OpenSearchTests.cs Outdated Show resolved Hide resolved
usingBedrock.md Outdated Show resolved Hide resolved
usingBedrock.md Outdated Show resolved Hide resolved
usingBedrock.md Outdated Show resolved Hide resolved
HavenDV and others added 3 commits April 8, 2024 16:45
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between f3257bb and f262d95.
Files selected for processing (1)
  • usingBedrock.md (1 hunks)
Additional comments not posted (6)
usingBedrock.md (6)

2-2: Correct the spelling of "serveral" to "several".


2-2: Change "way" to "ways" to correct the grammatical number.


2-2: Consider adding a comma after "ways" for better readability.


2-2: Use "an" instead of "a" before "AWS Profile".


53-53: Replace "you're" with "your" to correct the possessive form.


61-61: Use "an" instead of "a" before "AWS Access Key ID".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between f262d95 and 7220c64.
Files selected for processing (1)
  • usingBedrock.md (1 hunks)
Additional comments not posted (1)
usingBedrock.md (1)

68-69: Consider adding content or instructions on how to use Amazon Bedrock LLMs, or clarify if this section is a placeholder for future updates.

usingBedrock.md Outdated Show resolved Hide resolved
usingBedrock.md Show resolved Hide resolved
usingBedrock.md Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7220c64 and 1e65fcb.
Files selected for processing (1)
  • usingBedrock.md (1 hunks)

usingBedrock.md Outdated Show resolved Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 1e65fcb and 5125285.
Files selected for processing (1)
  • usingBedrock.md (1 hunks)

usingBedrock.md Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5125285 and d9f1c72.
Files selected for processing (3)
  • src/Databases/OpenSearch/test/OpenSearchTests.cs (5 hunks)
  • src/Sources/Abstractions/src/EnumerableExtensions.cs (2 hunks)
  • src/Sources/Abstractions/src/LangChain.Sources.Abstractions.csproj (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • src/Sources/Abstractions/src/EnumerableExtensions.cs
  • src/Sources/Abstractions/src/LangChain.Sources.Abstractions.csproj
Additional comments not posted (1)
src/Databases/OpenSearch/test/OpenSearchTests.cs (1)

118-141: Consider externalizing the hardcoded prompt text for better maintainability and flexibility. Also, ensure that the chain of operations is correctly handling errors and edge cases.

src/Databases/OpenSearch/test/OpenSearchTests.cs Outdated Show resolved Hide resolved
src/Databases/OpenSearch/test/OpenSearchTests.cs Outdated Show resolved Hide resolved
src/Databases/OpenSearch/test/OpenSearchTests.cs Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d9f1c72 and 66e463b.
Files selected for processing (1)
  • src/Databases/OpenSearch/test/OpenSearchTests.cs (5 hunks)
Additional comments not posted (2)
src/Databases/OpenSearch/test/OpenSearchTests.cs (2)

2-2: Consider organizing imports to maintain code cleanliness.


22-22: Introduction of _embeddings field enhances modularity by allowing reuse across tests.

@HavenDV HavenDV enabled auto-merge (squash) April 8, 2024 21:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 66e463b and 8f37069.
Files selected for processing (1)
  • src/Providers/Abstractions/src/Embedding/IEmbeddingModel.cs (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/Providers/Abstractions/src/Embedding/IEmbeddingModel.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 8f37069 and 36f6b8f.
Files selected for processing (1)
  • src/Providers/Abstractions/src/Embedding/IEmbeddingModel.cs (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/Providers/Abstractions/src/Embedding/IEmbeddingModel.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review Status

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 36f6b8f and 2f59b28.
Files selected for processing (1)
  • src/Providers/Abstractions/src/Embedding/IEmbeddingModel.cs (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/Providers/Abstractions/src/Embedding/IEmbeddingModel.cs

@HavenDV HavenDV merged commit d3d563e into tryAGI:main Apr 8, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants