Skip to content

Feature/chromadb embedding functions #6267 #6648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tejas-dharani
Copy link
Contributor

Why are these changes needed?

This PR adds support for configurable embedding functions in ChromaDBVectorMemory, addressing the need for users to customize how embeddings are generated for vector similarity search. Currently, ChromaDB memory is limited to default embedding functions, which restricts flexibility for different use cases that may require specific embedding models or custom embedding logic.

The implementation allows users to:

  • Use different SentenceTransformer models for domain-specific embeddings
  • Integrate with OpenAI's embedding API for consistent embedding generation
  • Define custom embedding functions for specialized requirements
  • Maintain backward compatibility with existing default behavior

Related issue number

Closes #6267

Checks

…on guide- Fix version format from 0.4.0-dev-1 to 0.4.0-dev.1 for all packages- Remove reference to non-existent Microsoft.AutoGen.Extensions package- Add correct extension packages: Aspire, MEAI, and SemanticKernel- Fix typo: RuntimeGatewway -> RuntimeGateway- Improve documentation structure with clear section headersFixes microsoft#6244
Fix issue microsoft#6277 where TextMessage was used but not imported in three code cells
of the custom agents documentation, causing NameError when users run the examples.

Changes:
- Add TextMessage to imports in ArithmeticAgent section
- Add TextMessage to imports in GeminiAssistantAgent section
- Add TextMessage to imports in Declarative GeminiAssistantAgent section

The CountDownAgent section already had the correct import.

Fixes microsoft#6277
…osoft#6267)

- Add embedding function configuration classes (Default, SentenceTransformer, OpenAI, Custom)
- Extend ChromaDBVectorMemoryConfig with embedding_function_config field
- Update collection initialization to use custom embedding functions
- Add comprehensive tests and demo examples
- Maintain backward compatibility with existing code

Resolves microsoft#6267
@victordibia victordibia self-assigned this Jun 9, 2025
Copy link

codecov bot commented Jun 10, 2025

Codecov Report

Attention: Patch coverage is 86.36364% with 9 lines in your changes missing coverage. Please review.

Project coverage is 79.67%. Comparing base (150ea01) to head (49de936).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...n-ext/src/autogen_ext/memory/chromadb/_chromadb.py 66.66% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6648      +/-   ##
==========================================
+ Coverage   79.65%   79.67%   +0.01%     
==========================================
  Files         229      231       +2     
  Lines       17126    17172      +46     
==========================================
+ Hits        13642    13682      +40     
- Misses       3484     3490       +6     
Flag Coverage Δ
unittests 79.67% <86.36%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tejas-dharani and others added 6 commits June 12, 2025 15:20
- Add explicit type annotation for embedding function config variable
- Add return statement in test helper function for better type checking
- Ensure all type checkers can properly infer types in edge cases
…m:tejas-dharani/autogen into feature/chromadb-embedding-functions-6267
@victordibia
Copy link
Collaborator

@tejas-dharani ,

Thanks again for the PR.
I made some changes, feel free to test them and then we can merge it in

  • Removed changes to files unrelated to the PR
  • Refactored to a folder structure (provides a better structure to extend in the future)
  • Updated the Memory notebook to include an example (instead of a standalone example file)
  • Updated the tests.

@tejas-dharani
Copy link
Contributor Author

@victordibia ,

Thanks for the updates!

  • Fixed the syntax error validation in tests (made it platform-agnostic)
  • The refactoring to folder structure looks great
  • Memory notebook integration is a nice improvement

@victordibia
Copy link
Collaborator

Thanks for the work on this @tejas-dharani , much appreciated.
For future PR's it would be great if we ONLY added commits related to the focus/topic of the PR.
For example, the commit to the dotnet package or the change to the code executor tests are unrelated to this PR and should not be part of it. In these cases, we should ideally create separate PRs.

Thanks again!

@victordibia victordibia merged commit 67ebeed into microsoft:main Jun 13, 2025
65 checks passed
@tejas-dharani
Copy link
Contributor Author

@victordibia Thanks for the feedback and for taking the time to review the PR!

You're absolutely right about keeping commits focused - I'll make sure to separate unrelated changes into their own PRs going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support embedding func in ChromaDB memory
2 participants