A powerful Model Context Protocol (MCP) server implementation that provides seamless vector database capabilities through ChromaDB. Enhance your AI applications with semantic document search, metadata filtering, and persistent document storage.
- Overview
- How ChromaDB Integration Works
- Features
- Requirements
- Quick Start
- Installation
- Configuration
- Usage Examples
- API Reference
- Development
- Troubleshooting
- Contributing
- License
The ChromaDB MCP Server bridges the gap between Language Models and vector databases. By implementing the Model Context Protocol (MCP), this server allows AI assistants to:
- Store and retrieve documents with semantic understanding
- Search for related content based on meaning rather than keywords
- Filter results using metadata and content specifications
- Maintain persistent storage across sessions
- Manage document versions to track changes over time
- Organize content in collections for better document management
MCP (Model Context Protocol) is an open protocol that standardizes how AI models interact with external tools and resources. This implementation focuses on document storage and retrieval powered by ChromaDB's vector database capabilities.
This MCP server uses ChromaDB in embedded mode, which means:
- No Separate Service Required: ChromaDB runs directly within the MCP server process
- File-Based Storage: Data is stored as files in the configured data directory
- Single Process Deployment: You only need to run the MCP server, ChromaDB is embedded
- Automatic Persistence: Data is automatically persisted to disk between server restarts
The data directory (configurable as described below) contains:
- Vector embeddings of your documents
- Metadata and content storage
- Index structures for efficient similarity search
- Collection information
Behind the scenes, this implementation:
- Creates text embeddings using the Sentence Transformers library
- Stores these embeddings with metadata in ChromaDB collections
- Provides search capabilities using vector similarity
- Handles document versioning and collection management on top of ChromaDB's features
Note: While ChromaDB can also be run as a separate service, this implementation uses the embedded mode for simplicity and ease of deployment.
Feature | Description |
---|---|
Semantic Search | Find documents based on meaning rather than exact matches using ChromaDB's embedding models |
Hybrid Search | Combine semantic and keyword search with adjustable weights for optimal results |
Multi-Query Search | Search with multiple queries at once and aggregate results with union or intersection |
Metadata Filtering | Narrow search results by specific metadata fields (e.g., date, author, category) |
Content Filtering | Apply additional text-based filters to search results |
Document Versioning | Track document changes over time with version history and retrieval |
Collection Management | Organize documents into separate collections for better content organization |
Bulk Operations | Create multiple documents in a single operation for improved performance |
Persistent Storage | Documents persist in local storage between server restarts |
Comprehensive Error Handling | Clear error messages and automatic retry mechanisms for reliability |
Cross-Platform Support | Works on Windows, macOS, and Linux |
MCP Integration | Seamlessly works with Claude and other MCP-compatible AI assistants |
- Python 3.12 or higher
- ChromaDB 0.4.22 or higher
- MCP SDK 1.1.2 or higher
- Sentence Transformers 2.2.2 or higher
-
Install the package:
pip install .
-
Start the server:
chroma-mcp
-
Configure your AI assistant to use the server (see Configuration)
-
Start storing and retrieving documents!
# Clone the repository
git clone https://github.com/humainlabs/chromadb-mcp.git
cd chromadb-mcp
# Install directly
pip install .
# Or install in development mode
pip install -e .
# Create and activate virtual environment
uv venv
# On Windows
.venv\Scripts\activate
# On macOS/Linux
source .venv/bin/activate
# Install dependencies
uv sync --dev --all-extras
Note: The module has been renamed from
chroma
tochroma_mcp
to avoid conflicts with the official ChromaDB package.
If you're upgrading from a previous version, the data directory has moved from src/chroma/data
to src/chroma_mcp/data
. To migrate your existing data, run the included migration script:
python migrate_data.py
# Create a new document with metadata
create_document({
"document_id": "research_paper_123",
"content": "Recent advancements in transformer models have revolutionized NLP tasks.",
"metadata": {
"year": 2023,
"author": "Smith et al.",
"field": "natural language processing",
"tags": ["transformers", "deep learning"]
}
})
# Bulk create multiple documents
bulk_create_documents({
"documents": [
{
"document_id": "research_paper_123",
"content": "Transformer models have revolutionized NLP.",
"metadata": {"year": 2023, "field": "NLP"}
},
{
"document_id": "research_paper_124",
"content": "Reinforcement learning from human feedback improves model alignment.",
"metadata": {"year": 2023, "field": "AI alignment"}
},
{
"document_id": "research_paper_125",
"content": "Multimodal models can process both text and images.",
"metadata": {"year": 2023, "field": "multimodal AI"}
}
]
})
# Find documents using a combination of semantic and keyword matching
hybrid_search({
"query": "transformer architecture improvements",
"keyword_weight": 0.3, # 0.0 for pure semantic, 1.0 for pure keyword
"num_results": 3,
"metadata_filter": {
"year": 2023,
"field": "natural language processing"
}
})
# Search with multiple related queries and combine the results
multi_query_search({
"queries": [
"transformer architecture",
"attention mechanisms",
"language model training"
],
"aggregation": "union", # or "intersection" for documents matching all queries
"num_results": 5,
"metadata_filter": {"field": "NLP"}
})
# Create a new version of a document
create_document_version({
"document_id": "research_paper_123",
"content": "Updated research on transformer architectures and their applications.",
"version_note": "Added new section on emerging applications",
"metadata": {"last_updated": "2023-12-01"}
})
# List all versions of a document
list_document_versions({
"document_id": "research_paper_123"
})
# Retrieve a specific version
get_document_version({
"document_id": "research_paper_123",
"version": 2 # or "latest"
})
# Create a new collection
create_collection({
"collection_name": "research_papers",
"description": "Academic research papers on AI topics",
"metadata": {"category": "academic", "field": "AI"}
})
# List all collections
list_collections()
# Delete a collection
delete_collection({
"collection_name": "research_papers"
})
Tool | Description | Parameters | Returns |
---|---|---|---|
create_document |
Add a new document | document_id , content , optional metadata |
Success confirmation |
read_document |
Retrieve a document | document_id |
Document with content and metadata |
update_document |
Modify existing document | document_id , content , optional metadata |
Success confirmation |
delete_document |
Remove a document | document_id |
Success confirmation |
list_documents |
Get all documents | optional limit , offset |
List of documents |
bulk_create_documents |
Add multiple documents at once | documents (array of document objects) |
Success confirmation with count |
Tool | Description | Parameters | Returns |
---|---|---|---|
search_similar |
Find semantically similar documents | query , optional num_results , metadata_filter , content_filter |
Ranked documents with similarity scores |
hybrid_search |
Search with combined semantic and keyword matching | query , optional keyword_weight (0-1), num_results , metadata_filter |
Ranked documents with relevance scores |
multi_query_search |
Search with multiple queries | queries (array), optional aggregation ("union"/"intersection"), num_results , metadata_filter |
Combined ranked results from all queries |
Tool | Description | Parameters | Returns |
---|---|---|---|
create_collection |
Create a new collection | collection_name , optional description , metadata |
Success confirmation |
list_collections |
List all collections | None | List of collections with descriptions |
delete_collection |
Delete a collection | collection_name |
Success confirmation |
Tool | Description | Parameters | Returns |
---|---|---|---|
create_document_version |
Create a new version of a document | document_id , content , optional version_note , metadata |
Success confirmation with version number |
list_document_versions |
List all versions of a document | document_id |
Version history with timestamps and notes |
get_document_version |
Retrieve a specific version of a document | document_id , version (number or "latest") |
Document content and version metadata |
The server provides clear error messages for common scenarios:
Document already exists [id=X]
Document not found [id=X]
Invalid input: Missing document_id or content
Invalid filter
Operation failed: [details]
Collection already exists [name=X]
Collection not found [name=X]
Version X not found for document 'Y'
The MCP Inspector provides a web interface for testing server functionality:
npx @modelcontextprotocol/inspector chroma-mcp
For uv installations:
npx @modelcontextprotocol/inspector uv --directory C:/PATH/TO/YOUR/PROJECT run chroma-mcp
# Update dependencies
uv compile pyproject.toml
# Build package
uv build
Issue: Server fails to start
Solution: Check Python version (python --version
), ensure 3.12+ is installed
Issue: Document search returns unexpected results
Solution: Verify embedding model is loaded correctly; check query formatting
Issue: Cannot connect from AI assistant
Solution: Verify MCP configuration in assistant settings, check server logs for connection attempts
Issue: Document versions not appearing
Solution: Ensure you're using the correct document_id and check that versions were created successfully
If you encounter issues not covered here:
- Check server logs for detailed error messages
- Open an issue on the GitHub repository
- Contact HumainLabs.ai support
Contributions are welcome! Please read our Contributing Guidelines for details on:
- Code style and conventions
- Testing requirements
- Pull request process
- Bug reporting
This project is licensed under the MIT License - see the LICENSE file for details.
Maintained with β€οΈ by HumainLabs.ai
Built on ChromaDB and Model Context Protocol