A Model Context Protocol (MCP) server for Zotero that provides semantic search capabilities using PostgreSQL with pg-vector and OpenAI/Ollama embeddings.
This is a fork of the excellent zotero-mcp project with modifications to match my personal workflow (pg-vector instead of chroma, ollama and openai backend instead of local transformers, etc.). I am still in progress of refactoring to fit this project to my personal needs
THIS IS NOT THE OFFICIAL PROJECT AND MY MODIFICATIONY MAY HAVE BUGS. I just use this version for my personal research projects.
At the moment I use the version in this repository against my own OpenAI compatible API gateway.
- Full Zotero Integration: Access your Zotero library through MCP tools
- Semantic Search: AI-powered semantic search using PostgreSQL + pg-vector
- Multiple Embedding Providers: Support for OpenAI and Ollama embeddings
- Lightweight Architecture: Removed heavy ML dependencies (torch, transformers)
- High Performance: PostgreSQL backend with optimized vector operations
- Flexible Configuration: Support for local and remote database instances
- Python 3.10+
- PostgreSQL 15+ with pg-vector extension
- Zotero desktop application or Zotero Web API credentials
- OpenAI API key or Ollama installation
pip install -e .
If you have access to a PostgreSQL instance with pg-vector:
-- Connect to your PostgreSQL instance
CREATE DATABASE zotero_mcp;
CREATE USER zotero_user WITH PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE zotero_mcp TO zotero_user;
-- Enable pg-vector extension
\c zotero_mcp
CREATE EXTENSION vector;Run the interactive setup:
zotero-mcp setup{
"mcpServers": {
"zotero": {
"command": "/path/to/zotero-mcp",
"env": {
"ZOTERO_DB_HOST": "your_host",
"ZOTERO_DB_NAME": "zotero_mcp",
"ZOTERO_EMBEDDING_PROVIDER": "ollama",
"OLLAMA_HOST": "your_ollama_host"
}
}
}
}Create ~/.config/zotero-mcp/config.json:
{
"database": {
"host": "localhost",
"port": 5432,
"database": "zotero_mcp",
"username": "zotero_user",
"password": "your_password",
"schema": "public",
"pool_size": 5
},
"embedding": {
"provider": "ollama",
"openai": {
"api_key": "sk-...",
"model": "text-embedding-3-small",
"batch_size": 100
},
"ollama": {
"host": "192.168.1.189:8182",
"model": "nomic-embed-text",
"timeout": 60
}
},
"chunking": {
"chunk_size": 1000,
"overlap": 100,
"min_chunk_size": 100,
"max_chunks_per_item": 10,
"chunking_strategy": "sentences"
},
"semantic_search": {
"similarity_threshold": 0.7,
"max_results": 50,
"update_config": {
"auto_update": false,
"update_frequency": "manual",
"batch_size": 50,
"parallel_workers": 4
}
}
}zotero_search_items- Search items by text queryzotero_search_by_tag- Search items by tagszotero_get_item_metadata- Get item details and metadatazotero_get_item_fulltext- Extract full text from attachmentszotero_get_collections- List all collectionszotero_get_collection_items- Get items in a collectionzotero_get_recent- Get recently added itemszotero_get_tags- List all tagszotero_batch_update_tags- Bulk update tags
zotero_semantic_search- AI-powered semantic searchzotero_update_search_database- Update embedding databasezotero_get_search_database_status- Check database status
zotero_get_annotations- Extract annotations from PDFszotero_get_notes- Retrieve noteszotero_search_notes- Search through noteszotero_create_note- Create new noteszotero_advanced_search- Complex multi-criteria search
The semantic search uses PostgreSQL with pg-vector for efficient vector similarity search:
# Initial database population
zotero-mcp update-db --force-rebuild
# Incremental updates
zotero-mcp update-db
# Update with limit (for testing)
zotero-mcp update-db --limit 100
# Check status
zotero-mcp status{
"embedding": {
"provider": "openai",
"openai": {
"api_key": "sk-...",
"model": "text-embedding-3-small",
"batch_size": 100,
"rate_limit_rpm": 3000
}
}
}Models Available:
text-embedding-3-small(1536 dimensions) - Fast and efficienttext-embedding-3-large(3072 dimensions) - Higher qualitytext-embedding-ada-002(1536 dimensions) - Legacy model
{
"embedding": {
"provider": "ollama",
"ollama": {
"host": "http://localhost:11434",
"model": "nomic-embed-text",
"timeout": 60
}
}
}Popular Models:
nomic-embed-text- Good general purpose embeddingsall-minilm- Lightweight and fastmxbai-embed-large- High quality embeddings
To install Ollama models:
ollama pull nomic-embed-text┌─────────────────┐ ┌─────────────────┐
│ Claude MCP │───▶│ FastMCP Server │
│ Client │ │ (server.py) │
└─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Semantic Search │
│ (semantic_search.py) │
└─────────────────┘
│
┌──────────┴──────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Vector Client│ │ Embedding │
│(vector_client)│ │ Service │
└──────────────┘ │(embedding_ │
│ │ service.py) │
▼ └──────────────┘
┌──────────────┐ │
│ PostgreSQL │ ▼
│ + pgvector │ ┌──────────────┐
└──────────────┘ │ OpenAI/Ollama│
│ APIs │
└──────────────┘
-- Core embeddings table
CREATE TABLE zotero_embeddings (
id SERIAL PRIMARY KEY,
item_key VARCHAR(50) UNIQUE NOT NULL,
item_type VARCHAR(50) NOT NULL,
title TEXT,
content TEXT NOT NULL,
content_hash VARCHAR(64) NOT NULL,
embedding vector(1536),
embedding_model VARCHAR(100) NOT NULL,
embedding_provider VARCHAR(50) NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Optimized indexes
CREATE INDEX idx_zotero_embedding_cosine
ON zotero_embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
CREATE INDEX idx_zotero_metadata_gin
ON zotero_embeddings USING gin(metadata);MIT License - see LICENSE file for details.