Skip to content

A zotero-mcp fork that runs on PostgreSQL and ollama instead of transformers and chroma and support full-text indexing

License

Notifications You must be signed in to change notification settings

tspspi/zotero-mcp-postgres-ollama-fulltext

Repository files navigation

Zotero MCP Server

A Model Context Protocol (MCP) server for Zotero that provides semantic search capabilities using PostgreSQL with pg-vector and OpenAI/Ollama embeddings.

This is a fork of the excellent zotero-mcp project with modifications to match my personal workflow (pg-vector instead of chroma, ollama and openai backend instead of local transformers, etc.). I am still in progress of refactoring to fit this project to my personal needs

THIS IS NOT THE OFFICIAL PROJECT AND MY MODIFICATIONY MAY HAVE BUGS. I just use this version for my personal research projects.

At the moment I use the version in this repository against my own OpenAI compatible API gateway.

Features

  • Full Zotero Integration: Access your Zotero library through MCP tools
  • Semantic Search: AI-powered semantic search using PostgreSQL + pg-vector
  • Multiple Embedding Providers: Support for OpenAI and Ollama embeddings
  • Lightweight Architecture: Removed heavy ML dependencies (torch, transformers)
  • High Performance: PostgreSQL backend with optimized vector operations
  • Flexible Configuration: Support for local and remote database instances

Quick Start

Prerequisites

  • Python 3.10+
  • PostgreSQL 15+ with pg-vector extension
  • Zotero desktop application or Zotero Web API credentials
  • OpenAI API key or Ollama installation

Installation

pip install -e .

PostgreSQL Setup

If you have access to a PostgreSQL instance with pg-vector:

-- Connect to your PostgreSQL instance
CREATE DATABASE zotero_mcp;
CREATE USER zotero_user WITH PASSWORD 'your_password';
GRANT ALL PRIVILEGES ON DATABASE zotero_mcp TO zotero_user;

-- Enable pg-vector extension
\c zotero_mcp
CREATE EXTENSION vector;

Configuration

Run the interactive setup:

zotero-mcp setup

Usage with Claude Desktop

{
  "mcpServers": {
    "zotero": {
      "command": "/path/to/zotero-mcp",
      "env": {
        "ZOTERO_DB_HOST": "your_host",
        "ZOTERO_DB_NAME": "zotero_mcp",
        "ZOTERO_EMBEDDING_PROVIDER": "ollama",
        "OLLAMA_HOST": "your_ollama_host"
      }
    }
  }
}

Configuration

Database Configuration

Create ~/.config/zotero-mcp/config.json:

{
  "database": {
    "host": "localhost",
    "port": 5432,
    "database": "zotero_mcp",
    "username": "zotero_user",
    "password": "your_password",
    "schema": "public",
    "pool_size": 5
  },
  "embedding": {
    "provider": "ollama",
    "openai": {
      "api_key": "sk-...",
      "model": "text-embedding-3-small",
      "batch_size": 100
    },
    "ollama": {
      "host": "192.168.1.189:8182",
      "model": "nomic-embed-text",
      "timeout": 60
    }
  },
  "chunking": {
    "chunk_size": 1000,
    "overlap": 100,
    "min_chunk_size": 100,
    "max_chunks_per_item": 10,
    "chunking_strategy": "sentences"
  },
  "semantic_search": {
    "similarity_threshold": 0.7,
    "max_results": 50,
    "update_config": {
      "auto_update": false,
      "update_frequency": "manual",
      "batch_size": 50,
      "parallel_workers": 4
    }
  }
}

Available Tools

Core Zotero Tools

  • zotero_search_items - Search items by text query
  • zotero_search_by_tag - Search items by tags
  • zotero_get_item_metadata - Get item details and metadata
  • zotero_get_item_fulltext - Extract full text from attachments
  • zotero_get_collections - List all collections
  • zotero_get_collection_items - Get items in a collection
  • zotero_get_recent - Get recently added items
  • zotero_get_tags - List all tags
  • zotero_batch_update_tags - Bulk update tags

Semantic Search Tools

  • zotero_semantic_search - AI-powered semantic search
  • zotero_update_search_database - Update embedding database
  • zotero_get_search_database_status - Check database status

Advanced Tools

  • zotero_get_annotations - Extract annotations from PDFs
  • zotero_get_notes - Retrieve notes
  • zotero_search_notes - Search through notes
  • zotero_create_note - Create new notes
  • zotero_advanced_search - Complex multi-criteria search

Semantic Search

The semantic search uses PostgreSQL with pg-vector for efficient vector similarity search:

Database Population

# Initial database population
zotero-mcp update-db --force-rebuild

# Incremental updates
zotero-mcp update-db

# Update with limit (for testing)
zotero-mcp update-db --limit 100

# Check status
zotero-mcp status

Embedding Providers

OpenAI (Recommended)

{
  "embedding": {
    "provider": "openai",
    "openai": {
      "api_key": "sk-...",
      "model": "text-embedding-3-small",
      "batch_size": 100,
      "rate_limit_rpm": 3000
    }
  }
}

Models Available:

  • text-embedding-3-small (1536 dimensions) - Fast and efficient
  • text-embedding-3-large (3072 dimensions) - Higher quality
  • text-embedding-ada-002 (1536 dimensions) - Legacy model

Ollama (Local)

{
  "embedding": {
    "provider": "ollama", 
    "ollama": {
      "host": "http://localhost:11434",
      "model": "nomic-embed-text",
      "timeout": 60
    }
  }
}

Popular Models:

  • nomic-embed-text - Good general purpose embeddings
  • all-minilm - Lightweight and fast
  • mxbai-embed-large - High quality embeddings

To install Ollama models:

ollama pull nomic-embed-text

Architecture

Component Overview

┌─────────────────┐    ┌─────────────────┐
│   Claude MCP    │───▶│  FastMCP Server │
│    Client       │    │   (server.py)   │
└─────────────────┘    └─────────────────┘
                               │
                               ▼
                    ┌─────────────────┐
                    │ Semantic Search │
                    │ (semantic_search.py) │
                    └─────────────────┘
                               │
                    ┌──────────┴──────────┐
                    ▼                     ▼
              ┌──────────────┐    ┌──────────────┐
              │ Vector Client│    │  Embedding   │
              │(vector_client)│    │   Service    │
              └──────────────┘    │(embedding_   │
                     │            │ service.py)  │
                     ▼            └──────────────┘
              ┌──────────────┐           │
              │ PostgreSQL   │           ▼
              │   + pgvector │    ┌──────────────┐
              └──────────────┘    │ OpenAI/Ollama│
                                  │   APIs       │
                                  └──────────────┘

Database Schema

-- Core embeddings table
CREATE TABLE zotero_embeddings (
    id SERIAL PRIMARY KEY,
    item_key VARCHAR(50) UNIQUE NOT NULL,
    item_type VARCHAR(50) NOT NULL,
    title TEXT,
    content TEXT NOT NULL,
    content_hash VARCHAR(64) NOT NULL,
    embedding vector(1536),
    embedding_model VARCHAR(100) NOT NULL,
    embedding_provider VARCHAR(50) NOT NULL,
    metadata JSONB NOT NULL DEFAULT '{}',
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Optimized indexes
CREATE INDEX idx_zotero_embedding_cosine 
    ON zotero_embeddings USING ivfflat (embedding vector_cosine_ops) 
    WITH (lists = 100);
CREATE INDEX idx_zotero_metadata_gin 
    ON zotero_embeddings USING gin(metadata);

License

MIT License - see LICENSE file for details.

About

A zotero-mcp fork that runs on PostgreSQL and ollama instead of transformers and chroma and support full-text indexing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages