CodeMem

CodeMem is an MCP (Model Context Protocol) server that provides semantic search and querying capabilities over your AI chat conversation history. It enables AI assistants like Claude to search through past conversations, retrieve relevant context, and answer questions about your interaction history.

Features

🔍 Semantic Search: BM25-based semantic search with smart tokenization (supports English and Chinese)
💾 SQL Queries: Direct SQL access for complex data analysis and custom queries
🎯 Regex Search: Pattern-based search for precise matching
⚡ High Performance: Async I/O, connection pooling, and query caching
📝 Markdown Export: Exports conversations to readable Markdown files
🔄 Multi-format Support: Handles JSON, JSONL, and various chat history formats

Installation

Prerequisites

Python 3.10 or higher
pip or uv package manager

Install from Source

Clone the repository:

git clone <repository-url>
cd CodeMem

Install dependencies using uv (recommended):

uv pip install -e .

Or using pip:

pip install -e .

Dependencies

The following packages will be installed automatically:

pydantic>=2.0.0 - Data validation
rank-bm25>=0.2.2 - BM25 search algorithm
tiktoken>=0.5.0 - Smart tokenization
aiosqlite>=0.22.0 - Async SQLite operations
mcp>=0.9.0 - Model Context Protocol

Usage

1. Configure MCP Client

Add CodeMem to your MCP client configuration (e.g., Claude Desktop):

For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "codemem": {
      "command": "python",
      "args": [
        "/path/to/CodeMem/mcp_server.py",
        "--db-path", "/path/to/your/chat_history.sqlite"
      ]
    }
  }
}

Command-line options:

--db-path: Path to SQLite database (default: ~/.codemem/chat_history.sqlite)
--data-dir: Directory containing chat history files to import

2. Prepare Your Chat History

CodeMem can import chat history from various sources:

# Import from a directory containing JSON/JSONL files
python mcp_server.py --data-dir /path/to/chat/logs --db-path ~/.codemem/chat_history.sqlite

The server will:

Scan the data directory for chat history files
Build a SQLite database with indexed conversations
Export sessions to Markdown files in ~/.codemem/md_sessions/
Build BM25 search indexes

3. Use the Tools

Once configured, your AI assistant can use three main tools:

semantic.search - Semantic Search

Search conversation history using natural language queries:

# Example query from Claude
semantic.search(
    query="How do I implement authentication?",
    top_k=10,
    mode="refs"  # Options: summary, refs, preview, full
)

Modes:

summary: Returns statistics and top 3 samples (most context-efficient)
refs: Returns reference IDs and metadata only (recommended)
preview: Returns first 100 characters preview
full: Returns complete content (use sparingly)

sql.query - SQL Queries

Execute SQL queries for complex analysis:

# Example queries
sql.query(
    query="SELECT * FROM events WHERE text LIKE '%authentication%' LIMIT 10",
    mode="summary"
)

sql.query(
    query="SELECT session_id, COUNT(*) as msg_count FROM events GROUP BY session_id ORDER BY msg_count DESC",
    mode="full"
)

Common queries:

SELECT * FROM events WHERE role='user' LIMIT 10 - Get user messages
SELECT COUNT(*) FROM events WHERE text LIKE '%keyword%' - Count matches
SELECT session_id, MIN(timestamp) as start_time FROM events GROUP BY session_id - Session stats

regex.search - Pattern Matching

Search using regular expressions:

# Example patterns
regex.search(
    pattern=r"async def \w+\(.*\):",  # Find async function definitions
    mode="summary"
)

regex.search(
    pattern=r"https?://\S+",  # Find URLs
    mode="preview"
)

How It Works

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      MCP Client (Claude)                     │
└───────────────────────────┬─────────────────────────────────┘
                            │ MCP Protocol
┌───────────────────────────▼─────────────────────────────────┐
│                    CodeMem MCP Server                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Semantic   │  │  SQL Query   │  │    Regex     │     │
│  │    Search    │  │   Engine     │  │   Search     │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                  │                  │              │
│         └──────────────────┼──────────────────┘              │
│                            │                                 │
│  ┌─────────────────────────▼──────────────────────────┐    │
│  │           SQLite Database + BM25 Index             │    │
│  │  • events table (session_id, role, text, etc.)    │    │
│  │  • BM25 index for semantic search                 │    │
│  │  • Query cache for performance                    │    │
│  └────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│              Data Sources & Exports                          │
│  • JSON/JSONL chat logs                                     │
│  • Markdown exports (~/.codemem/md_sessions/)              │
└─────────────────────────────────────────────────────────────┘

Workflow

Initialization:
- Server starts and loads configuration
- Scans data directory for chat history files
- Parses JSON/JSONL files and normalizes data
Database Building:
- Creates SQLite database with events table
- Stores: session_id, timestamp, role, content, metadata
- Builds indexes for fast querying
Markdown Export:
- Exports each session to a formatted Markdown file
- Stored in ~/.codemem/md_sessions/
- Used for BM25 search indexing
BM25 Index Building:
- Tokenizes Markdown content using tiktoken
- Builds BM25Okapi index for semantic search
- Supports multilingual tokenization
Query Processing:
- Receives tool calls from MCP client
- Checks query cache for recent results
- Executes search/query with appropriate strategy
- Returns results in requested format

Data Model

The SQLite database uses the following schema:

CREATE TABLE events (
    session_id TEXT,      -- Unique session identifier
    timestamp INTEGER,    -- Unix timestamp
    role TEXT,           -- 'user' or 'assistant'
    text TEXT,           -- Message content
    content_json TEXT,   -- Full content structure (JSON)
    source TEXT,         -- Source file path
    content_hash TEXT    -- Deduplication hash
);

Performance Optimizations

Async I/O: Non-blocking database operations with aiosqlite
Connection Pooling: Reuses database connections
Query Caching: 1-hour TTL cache for repeated queries (max 100 entries)
Parallel Processing: Multi-threaded index building
Smart Tokenization: tiktoken-based tokenization for better search quality

Configuration

Environment Variables

CODEMEM_DB_PATH: Default database path (default: ~/.codemem/chat_history.sqlite)
CODEMEM_CACHE_SIZE: Query cache size (default: 100)
CODEMEM_CACHE_TTL: Cache TTL in seconds (default: 3600)

Directory Structure

~/.codemem/
├── chat_history.sqlite       # Main database
├── chat_history.sqlite-journal  # SQLite journal
└── md_sessions/              # Exported Markdown files
    ├── session_abc123.md
    ├── session_def456.md
    └── ...

Troubleshooting

Database not found

Error: Markdown sessions directory not found

Solution: Restart the MCP server to rebuild the database and export markdown files.

BM25 index not ready

Error: BM25 index not built yet

Solution: Wait a few seconds for initialization to complete. The index is built in the background.

No results found

Check that your data directory contains valid JSON/JSONL files
Verify the database path is correct
Try using SQL queries to inspect the database directly

Development

Running Tests

# Run all tests (if available)
python -m pytest tests/

# Test database building
python unified_history.py --data-dir /path/to/data --db-path test.sqlite

# Test markdown export
python export_sessions_md.py --db-path test.sqlite --output-dir ./test_md

Project Structure

CodeMem/
├── mcp_server.py           # Main MCP server
├── unified_history.py      # Chat history loader
├── export_sessions_md.py   # Markdown exporter
├── models.py              # Data models
├── pyproject.toml         # Project configuration
└── README.md              # This file

License

[Add your license here]

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Acknowledgments

Built with MCP (Model Context Protocol)
Uses BM25 for semantic search
Powered by tiktoken for tokenization

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.gitignore		.gitignore
README.md		README.md
export_sessions_md.py		export_sessions_md.py
mcp_server.py		mcp_server.py
models.py		models.py
pyproject.toml		pyproject.toml
unified_history.py		unified_history.py
uv.lock		uv.lock
uv.toml		uv.toml

Folders and files

Latest commit

History

Repository files navigation

CodeMem

Features

Installation

Prerequisites

Install from Source

Dependencies

Usage

1. Configure MCP Client

2. Prepare Your Chat History

3. Use the Tools

semantic.search - Semantic Search

sql.query - SQL Queries

regex.search - Pattern Matching

How It Works

Architecture

Workflow

Data Model

Performance Optimizations

Configuration

Environment Variables

Directory Structure

Troubleshooting

Database not found

BM25 index not ready

No results found

Development

Running Tests

Project Structure

License

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages