CodeMem is an MCP (Model Context Protocol) server that provides semantic search and querying capabilities over your AI chat conversation history. It enables AI assistants like Claude to search through past conversations, retrieve relevant context, and answer questions about your interaction history.
- 🔍 Semantic Search: BM25-based semantic search with smart tokenization (supports English and Chinese)
- 💾 SQL Queries: Direct SQL access for complex data analysis and custom queries
- 🎯 Regex Search: Pattern-based search for precise matching
- ⚡ High Performance: Async I/O, connection pooling, and query caching
- 📝 Markdown Export: Exports conversations to readable Markdown files
- 🔄 Multi-format Support: Handles JSON, JSONL, and various chat history formats
- Python 3.10 or higher
- pip or uv package manager
- Clone the repository:
git clone <repository-url>
cd CodeMem- Install dependencies using uv (recommended):
uv pip install -e .Or using pip:
pip install -e .The following packages will be installed automatically:
pydantic>=2.0.0- Data validationrank-bm25>=0.2.2- BM25 search algorithmtiktoken>=0.5.0- Smart tokenizationaiosqlite>=0.22.0- Async SQLite operationsmcp>=0.9.0- Model Context Protocol
Add CodeMem to your MCP client configuration (e.g., Claude Desktop):
For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"codemem": {
"command": "python",
"args": [
"/path/to/CodeMem/mcp_server.py",
"--db-path", "/path/to/your/chat_history.sqlite"
]
}
}
}Command-line options:
--db-path: Path to SQLite database (default:~/.codemem/chat_history.sqlite)--data-dir: Directory containing chat history files to import
CodeMem can import chat history from various sources:
# Import from a directory containing JSON/JSONL files
python mcp_server.py --data-dir /path/to/chat/logs --db-path ~/.codemem/chat_history.sqliteThe server will:
- Scan the data directory for chat history files
- Build a SQLite database with indexed conversations
- Export sessions to Markdown files in
~/.codemem/md_sessions/ - Build BM25 search indexes
Once configured, your AI assistant can use three main tools:
Search conversation history using natural language queries:
# Example query from Claude
semantic.search(
query="How do I implement authentication?",
top_k=10,
mode="refs" # Options: summary, refs, preview, full
)Modes:
summary: Returns statistics and top 3 samples (most context-efficient)refs: Returns reference IDs and metadata only (recommended)preview: Returns first 100 characters previewfull: Returns complete content (use sparingly)
Execute SQL queries for complex analysis:
# Example queries
sql.query(
query="SELECT * FROM events WHERE text LIKE '%authentication%' LIMIT 10",
mode="summary"
)
sql.query(
query="SELECT session_id, COUNT(*) as msg_count FROM events GROUP BY session_id ORDER BY msg_count DESC",
mode="full"
)Common queries:
SELECT * FROM events WHERE role='user' LIMIT 10- Get user messagesSELECT COUNT(*) FROM events WHERE text LIKE '%keyword%'- Count matchesSELECT session_id, MIN(timestamp) as start_time FROM events GROUP BY session_id- Session stats
Search using regular expressions:
# Example patterns
regex.search(
pattern=r"async def \w+\(.*\):", # Find async function definitions
mode="summary"
)
regex.search(
pattern=r"https?://\S+", # Find URLs
mode="preview"
)┌─────────────────────────────────────────────────────────────┐
│ MCP Client (Claude) │
└───────────────────────────┬─────────────────────────────────┘
│ MCP Protocol
┌───────────────────────────▼─────────────────────────────────┐
│ CodeMem MCP Server │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Semantic │ │ SQL Query │ │ Regex │ │
│ │ Search │ │ Engine │ │ Search │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ │ │
│ ┌─────────────────────────▼──────────────────────────┐ │
│ │ SQLite Database + BM25 Index │ │
│ │ • events table (session_id, role, text, etc.) │ │
│ │ • BM25 index for semantic search │ │
│ │ • Query cache for performance │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────────────────▼─────────────────────────────────┐
│ Data Sources & Exports │
│ • JSON/JSONL chat logs │
│ • Markdown exports (~/.codemem/md_sessions/) │
└─────────────────────────────────────────────────────────────┘
-
Initialization:
- Server starts and loads configuration
- Scans data directory for chat history files
- Parses JSON/JSONL files and normalizes data
-
Database Building:
- Creates SQLite database with
eventstable - Stores: session_id, timestamp, role, content, metadata
- Builds indexes for fast querying
- Creates SQLite database with
-
Markdown Export:
- Exports each session to a formatted Markdown file
- Stored in
~/.codemem/md_sessions/ - Used for BM25 search indexing
-
BM25 Index Building:
- Tokenizes Markdown content using tiktoken
- Builds BM25Okapi index for semantic search
- Supports multilingual tokenization
-
Query Processing:
- Receives tool calls from MCP client
- Checks query cache for recent results
- Executes search/query with appropriate strategy
- Returns results in requested format
The SQLite database uses the following schema:
CREATE TABLE events (
session_id TEXT, -- Unique session identifier
timestamp INTEGER, -- Unix timestamp
role TEXT, -- 'user' or 'assistant'
text TEXT, -- Message content
content_json TEXT, -- Full content structure (JSON)
source TEXT, -- Source file path
content_hash TEXT -- Deduplication hash
);- Async I/O: Non-blocking database operations with aiosqlite
- Connection Pooling: Reuses database connections
- Query Caching: 1-hour TTL cache for repeated queries (max 100 entries)
- Parallel Processing: Multi-threaded index building
- Smart Tokenization: tiktoken-based tokenization for better search quality
CODEMEM_DB_PATH: Default database path (default:~/.codemem/chat_history.sqlite)CODEMEM_CACHE_SIZE: Query cache size (default: 100)CODEMEM_CACHE_TTL: Cache TTL in seconds (default: 3600)
~/.codemem/
├── chat_history.sqlite # Main database
├── chat_history.sqlite-journal # SQLite journal
└── md_sessions/ # Exported Markdown files
├── session_abc123.md
├── session_def456.md
└── ...
Error: Markdown sessions directory not found
Solution: Restart the MCP server to rebuild the database and export markdown files.
Error: BM25 index not built yet
Solution: Wait a few seconds for initialization to complete. The index is built in the background.
- Check that your data directory contains valid JSON/JSONL files
- Verify the database path is correct
- Try using SQL queries to inspect the database directly
# Run all tests (if available)
python -m pytest tests/
# Test database building
python unified_history.py --data-dir /path/to/data --db-path test.sqlite
# Test markdown export
python export_sessions_md.py --db-path test.sqlite --output-dir ./test_mdCodeMem/
├── mcp_server.py # Main MCP server
├── unified_history.py # Chat history loader
├── export_sessions_md.py # Markdown exporter
├── models.py # Data models
├── pyproject.toml # Project configuration
└── README.md # This file
[Add your license here]
Contributions are welcome! Please feel free to submit issues or pull requests.
- Built with MCP (Model Context Protocol)
- Uses BM25 for semantic search
- Powered by tiktoken for tokenization