MCP Memory Server with RAG

A FastAPI-based MCP-compatible server that provides:

Web search via LangSearch
Live webpage scraping via an existing Chrome instance (remote debugging)
Readable content extraction (Reader-mode quality)
Persistent memory storage with SQLite
RAG-powered semantic search using ChromaDB and embeddings

This server is designed to be used as a tool backend for LLM agents (MCP-style), RAG pipelines, or automated research workflows.

✨ Features

🔍 Web search using LangSearch API
🌐 Attach to a running Chrome browser via DevTools (CDP)
🔐 Reuse logged-in sessions, cookies, and profiles
📰 High-quality article extraction using Mozilla Readability
🧠 Persistent memory system with dual storage (SQLite + ChromaDB)
🔮 RAG-powered semantic search using sentence-transformers
⚡ Async FastAPI endpoints
🧩 Clean, deterministic JSON responses
🔄 Easy to extend with any summarization model (OpenAI, Ollama, local LLMs)

🏗 Architecture Overview

┌────────────┐
│   Client   │ (LLM / MCP Agent / curl)
└─────┬──────┘
      │
      ▼
┌────────────────────┐
│  FastAPI Server    │
├────────────────────┤
│ /search            │───▶ LangSearch API
│ /scrape            │───▶ Chrome (CDP 9222)
│ /memory/store      │───▶ SQLite + ChromaDB
│ /memory/retrieve   │───▶ Vector Search (semantic)
│                    │───▶ Keyword Search (SQL)
│ /memory/forget     │───▶ Delete from both DBs
└────────────────────┘

Memory Storage Architecture

Memory Store Request
    ↓
┌─────────────────────┐
│ /memory/store       │
├─────────────────────┤
├─ SQLite            │ ← Metadata, tags, timestamps
├─ ChromaDB          │ ← Embeddings for semantic search
└─────────────────────┘

Memory Retrieve Request
    ↓
┌─────────────────────┐
│ /memory/retrieve    │
├─────────────────────┤
├─ Semantic Search   │ ← Vector similarity (default)
├─ Keyword Search    │ ← SQL LIKE fallback
└─────────────────────┘

📦 Tech Stack

Python 3.10+
FastAPI – API framework
Playwright – Chrome DevTools Protocol client
Mozilla Readability – main content extraction
BeautifulSoup – HTML cleaning
Requests – LangSearch HTTP client
Pydantic – request/response validation
SQLite – metadata storage
ChromaDB – vector database for embeddings
sentence-transformers – local embedding generation (all-MiniLM-L6-v2)

🚀 Setup Instructions

1️⃣ Clone the repository

git clone <your-repo-url>
cd mcp-server

2️⃣ Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3️⃣ Install dependencies

pip install -r requirements.txt
playwright install chromium

The following dependencies are included:

fastapi, uvicorn – web framework
playwright – browser automation
chromadb – vector database
sentence-transformers – embedding generation
requests, beautifulsoup4, readability – web scraping

4️⃣ Configure environment variables

Create a .env file in the project root:

LANGSEARCH_API_KEY=your_langsearch_api_key

🌐 Chrome Remote Debugging Setup

This server does not launch Chrome itself. It attaches to an existing instance.

Start Chrome with debugging enabled

google-chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome-profile

✅ Benefits:

Reuses logins (Google, Medium, etc.)
Works with authenticated pages
Mirrors real user browsing

⚠️ Ensure no other Chrome instance is already using that profile directory.

▶️ Running the Server

uvicorn app:app --host 0.0.0.0 --port 4444 --reload

Open Swagger UI:

http://localhost:4444/docs

🔍 API Reference

1️⃣ Web Search

Endpoint

POST /search

Request Body

{
  "query": "run llm on android phone",
  "max_results": 3
}

Response

{
  "query": "run llm on android phone",
  "results": [
    {
      "title": "How I ran a local LLM on my Android phone",
      "url": "https://example.com",
      "snippet": "I experimented with running LLMs locally..."
    }
  ]
}

2️⃣ Scrape Webpage

Endpoint

POST /scrape

Request Body

{
  "url": "https://en.wikipedia.org/wiki/William_Anderson_(RAAF_officer)",
  "max_chars": 8000
}

Response

{
  "url": "https://en.wikipedia.org/wiki/...",
  "title": "William Anderson (RAAF officer)",
  "extracted_text": "William Anderson was born..."
}

3️⃣ Memory Storage (NEW)

Endpoint

POST /memory/store

Request Body

{
  "content": "Python is a high-level programming language known for its simplicity",
  "type": "fact",
  "tags": ["programming", "python"],
  "source": "manual",
  "confidence": 0.9
}

Response

{
  "memory_id": "550e8400-e29b-41d4-a716-446655440000",
  "stored_at": "2026-01-13T10:30:00",
  "message": "Memory stored successfully"
}

4️⃣ Memory Retrieval with RAG (NEW)

Endpoint

POST /memory/retrieve

Request Body (Semantic Search - Default)

{
  "query": "coding languages",
  "limit": 5,
  "search_type": "semantic"
}

This will find memories that are semantically similar to "coding languages" (e.g., memories about Python, JavaScript, programming concepts).

Request Body (Keyword Search - Fallback)

{
  "query": "Python",
  "limit": 5,
  "search_type": "keyword"
}

This performs traditional SQL LIKE search (exact keyword matching).

Response

{
  "query": "coding languages",
  "count": 2,
  "memories": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "content": "Python is a high-level programming language...",
      "type": "fact",
      "tags": ["programming", "python"],
      "source": "manual",
      "confidence": 0.9,
      "created_at": "2026-01-13T10:30:00"
    }
  ]
}

5️⃣ Memory Deletion (NEW)

Endpoint

POST /memory/forget

Request Body

{
  "memory_id": "550e8400-e29b-41d4-a716-446655440000"
}

Response

{
  "memory_id": "550e8400-e29b-41d4-a716-446655440000",
  "deleted": true,
  "message": "Memory deleted"
}

6️⃣ Memory Statistics (NEW)

Endpoint

GET /memory/stats

Response

{
  "total_memories": 42,
  "by_type": {
    "fact": 20,
    "episodic": 15,
    "semantic": 7
  }
}

🧠 RAG Implementation Details

Vector Database

ChromaDB: Persistent vector storage in ./chroma_db/ directory
Embedding Model: all-MiniLM-L6-v2 (~80MB, downloads on first run)
Embedding Dimension: 384
Similarity Metric: Cosine similarity

Dual Storage System

SQLite: Stores metadata (tags, source, confidence, timestamps)
ChromaDB: Stores embeddings for semantic search

Search Modes

Semantic Search (default): Uses vector embeddings to find conceptually similar memories
Keyword Search: Traditional SQL LIKE pattern matching

Example Use Case

# Store a memory about Python
curl -X POST http://localhost:4444/memory/store \
  -H "Content-Type: application/json" \
  -d '{"content": "Python uses indentation for code blocks", "type": "fact"}'

# Semantic search finds it with related query
curl -X POST http://localhost:4444/memory/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "programming language syntax"}'
# ✅ Returns the Python memory (semantic match)

# Keyword search requires exact match
curl -X POST http://localhost:4444/memory/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "syntax", "search_type": "keyword"}'
# ❌ Won't find it unless "syntax" appears in content

🧩 MCP Integration Pattern

Typical agent flow:

/search → select result.url
      ↓
/scrape(url) → extract content
      ↓
/memory/store → save important facts
      ↓
/memory/retrieve → semantic recall
      ↓
LLM reasoning / synthesis

This mirrors Anthropic-style browser MCP tools closely with added persistent memory.

🔒 Security Considerations

Do not expose this server publicly without:
- URL allowlists
- Authentication
- Rate limiting
Chrome debugging gives full browser access
Vector embeddings are stored locally (no external API calls)
Treat this as a trusted internal service

🛠 Troubleshooting

FastAPI error: `Field required (body)`

Ensure Content-Type: application/json
Ensure -d is passed in curl

Playwright cannot connect

Verify Chrome is running on port 9222
Check firewall / localhost access

Empty extracted text

Page may be JS-heavy
Add wait_for_selector() for article content

ChromaDB initialization slow

First run downloads the all-MiniLM-L6-v2 model (~80MB)
Subsequent runs are fast (model is cached)

Out of memory during embedding

The embedding model runs on CPU by default
For large batches, consider using a GPU or smaller model

🗺 Roadmap / Ideas

~~Embeddings + semantic search~~ ✅ Done
Page reuse pool
Content hashing + caching
Streaming summaries
Screenshot capture
PDF extraction
Unified MCP tool schema
Hybrid search (combine semantic + keyword)
Reranking with cross-encoders

📄 License

MIT License

🙌 Acknowledgements

Inspired by:

Browserless
LangChain WebLoader
Mozilla Readability
Anthropic MCP browser tools
ChromaDB
sentence-transformers

If you're building an LLM agent, RAG pipeline, or research assistant, this server is meant to be extended—not locked down.

Happy hacking 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
chroma_db		chroma_db
.gitignore		.gitignore
README.md		README.md
app.py		app.py
db.py		db.py
memory.db		memory.db
models.py		models.py
requirements.txt		requirements.txt
vector_db.py		vector_db.py

Folders and files

Latest commit

History

Repository files navigation

MCP Memory Server with RAG

✨ Features

🏗 Architecture Overview

Memory Storage Architecture

📦 Tech Stack

🚀 Setup Instructions

1️⃣ Clone the repository

2️⃣ Create a virtual environment

3️⃣ Install dependencies

4️⃣ Configure environment variables

🌐 Chrome Remote Debugging Setup

Start Chrome with debugging enabled

▶️ Running the Server

🔍 API Reference

1️⃣ Web Search

2️⃣ Scrape Webpage

3️⃣ Memory Storage (NEW)

4️⃣ Memory Retrieval with RAG (NEW)

5️⃣ Memory Deletion (NEW)

6️⃣ Memory Statistics (NEW)

🧠 RAG Implementation Details

Vector Database

Dual Storage System

Search Modes

Example Use Case

🧩 MCP Integration Pattern

🔒 Security Considerations

🛠 Troubleshooting

FastAPI error: Field required (body)

Playwright cannot connect

Empty extracted text

ChromaDB initialization slow

Out of memory during embedding

🗺 Roadmap / Ideas

📄 License

🙌 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

FastAPI error: `Field required (body)`

Packages