A FastAPI-based MCP-compatible server that provides:
- Web search via LangSearch
- Live webpage scraping via an existing Chrome instance (remote debugging)
- Readable content extraction (Reader-mode quality)
- Persistent memory storage with SQLite
- RAG-powered semantic search using ChromaDB and embeddings
This server is designed to be used as a tool backend for LLM agents (MCP-style), RAG pipelines, or automated research workflows.
- 🔍 Web search using LangSearch API
- 🌐 Attach to a running Chrome browser via DevTools (CDP)
- 🔐 Reuse logged-in sessions, cookies, and profiles
- 📰 High-quality article extraction using Mozilla Readability
- 🧠 Persistent memory system with dual storage (SQLite + ChromaDB)
- 🔮 RAG-powered semantic search using sentence-transformers
- ⚡ Async FastAPI endpoints
- 🧩 Clean, deterministic JSON responses
- 🔄 Easy to extend with any summarization model (OpenAI, Ollama, local LLMs)
┌────────────┐
│ Client │ (LLM / MCP Agent / curl)
└─────┬──────┘
│
▼
┌────────────────────┐
│ FastAPI Server │
├────────────────────┤
│ /search │───▶ LangSearch API
│ /scrape │───▶ Chrome (CDP 9222)
│ /memory/store │───▶ SQLite + ChromaDB
│ /memory/retrieve │───▶ Vector Search (semantic)
│ │───▶ Keyword Search (SQL)
│ /memory/forget │───▶ Delete from both DBs
└────────────────────┘
Memory Store Request
↓
┌─────────────────────┐
│ /memory/store │
├─────────────────────┤
├─ SQLite │ ← Metadata, tags, timestamps
├─ ChromaDB │ ← Embeddings for semantic search
└─────────────────────┘
Memory Retrieve Request
↓
┌─────────────────────┐
│ /memory/retrieve │
├─────────────────────┤
├─ Semantic Search │ ← Vector similarity (default)
├─ Keyword Search │ ← SQL LIKE fallback
└─────────────────────┘
- Python 3.10+
- FastAPI – API framework
- Playwright – Chrome DevTools Protocol client
- Mozilla Readability – main content extraction
- BeautifulSoup – HTML cleaning
- Requests – LangSearch HTTP client
- Pydantic – request/response validation
- SQLite – metadata storage
- ChromaDB – vector database for embeddings
- sentence-transformers – local embedding generation (
all-MiniLM-L6-v2)
git clone <your-repo-url>
cd mcp-serverpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txt
playwright install chromiumThe following dependencies are included:
fastapi,uvicorn– web frameworkplaywright– browser automationchromadb– vector databasesentence-transformers– embedding generationrequests,beautifulsoup4,readability– web scraping
Create a .env file in the project root:
LANGSEARCH_API_KEY=your_langsearch_api_keyThis server does not launch Chrome itself. It attaches to an existing instance.
google-chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-profile✅ Benefits:
- Reuses logins (Google, Medium, etc.)
- Works with authenticated pages
- Mirrors real user browsing
⚠️ Ensure no other Chrome instance is already using that profile directory.
uvicorn app:app --host 0.0.0.0 --port 4444 --reloadOpen Swagger UI:
http://localhost:4444/docs
Endpoint
POST /search
Request Body
{
"query": "run llm on android phone",
"max_results": 3
}Response
{
"query": "run llm on android phone",
"results": [
{
"title": "How I ran a local LLM on my Android phone",
"url": "https://example.com",
"snippet": "I experimented with running LLMs locally..."
}
]
}Endpoint
POST /scrape
Request Body
{
"url": "https://en.wikipedia.org/wiki/William_Anderson_(RAAF_officer)",
"max_chars": 8000
}Response
{
"url": "https://en.wikipedia.org/wiki/...",
"title": "William Anderson (RAAF officer)",
"extracted_text": "William Anderson was born..."
}Endpoint
POST /memory/store
Request Body
{
"content": "Python is a high-level programming language known for its simplicity",
"type": "fact",
"tags": ["programming", "python"],
"source": "manual",
"confidence": 0.9
}Response
{
"memory_id": "550e8400-e29b-41d4-a716-446655440000",
"stored_at": "2026-01-13T10:30:00",
"message": "Memory stored successfully"
}Endpoint
POST /memory/retrieve
Request Body (Semantic Search - Default)
{
"query": "coding languages",
"limit": 5,
"search_type": "semantic"
}This will find memories that are semantically similar to "coding languages" (e.g., memories about Python, JavaScript, programming concepts).
Request Body (Keyword Search - Fallback)
{
"query": "Python",
"limit": 5,
"search_type": "keyword"
}This performs traditional SQL LIKE search (exact keyword matching).
Response
{
"query": "coding languages",
"count": 2,
"memories": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"content": "Python is a high-level programming language...",
"type": "fact",
"tags": ["programming", "python"],
"source": "manual",
"confidence": 0.9,
"created_at": "2026-01-13T10:30:00"
}
]
}Endpoint
POST /memory/forget
Request Body
{
"memory_id": "550e8400-e29b-41d4-a716-446655440000"
}Response
{
"memory_id": "550e8400-e29b-41d4-a716-446655440000",
"deleted": true,
"message": "Memory deleted"
}Endpoint
GET /memory/stats
Response
{
"total_memories": 42,
"by_type": {
"fact": 20,
"episodic": 15,
"semantic": 7
}
}- ChromaDB: Persistent vector storage in
./chroma_db/directory - Embedding Model:
all-MiniLM-L6-v2(~80MB, downloads on first run) - Embedding Dimension: 384
- Similarity Metric: Cosine similarity
- SQLite: Stores metadata (tags, source, confidence, timestamps)
- ChromaDB: Stores embeddings for semantic search
- Semantic Search (default): Uses vector embeddings to find conceptually similar memories
- Keyword Search: Traditional SQL
LIKEpattern matching
# Store a memory about Python
curl -X POST http://localhost:4444/memory/store \
-H "Content-Type: application/json" \
-d '{"content": "Python uses indentation for code blocks", "type": "fact"}'
# Semantic search finds it with related query
curl -X POST http://localhost:4444/memory/retrieve \
-H "Content-Type: application/json" \
-d '{"query": "programming language syntax"}'
# ✅ Returns the Python memory (semantic match)
# Keyword search requires exact match
curl -X POST http://localhost:4444/memory/retrieve \
-H "Content-Type: application/json" \
-d '{"query": "syntax", "search_type": "keyword"}'
# ❌ Won't find it unless "syntax" appears in contentTypical agent flow:
/search → select result.url
↓
/scrape(url) → extract content
↓
/memory/store → save important facts
↓
/memory/retrieve → semantic recall
↓
LLM reasoning / synthesis
This mirrors Anthropic-style browser MCP tools closely with added persistent memory.
- Do not expose this server publicly without:
- URL allowlists
- Authentication
- Rate limiting
- Chrome debugging gives full browser access
- Vector embeddings are stored locally (no external API calls)
- Treat this as a trusted internal service
- Ensure
Content-Type: application/json - Ensure
-dis passed in curl
- Verify Chrome is running on port
9222 - Check firewall / localhost access
- Page may be JS-heavy
- Add
wait_for_selector()for article content
- First run downloads the
all-MiniLM-L6-v2model (~80MB) - Subsequent runs are fast (model is cached)
- The embedding model runs on CPU by default
- For large batches, consider using a GPU or smaller model
Embeddings + semantic search✅ Done- Page reuse pool
- Content hashing + caching
- Streaming summaries
- Screenshot capture
- PDF extraction
- Unified MCP tool schema
- Hybrid search (combine semantic + keyword)
- Reranking with cross-encoders
MIT License
Inspired by:
- Browserless
- LangChain WebLoader
- Mozilla Readability
- Anthropic MCP browser tools
- ChromaDB
- sentence-transformers
If you're building an LLM agent, RAG pipeline, or research assistant, this server is meant to be extended—not locked down.
Happy hacking 🚀