AI-Powered Personal Research Knowledge Graph
DeepTrail transforms your browsing activity into a semantic knowledge graph. Unlike traditional bookmarks or browser history, DeepTrail understands meaning β it connects pages by topic, summarizes content automatically, and lets you query your entire research history using natural language.
"What was I reading about transformer architectures last week?"
DeepTrail finds the answer from your browsing history and shows you exactly which pages it used.
- Auto-Capture β Chrome extension silently tracks pages you visit (with smart 30-min deduplication)
- Semantic Embeddings β Every page is converted into a 768-dimension vector that captures its meaning, not just keywords
- Knowledge Graph β Pages with similar content are automatically connected. See your research as an interactive graph
- AI Summaries β Every captured page gets a 2-4 sentence summary + keywords, generated by GPT-4o-mini
- RAG Search β Ask questions in plain English. DeepTrail finds relevant pages, builds context, and generates a grounded answer with source citations
- Side Panel UI β Live knowledge graph and search right inside your browser sidebar. No tab switching needed
Chrome Extension (Manifest V3)
β
β POST /api/nodes (URL + Title + JWT)
βΌ
FastAPI Backend
β
βββ Text Extractor (httpx + trafilatura)
βββ Embedder (OpenAI text-embedding-3-small β 768-d vector)
βββ Similarity Engine (pgvector cosine search, threshold β₯ 0.6)
βββ Summarizer (GPT-4o-mini, async background task)
βββ RAG Engine (embed query β top-K retrieval β LLM answer)
β
βΌ
PostgreSQL 16 + pgvector
β
βββ users (id, email, password_hash)
βββ nodes (url, title, summary, keywords, embedding VECTOR(768))
βββ edges (source_node_id, target_node_id, similarity_score)
| Layer | Technology | Purpose |
|---|---|---|
| Extension | Chrome Manifest V3 | Browser event capture via Service Worker |
| Side Panel | React 18 + React Flow | Interactive graph visualization in sidebar |
| Backend | FastAPI (Python) | Async API with auto-generated OpenAPI docs |
| ORM | SQLAlchemy 2.0 | Async database operations with type safety |
| Vector Store | pgvector | VECTOR(768) columns + HNSW indexing |
| Text Extraction | trafilatura | HTML to clean article text |
| Embeddings | OpenAI text-embedding-3-small | 768-d semantic vectors |
| LLM | OpenAI GPT-4o-mini | Summarization + RAG answers |
| Database | PostgreSQL 16 | Relational data + vector persistence |
| Auth | JWT (HS256) + bcrypt | Stateless authentication |
| Deployment | Docker + Docker Compose | Containerized backend + database |
deeptrail/
β
βββ backend/
β βββ app/
β β βββ __init__.py
β β βββ main.py # FastAPI app, CORS, lifespan
β β βββ config.py # Environment variables
β β βββ db.py # Async SQLAlchemy engine + session
β β βββ dependencies.py # JWT auth (get_current_user)
β β β
β β βββ models/
β β β βββ __init__.py # Barrel imports
β β β βββ user.py # User table
β β β βββ node.py # Node table + Vector(768)
β β β βββ edge.py # Edge table + unique constraint
β β β
β β βββ schemas/
β β β βββ __init__.py
β β β βββ schemas.py # Pydantic request/response models
β β β
β β βββ routers/
β β β βββ __init__.py
β β β βββ auth.py # POST /register, /login
β β β βββ nodes.py # POST/GET/DELETE /nodes
β β β βββ graph.py # GET /graph
β β β βββ query.py # POST /query (RAG)
β β β
β β βββ services/
β β βββ __init__.py
β β βββ extractor.py # httpx + trafilatura
β β βββ embedder.py # OpenAI embeddings + retry
β β βββ similarity.py # pgvector cosine search
β β βββ summarizer.py # GPT-4o-mini background task
β β βββ rag.py # Query β embed β top-K β answer
β β
β βββ Dockerfile
β βββ docker-compose.yml
β βββ requirements.txt
β βββ .env
β βββ .env.example
β
βββ extension/
β βββ manifest.json # Manifest V3 config + permissions
β βββ background.js # Service Worker β tab events + POST to backend
β βββ content.js # (Optional) DOM text extraction fallback
β β
β βββ sidepanel/
β β βββ index.html # Side Panel entry point
β β βββ src/
β β β βββ App.jsx # Root component β routing + auth context
β β β βββ index.jsx # React mount point
β β β β
β β β βββ components/
β β β β βββ GraphView.jsx # React Flow graph β nodes, edges, layout
β β β β βββ SearchPanel.jsx # Semantic query input + results
β β β β βββ NodeCard.jsx # Hover tooltip β summary + keywords
β β β β βββ NodeDetail.jsx # Full modal β content + RAG links
β β β β βββ TabList.jsx # Live grouped tab list
β β β β βββ AuthForm.jsx # Login / register form
β β β β
β β β βββ hooks/
β β β β βββ useGraph.js # Fetch + poll graph data
β β β β βββ useSearch.js # Debounced RAG query
β β β β βββ useAuth.js # JWT token management
β β β β
β β β βββ lib/
β β β β βββ api.js # Axios instance β JWT header injection
β β β β
β β β βββ styles/
β β β βββ globals.css # Tailwind base styles
β β β
β β βββ vite.config.js # Vite bundler config for extension
β β βββ tailwind.config.js
β β βββ postcss.config.js
β β βββ package.json
β β
β βββ icons/
β βββ icon-16.png
β βββ icon-48.png
β βββ icon-128.png
β
βββ README.md
βββ .gitignore
- Docker and Docker Compose
- Gemini AI API key
- GROQ AI API key
- Python 3.12.10
git clone https://github.com/yourusername/deeptrail.git
cd deeptrailcp .env.example .envEdit .env with your keys:
DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/knowledgegraph
OPENAI_API_KEY=sk-your-key-here
GEMINI_API_KEY=sk-your-key-here
JWT_SECRET=generate-a-random-string-here
DEBUG=True
JWT_EXPIRY_HOURS=2
SIMILARITY_THRESHOLD=0.6
docker-compose up --buildGo to http://localhost:8000/docs β you'll see all endpoints with a Swagger UI to test them.
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /api/auth/register |
Create a new account | No |
| POST | /api/auth/login |
Login and get JWT token | No |
| POST | /api/nodes |
Ingest a browsed page | JWT |
| GET | /api/nodes/:id |
Get a single node with summary | JWT |
| DELETE | /api/nodes/:id |
Delete a node and its edges | JWT |
| GET | /api/graph |
Get full knowledge graph (nodes + edges) | JWT |
| POST | /api/query |
Ask a question over your browsing history | JWT |
| GET | /health |
Health check | No |
curl -X POST http://localhost:8000/api/nodes \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://arxiv.org/abs/1706.03762",
"title": "Attention Is All You Need",
"timestamp": "2026-03-03T10:00:00Z"
}'Response:
{
"node_id": "a1b2c3d4-...",
"edges_created": 3,
"status": "processing"
}curl -X POST http://localhost:8000/api/query \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"question": "What have I been reading about attention mechanisms?"}'Response:
{
"answer": "Based on your browsing history, you've been researching...",
"sources": [
{
"url": "https://arxiv.org/abs/1706.03762",
"title": "Attention Is All You Need",
"similarity_score": 0.92
}
]
}1. You browse the web normally
β
2. Extension detects page load (with 30-min dedup)
β
3. Sends URL + title to backend
β
4. Backend fetches page β extracts clean text
β
5. Text β OpenAI β 768-d embedding vector
β
6. pgvector scans existing nodes for cosine similarity β₯ 0.6
β
7. Matching nodes get connected via edges in the graph
β
8. GPT-4o-mini summarizes the page in background
β
9. Side Panel shows your growing knowledge graph in real-time
β
10. Ask any question β RAG engine finds relevant nodes β answers with citations
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
β | PostgreSQL connection string |
GEMINI_API_KEY |
β | Your GEMINI API key |
OPENAI_API_KEY |
β | Your OPENAI API key |
JWT_SECRET |
β | Secret for signing JWT tokens |
JWT_EXPIRY_HOURS |
24 | Token expiration time |
SIMILARITY_THRESHOLD |
0.6 | Minimum cosine similarity to create an edge |
DEBUG |
True | For debugging purpose |
ALLOW_ORIGINS |
* | Allowed origins |
Note: set your backend endpoint within extension/sidepanel/config.js
| Column | Type | Notes |
|---|---|---|
| id | UUID | Primary key |
| VARCHAR(255) | Unique | |
| password_hash | TEXT | bcrypt hashed |
| created_at | TIMESTAMPTZ | Auto-set |
| Column | Type | Notes |
|---|---|---|
| id | UUID | Primary key |
| user_id | UUID | FK β users, cascade delete |
| url | TEXT | Page URL |
| title | TEXT | Page title |
| raw_text | TEXT | Extracted article content |
| summary | TEXT | AI-generated summary |
| keywords | TEXT[] | Keyword array |
| key_concepts | JSONB | Structured concepts |
| embedding | VECTOR(768) | pgvector, HNSW indexed |
| summary_status | VARCHAR(20) | pending / complete / failed |
| created_at | TIMESTAMPTZ | Auto-set |
| Column | Type | Notes |
|---|---|---|
| id | UUID | Primary key |
| source_node_id | UUID | FK β nodes, cascade delete |
| target_node_id | UUID | FK β nodes, cascade delete |
| similarity_score | FLOAT | Cosine similarity (0.6β1.0) |
| created_at | TIMESTAMPTZ | Auto-set |
- A PostgreSQL async database URL is used by async libraries like asyncpg or SQLAlchemy when working with PostgreSQL in asynchronous Python frameworks such as FastAPI. postgresql+asyncpg://USER:PASSWORD@HOST:PORT/DATABASE
- Backend API (FastAPI + pgvector + RAG)
- Chrome Extension with Side Panel
- Interactive graph visualization (React Flow)
- Cross-device sync
- Usage tiers and Stripe integration
- Team shared knowledge graphs
- Export to Obsidian / Notion
MIT
Built by Tenzin