Skip to content

tenzin333/Deep-Trial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 DeepTrail

AI-Powered Personal Research Knowledge Graph

DeepTrail transforms your browsing activity into a semantic knowledge graph. Unlike traditional bookmarks or browser history, DeepTrail understands meaning β€” it connects pages by topic, summarizes content automatically, and lets you query your entire research history using natural language.

"What was I reading about transformer architectures last week?"

DeepTrail finds the answer from your browsing history and shows you exactly which pages it used.


✨ Features

  • Auto-Capture β€” Chrome extension silently tracks pages you visit (with smart 30-min deduplication)
  • Semantic Embeddings β€” Every page is converted into a 768-dimension vector that captures its meaning, not just keywords
  • Knowledge Graph β€” Pages with similar content are automatically connected. See your research as an interactive graph
  • AI Summaries β€” Every captured page gets a 2-4 sentence summary + keywords, generated by GPT-4o-mini
  • RAG Search β€” Ask questions in plain English. DeepTrail finds relevant pages, builds context, and generates a grounded answer with source citations
  • Side Panel UI β€” Live knowledge graph and search right inside your browser sidebar. No tab switching needed

πŸ—οΈ Architecture

Chrome Extension (Manifest V3)
        β”‚
        β”‚  POST /api/nodes (URL + Title + JWT)
        β–Ό
FastAPI Backend
        β”‚
        β”œβ”€β†’ Text Extractor (httpx + trafilatura)
        β”œβ”€β†’ Embedder (OpenAI text-embedding-3-small β†’ 768-d vector)
        β”œβ”€β†’ Similarity Engine (pgvector cosine search, threshold β‰₯ 0.6)
        β”œβ”€β†’ Summarizer (GPT-4o-mini, async background task)
        └─→ RAG Engine (embed query β†’ top-K retrieval β†’ LLM answer)
        β”‚
        β–Ό
PostgreSQL 16 + pgvector
        β”‚
        β”œβ”€β”€ users (id, email, password_hash)
        β”œβ”€β”€ nodes (url, title, summary, keywords, embedding VECTOR(768))
        └── edges (source_node_id, target_node_id, similarity_score)

πŸ› οΈ Tech Stack

Layer Technology Purpose
Extension Chrome Manifest V3 Browser event capture via Service Worker
Side Panel React 18 + React Flow Interactive graph visualization in sidebar
Backend FastAPI (Python) Async API with auto-generated OpenAPI docs
ORM SQLAlchemy 2.0 Async database operations with type safety
Vector Store pgvector VECTOR(768) columns + HNSW indexing
Text Extraction trafilatura HTML to clean article text
Embeddings OpenAI text-embedding-3-small 768-d semantic vectors
LLM OpenAI GPT-4o-mini Summarization + RAG answers
Database PostgreSQL 16 Relational data + vector persistence
Auth JWT (HS256) + bcrypt Stateless authentication
Deployment Docker + Docker Compose Containerized backend + database

πŸ“ Project Structure

deeptrail/
β”‚
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ main.py                  # FastAPI app, CORS, lifespan
β”‚   β”‚   β”œβ”€β”€ config.py                # Environment variables
β”‚   β”‚   β”œβ”€β”€ db.py                    # Async SQLAlchemy engine + session
β”‚   β”‚   β”œβ”€β”€ dependencies.py          # JWT auth (get_current_user)
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py          # Barrel imports
β”‚   β”‚   β”‚   β”œβ”€β”€ user.py             # User table
β”‚   β”‚   β”‚   β”œβ”€β”€ node.py             # Node table + Vector(768)
β”‚   β”‚   β”‚   └── edge.py             # Edge table + unique constraint
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   └── schemas.py          # Pydantic request/response models
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ routers/
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ auth.py             # POST /register, /login
β”‚   β”‚   β”‚   β”œβ”€β”€ nodes.py            # POST/GET/DELETE /nodes
β”‚   β”‚   β”‚   β”œβ”€β”€ graph.py            # GET /graph
β”‚   β”‚   β”‚   └── query.py            # POST /query (RAG)
β”‚   β”‚   β”‚
β”‚   β”‚   └── services/
β”‚   β”‚       β”œβ”€β”€ __init__.py
β”‚   β”‚       β”œβ”€β”€ extractor.py        # httpx + trafilatura
β”‚   β”‚       β”œβ”€β”€ embedder.py         # OpenAI embeddings + retry
β”‚   β”‚       β”œβ”€β”€ similarity.py       # pgvector cosine search
β”‚   β”‚       β”œβ”€β”€ summarizer.py       # GPT-4o-mini background task
β”‚   β”‚       └── rag.py              # Query β†’ embed β†’ top-K β†’ answer
β”‚   β”‚
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ docker-compose.yml
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ .env
β”‚   └── .env.example
β”‚
β”œβ”€β”€ extension/
β”‚   β”œβ”€β”€ manifest.json                # Manifest V3 config + permissions
β”‚   β”œβ”€β”€ background.js                # Service Worker β€” tab events + POST to backend
β”‚   β”œβ”€β”€ content.js                   # (Optional) DOM text extraction fallback
β”‚   β”‚
β”‚   β”œβ”€β”€ sidepanel/
β”‚   β”‚   β”œβ”€β”€ index.html               # Side Panel entry point
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ App.jsx              # Root component β€” routing + auth context
β”‚   β”‚   β”‚   β”œβ”€β”€ index.jsx            # React mount point
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ GraphView.jsx    # React Flow graph β€” nodes, edges, layout
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ SearchPanel.jsx  # Semantic query input + results
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ NodeCard.jsx     # Hover tooltip β€” summary + keywords
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ NodeDetail.jsx   # Full modal β€” content + RAG links
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ TabList.jsx      # Live grouped tab list
β”‚   β”‚   β”‚   β”‚   └── AuthForm.jsx     # Login / register form
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ hooks/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ useGraph.js      # Fetch + poll graph data
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ useSearch.js     # Debounced RAG query
β”‚   β”‚   β”‚   β”‚   └── useAuth.js       # JWT token management
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   β”œβ”€β”€ lib/
β”‚   β”‚   β”‚   β”‚   └── api.js           # Axios instance β€” JWT header injection
β”‚   β”‚   β”‚   β”‚
β”‚   β”‚   β”‚   └── styles/
β”‚   β”‚   β”‚       └── globals.css      # Tailwind base styles
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ vite.config.js           # Vite bundler config for extension
β”‚   β”‚   β”œβ”€β”€ tailwind.config.js
β”‚   β”‚   β”œβ”€β”€ postcss.config.js
β”‚   β”‚   └── package.json
β”‚   β”‚
β”‚   └── icons/
β”‚       β”œβ”€β”€ icon-16.png
β”‚       β”œβ”€β”€ icon-48.png
β”‚       └── icon-128.png
β”‚
β”œβ”€β”€ README.md
└── .gitignore


πŸš€ Getting Started

Prerequisites

  • Docker and Docker Compose
  • Gemini AI API key
  • GROQ AI API key
  • Python 3.12.10

1. Clone the repo

git clone https://github.com/yourusername/deeptrail.git
cd deeptrail

2. Set up environment

cp .env.example .env

Edit .env with your keys:

DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/knowledgegraph
OPENAI_API_KEY=sk-your-key-here
GEMINI_API_KEY=sk-your-key-here
JWT_SECRET=generate-a-random-string-here
DEBUG=True
JWT_EXPIRY_HOURS=2
SIMILARITY_THRESHOLD=0.6

3. Start the app

docker-compose up --build

4. Open the API docs

Go to http://localhost:8000/docs β€” you'll see all endpoints with a Swagger UI to test them.


πŸ“‘ API Endpoints

Method Endpoint Description Auth
POST /api/auth/register Create a new account No
POST /api/auth/login Login and get JWT token No
POST /api/nodes Ingest a browsed page JWT
GET /api/nodes/:id Get a single node with summary JWT
DELETE /api/nodes/:id Delete a node and its edges JWT
GET /api/graph Get full knowledge graph (nodes + edges) JWT
POST /api/query Ask a question over your browsing history JWT
GET /health Health check No

Example: Ingest a page

curl -X POST http://localhost:8000/api/nodes \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://arxiv.org/abs/1706.03762",
    "title": "Attention Is All You Need",
    "timestamp": "2026-03-03T10:00:00Z"
  }'

Response:

{
  "node_id": "a1b2c3d4-...",
  "edges_created": 3,
  "status": "processing"
}

Example: Ask a question

curl -X POST http://localhost:8000/api/query \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"question": "What have I been reading about attention mechanisms?"}'

Response:

{
  "answer": "Based on your browsing history, you've been researching...",
  "sources": [
    {
      "url": "https://arxiv.org/abs/1706.03762",
      "title": "Attention Is All You Need",
      "similarity_score": 0.92
    }
  ]
}

πŸ”„ How It Works

1. You browse the web normally
         ↓
2. Extension detects page load (with 30-min dedup)
         ↓
3. Sends URL + title to backend
         ↓
4. Backend fetches page β†’ extracts clean text
         ↓
5. Text β†’ OpenAI β†’ 768-d embedding vector
         ↓
6. pgvector scans existing nodes for cosine similarity β‰₯ 0.6
         ↓
7. Matching nodes get connected via edges in the graph
         ↓
8. GPT-4o-mini summarizes the page in background
         ↓
9. Side Panel shows your growing knowledge graph in real-time
         ↓
10. Ask any question β†’ RAG engine finds relevant nodes β†’ answers with citations

βš™οΈ Configuration

Variable Default Description
DATABASE_URL β€” PostgreSQL connection string
GEMINI_API_KEY β€” Your GEMINI API key
OPENAI_API_KEY β€” Your OPENAI API key
JWT_SECRET β€” Secret for signing JWT tokens
JWT_EXPIRY_HOURS 24 Token expiration time
SIMILARITY_THRESHOLD 0.6 Minimum cosine similarity to create an edge
DEBUG True For debugging purpose
ALLOW_ORIGINS * Allowed origins

Note: set your backend endpoint within extension/sidepanel/config.js


πŸ“Š Database Schema

users

Column Type Notes
id UUID Primary key
email VARCHAR(255) Unique
password_hash TEXT bcrypt hashed
created_at TIMESTAMPTZ Auto-set

nodes

Column Type Notes
id UUID Primary key
user_id UUID FK β†’ users, cascade delete
url TEXT Page URL
title TEXT Page title
raw_text TEXT Extracted article content
summary TEXT AI-generated summary
keywords TEXT[] Keyword array
key_concepts JSONB Structured concepts
embedding VECTOR(768) pgvector, HNSW indexed
summary_status VARCHAR(20) pending / complete / failed
created_at TIMESTAMPTZ Auto-set

edges

Column Type Notes
id UUID Primary key
source_node_id UUID FK β†’ nodes, cascade delete
target_node_id UUID FK β†’ nodes, cascade delete
similarity_score FLOAT Cosine similarity (0.6–1.0)
created_at TIMESTAMPTZ Auto-set

Potential Error case

  1. A PostgreSQL async database URL is used by async libraries like asyncpg or SQLAlchemy when working with PostgreSQL in asynchronous Python frameworks such as FastAPI. postgresql+asyncpg://USER:PASSWORD@HOST:PORT/DATABASE

🎯 Roadmap

  • Backend API (FastAPI + pgvector + RAG)
  • Chrome Extension with Side Panel
  • Interactive graph visualization (React Flow)
  • Cross-device sync
  • Usage tiers and Stripe integration
  • Team shared knowledge graphs
  • Export to Obsidian / Notion

πŸ“„ License

MIT


Built by Tenzin

About

AI-Powered Personal Research Knowledge Graph

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors