Skip to content

wlowenfeld/second-brain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Second Brain

A personal knowledge base with built-in usage tracking that connects to Claude.ai via MCP.

Second Brain ingests your emails, calendar events, documents, messages, notes, and browser history into a unified vector database. It exposes an MCP (Model Context Protocol) server so Claude can search your personal data during conversations. Built-in query logging and an auto-curator track what you search for, how results perform, and which sources are pulling their weight.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Data Sources                             │
│  Gmail · Calendar · OneDrive · iMessage · Notes · Chrome · PDF  │
└──────────────────────────┬──────────────────────────────────────┘
                           │ Ingestion (scheduled)
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Ingestion Pipeline                          │
│  Chunking → Embedding (all-MiniLM-L6-v2) → ChromaDB (cosine)   │
│  Auto-tagging · Deduplication · Checkpoint tracking             │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                      ChromaDB Vector DB                         │
│  15 collections · ~384-dim embeddings · persistent on disk      │
│  emails · documents · calendar · messages · notes · manual · …  │
└──────────────┬───────────────────────────────┬──────────────────┘
               │                               │
               ▼                               ▼
┌──────────────────────────┐   ┌──────────────────────────────────┐
│     MCP Server (:8420)   │   │   Observability & Quality Signals │
│  FastAPI + JSON-RPC      │   │                                  │
│  OAuth 2.0 for Claude.ai │   │  Feedback DB (SQLite)            │
│  8 tools:                │   │    ↳ logs every query + results  │
│   · query_second_brain   │   │    ↳ user ratings (thumbs up/down)│
│   · get_recent_entries   │   │                                  │
│   · get_entries_about_   │   │  Auto-Curator (daily)            │
│     person               │   │    ↳ source quality rankings     │
│   · get_project_status   │   │    ↳ stale chunk detection       │
│   · add_entry            │   │    ↳ query pattern analysis      │
│   · get_brain_status     │   │    ↳ health reports              │
│   · rate_response        │   │                                  │
│   · brain_insights       │   │  brain_insights tool             │
│                          │   │    ↳ recommendations for user    │
│  Cloudflare Tunnel       │   │    ↳ "what should I add?"         │
│    ↳ remote access       │   │    ↳ "what's working well?"      │
└──────────────────────────┘   └──────────────────────────────────┘
               │
               ▼
┌──────────────────────────┐
│       Claude.ai          │
│  MCP Connector           │
│  "Search my brain for…"  │
│  "What did Sarah say…"   │
│  "Rate that as helpful"  │
└──────────────────────────┘

What Makes This Different

Built-in observability. Most personal knowledge bases are black boxes — you add data, you search it, and you have no idea what's working. Second Brain tracks usage and surfaces quality signals:

  1. Every query is logged — what you searched for, what came back, relevance scores, response time
  2. You rate results — thumbs up/down via the rate_response MCP tool, right from your Claude conversation
  3. The auto-curator analyzes query patterns — which sources appear in useful results? Which chunks haven't matched a query in 30+ days? What topics do you search most?
  4. Daily health reports — the curator generates actionable recommendations: source quality rankings, stale data flags, coverage gaps
  5. Freshness-weighted scoring — newer entries get a relevance boost, query expansion fills gaps, and deduplication keeps the index clean

Right now it surfaces insights and recommendations. Automated re-chunking and re-weighting based on feedback is on the roadmap.

Supported Data Sources

Source Type Status
Gmail Cloud (OAuth) Stable
Google Calendar Cloud (OAuth) Stable
OneDrive Cloud (OAuth) Stable
PDF / Word / Excel / PPT Local files Stable
Claude conversations Export Stable
Manual entries CLI / MCP Stable
iMessage Local (macOS) Stable
Apple Notes Local (macOS) Stable
Chrome history Local Stable
Plaid (financial) API Beta

Prerequisites

  • macOS (tested on Sequoia 26.x, Apple Silicon recommended)
    • Linux support is possible but untested — the macOS-specific sources (iMessage, Apple Notes, launchd) would need alternatives
  • Python 3.11+
  • Homebrew (macOS package manager)
  • Cloudflare account (free tier, for remote access via Claude.ai)
  • ~2GB disk space (embedding model + ChromaDB)
  • FileVault enabled (strongly recommended — your data includes emails and messages)

Quick Start

# 1. Clone the repo
git clone https://github.com/yourusername/SecondBrain.git
cd SecondBrain

# 2. Copy and edit config
cp config.yaml config.yaml  # Already included, edit values
cp .env.example .env
nano .env  # Add your API keys/secrets

# 3. Run setup (creates venv, installs deps, downloads model, starts services)
bash setup.sh

# 4. Test it
source ~/.zshrc
brain add --title "First entry" --text "Hello, Second Brain"
brain search "first entry"

Enable Gmail & Calendar (Phase 2)

  1. Create a Google Cloud project and enable Gmail + Calendar APIs
  2. Download OAuth credentials to ~/SecondBrain/data/oauth/gmail_credentials.json
  3. Run the OAuth flows:
    # Gmail (opens browser for consent)
    python3 -c "
    from ingestion.gmail import _get_credentials
    _get_credentials()
    "
    
    # Calendar
    python3 -c "
    from ingestion.calendar_sync import _get_credentials
    _get_credentials()
    "
  4. Enable in config.yaml: set gmail.enabled: true and calendar.enabled: true
  5. Restart the scheduler: launchctl unload ~/Library/LaunchAgents/com.secondbrain.scheduler.plist && launchctl load ~/Library/LaunchAgents/com.secondbrain.scheduler.plist

Enable OneDrive (Phase 3)

  1. Register an app at Azure Portal → App registrations
  2. Set "Personal Microsoft accounts only", redirect URI: http://localhost:8424/callback
  3. Enable "Allow public client flows" under Authentication
  4. Add your client_id to config.yaml under sources.onedrive.client_id
  5. Run the OAuth flow (script provided in scripts/)

Connect to Claude.ai

  1. Set up a Cloudflare Tunnel to expose port 8420:
    brew install cloudflared
    cloudflared tunnel --url http://127.0.0.1:8420
  2. In Claude.ai → Settings → MCP Servers → Add:
    • URL: https://your-tunnel-url/mcp
    • The server handles OAuth discovery automatically

MCP Tools

Tool Description
query_second_brain Semantic search across all sources with date/tag/source filters
get_recent_entries Entries from the last N hours
get_entries_about_person Everything mentioning a specific person
get_project_status Project timeline in chronological order
add_entry Save a note, decision, or idea from the conversation
get_brain_status System health, ingestion status, alerts
rate_response Rate a query result as useful/not useful
brain_insights Self-improvement analytics and recommendations

Project Structure

SecondBrain/
├── brain/              # Core: DB, search, chunking, embeddings, self-improvement
│   ├── db.py           # ChromaDB wrapper (15 collections)
│   ├── search.py       # Semantic search with freshness weighting
│   ├── chunker.py      # Text → chunks with auto-tagging
│   ├── embeddings.py   # all-MiniLM-L6-v2 embedding function
│   ├── feedback.py     # Query logging + ratings (SQLite)
│   ├── curator.py      # Auto-curator: analysis, reports, stale detection
│   ├── self_improve.py # Freshness scoring, query expansion, dedup
│   ├── alerting.py     # Proactive health alerts (email/log)
│   ├── maintenance.py  # Log rotation, health snapshots
│   └── secure_fs.py    # Atomic writes with restricted permissions
├── ingestion/          # Data source connectors
│   ├── scheduler.py    # APScheduler orchestrator (parallel I/O)
│   ├── gmail.py        # Gmail via Google API (OAuth)
│   ├── calendar_sync.py# Google Calendar (OAuth)
│   ├── onedrive.py     # OneDrive via Microsoft Graph (OAuth)
│   ├── imessage.py     # iMessage (local SQLite)
│   ├── apple_notes.py  # Apple Notes (local SQLite)
│   ├── chrome.py       # Chrome history (local SQLite)
│   ├── documents.py    # PDF, Word, Excel, PPT (inbox folder)
│   ├── claude_export.py# Claude conversation exports
│   ├── plaid_finance.py# Plaid transactions, balances, holdings
│   └── manual.py       # CLI / MCP manual entries
├── mcp_server/         # MCP + REST API server
│   ├── server.py       # FastAPI: MCP JSON-RPC, OAuth 2.0, REST
│   ├── tools.py        # 8 tool implementations + schemas
│   └── auth.py         # API key validation
├── web/                # Status dashboard (Flask)
├── scripts/            # CLI, backup, utilities
├── tunnel/             # Cloudflare Tunnel setup
├── config.yaml         # All configuration
├── setup.sh            # One-command setup
└── requirements.txt    # Python dependencies

FAQ

Q: How is this different from just connecting Gmail/Google Calendar to Claude?

Native connectors give Claude live access to one service at a time. Second Brain is fundamentally different in three ways:

  1. Unified cross-source search — It combines Gmail, Calendar, OneDrive, and more into a single semantic index. Ask "what's happening with Project X" and it searches emails, calendar invites, and documents simultaneously. Native connectors can't cross-reference across sources.
  2. Persistent memory with semantic understanding — Native connectors are stateless keyword searches. Second Brain pre-processes your data into vector embeddings, so it understands meaning, not just keywords. Ask "what financial decisions have I made recently" and it pulls budget spreadsheets, investment emails, and financial advisor meetings — by meaning, across every source.
  3. Observability and feedback — Query patterns are tracked, source quality is ranked, and stale data is flagged automatically. The auto-curator surfaces recommendations so you know what's working and what isn't. Native connectors give you zero visibility into result quality.

Q: What data sources are supported?

Currently: Gmail, Google Calendar, and OneDrive. The architecture supports any source — iMessage, local files, Chrome history, and financial data (via Plaid) are on the roadmap. Adding a new source means writing one Python module that follows the existing pattern.

Q: Where does my data live?

Everything stays on your machine. Second Brain runs entirely self-hosted — your data is stored in a local ChromaDB vector database and SQLite. Nothing is sent to external servers except through the Cloudflare tunnel to YOUR Claude session. No third-party analytics, no telemetry, no cloud storage.

Q: Do I need a Mac Mini?

No. Any always-on machine works — a Linux server, a Raspberry Pi, an old laptop, a cloud VM. The Mac Mini is just what we used. You need Python 3.11+, about 500MB of disk space, and a network connection.

Q: Can I use this with ChatGPT or other LLMs?

The MCP server exposes a standard HTTP API. While the current setup is optimized for Claude's MCP connector protocol, the underlying REST endpoints can be adapted for any LLM that supports tool use or function calling.

Q: Is this secure?

The MCP server requires API key authentication for all data-access operations. OAuth tokens are stored with 0600 permissions. The Cloudflare tunnel provides HTTPS encryption. No credentials are stored in the codebase. That said — this is a personal project, not enterprise software. Review the security model before deploying with sensitive data.

Roadmap

  • Phase 1: Core (manual, documents, Claude export)
  • Phase 2: Cloud sources (Gmail, Calendar)
  • Phase 3: More sources (OneDrive)
  • Phase 4: MCP server + Claude.ai integration
  • Phase 5: Observability (query logging, feedback, auto-curator)
  • Phase 6: Local macOS sources (iMessage, Apple Notes, Chrome)
  • Phase 7: Financial data (Plaid integration)
  • Phase 8: Automated quality actions (re-chunking, re-weighting based on feedback)
  • Phase 9: Multi-device sync and mobile access

Security Notes

  • All data is stored locally on your machine — nothing leaves unless you set up a tunnel
  • FileVault (full-disk encryption) is strongly recommended
  • OAuth tokens are stored with 0600 permissions (owner-only)
  • Atomic file writes prevent token corruption during concurrent access
  • API key authentication on all data-access endpoints
  • Rate limiting (60 req/min) on the MCP server

License

MIT — see LICENSE.

About

A self-hosted personal knowledge base that connects to Claude via MCP

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors