Second Brain

A personal knowledge base with built-in usage tracking that connects to Claude.ai via MCP.

Second Brain ingests your emails, calendar events, documents, messages, notes, and browser history into a unified vector database. It exposes an MCP (Model Context Protocol) server so Claude can search your personal data during conversations. Built-in query logging and an auto-curator track what you search for, how results perform, and which sources are pulling their weight.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Data Sources                             │
│  Gmail · Calendar · OneDrive · iMessage · Notes · Chrome · PDF  │
└──────────────────────────┬──────────────────────────────────────┘
                           │ Ingestion (scheduled)
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Ingestion Pipeline                          │
│  Chunking → Embedding (all-MiniLM-L6-v2) → ChromaDB (cosine)   │
│  Auto-tagging · Deduplication · Checkpoint tracking             │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                      ChromaDB Vector DB                         │
│  15 collections · ~384-dim embeddings · persistent on disk      │
│  emails · documents · calendar · messages · notes · manual · …  │
└──────────────┬───────────────────────────────┬──────────────────┘
               │                               │
               ▼                               ▼
┌──────────────────────────┐   ┌──────────────────────────────────┐
│     MCP Server (:8420)   │   │   Observability & Quality Signals │
│  FastAPI + JSON-RPC      │   │                                  │
│  OAuth 2.0 for Claude.ai │   │  Feedback DB (SQLite)            │
│  8 tools:                │   │    ↳ logs every query + results  │
│   · query_second_brain   │   │    ↳ user ratings (thumbs up/down)│
│   · get_recent_entries   │   │                                  │
│   · get_entries_about_   │   │  Auto-Curator (daily)            │
│     person               │   │    ↳ source quality rankings     │
│   · get_project_status   │   │    ↳ stale chunk detection       │
│   · add_entry            │   │    ↳ query pattern analysis      │
│   · get_brain_status     │   │    ↳ health reports              │
│   · rate_response        │   │                                  │
│   · brain_insights       │   │  brain_insights tool             │
│                          │   │    ↳ recommendations for user    │
│  Cloudflare Tunnel       │   │    ↳ "what should I add?"         │
│    ↳ remote access       │   │    ↳ "what's working well?"      │
└──────────────────────────┘   └──────────────────────────────────┘
               │
               ▼
┌──────────────────────────┐
│       Claude.ai          │
│  MCP Connector           │
│  "Search my brain for…"  │
│  "What did Sarah say…"   │
│  "Rate that as helpful"  │
└──────────────────────────┘

What Makes This Different

Built-in observability. Most personal knowledge bases are black boxes — you add data, you search it, and you have no idea what's working. Second Brain tracks usage and surfaces quality signals:

Every query is logged — what you searched for, what came back, relevance scores, response time
You rate results — thumbs up/down via the rate_response MCP tool, right from your Claude conversation
The auto-curator analyzes query patterns — which sources appear in useful results? Which chunks haven't matched a query in 30+ days? What topics do you search most?
Daily health reports — the curator generates actionable recommendations: source quality rankings, stale data flags, coverage gaps
Freshness-weighted scoring — newer entries get a relevance boost, query expansion fills gaps, and deduplication keeps the index clean

Right now it surfaces insights and recommendations. Automated re-chunking and re-weighting based on feedback is on the roadmap.

Supported Data Sources

Source	Type	Status
Gmail	Cloud (OAuth)	Stable
Google Calendar	Cloud (OAuth)	Stable
OneDrive	Cloud (OAuth)	Stable
PDF / Word / Excel / PPT	Local files	Stable
Claude conversations	Export	Stable
Manual entries	CLI / MCP	Stable
iMessage	Local (macOS)	Stable
Apple Notes	Local (macOS)	Stable
Chrome history	Local	Stable
Plaid (financial)	API	Beta

Prerequisites

macOS (tested on Sequoia 26.x, Apple Silicon recommended)
- Linux support is possible but untested — the macOS-specific sources (iMessage, Apple Notes, launchd) would need alternatives
Python 3.11+
Homebrew (macOS package manager)
Cloudflare account (free tier, for remote access via Claude.ai)
~2GB disk space (embedding model + ChromaDB)
FileVault enabled (strongly recommended — your data includes emails and messages)

Quick Start

# 1. Clone the repo
git clone https://github.com/yourusername/SecondBrain.git
cd SecondBrain

# 2. Copy and edit config
cp config.yaml config.yaml  # Already included, edit values
cp .env.example .env
nano .env  # Add your API keys/secrets

# 3. Run setup (creates venv, installs deps, downloads model, starts services)
bash setup.sh

# 4. Test it
source ~/.zshrc
brain add --title "First entry" --text "Hello, Second Brain"
brain search "first entry"

Enable Gmail & Calendar (Phase 2)

Create a Google Cloud project and enable Gmail + Calendar APIs
Download OAuth credentials to ~/SecondBrain/data/oauth/gmail_credentials.json

Run the OAuth flows:

# Gmail (opens browser for consent)
python3 -c "
from ingestion.gmail import _get_credentials
_get_credentials()
"

# Calendar
python3 -c "
from ingestion.calendar_sync import _get_credentials
_get_credentials()
"

Enable in config.yaml: set gmail.enabled: true and calendar.enabled: true
Restart the scheduler: launchctl unload ~/Library/LaunchAgents/com.secondbrain.scheduler.plist && launchctl load ~/Library/LaunchAgents/com.secondbrain.scheduler.plist

Enable OneDrive (Phase 3)

Register an app at Azure Portal → App registrations
Set "Personal Microsoft accounts only", redirect URI: http://localhost:8424/callback
Enable "Allow public client flows" under Authentication
Add your client_id to config.yaml under sources.onedrive.client_id
Run the OAuth flow (script provided in scripts/)

Connect to Claude.ai

Set up a Cloudflare Tunnel to expose port 8420:

brew install cloudflared
cloudflared tunnel --url http://127.0.0.1:8420

In Claude.ai → Settings → MCP Servers → Add:
- URL: https://your-tunnel-url/mcp
- The server handles OAuth discovery automatically

MCP Tools

Tool	Description
`query_second_brain`	Semantic search across all sources with date/tag/source filters
`get_recent_entries`	Entries from the last N hours
`get_entries_about_person`	Everything mentioning a specific person
`get_project_status`	Project timeline in chronological order
`add_entry`	Save a note, decision, or idea from the conversation
`get_brain_status`	System health, ingestion status, alerts
`rate_response`	Rate a query result as useful/not useful
`brain_insights`	Self-improvement analytics and recommendations

Project Structure

SecondBrain/
├── brain/              # Core: DB, search, chunking, embeddings, self-improvement
│   ├── db.py           # ChromaDB wrapper (15 collections)
│   ├── search.py       # Semantic search with freshness weighting
│   ├── chunker.py      # Text → chunks with auto-tagging
│   ├── embeddings.py   # all-MiniLM-L6-v2 embedding function
│   ├── feedback.py     # Query logging + ratings (SQLite)
│   ├── curator.py      # Auto-curator: analysis, reports, stale detection
│   ├── self_improve.py # Freshness scoring, query expansion, dedup
│   ├── alerting.py     # Proactive health alerts (email/log)
│   ├── maintenance.py  # Log rotation, health snapshots
│   └── secure_fs.py    # Atomic writes with restricted permissions
├── ingestion/          # Data source connectors
│   ├── scheduler.py    # APScheduler orchestrator (parallel I/O)
│   ├── gmail.py        # Gmail via Google API (OAuth)
│   ├── calendar_sync.py# Google Calendar (OAuth)
│   ├── onedrive.py     # OneDrive via Microsoft Graph (OAuth)
│   ├── imessage.py     # iMessage (local SQLite)
│   ├── apple_notes.py  # Apple Notes (local SQLite)
│   ├── chrome.py       # Chrome history (local SQLite)
│   ├── documents.py    # PDF, Word, Excel, PPT (inbox folder)
│   ├── claude_export.py# Claude conversation exports
│   ├── plaid_finance.py# Plaid transactions, balances, holdings
│   └── manual.py       # CLI / MCP manual entries
├── mcp_server/         # MCP + REST API server
│   ├── server.py       # FastAPI: MCP JSON-RPC, OAuth 2.0, REST
│   ├── tools.py        # 8 tool implementations + schemas
│   └── auth.py         # API key validation
├── web/                # Status dashboard (Flask)
├── scripts/            # CLI, backup, utilities
├── tunnel/             # Cloudflare Tunnel setup
├── config.yaml         # All configuration
├── setup.sh            # One-command setup
└── requirements.txt    # Python dependencies

FAQ

Q: How is this different from just connecting Gmail/Google Calendar to Claude?

Native connectors give Claude live access to one service at a time. Second Brain is fundamentally different in three ways:

Unified cross-source search — It combines Gmail, Calendar, OneDrive, and more into a single semantic index. Ask "what's happening with Project X" and it searches emails, calendar invites, and documents simultaneously. Native connectors can't cross-reference across sources.
Persistent memory with semantic understanding — Native connectors are stateless keyword searches. Second Brain pre-processes your data into vector embeddings, so it understands meaning, not just keywords. Ask "what financial decisions have I made recently" and it pulls budget spreadsheets, investment emails, and financial advisor meetings — by meaning, across every source.
Observability and feedback — Query patterns are tracked, source quality is ranked, and stale data is flagged automatically. The auto-curator surfaces recommendations so you know what's working and what isn't. Native connectors give you zero visibility into result quality.

Q: What data sources are supported?

Currently: Gmail, Google Calendar, and OneDrive. The architecture supports any source — iMessage, local files, Chrome history, and financial data (via Plaid) are on the roadmap. Adding a new source means writing one Python module that follows the existing pattern.

Q: Where does my data live?

Everything stays on your machine. Second Brain runs entirely self-hosted — your data is stored in a local ChromaDB vector database and SQLite. Nothing is sent to external servers except through the Cloudflare tunnel to YOUR Claude session. No third-party analytics, no telemetry, no cloud storage.

Q: Do I need a Mac Mini?

No. Any always-on machine works — a Linux server, a Raspberry Pi, an old laptop, a cloud VM. The Mac Mini is just what we used. You need Python 3.11+, about 500MB of disk space, and a network connection.

Q: Can I use this with ChatGPT or other LLMs?

The MCP server exposes a standard HTTP API. While the current setup is optimized for Claude's MCP connector protocol, the underlying REST endpoints can be adapted for any LLM that supports tool use or function calling.

Q: Is this secure?

The MCP server requires API key authentication for all data-access operations. OAuth tokens are stored with 0600 permissions. The Cloudflare tunnel provides HTTPS encryption. No credentials are stored in the codebase. That said — this is a personal project, not enterprise software. Review the security model before deploying with sensitive data.

Roadmap

Phase 1: Core (manual, documents, Claude export)
Phase 2: Cloud sources (Gmail, Calendar)
Phase 3: More sources (OneDrive)
Phase 4: MCP server + Claude.ai integration
Phase 5: Observability (query logging, feedback, auto-curator)
Phase 6: Local macOS sources (iMessage, Apple Notes, Chrome)
Phase 7: Financial data (Plaid integration)
Phase 8: Automated quality actions (re-chunking, re-weighting based on feedback)
Phase 9: Multi-device sync and mobile access

Security Notes

All data is stored locally on your machine — nothing leaves unless you set up a tunnel
FileVault (full-disk encryption) is strongly recommended
OAuth tokens are stored with 0600 permissions (owner-only)
Atomic file writes prevent token corruption during concurrent access
API key authentication on all data-access endpoints
Rate limiting (60 req/min) on the MCP server

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Second Brain

Architecture

What Makes This Different

Supported Data Sources

Prerequisites

Quick Start

Enable Gmail & Calendar (Phase 2)

Enable OneDrive (Phase 3)

Connect to Claude.ai

MCP Tools

Project Structure

FAQ

Roadmap

Security Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
brain		brain
ingestion		ingestion
mcp_server		mcp_server
scripts		scripts
tunnel		tunnel
web		web
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Second Brain

Architecture

What Makes This Different

Supported Data Sources

Prerequisites

Quick Start

Enable Gmail & Calendar (Phase 2)

Enable OneDrive (Phase 3)

Connect to Claude.ai

MCP Tools

Project Structure

FAQ

Roadmap

Security Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages