The agent-native knowledge OS — turn messy documents (PDFs, scans, DOCX) into a clean, verifiable, Markdown knowledge base that LLMs and AI agents read, search, and curate.
Self-hosted · any LLM · any embedder · structure-first, RAG-optional · no vector lock-in
Why Tome? Most "chat with your docs" stacks shred files into anonymous vector chunks and hope similarity search returns the right one — losing structure, dropping tables and numbers, impossible to correct. Tome keeps the document's real structure, proves it didn't silently lose content (a faithfulness gate), and exposes the base the way an agent actually navigates knowledge: folder → document → section + a living Atlas — over REST · MCP · a React Library UI. Vectors are an optional enhancement, never the core.
extract → structure (LLM) → verify faithfulness → sections + retrieval chunks
→ hybrid search (BM25 + vectors + reranker) → hierarchical Atlas
→ REST + MCP + Library UI
Highlights: 📄 pluggable extraction (top-10) · ✅ faithfulness guarantee · 🌍 auto language detection (right OCR languages per doc) · 🧭 hierarchical Atlas · 🔎 hybrid search (BM25 + pgvector + knowledge graph → RRF + reranker) · ✍️ full editing & versioning · 🧠 agent memory (Markdown-native, auto-capture) · 🤖 read and write over MCP · 🔒 secure-by-default · 🏠 fully-local mode.
Product overview, goals & strategy: PRODUCT.md.
Most "chat with your docs" stacks shred a file into anonymous vector chunks and hope similarity search returns the right fragment. That loses structure, drops tables and numbers during OCR, can't be corrected, and feeds agents context-free snippets they tend to hallucinate around.
Tome takes a different path. It runs each document through a transparent pipeline that preserves the document's real structure (folders → documents → sections), proves the result didn't silently lose content (a faithfulness gate), and exposes the base the way an agent actually navigates knowledge — read the map, open a folder, list a document's sections, fetch the exact section. It's not a one-shot index: documents stay editable, versioned, and re-importable with conflict resolution, so corrections survive.
- Agents read structured documents, not opaque chunks. Over MCP an agent calls
get_atlas → list_folders → list_documents → list_sections → get_section, retrieving whole coherent sections with headings and breadcrumbs. Answers are traceable to a real section, not stitched from 200-token fragments → far less hallucination. - A faithfulness guarantee you can trust. OCR and LLM structuring silently drop tables, numbers, and whole sections. Tome verifies the assembled Markdown against the raw extract — content coverage, number reconciliation, cleanliness — escalates on failure, and stores a per-document faithfulness score. You know the KB matches the source.
- No vector lock-in — runs on plain Postgres. Tree (
ltree), full-text BM25 (tsvector), and optional semantic ANN (pgvector) live in one database; hybrid search fuses them with RRF + an optional reranker. Bring any LLM (OpenAI, Azure, Anthropic, local Ollama/vLLM) and any embedder. Graceful degradation: no pgvector → pure BM25; no LLM key → raw text. One dependency, your models, your infra. - Handles real-world documents. Pluggable extraction with a top-10 roster —
Tika, Docling, Marker, Azure Document Intelligence, AWS Textract, Google Document AI,
Mistral OCR, Unstructured, LlamaParse, vision-LLM — with smart
primary → fallbackrouting for scanned or poor-quality pages, plus large-PDF splitting. - A living knowledge base, not a frozen index. Humans and agents edit sections
(optimistic
revlocking, full revision history), reorganize folders, and re-import updated sources with per-section 3-way conflict resolution (keep manual edits vs. take the new import). Corrections persist across re-ingests. - Built-in agent memory (Markdown-native). Tome doubles as an agent's long-term
memory:
remember / recall / observe / consolidate / forgetover REST and MCP. Memories are plain Markdown (no proprietary store), tiered working → episodic → semantic → procedural with LLM consolidation, scoped per agent (shared or private), secret-redacted on write, reinforced on recall, and fade via decay/GC. A drop-in auto-capture hook gives any agent memory with zero code. - Three first-class interfaces. A REST API, 30 MCP tools (read + write + memory,
so agents can grow the base and remember across sessions — including
ingest_markdownfor ready Markdown andingest_filefor binary files-with-processing, both into a folder tree), and a polished React Library UI with a real folder tree (create / rename / move / drag-to-file), a TOC-based document reader, a navigable Atlas map, a tabbed Admin, search, and a Memory browser. - A map for agents (the Atlas). A generated, always-current overview of the whole base (folder tree, document counts, summaries) that an agent reads first to orient itself — which markedly improves multi-step retrieval.
- Secure and self-hosted by default. Identity with users / roles (admin · editor · viewer) / sessions, scope-based RBAC, short-lived signed asset URLs (no token in the URL), non-root containers, and internal services kept off the network. Your data never leaves your perimeter.
┌─────────── ingest ───────────┐
file → extract → structure (LLM) → verify (faithfulness gate) → vision
│ │ │ pass/escalate │ (figures
top-10 headings, ▼ │ described
routing sections name + auto-folder │ & classified)
│
split → index (BM25 + chunks + embeddings) → Atlas
│
┌──────────────────────┼───────────────────────┐
REST API MCP tools Library UI
(apps, CI) (Claude/Cursor/agents) (humans)
Everything persists in PostgreSQL: folder tree, documents, sections, revisions, retrieval chunks, Atlas, jobs, and a transactional outbox for object-store/webhook consistency.
Tome isn't only a document KB — it's also a persistent memory an agent can grow and reuse, stored as ordinary Markdown (never a proprietary object model). Memory lives in its own namespace, so it never pollutes the document tree or Atlas, yet it's searched by the same hybrid (BM25 + optional vectors) machinery.
- Tiers (automatic consolidation).
working(raw observations) →episodic(per-session summary) →semantic(durable facts) →procedural(how-tos).consolidatedistils a session's observations into an episodic summary and promotes durable facts — via the configured LLM, or a deterministic raw roll-up offline. - Hygiene built in. Secrets (API keys, tokens, PEM blocks,
<private>…</private>) are redacted before storage; recall reinforces importance; old low-value memories decay and are evicted (tome memory-gc); contradictions are resolved by supersession (write with the samemkey). - Per-agent scoping.
sharedmemories are workspace-wide;agentmemories are private to the writingagent_id(X-Agent-Idheader /agent_idparam). Default is set byMEMORY_SCOPE(shared|isolated). - Surfaces. REST
POST /v1/memory,GET /v1/memory/recall?q=,/observe,/consolidate,GET/DELETE /v1/memory/{id}; MCP toolsremember · recall · list_memory · observe · consolidate · forget; a Memory tab in the Library UI. - Zero-code auto-capture. Drop in the Claude Code hook to
observeon tool use andconsolidateat the end of a turn.
# remember a fact (redacted, Markdown), then recall it later
curl -XPOST localhost:8080/v1/memory -H 'Content-Type: application/json' \
-H 'X-Agent-Id: my-agent' -d '{"content":"## Pref\n\nUser prefers metric units.","mkey":"user.units"}'
curl 'localhost:8080/v1/memory/recall?q=units' -H 'X-Agent-Id: my-agent'| Traditional vector RAG | Tome | |
|---|---|---|
| Retrieval unit | anonymous text chunks | whole sections with headings & breadcrumbs |
| Source fidelity | no guarantee (silent OCR/parse loss) | faithfulness gate + stored score |
| Editing | re-embed everything | section edits, versioning, conflict resolution |
| Agent access | similarity search only | navigable hierarchy + search over MCP |
| Infrastructure | app + separate vector DB (+ more) | a single Postgres |
| Lock-in | embedder + vector store | pluggable; BM25 works with no vectors at all |
| Access model | usually bolted on later | secure-by-default identity + RBAC |
- Technical documentation & manuals (Tome's origin: industrial-equipment manuals, full of scanned tables and figures) made queryable by an AI assistant.
- Internal knowledge bases for support, ops, or engineering agents.
- Agent long-term memory — first-class, Markdown-native memory (tiers, decay, redaction, per-agent scoping) an agent reads and curates via MCP/REST, with drop-in auto-capture.
- Regulated / air-gapped environments that need everything self-hosted, with no data leaving the network and no third-party vector service.
- Pluggable extraction (
tome/extract/): top-10 — Tika, Docling, Marker, Azure DI, AWS Textract, Google DocAI, Mistral OCR, Unstructured, LlamaParse, vision-LLM (+ passthrough for md/txt/html). Routing: primary → fallback for scanned/poor pages. Docling is the recommended path for complex docs (faithful GFM tables + reading order). Each adapter is labeled verified vs. experimental — seeGET /v1/extractors; install only the extras you use. - Pluggable LLM (
tome/llm/): OpenAI / Azure OpenAI / Anthropic / xAI / Ollama / vLLM. Separate models for structuring / vision / naming / atlas. - Pluggable embedder (
tome/embed/): OpenAI-compatible + local BGE/e5. - Pipeline (
tome/pipeline/): extract → structure → verify (faithfulness) → vision (+ image classification) → name → split (+ section normalization) → index (tsvector + retrieval chunks + embeddings) → atlas. - PostgreSQL (
tome/db.py,tome/store.py): folder tree (ltree), documents, versions, sections (hierarchy), retrieval chunks (pgvector), Atlas, jobs, outbox. Atomic document writes. Hybrid search (BM25 ∪ ANN → RRF → reranker). - Seamless editing: section edits with
rev(optimistic locking → 409), revisions,manually_editedflag, document versions, per-section re-import conflict resolution. - Identity & access (secure-by-default): users + passwords (pbkdf2-sha256), opaque session tokens, roles → scope RBAC, first-run bootstrap, master key + service API keys; assets via short-lived signed URLs; non-root containers.
- REST API (
api/, FastAPI) + MCP (mcp_server/) + Library UI (webui/, React + Vite). - Agent memory (
tome/memory.py): Markdown-native, tiered (working/episodic/ semantic/procedural), per-agent scoping, secret redaction, decay/GC, supersession — over REST + MCP, with a drop-in auto-capture hook (examples/hooks/). - Knowledge graph (
tome/graph.py): entities + co-occurrence relations derived deterministically from Markdown (no graph DB), fused into hybrid search as a third signal; browse it in the UI or over MCP (list_entities/get_entity). - Multilingual OCR (
tome/lang.py): AI language pre-analysis detects each document's real language(s) and re-scans with the correct OCR engine languages — no more garbled mixed-language scans. - CLI (
tome/cli.py):tome init-db | ingest | status | eval | gc | dedup | reindex | memory-gc | graph-rebuild | demo-seed | export-all.
cp .env.example .env # set LLM keys + TOME_SECRET (see Configuration)
docker compose up -d --build
# Library UI: http://localhost:3000
# REST + Swagger: http://localhost:8080/docs
# MCP / OpenAPI: http://localhost:8765/docsStack: gateway, worker×2, mcp, postgres (pgvector), tika, minio, webui. Only gateway (8080), mcp (8765), and webui (3000) are published; postgres, tika, and minio stay on the internal Compose network.
Tome requires authentication unless TOME_OPEN=true. On first launch the Library UI
shows a "Create the first administrator" screen. Or via API:
curl -X POST localhost:8080/v1/auth/bootstrap \
-H 'Content-Type: application/json' \
-d '{"email":"admin@example.com","password":"change-me-8+"}'You can also seed the admin from env on first start with TOME_ADMIN_EMAIL /
TOME_ADMIN_PASSWORD. For a personal localhost-only instance, set TOME_OPEN=true
to disable auth entirely.
python -m venv .venv && . .venv/Scripts/activate # Windows: .venv\Scripts\activate
pip install -e .
cp .env.example .env # set POSTGRES_DSN + an LLM key
tome init-db
uvicorn api.main:app --reload --port 8080 # gateway + UI + in-process worker
# MCP separately: python -m mcp_server.server # stdio (Claude Desktop / Cursor)LLM_PROVIDER=ollama
OPENAI_BASE_URL=http://localhost:11434/v1
EMBED_PROVIDER=local
EMBED_MODEL=BAAI/bge-m3
EXTRACT_PRIMARY=tika
TOME_OPEN=true # personal localhost mode, no sign-inData never leaves your perimeter; a knowledge base for a personal LLM agent over MCP.
docker compose -f docker-compose.yml -f docker-compose.local.yml up -d --buildThis overlay needs no cloud keys: Tika extraction + a local embedder + (optional)
Ollama for structuring. It defaults to the hash embedder (deterministic,
zero-download lexical semantics). The overlay bakes the fastembed extra into the
image (via the TOME_EXTRAS=fastembed build arg), so switching to real semantic
embeddings needs no manual rebuild — just set EMBED_PROVIDER=fastembed
(fastembed downloads a small ONNX model on first use). For structuring, run Ollama and
ollama pull a model (otherwise structuring falls back to raw text). Hybrid search
works either way (BM25 + vectors → RRF).
Everything is set via .env (see .env.example): providers (LLM / embed / extract),
limits/thresholds (faithfulness, section/chunk sizes, concurrency), and access.
In Azure the model name is the deployment name (not gpt-4o). If you have a
single deployment, use its name for all four LLM_*_MODEL values:
LLM_PROVIDER=azure_openai
LLM_STRUCTURE_MODEL=<deployment-name>
LLM_VISION_MODEL=<deployment-name> # deployment must be multimodal
LLM_NAMING_MODEL=<deployment-name>
LLM_ATLAS_MODEL=<deployment-name>
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com
AZURE_OPENAI_KEY=<key>
AZURE_OPENAI_API_VERSION=2024-12-01-preview
# embeddings via Azure (if you have a text-embedding-* deployment):
EMBED_PROVIDER=azure_openai
EMBED_MODEL=<embeddings-deployment-name>
# no embeddings deployment? → EMBED_ENABLED=false (search falls back to BM25)Reasoning models (gpt-5.x / o-series) are handled automatically: the adapter sends
max_completion_tokens and omits temperature.
Document metadata, sections, and search indexes live in PostgreSQL. Binary
blobs (original uploads, extracted figures, version snapshots) go to an object
store under keys like sources/<id>/<file> and pending/<id>/<hash>.md:
- Default (local FS): written under
_store/(gitignored). In Docker this is the named volume/app/_store— it lives in Docker's volume area, not in your repo or working tree. SetSTORAGE_DIR=/abs/pathto relocate it (e.g./var/lib/tome). - Production: set
S3_USE=true(MinIO/S3) so nothing is written to local disk.
Tome never writes data files into the source tree. (If you ran an early local build
and see stray sources//pending/ folders in the project root, they are pre-_store
artifacts — safe to delete; they are gitignored.)
{
"mcpServers": {
"tome": {
"command": "python", "args": ["-m", "mcp_server.server"],
"env": { "POSTGRES_DSN": "postgresql://...", "OPENAI_API_KEY": "..." }
}
}
}Recommended agent flow: get_atlas → list_folders → list_documents → list_sections → get_section (with search when the target isn't known yet).
pip install pytest
# unit (no DB):
python -m pytest tests/test_pipeline.py tests/test_units2.py -q
# integration (needs Postgres; creates/drops the tome_test schema):
TOME_TEST_DSN=postgresql://... python -m pytest tests/test_integration.py -qUnit suites cover clean/split/chunk/faithfulness, the extractor registry, conflict diff, export, GC, dedup, rate-limit, Atlas, password hashing, and signed-URL verification (incl. expiry/tamper). Integration runs against real Postgres.
tome/
├── tome/ # core (config, db, store, schema.sql, signing, cli, worker)
│ ├── llm/ # pluggable LLM providers
│ ├── extract/ # pluggable extractors (+ pdfutil)
│ ├── embed/ # pluggable embedders
│ ├── prompts/ # system prompts (files)
│ └── pipeline/ # pipeline stages + orchestrator run.py
├── api/ # FastAPI gateway (REST + auth)
├── mcp_server/ # MCP (read + write)
├── webui/ # React + Vite Library UI (nginx, proxies /v1 → gateway)
├── tests/ # unit + integration tests
├── Dockerfile, docker-compose.yml, .env.example
└── PRODUCT.md # product overview & strategy
This is a working MVP, verified end-to-end (ingest → faithfulness → search → edit → export; identity + RBAC; Docker Compose stack; unit + integration tests green). Contributions and issues welcome.
MIT © 2026 zsembek.