Skip to content

vericontext/rawtowise

Repository files navigation

RawToWise

LLM Knowledge Compiler — drop raw documents, get a structured markdown wiki.

Install · Quick Start · How It Works · Contributing


demo.mp4
raw/ (papers, articles, URLs)
  → rtw compile → wiki/ (structured .md with backlinks)
                    → rtw query → answers accumulate in wiki
                    → rtw lint  → detect contradictions, fill gaps

Inspired by Andrej Karpathy's LLM knowledge base workflow. Turn his "hacky collection of scripts" into a real tool.

Why RawToWise?

Problem RawToWise
RAG requires vector DB infra No vector DB — LLM navigates via index + backlinks
Chat answers disappear Exploration = accumulation — every query enriches the wiki
PKM requires manual organizing Drop and forget — put files in raw/, LLM handles the rest
Vendor lock-in (NotebookLM, etc.) Plain markdown — works in Obsidian, VSCode, or any editor

Install

curl -fsSL https://raw.githubusercontent.com/vericontext/rawtowise/main/install.sh | bash
Other install methods
# Via pipx
pipx install git+https://github.com/vericontext/rawtowise.git

# Via uv
uv tool install git+https://github.com/vericontext/rawtowise.git

# From source
git clone https://github.com/vericontext/rawtowise.git && cd rawtowise && pip install -e .

Requires an Anthropic API key. The rtw init command will prompt you to set it up.

Quick Start

# 1. Initialize a project
rtw init --name "AI Research"

# 2. Ingest sources
rtw ingest https://example.com/article
rtw ingest "https://en.wikipedia.org/wiki/Transformer_(deep_learning)"
rtw ingest paper.pdf
rtw ingest ./my-articles/

# 3. Compile into a wiki
rtw compile

# 4. Ask questions (answers stream in real-time)
rtw query "What are the key debates in this field?"

# 5. Health check
rtw lint

How It Works

Ingest — Fetch URLs (via Jina Reader), copy local files, and clean web boilerplate. Sources are stored in raw/.

Compile — LLM extracts key concepts from all sources, generates interlinked wiki articles with [[backlinks]] and [source: filename] citations, and builds an index. Articles are generated in parallel for speed.

Query — LLM reads the wiki index, finds relevant articles, and synthesizes an answer. Answers stream to the terminal and are saved to output/ for future reference.

Lint — LLM audits the wiki for contradictions, coverage gaps, stale information, and suggests new questions to explore.

Commands

Command Description
rtw init Initialize a new project (creates dirs + config, prompts for API key)
rtw ingest <source> Ingest URL, file, or directory into raw/
rtw compile Compile sources into wiki (incremental by default)
rtw compile --full Full recompile from scratch
rtw compile --dry-run Estimate token usage and cost
rtw query "question" Ask the wiki (streamed output)
rtw query "..." --format table Output as markdown table
rtw query "..." --deep Deep research mode (longer output)
rtw lint Run wiki health check
rtw stats Show wiki statistics

Project Structure

my-research/
├── rtw.yaml              # Configuration
├── .env                  # API key (auto-created by rtw init, gitignored)
├── raw/                  # Raw sources — you add files here
│   ├── articles/         #   Web articles (auto-sorted)
│   └── papers/           #   PDFs (auto-sorted)
├── wiki/                 # LLM-generated wiki — don't edit manually
│   ├── _index.md         #   Master index
│   ├── _sources.md       #   Source catalog
│   └── concepts/         #   Concept articles with [[backlinks]]
├── output/               # Query results
│   └── queries/          #   Saved answers
└── .rtw/                 # Internal state (compile state, debug logs)

Configuration

rtw.yaml (auto-generated by rtw init):

version: 1
name: "My Research"

llm:
  compile: claude-sonnet-4-6      # Fast model for compilation
  query: claude-sonnet-4-6        # Query answering
  lint: claude-haiku-4-5-20251001 # Economical model for health checks

compile:
  strategy: incremental
  max_concepts: 200
  language: en                    # Wiki language

Viewing the Wiki

The compiled wiki is plain markdown with [[wiki-links]]. Best viewed with:

  • Obsidian — open wiki/ as a vault. Graph view shows concept connections.
  • VSCode + Foam[[backlink]] support with graph visualization.
  • Any markdown viewer — files are standard .md, readable anywhere.

Cost

RawToWise uses the Anthropic API. You pay only for what you use.

Operation Estimate
Ingest 1 article ~$0.02
Compile 5 sources ~$1-2
Single query ~$0.05-0.15
Lint ~$0.50

Use rtw compile --dry-run to estimate before compiling.

Roadmap

See open issues labeled roadmap for planned features, including:

  • PDF ingestion
  • YouTube transcript support
  • True incremental compile
  • Multi-LLM support (OpenAI, Ollama)
  • Obsidian plugin
  • MCP server for AI agents

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Uninstall

curl -fsSL https://raw.githubusercontent.com/vericontext/rawtowise/main/uninstall.sh | bash

License

MIT

About

LLM Knowledge Compiler — compiles raw documents into a structured markdown wiki. Inspired by Karpathy's workflow.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors