vectorlessflow · zTgx · Apr 5, 2026 · Apr 5, 2026 · Apr 5, 2026 · Apr 5, 2026
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "vectorless"
-version = "0.1.13"
+version = "0.1.14"
 edition = "2024"
 authors = ["zTgx <beautifularea@gmail.com>"]
 description = "Hierarchical, reasoning-native document intelligence engine"

diff --git a/docs/README.md b/docs/README.md
@@ -1,35 +1,51 @@
 # Vectorless Documentation
 
-## Brand Assets
+Welcome to the Vectorless documentation.
 
-Logos and icons for use in README, website, and presentations.
+## What is Vectorless?
 
-- [assets/brand/](assets/brand/) — Logo variants (light, dark, horizontal, icon)
+Vectorless is a **reasoning-native document intelligence engine** that uses LLM-powered tree navigation instead of vector embeddings. It preserves document structure and uses intelligent navigation to find relevant content.
 
-## Design Documents
+## Key Features
 
-System architecture and core mechanism documentation.
+- **Dual Pipeline Architecture** - Separate Index and Retrieval pipelines
+- **Pilot System** - LLM-guided navigation with layered fallback
+- **Multi-Strategy Retrieval** - Keyword, LLM, and Structure-aware strategies
+- **Zero Infrastructure** - No vector database, no embeddings
+- **Multi-Format Support** - Markdown, PDF, DOCX, HTML
 
-| Document | Description |
-|----------|-------------|
-| [architecture.svg](design/architecture.svg) | System architecture diagram |
-| [recovery.md](design/recovery.md) | Graceful degradation and error recovery strategy |
+## Getting Started
 
-## Development Guides
+- [Quick Start Guide](guides/quick-start.md) - Get up and running in 5 minutes
 
-Guides for using and contributing to Vectorless.
+## Guides
 
 | Guide | Description |
 |-------|-------------|
-| [deployment.md](guides/deployment.md) | Production deployment checklist |
+| [Quick Start](guides/quick-start.md) | Get up and running quickly |
+| [Dual Pipeline](guides/dual-pipeline.md) | Understand Index + Retrieval pipelines |
+| [Pilot System](guides/pilot-system.md) | LLM-guided navigation |
+| [Multi-Strategy Retrieval](guides/multi-strategy.md) | Keyword, LLM, Structure strategies |
+
+## Design Documents
+
+System architecture and core mechanism documentation.
+
+| Document | Description |
+|----------|-------------|
+| [pilot.md](design/pilot.md) | Pilot system design |
+| [content-aggregation.md](design/content-aggregation.md) | Content aggregation design |
+| [client-module.md](design/client-module.md) | Client API design |
+| [v3.md](design/v3.md) | Version 3 architecture |
 
 ## RFCs (Feature Proposals)
 
 Detailed design documents for new features.
 
 | RFC | Title | Status |
 |-----|-------|--------|
-| [0001](rfcs/0001-docx-parser.md) | DOCX Parser | Proposed |
+| [0001](rfcs/0001-docx-parser.md) | DOCX Parser | Implemented |
+| [0002](rfcs/0002-html-parser.md) | HTML Parser | Implemented |
 
 ### RFC Process
 

diff --git a/docs/guides/README.md b/docs/guides/README.md
@@ -1 +1,3 @@
-# Guide
+# Vectorless Guides
+
+Practical guides for using Vectorless effectively.
diff --git a/docs/guides/dual-pipeline.md b/docs/guides/dual-pipeline.md
@@ -0,0 +1,152 @@
+# Understanding the Dual Pipeline
+
+Vectorless uses a **dual pipeline architecture** that separates document processing from retrieval. This design enables efficient indexing and intelligent retrieval.
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                           Vectorless Architecture                            │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│   ┌─────────────────────────────┐     ┌─────────────────────────────┐      │
+│   │      INDEX PIPELINE         │     │    RETRIEVAL PIPELINE       │      │
+│   │                             │     │                             │      │
+│   │  Parse → Build → Enrich    │     │  Analyze → Plan → Search    │      │
+│   │    ↓       ↓       ↓       │     │     ↓        ↓       ↓      │      │
+│   │  Enhance → Optimize →      │     │  Evaluate (Sufficiency)     │      │
+│   │    Persist                  │     │     ↑_____________│         │      │
+│   │                             │     │     │ (NeedMoreData)│         │      │
+│   └─────────────────────────────┘     └─────────────────────────────┘      │
+│                 │                                    ▲                      │
+│                 └──────────── Workspace ─────────────┘                       │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Index Pipeline
+
+The Index Pipeline processes documents and builds a searchable tree structure.
+
+### Stages
+
+| Stage | Purpose |
+|-------|---------|
+| **Parse** | Extract content from file (MD, PDF, DOCX, HTML) |
+| **Build** | Construct hierarchical document tree |
+| **Enrich** | Add metadata, TOC, references |
+| **Enhance** | Generate summaries (optional) |
+| **Optimize** | Prune, compress, optimize tree |
+| **Persist** | Save to workspace storage |
+
+### Example
+
+```rust
+// Index pipeline is triggered automatically
+let doc_id = engine.index(IndexContext::from_path("./manual.md")).await?;
+
+// With summary generation
+let doc_id = engine.index(
+    IndexContext::from_path("./manual.md")
+        .with_options(IndexOptions::new().with_summaries())
+).await?;
+```
+
+## Retrieval Pipeline
+
+The Retrieval Pipeline processes queries and retrieves relevant content.
+
+### Stages
+
+| Stage | Purpose |
+|-------|---------|
+| **Analyze** | Analyze query complexity, extract keywords |
+| **Plan** | Select retrieval strategy and algorithm |
+| **Search** | Navigate tree to find candidates |
+| **Evaluate** | Check sufficiency, aggregate content |
+
+### The Evaluate Stage
+
+The Evaluate stage is crucial - it determines if retrieved content is sufficient:
+
+```text
+                    ┌─────────────┐
+                    │   Search    │
+                    └──────┬──────┘
+                           │
+                           ▼
+                    ┌─────────────┐
+                    │  Evaluate   │
+                    └──────┬──────┘
+                           │
+              ┌────────────┼────────────┐
+              │            │            │
+              ▼            ▼            ▼
+        Sufficient    PartialSufficient  Insufficient
+              │            │            │
+              ▼            ▼            ▼
+           Return      More Search    Expand Beam
+                       (1 iteration)  (2 iterations)
+```
+
+### Retrieval Strategies
+
+```rust
+// Three built-in strategies:
+
+// 1. Keyword - Fast, exact matching
+// 2. LLM - Semantic understanding via Pilot
+// 3. Structure - Hierarchy-aware navigation
+```
+
+## The Pilot System
+
+Pilot is the "brain" of the Retrieval Pipeline:
+
+- **Query Analysis**: Understands what the user is asking
+- **Context Building**: Creates navigation context from TOC
+- **Decision Making**: Decides which branches to explore
+- **Fallback**: Algorithm takes over when LLM fails
+
+See [The Pilot System](./pilot-system.md) for details.
+
+## Data Flow
+
+```
+Document ──► Index Pipeline ──► Workspace
+                                       │
+Query ──► Retrieval Pipeline ──────────┘
+                    │
+                    ▼
+              RetrievalResult
+              ├── content
+              ├── node_ids
+              ├── confidence
+              └── trace
+```
+
+## Session-Based Operations
+
+For multi-document operations, use sessions:
+
+```rust
+// Create a session
+let session = engine.session().await;
+
+// Index multiple documents
+session.index(IndexContext::from_path("./doc1.md")).await?;
+session.index(IndexContext::from_path("./doc2.md")).await?;
+
+// Query across all documents
+let results = session.query_all("What is the architecture?").await?;
+
+for result in results {
+    println!("From {}: {}", result.doc_id, result.content);
+}
+```
+
+## See Also
+
+- [Multi-Strategy Retrieval](./multi-strategy.md)
+- [Content Aggregation](./content-aggregation.md)
+- [Sufficiency Checking](./sufficiency.md)
diff --git a/docs/guides/quick-start.md b/docs/guides/quick-start.md
@@ -0,0 +1,89 @@
+# Quick Start Guide
+
+Get up and running with Vectorless in 5 minutes.
+
+## Prerequisites
+
+- Rust 1.70+ installed
+- An OpenAI API key (or compatible LLM endpoint)
+
+## Installation
+
+Add to your `Cargo.toml`:
+
+```toml
+[dependencies]
+vectorless = "0.1"
+tokio = { version = "1", features = ["full"] }
+```
+
+## Basic Usage
+
+```rust
+use vectorless::{Engine, IndexContext};
+
+#[tokio::main]
+async fn main() -> Result<(), Box<dyn std::error::Error>> {
+    // 1. Create an engine with OpenAI
+    let engine = Engine::builder()
+        .with_workspace("./workspace")
+        .with_openai(std::env::var("OPENAI_API_KEY")?)
+        .build()
+        .await?;
+
+    // 2. Index a document
+    let doc_id = engine.index(IndexContext::from_path("./manual.md")).await?;
+    println!("Indexed: {}", doc_id);
+
+    // 3. Query the document
+    let result = engine.query(&doc_id, "How do I configure authentication?").await?;
+    println!("Answer: {}", result.content);
+
+    Ok(())
+}
+```
+
+## Index from Different Sources
+
+```rust
+// From file path
+let id1 = engine.index(IndexContext::from_path("./doc.pdf")).await?;
+
+// From string content
+let html = "<html><body><h1>Title</h1><p>Content</p></body></html>";
+let id2 = engine.index(
+    IndexContext::from_content(html, vectorless::parser::DocumentFormat::Html)
+        .with_name("webpage")
+).await?;
+
+// From bytes (e.g., from HTTP response)
+let pdf_bytes = std::fs::read("./document.pdf")?;
+let id3 = engine.index(
+    IndexContext::from_bytes(pdf_bytes, vectorless::parser::DocumentFormat::Pdf)
+).await?;
+```
+
+## Index Modes
+
+```rust
+use vectorless::IndexMode;
+
+// Default: Skip if already indexed
+engine.index(IndexContext::from_path("./doc.md")).await?;
+
+// Force: Always re-index
+engine.index(
+    IndexContext::from_path("./doc.md").with_mode(IndexMode::Force)
+).await?;
+
+// Incremental: Only re-index if changed
+engine.index(
+    IndexContext::from_path("./doc.md").with_mode(IndexMode::Incremental)
+).await?;
+```
+
+## Next Steps
+
+- [Understanding the Dual Pipeline](./dual-pipeline.md) - Learn how Vectorless works
+- [Indexing Documents](./indexing.md) - Deep dive into document indexing
+- [Querying Documents](./querying.md) - Advanced query techniques