# NexusRAG Quick Start Guide

This notebook demonstrates the core NexusRAG workflow:
1. **Ingest** documents into the knowledge base
2. **Query** the system with natural language questions
3. **Inspect** sources and confidence scores
4. **Manage** documents (list, delete, clear)

In [None]:
import sys
sys.path.insert(0, '..')

from nexusrag.pipeline import NexusRAG
from nexusrag.config import get_settings

## 1. Initialize the Pipeline

NexusRAG uses lazy loading â€” components are only initialized when first accessed.

In [None]:
settings = get_settings()
rag = NexusRAG(settings)

print(f"LLM model: {settings.llm.model}")
print(f"Embedding model: {settings.embedding.model}")
print(f"Storage path: {settings.storage.lancedb_path}")

## 2. Ingest Documents

Upload PDF, DOCX, TXT, or MD files. The pipeline automatically:
- Parses the document and extracts text
- Chunks content into semantic segments
- Generates embeddings for retrieval
- Stores chunks in the vector database

In [None]:
# Ingest a single document
result = rag.ingest("../data/sample_paper.pdf")

print(f"Document ID: {result.document_id}")
print(f"Chunks created: {result.chunk_count}")
print(f"Word count: {result.word_count}")
print(f"Success: {result.success}")

## 3. Query the Knowledge Base

NexusRAG uses hybrid retrieval (dense + sparse) with self-correction
to validate retrieval quality before generating answers.

In [None]:
response = rag.query("What are the main findings of the study?")

print(f"Answer: {response.answer}\n")
print(f"Confidence: {response.confidence:.2f}")
print(f"Processing time: {response.processing_time_ms:.0f}ms")
print(f"\nSources ({len(response.sources)}):")
for i, source in enumerate(response.sources, 1):
    print(f"  [{i}] {source.document_name} (p.{source.page_number}, score: {source.score:.3f})")
    print(f"      {source.content[:100]}...")

## 4. Manage Documents

In [None]:
# List all documents
docs = rag.list_documents()
for doc in docs:
    print(f"  {doc['id'][:8]}... | {doc.get('filename', 'Unknown')} | {doc.get('word_count', 0)} words")

# System stats
stats = rag.get_stats()
print(f"\nTotal documents: {stats.total_documents}")
print(f"Total chunks: {stats.total_chunks}")
print(f"LLM available: {stats.llm_available}")

In [None]:
# Delete a specific document
# rag.delete_document(result.document_id)

# Or clear everything
# rag.clear_all()