# Retrieval-Augmented Generation (RAG)

**RAG** is a technique that enhances Large Language Models (LLMs) by retrieving relevant information from external knowledge sources before generating responses. This approach addresses key LLM limitations: hallucinations, outdated knowledge, and lack of domain-specific information.

## Table of Contents
1. [RAG Architecture](#1-rag-architecture)
2. [Document Loading](#2-document-loading)
3. [Chunking Strategies](#3-chunking-strategies)
4. [Embedding Models](#4-embedding-models)
5. [Vector Stores](#5-vector-stores)
6. [Retrieval Methods](#6-retrieval-methods)
7. [Reranking](#7-reranking)
8. [Evaluation Metrics](#8-evaluation-metrics)

---
## 1. RAG Architecture

RAG consists of two main phases: **Indexing** (offline) and **Retrieval & Generation** (online).

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           RAG PIPELINE OVERVIEW                              │
└─────────────────────────────────────────────────────────────────────────────┘

╔═══════════════════════════════════════════════════════════════════════════════╗
║                        INDEXING PHASE (Offline)                                ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║                                                                               ║
║   ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────────────┐    ║
║   │ Documents│────▶│  Loader  │────▶│ Chunking │────▶│ Embedding Model  │    ║
║   │ (PDF,    │     │          │     │          │     │                  │    ║
║   │  TXT,    │     │ Extract  │     │ Split    │     │ Text → Vectors   │    ║
║   │  HTML)   │     │ Text     │     │ Text     │     │ [0.1, 0.3, ...]  │    ║
║   └──────────┘     └──────────┘     └──────────┘     └────────┬─────────┘    ║
║                                                                │              ║
║                                                                ▼              ║
║                                                       ┌──────────────────┐    ║
║                                                       │   Vector Store   │    ║
║                                                       │ (FAISS, Chroma,  │    ║
║                                                       │  Pinecone, etc.) │    ║
║                                                       └──────────────────┘    ║
╚═══════════════════════════════════════════════════════════════════════════════╝

╔═══════════════════════════════════════════════════════════════════════════════╗
║                    RETRIEVAL & GENERATION PHASE (Online)                       ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║                                                                               ║
║   ┌──────────┐     ┌──────────┐     ┌──────────────────┐                     ║
║   │  User    │────▶│ Embedding│────▶│   Vector Store   │                     ║
║   │  Query   │     │  Model   │     │                  │                     ║
║   └──────────┘     └──────────┘     │  Similarity      │                     ║
║                                      │  Search          │                     ║
║                                      └────────┬─────────┘                     ║
║                                               │                               ║
║                                               ▼                               ║
║   ┌──────────────────────────────────────────────────────────────────────┐   ║
║   │                    Retrieved Documents (Top-K)                        │   ║
║   │  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐                      │   ║
║   │  │ Doc 1  │  │ Doc 2  │  │ Doc 3  │  │ Doc K  │                      │   ║
║   │  │Score:  │  │Score:  │  │Score:  │  │Score:  │                      │   ║
║   │  │ 0.95   │  │ 0.89   │  │ 0.85   │  │ 0.72   │                      │   ║
║   │  └────────┘  └────────┘  └────────┘  └────────┘                      │   ║
║   └──────────────────────────────────────────────────────────────────────┘   ║
║                                               │                               ║
║                                               ▼                               ║
║                                      ┌──────────────────┐                     ║
║                                      │    Reranker      │ (Optional)          ║
║                                      │ Cross-Encoder    │                     ║
║                                      └────────┬─────────┘                     ║
║                                               │                               ║
║                                               ▼                               ║
║   ┌──────────────────────────────────────────────────────────────────────┐   ║
║   │                         PROMPT TEMPLATE                               │   ║
║   │  ┌────────────────────────────────────────────────────────────────┐  │   ║
║   │  │ Context: {retrieved_documents}                                  │  │   ║
║   │  │                                                                  │  │   ║
║   │  │ Question: {user_query}                                          │  │   ║
║   │  │                                                                  │  │   ║
║   │  │ Answer based on the context above:                              │  │   ║
║   │  └────────────────────────────────────────────────────────────────┘  │   ║
║   └──────────────────────────────────────────────────────────────────────┘   ║
║                                               │                               ║
║                                               ▼                               ║
║                                      ┌──────────────────┐                     ║
║                                      │       LLM        │                     ║
║                                      │  (GPT-4, Claude, │                     ║
║                                      │   Llama, etc.)   │                     ║
║                                      └────────┬─────────┘                     ║
║                                               │                               ║
║                                               ▼                               ║
║                                      ┌──────────────────┐                     ║
║                                      │    Response      │                     ║
║                                      │ (Grounded in     │                     ║
║                                      │  retrieved docs) │                     ║
║                                      └──────────────────┘                     ║
╚═══════════════════════════════════════════════════════════════════════════════╝
```

### Why RAG?

| Challenge | Without RAG | With RAG |
|-----------|-------------|----------|
| **Hallucinations** | LLM may generate plausible but incorrect facts | Responses grounded in retrieved documents |
| **Outdated Knowledge** | Training data has a cutoff date | Access to current/updated documents |
| **Domain-Specific** | Generic knowledge only | Access to proprietary/specialized data |
| **Transparency** | Black-box responses | Can cite sources and provide provenance |
| **Cost** | Fine-tuning expensive for updates | Update knowledge base without retraining |

In [None]:
# Install required packages
# !pip install langchain langchain-openai langchain-community chromadb faiss-cpu
# !pip install llama-index llama-index-embeddings-openai llama-index-vector-stores-chroma
# !pip install sentence-transformers unstructured pypdf tiktoken
# !pip install ragas datasets

In [None]:
import os
from typing import List, Dict, Any

# Set up API keys (use environment variables in production)
# os.environ["OPENAI_API_KEY"] = "your-api-key"

---
## 2. Document Loading

The first step in RAG is loading documents from various sources. Both LangChain and LlamaIndex provide extensive document loaders.

```
┌─────────────────────────────────────────────────────────────────────┐
│                      DOCUMENT LOADERS                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │
│  │    PDF      │  │    HTML     │  │  Markdown   │  │    CSV      │ │
│  │  (.pdf)     │  │  (.html)    │  │   (.md)     │  │   (.csv)    │ │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘ │
│         │                │                │                │        │
│         ▼                ▼                ▼                ▼        │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │                     Document Loader                              ││
│  │            (Extracts text + metadata)                           ││
│  └─────────────────────────────────────────────────────────────────┘│
│         │                                                           │
│         ▼                                                           │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │  Document(page_content="...", metadata={"source": "...", ...}) ││
│  └─────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────┘
```

In [None]:
# ============================================================================
# LangChain Document Loaders
# ============================================================================

from langchain_community.document_loaders import (
    PyPDFLoader,
    TextLoader,
    DirectoryLoader,
    UnstructuredHTMLLoader,
    CSVLoader,
    WebBaseLoader,
)

# Load a PDF file
def load_pdf(file_path: str):
    """Load a PDF file and return documents."""
    loader = PyPDFLoader(file_path)
    documents = loader.load()
    print(f"Loaded {len(documents)} pages from PDF")
    return documents

# Load text files from a directory
def load_directory(dir_path: str, glob_pattern: str = "**/*.txt"):
    """Load all text files from a directory recursively."""
    loader = DirectoryLoader(
        dir_path,
        glob=glob_pattern,
        loader_cls=TextLoader,
        show_progress=True
    )
    documents = loader.load()
    print(f"Loaded {len(documents)} documents from directory")
    return documents

# Load from web URLs
def load_web_pages(urls: List[str]):
    """Load content from web pages."""
    loader = WebBaseLoader(urls)
    documents = loader.load()
    print(f"Loaded {len(documents)} web pages")
    return documents

# Example usage (commented out - requires actual files)
# pdf_docs = load_pdf("./data/document.pdf")
# web_docs = load_web_pages(["https://example.com/page1", "https://example.com/page2"])

In [None]:
# ============================================================================
# LlamaIndex Document Loaders (SimpleDirectoryReader)
# ============================================================================

from llama_index.core import SimpleDirectoryReader, Document

def load_documents_llamaindex(input_dir: str = None, input_files: List[str] = None):
    """
    Load documents using LlamaIndex's SimpleDirectoryReader.
    Supports: .pdf, .docx, .pptx, .jpg, .png, .mp3, .mp4, etc.
    """
    reader = SimpleDirectoryReader(
        input_dir=input_dir,
        input_files=input_files,
        recursive=True,
        required_exts=[".pdf", ".txt", ".md", ".docx"],  # Optional filter
    )
    documents = reader.load_data()
    print(f"Loaded {len(documents)} documents with LlamaIndex")
    return documents

# Create documents from text (useful for testing)
def create_sample_documents():
    """Create sample documents for demonstration."""
    texts = [
        "Machine learning is a subset of artificial intelligence that enables systems to learn from data.",
        "Deep learning uses neural networks with multiple layers to model complex patterns in data.",
        "Natural language processing (NLP) focuses on the interaction between computers and human language.",
        "Transformers are a type of neural network architecture that uses self-attention mechanisms.",
        "RAG combines retrieval systems with generative models to produce more accurate responses.",
    ]
    
    documents = [Document(text=text, metadata={"source": f"doc_{i}"}) for i, text in enumerate(texts)]
    return documents

# Create sample documents for later use
sample_docs = create_sample_documents()
print(f"Created {len(sample_docs)} sample documents")

---
## 3. Chunking Strategies

Documents must be split into smaller chunks for effective retrieval. The chunking strategy significantly impacts RAG performance.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         CHUNKING STRATEGIES                                  │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│ 1. FIXED-SIZE CHUNKING                                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Original:  [████████████████████████████████████████████████████████████]  │
│                                                                              │
│  Chunks:    [████████] [████████] [████████] [████████] [████████]          │
│              chunk_1    chunk_2    chunk_3    chunk_4    chunk_5            │
│                                                                              │
│  ⚠️ May split mid-sentence or mid-concept                                   │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│ 2. RECURSIVE CHARACTER SPLITTING                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Separators (in order): ["\n\n", "\n", " ", ""]                             │
│                                                                              │
│  Original:                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │ Paragraph 1: Lorem ipsum dolor sit amet...                          │    │
│  │                                                                     │    │
│  │ Paragraph 2: Consectetur adipiscing elit...                         │    │
│  │                                                                     │    │
│  │ Paragraph 3: Sed do eiusmod tempor...                               │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                          │                                                   │
│                          ▼                                                   │
│  ┌────────────────────┐ ┌────────────────────┐ ┌────────────────────┐       │
│  │ Chunk 1            │ │ Chunk 2            │ │ Chunk 3            │       │
│  │ (Paragraph 1)      │ │ (Paragraph 2)      │ │ (Paragraph 3)      │       │
│  └────────────────────┘ └────────────────────┘ └────────────────────┘       │
│                                                                              │
│  ✅ Respects document structure                                             │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│ 3. SEMANTIC CHUNKING                                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Sentences:  S1   S2   S3   S4   S5   S6   S7   S8   S9   S10               │
│              │    │    │    │    │    │    │    │    │    │                 │
│              ▼    ▼    ▼    ▼    ▼    ▼    ▼    ▼    ▼    ▼                 │
│  Embeddings: E1   E2   E3   E4   E5   E6   E7   E8   E9   E10               │
│                                                                              │
│  Similarity: ───0.9──0.85─┼─0.3─┼─0.88──0.92──0.87─┼─0.25─┼─0.9──          │
│                           │     │                  │      │                 │
│                        break  break             break  break                │
│                                                                              │
│  Chunks:    [S1, S2, S3] [S4] [S5, S6, S7, S8] [S9] [S10]                   │
│               Topic A    Topic B    Topic C    Topic D  Topic E             │
│                                                                              │
│  ✅ Maintains semantic coherence                                            │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│ 4. OVERLAP (Applied to any strategy)                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Without overlap:   [████████] [████████] [████████]                        │
│                                                                              │
│  With overlap:      [████████████]                                          │
│                          [████████████]                                      │
│                              [████████████]                                  │
│                     └──overlap──┘                                            │
│                                                                              │
│  ✅ Preserves context at chunk boundaries                                   │
└─────────────────────────────────────────────────────────────────────────────┘
```

In [None]:
# ============================================================================
# LangChain Text Splitters
# ============================================================================

from langchain.text_splitter import (
    CharacterTextSplitter,
    RecursiveCharacterTextSplitter,
    TokenTextSplitter,
)

# Sample document for demonstration
sample_text = """
Machine Learning Fundamentals

Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It focuses on developing algorithms that can access data and use it to learn for themselves.

Types of Machine Learning

Supervised Learning: The algorithm learns from labeled training data and makes predictions. Examples include classification and regression tasks.

Unsupervised Learning: The algorithm finds patterns in unlabeled data. Common techniques include clustering and dimensionality reduction.

Reinforcement Learning: The algorithm learns by interacting with an environment and receiving rewards or penalties for its actions.

Deep Learning

Deep learning is a subset of machine learning that uses neural networks with multiple layers (deep neural networks) to model complex patterns. It has achieved remarkable success in image recognition, natural language processing, and game playing.
"""

print(f"Original text length: {len(sample_text)} characters")

In [None]:
# 1. Character Text Splitter (Simple fixed-size)
char_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=200,
    chunk_overlap=50,
    length_function=len,
)

char_chunks = char_splitter.split_text(sample_text)
print(f"Character splitter produced {len(char_chunks)} chunks")
print(f"First chunk ({len(char_chunks[0])} chars):\n{char_chunks[0][:200]}...")

In [None]:
# 2. Recursive Character Text Splitter (Recommended for most use cases)
recursive_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n", " ", ""],  # Try to split on paragraphs first
)

recursive_chunks = recursive_splitter.split_text(sample_text)
print(f"Recursive splitter produced {len(recursive_chunks)} chunks\n")

for i, chunk in enumerate(recursive_chunks):
    print(f"Chunk {i+1} ({len(chunk)} chars):")
    print(f"{chunk[:100]}..." if len(chunk) > 100 else chunk)
    print("-" * 50)

In [None]:
# 3. Token-based Splitter (Useful for LLM context windows)
token_splitter = TokenTextSplitter(
    chunk_size=100,  # 100 tokens
    chunk_overlap=20,  # 20 token overlap
)

token_chunks = token_splitter.split_text(sample_text)
print(f"Token splitter produced {len(token_chunks)} chunks")

In [None]:
# ============================================================================
# Semantic Chunking (LangChain Experimental)
# ============================================================================

from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings

# Note: Requires OpenAI API key
def semantic_chunking(text: str):
    """
    Split text based on semantic similarity between sentences.
    Groups semantically similar sentences together.
    """
    embeddings = OpenAIEmbeddings()
    
    # Breakpoint types:
    # - "percentile": Split at sentences where similarity drops below a percentile
    # - "standard_deviation": Split based on standard deviation from mean
    # - "interquartile": Split based on interquartile range
    
    semantic_splitter = SemanticChunker(
        embeddings,
        breakpoint_threshold_type="percentile",
        breakpoint_threshold_amount=95,  # Lower = more chunks
    )
    
    chunks = semantic_splitter.split_text(text)
    return chunks

# Example usage (requires API key)
# semantic_chunks = semantic_chunking(sample_text)
# print(f"Semantic chunking produced {len(semantic_chunks)} chunks")

In [None]:
# ============================================================================
# LlamaIndex Chunking (Node Parsers)
# ============================================================================

from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser,
    HierarchicalNodeParser,
)
from llama_index.core import Document as LlamaDocument

# Create a LlamaIndex document
llama_doc = LlamaDocument(text=sample_text)

# 1. Sentence Splitter (Similar to RecursiveCharacterTextSplitter)
sentence_splitter = SentenceSplitter(
    chunk_size=256,
    chunk_overlap=32,
)

nodes = sentence_splitter.get_nodes_from_documents([llama_doc])
print(f"SentenceSplitter produced {len(nodes)} nodes\n")

for i, node in enumerate(nodes[:3]):  # Show first 3
    print(f"Node {i+1}: {node.text[:100]}...")
    print("-" * 50)

In [None]:
# 2. Hierarchical Node Parser (Creates parent-child relationships)
# Useful for "small-to-big" retrieval strategies

hierarchical_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[512, 256, 128],  # Parent → Child → Grandchild
)

hierarchical_nodes = hierarchical_parser.get_nodes_from_documents([llama_doc])
print(f"Hierarchical parser produced {len(hierarchical_nodes)} nodes")

# Show node hierarchy
for node in hierarchical_nodes[:5]:
    parent_id = node.parent_node.node_id if node.parent_node else "None"
    print(f"Node: {node.node_id[:8]}... | Parent: {parent_id[:8] if parent_id != 'None' else 'None'}... | Size: {len(node.text)}")

### Chunking Best Practices

| Factor | Recommendation |
|--------|----------------|
| **Chunk Size** | 256-512 tokens for most use cases; smaller (128-256) for precise retrieval |
| **Overlap** | 10-20% of chunk size to preserve context at boundaries |
| **Strategy** | Start with Recursive/Sentence splitting; use Semantic for topic-heavy docs |
| **Metadata** | Preserve source, page number, section headers for filtering |
| **Evaluation** | Test retrieval quality with different settings |

---
## 4. Embedding Models

Embedding models convert text into dense vector representations that capture semantic meaning.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         EMBEDDING PROCESS                                    │
└─────────────────────────────────────────────────────────────────────────────┘

  Text Input                    Embedding Model                  Vector Output
  ───────────                   ───────────────                  ─────────────
                                                                              
  "Machine learning           ┌─────────────────┐              [0.023, -0.156,
   is a subset of      ────▶  │ Transformer     │  ────▶        0.892, 0.034,
   artificial                 │ Encoder         │               -0.445, 0.721,
   intelligence"              │ (BERT, etc.)    │               ..., 0.089]
                              └─────────────────┘               (d dimensions)

┌─────────────────────────────────────────────────────────────────────────────┐
│                     SIMILARITY IN VECTOR SPACE                               │
└─────────────────────────────────────────────────────────────────────────────┘

                    ▲ Dimension 2
                    │
                    │     • "deep learning"
                    │    ╱
                    │   ╱  (high similarity)
                    │  ╱
                    │ • "machine learning"
                    │
                    │
                    │                    • "cooking recipes"
                    │                   (low similarity)
                    │
                    └──────────────────────────────────────▶ Dimension 1

  Similarity Measures:
  • Cosine Similarity: cos(θ) = (A · B) / (||A|| × ||B||)
  • Euclidean Distance: √(Σ(aᵢ - bᵢ)²)
  • Dot Product: A · B = Σ(aᵢ × bᵢ)
```

In [None]:
# ============================================================================
# Embedding Models Comparison
# ============================================================================

from dataclasses import dataclass
from typing import Optional

@dataclass
class EmbeddingModelInfo:
    name: str
    provider: str
    dimensions: int
    max_tokens: int
    use_case: str
    cost: str

embedding_models = [
    EmbeddingModelInfo("text-embedding-3-large", "OpenAI", 3072, 8191, "High accuracy, production", "$$"),
    EmbeddingModelInfo("text-embedding-3-small", "OpenAI", 1536, 8191, "Cost-effective, good quality", "$"),
    EmbeddingModelInfo("text-embedding-ada-002", "OpenAI", 1536, 8191, "Legacy, widely used", "$"),
    EmbeddingModelInfo("all-MiniLM-L6-v2", "Sentence-Transformers", 384, 256, "Fast, lightweight, local", "Free"),
    EmbeddingModelInfo("all-mpnet-base-v2", "Sentence-Transformers", 768, 384, "High quality, local", "Free"),
    EmbeddingModelInfo("bge-large-en-v1.5", "BAAI", 1024, 512, "SOTA open-source", "Free"),
    EmbeddingModelInfo("e5-large-v2", "Microsoft", 1024, 512, "Excellent retrieval", "Free"),
    EmbeddingModelInfo("voyage-large-2", "Voyage AI", 1536, 16000, "Long context, high quality", "$$"),
    EmbeddingModelInfo("embed-english-v3.0", "Cohere", 1024, 512, "Multilingual support", "$$"),
]

print("Popular Embedding Models:\n")
print(f"{'Model':<25} {'Provider':<20} {'Dims':<8} {'Max Tokens':<12} {'Cost'}")
print("=" * 80)
for model in embedding_models:
    print(f"{model.name:<25} {model.provider:<20} {model.dimensions:<8} {model.max_tokens:<12} {model.cost}")

In [None]:
# ============================================================================
# LangChain Embeddings
# ============================================================================

from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings

# OpenAI Embeddings (requires API key)
def get_openai_embeddings():
    embeddings = OpenAIEmbeddings(
        model="text-embedding-3-small",
        # dimensions=512,  # Can reduce dimensions for efficiency
    )
    return embeddings

# Local HuggingFace Embeddings (free, no API key needed)
def get_local_embeddings():
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={'device': 'cpu'},  # or 'cuda' for GPU
        encode_kwargs={'normalize_embeddings': True},  # For cosine similarity
    )
    return embeddings

# Example: Generate embeddings
# embeddings_model = get_local_embeddings()
# query_embedding = embeddings_model.embed_query("What is machine learning?")
# doc_embeddings = embeddings_model.embed_documents(["Doc 1 text", "Doc 2 text"])
# print(f"Query embedding dimension: {len(query_embedding)}")

In [None]:
# ============================================================================
# LlamaIndex Embeddings
# ============================================================================

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# OpenAI Embedding
def get_llamaindex_openai_embedding():
    embed_model = OpenAIEmbedding(
        model="text-embedding-3-small",
        # dimensions=512,
    )
    return embed_model

# Local HuggingFace Embedding
def get_llamaindex_local_embedding():
    embed_model = HuggingFaceEmbedding(
        model_name="BAAI/bge-small-en-v1.5",
    )
    return embed_model

# Example usage
# embed_model = get_llamaindex_local_embedding()
# embedding = embed_model.get_text_embedding("What is deep learning?")
# print(f"Embedding dimension: {len(embedding)}")

In [None]:
# ============================================================================
# Compute Similarity Between Embeddings
# ============================================================================

import numpy as np
from typing import List

def cosine_similarity(vec1: List[float], vec2: List[float]) -> float:
    """Compute cosine similarity between two vectors."""
    v1, v2 = np.array(vec1), np.array(vec2)
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

def euclidean_distance(vec1: List[float], vec2: List[float]) -> float:
    """Compute Euclidean distance between two vectors."""
    v1, v2 = np.array(vec1), np.array(vec2)
    return np.linalg.norm(v1 - v2)

# Demo with random vectors (replace with actual embeddings)
vec_a = np.random.randn(384).tolist()
vec_b = np.random.randn(384).tolist()

print(f"Cosine Similarity: {cosine_similarity(vec_a, vec_b):.4f}")
print(f"Euclidean Distance: {euclidean_distance(vec_a, vec_b):.4f}")

---
## 5. Vector Stores

Vector stores are specialized databases optimized for storing and querying high-dimensional vectors.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         VECTOR STORE ARCHITECTURE                            │
└─────────────────────────────────────────────────────────────────────────────┘

  Documents                    Vector Store                     Query
  ─────────                    ────────────                     ─────

  ┌─────────┐                 ┌─────────────────────────┐      ┌─────────┐
  │ Doc 1   │──▶ Embed ──▶   │  ┌─────────────────────┐│      │ Query   │
  └─────────┘                 │  │ Vector Index        ││      └────┬────┘
  ┌─────────┐                 │  │ ┌───┐ ┌───┐ ┌───┐  ││           │
  │ Doc 2   │──▶ Embed ──▶   │  │ │ V1│ │ V2│ │ V3│  ││           ▼
  └─────────┘                 │  │ └───┘ └───┘ └───┘  ││       Embed
  ┌─────────┐                 │  │ ┌───┐ ┌───┐ ┌───┐  ││           │
  │ Doc 3   │──▶ Embed ──▶   │  │ │ V4│ │ V5│ │ V6│  ││           ▼
  └─────────┘                 │  │ └───┘ └───┘ └───┘  ││      ┌─────────┐
       ⋮                      │  └─────────────────────┘│      │ Query   │
  ┌─────────┐                 │                          │      │ Vector  │
  │ Doc N   │──▶ Embed ──▶   │  ┌─────────────────────┐│      └────┬────┘
  └─────────┘                 │  │ Metadata Storage    ││           │
                              │  │ • source: file.pdf  ││           ▼
                              │  │ • page: 5           ││   ┌─────────────┐
                              │  │ • date: 2024-01-15  ││   │ ANN Search  │
                              │  └─────────────────────┘│   └──────┬──────┘
                              └─────────────────────────┘          │
                                                                   ▼
                                                          ┌─────────────┐
                                                          │ Top-K Docs  │
                                                          │ + Scores    │
                                                          └─────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                    APPROXIMATE NEAREST NEIGHBOR (ANN)                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Algorithm            Description                      Trade-off             │
│  ─────────            ───────────                      ─────────             │
│  HNSW                 Hierarchical graph navigation    Memory ↑, Speed ↑    │
│  IVF                  Inverted file with clusters      Balanced             │
│  PQ                   Product quantization             Memory ↓, Accuracy ↓ │
│  Flat                 Brute force (exact)              Slow, Accurate       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Vector Store Comparison

| Vector Store | Type | Hosting | Best For | Key Features |
|--------------|------|---------|----------|-------------|
| **FAISS** | Library | Local | Development, high performance | Fast, CPU/GPU support |
| **ChromaDB** | Database | Local/Cloud | Prototyping, simplicity | Easy setup, metadata filtering |
| **Pinecone** | Managed | Cloud | Production, scale | Fully managed, hybrid search |
| **Weaviate** | Database | Self-hosted/Cloud | Semantic search | GraphQL, modules |
| **Qdrant** | Database | Self-hosted/Cloud | Filtering, precision | Rust-based, fast |
| **Milvus** | Database | Self-hosted/Cloud | Large-scale | Distributed, GPU support |
| **pgvector** | Extension | Self-hosted | Postgres users | SQL integration |

In [None]:
# ============================================================================
# LangChain Vector Stores
# ============================================================================

from langchain_community.vectorstores import FAISS, Chroma
from langchain_core.documents import Document

# Create sample documents
documents = [
    Document(page_content="Machine learning enables computers to learn from data.", metadata={"source": "ml_intro", "page": 1}),
    Document(page_content="Deep learning uses neural networks with many layers.", metadata={"source": "dl_intro", "page": 1}),
    Document(page_content="Natural language processing helps computers understand text.", metadata={"source": "nlp_intro", "page": 1}),
    Document(page_content="Computer vision allows machines to interpret images.", metadata={"source": "cv_intro", "page": 1}),
    Document(page_content="Reinforcement learning trains agents through rewards.", metadata={"source": "rl_intro", "page": 1}),
]

print(f"Created {len(documents)} sample documents")

In [None]:
# FAISS Vector Store (requires: pip install faiss-cpu)
# Fast, in-memory, great for development

from langchain_community.embeddings import HuggingFaceEmbeddings

def create_faiss_store(documents: List[Document]):
    """Create a FAISS vector store from documents."""
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2"
    )
    
    # Create vector store
    vectorstore = FAISS.from_documents(documents, embeddings)
    
    # Save to disk (optional)
    # vectorstore.save_local("./faiss_index")
    
    # Load from disk
    # vectorstore = FAISS.load_local("./faiss_index", embeddings)
    
    return vectorstore

# Example usage (uncomment to run)
# faiss_store = create_faiss_store(documents)
# print("FAISS store created")

In [None]:
# ChromaDB Vector Store (requires: pip install chromadb)
# Easy to use, supports persistence, metadata filtering

def create_chroma_store(documents: List[Document], persist_directory: str = None):
    """Create a ChromaDB vector store from documents."""
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2"
    )
    
    if persist_directory:
        vectorstore = Chroma.from_documents(
            documents,
            embeddings,
            persist_directory=persist_directory,
            collection_name="my_collection",
        )
    else:
        vectorstore = Chroma.from_documents(
            documents,
            embeddings,
            collection_name="my_collection",
        )
    
    return vectorstore

# Example usage (uncomment to run)
# chroma_store = create_chroma_store(documents, "./chroma_db")
# print("ChromaDB store created")

In [None]:
# ============================================================================
# LlamaIndex Vector Stores
# ============================================================================

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core import Document as LlamaDocument
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

def create_llamaindex_index(documents: List[LlamaDocument]):
    """Create a LlamaIndex vector store index."""
    # Simple in-memory index
    index = VectorStoreIndex.from_documents(documents)
    return index

def create_llamaindex_chroma_index(documents: List[LlamaDocument], persist_dir: str):
    """Create a LlamaIndex index with ChromaDB backend."""
    # Initialize ChromaDB client
    chroma_client = chromadb.PersistentClient(path=persist_dir)
    chroma_collection = chroma_client.get_or_create_collection("llama_collection")
    
    # Create vector store
    vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    
    # Create index
    index = VectorStoreIndex.from_documents(
        documents,
        storage_context=storage_context,
    )
    return index

# Example usage (uncomment to run)
# llama_docs = [LlamaDocument(text=doc.page_content) for doc in documents]
# index = create_llamaindex_index(llama_docs)
# print("LlamaIndex created")

---
## 6. Retrieval Methods

Different retrieval strategies offer trade-offs between relevance, diversity, and efficiency.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         RETRIEVAL METHODS                                    │
└─────────────────────────────────────────────────────────────────────────────┘

╔═══════════════════════════════════════════════════════════════════════════════╗
║ 1. SIMILARITY SEARCH                                                           ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║                                                                               ║
║   Query Vector ──────────────────────────────┐                               ║
║                                              │                               ║
║                                              ▼                               ║
║   Document Vectors:   D1 ─────── Score: 0.95  ◀── Most Similar              ║
║                       D2 ─────── Score: 0.89                                 ║
║                       D3 ─────── Score: 0.85                                 ║
║                       D4 ─────── Score: 0.72                                 ║
║                       D5 ─────── Score: 0.65                                 ║
║                                                                               ║
║   Return: Top-K by similarity score                                          ║
║   ⚠️ Issue: May return redundant/similar documents                          ║
╚═══════════════════════════════════════════════════════════════════════════════╝

╔═══════════════════════════════════════════════════════════════════════════════╗
║ 2. MAXIMUM MARGINAL RELEVANCE (MMR)                                           ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║                                                                               ║
║   MMR = argmax [ λ · Sim(dᵢ, q) - (1-λ) · max Sim(dᵢ, dⱼ) ]                 ║
║           dᵢ∈R\S                        dⱼ∈S                                 ║
║                                                                               ║
║   λ = 1.0: Pure similarity (no diversity)                                    ║
║   λ = 0.0: Pure diversity (ignore relevance)                                 ║
║   λ = 0.5: Balanced (recommended starting point)                             ║
║                                                                               ║
║                    High Relevance                                             ║
║                         ▲                                                     ║
║                         │    • D1 (selected)                                  ║
║                         │                                                     ║
║                         │  • D2 (similar to D1, skipped)                      ║
║                         │                                                     ║
║                         │           • D3 (selected - diverse)                 ║
║                         │                                                     ║
║                         │       • D4 (selected - diverse)                     ║
║                         │                                                     ║
║   Low Relevance ────────┴──────────────────────────▶ High Diversity          ║
║                                                                               ║
║   ✅ Balances relevance and diversity                                        ║
╚═══════════════════════════════════════════════════════════════════════════════╝

╔═══════════════════════════════════════════════════════════════════════════════╗
║ 3. HYBRID SEARCH (Sparse + Dense)                                             ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║                                                                               ║
║   Query: "python machine learning tutorial"                                  ║
║                    │                                                          ║
║          ┌─────────┴─────────┐                                               ║
║          ▼                   ▼                                               ║
║   ┌─────────────┐     ┌─────────────┐                                        ║
║   │ BM25/TF-IDF │     │   Dense     │                                        ║
║   │  (Sparse)   │     │ Embeddings  │                                        ║
║   │             │     │             │                                        ║
║   │ Keyword     │     │ Semantic    │                                        ║
║   │ Matching    │     │ Similarity  │                                        ║
║   └──────┬──────┘     └──────┬──────┘                                        ║
║          │                   │                                               ║
║          └─────────┬─────────┘                                               ║
║                    ▼                                                          ║
║             ┌─────────────┐                                                   ║
║             │   Fusion    │  (RRF, weighted sum, etc.)                       ║
║             └──────┬──────┘                                                   ║
║                    ▼                                                          ║
║             Combined Results                                                  ║
║                                                                               ║
║   ✅ Best of both: exact keywords + semantic understanding                   ║
╚═══════════════════════════════════════════════════════════════════════════════╝
```

In [None]:
# ============================================================================
# LangChain Retrieval Methods
# ============================================================================

from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.documents import Document

# Create a sample vector store for demonstration
sample_docs = [
    Document(page_content="Python is a high-level programming language known for its simplicity.", metadata={"topic": "python"}),
    Document(page_content="Python is widely used in machine learning and data science.", metadata={"topic": "python"}),
    Document(page_content="Machine learning algorithms learn patterns from data.", metadata={"topic": "ml"}),
    Document(page_content="Deep learning is a subset of machine learning using neural networks.", metadata={"topic": "dl"}),
    Document(page_content="Natural language processing enables computers to understand human language.", metadata={"topic": "nlp"}),
    Document(page_content="Computer vision helps machines interpret visual information from the world.", metadata={"topic": "cv"}),
    Document(page_content="TensorFlow and PyTorch are popular deep learning frameworks.", metadata={"topic": "dl"}),
    Document(page_content="Scikit-learn provides simple machine learning tools in Python.", metadata={"topic": "ml"}),
]

# Note: Uncomment to create actual vector store
# embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
# vectorstore = FAISS.from_documents(sample_docs, embeddings)

In [None]:
# 1. Similarity Search
def similarity_search(vectorstore, query: str, k: int = 4):
    """Retrieve documents by cosine similarity."""
    results = vectorstore.similarity_search(query, k=k)
    return results

# 2. Similarity Search with Scores
def similarity_search_with_scores(vectorstore, query: str, k: int = 4):
    """Retrieve documents with similarity scores."""
    results = vectorstore.similarity_search_with_score(query, k=k)
    # Returns list of (document, score) tuples
    return results

# 3. Maximum Marginal Relevance (MMR)
def mmr_search(vectorstore, query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5):
    """
    Retrieve documents using MMR for diversity.
    
    Args:
        k: Number of documents to return
        fetch_k: Number of documents to initially retrieve
        lambda_mult: 0 = max diversity, 1 = max relevance
    """
    results = vectorstore.max_marginal_relevance_search(
        query,
        k=k,
        fetch_k=fetch_k,
        lambda_mult=lambda_mult,
    )
    return results

# Example usage (requires actual vectorstore)
# query = "What is machine learning in Python?"
# similar_docs = similarity_search(vectorstore, query)
# mmr_docs = mmr_search(vectorstore, query, lambda_mult=0.5)
# print(f"Similarity search returned {len(similar_docs)} docs")
# print(f"MMR search returned {len(mmr_docs)} docs")

In [None]:
# 4. Metadata Filtering
def filtered_search(vectorstore, query: str, filter_dict: Dict[str, Any], k: int = 4):
    """Retrieve documents with metadata filtering."""
    results = vectorstore.similarity_search(
        query,
        k=k,
        filter=filter_dict,  # e.g., {"topic": "ml"}
    )
    return results

# Example: Filter by topic
# filtered_docs = filtered_search(vectorstore, "neural networks", {"topic": "dl"})

In [None]:
# ============================================================================
# Hybrid Search (BM25 + Dense)
# ============================================================================

from langchain.retrievers import BM25Retriever, EnsembleRetriever

def create_hybrid_retriever(documents: List[Document], vectorstore):
    """
    Create a hybrid retriever combining BM25 (sparse) and vector (dense) search.
    """
    # BM25 Retriever (keyword-based)
    bm25_retriever = BM25Retriever.from_documents(documents)
    bm25_retriever.k = 4
    
    # Vector Retriever (semantic)
    vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
    
    # Ensemble Retriever (combines both)
    ensemble_retriever = EnsembleRetriever(
        retrievers=[bm25_retriever, vector_retriever],
        weights=[0.5, 0.5],  # Equal weights; adjust based on use case
    )
    
    return ensemble_retriever

# Example usage
# hybrid_retriever = create_hybrid_retriever(sample_docs, vectorstore)
# results = hybrid_retriever.invoke("Python machine learning")

In [None]:
# ============================================================================
# LlamaIndex Retrieval
# ============================================================================

from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.postprocessor import SimilarityPostprocessor

def llamaindex_retrieval_example(index: VectorStoreIndex, query: str):
    """
    Demonstrate LlamaIndex retrieval options.
    """
    # Create retriever with similarity_top_k
    retriever = VectorIndexRetriever(
        index=index,
        similarity_top_k=5,
    )
    
    # Retrieve nodes
    nodes = retriever.retrieve(query)
    
    # Apply post-processing (filter by similarity threshold)
    processor = SimilarityPostprocessor(similarity_cutoff=0.7)
    filtered_nodes = processor.postprocess_nodes(nodes)
    
    return filtered_nodes

# Example usage
# nodes = llamaindex_retrieval_example(index, "What is deep learning?")

---
## 7. Reranking

Reranking improves retrieval quality by using a more powerful model to re-score initially retrieved documents.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           RERANKING PIPELINE                                 │
└─────────────────────────────────────────────────────────────────────────────┘

  Query: "How does attention mechanism work in transformers?"
         │
         ▼
  ┌─────────────────────────────────────────────────────────────────────────┐
  │                    STAGE 1: Initial Retrieval                           │
  │                    (Bi-Encoder / Embedding Model)                       │
  │                                                                         │
  │   Query ──▶ [Embed] ──▶ Vector Search ──▶ Top-K (e.g., 20 docs)        │
  │                                                                         │
  │   ✓ Fast (single embedding per query)                                  │
  │   ✗ May miss nuanced relevance                                         │
  └─────────────────────────────────────────────────────────────────────────┘
         │
         ▼ Top-20 documents
  ┌─────────────────────────────────────────────────────────────────────────┐
  │                    STAGE 2: Reranking                                    │
  │                    (Cross-Encoder Model)                                │
  │                                                                         │
  │   For each document:                                                    │
  │   ┌─────────────────────────────────────────────────────────────────┐  │
  │   │  [CLS] Query [SEP] Document [SEP]  ──▶  Relevance Score         │  │
  │   └─────────────────────────────────────────────────────────────────┘  │
  │                                                                         │
  │   Doc 1:  0.95  ──▶ Rank 1                                             │
  │   Doc 7:  0.91  ──▶ Rank 2 (moved up!)                                 │
  │   Doc 3:  0.87  ──▶ Rank 3                                             │
  │   Doc 2:  0.82  ──▶ Rank 4 (moved down)                                │
  │   ...                                                                   │
  │                                                                         │
  │   ✓ More accurate relevance scoring                                    │
  │   ✗ Slower (processes query+doc pairs)                                 │
  └─────────────────────────────────────────────────────────────────────────┘
         │
         ▼ Top-K (e.g., 5 docs)
  ┌─────────────────────────────────────────────────────────────────────────┐
  │                    Final Retrieved Documents                            │
  │                    (Higher quality, fewer docs)                         │
  └─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│              BI-ENCODER vs CROSS-ENCODER                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  BI-ENCODER (Retrieval)              CROSS-ENCODER (Reranking)              │
│  ─────────────────────               ────────────────────────               │
│                                                                              │
│  Query ──▶ [Encoder] ──▶ Eq          Query + Doc ──▶ [Encoder] ──▶ Score   │
│  Doc   ──▶ [Encoder] ──▶ Ed                                                 │
│                                                                              │
│  Similarity = cos(Eq, Ed)            Direct relevance prediction            │
│                                                                              │
│  ✓ Pre-compute doc embeddings        ✗ Cannot pre-compute                  │
│  ✓ Very fast at query time           ✗ Slower (O(n) per query)             │
│  ✗ Less accurate                     ✓ More accurate                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

In [None]:
# ============================================================================
# Reranking with Cross-Encoders
# ============================================================================

from sentence_transformers import CrossEncoder
from typing import List, Tuple

class Reranker:
    """Rerank documents using a cross-encoder model."""
    
    def __init__(self, model_name: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"):
        """
        Initialize the reranker.
        
        Popular models:
        - cross-encoder/ms-marco-MiniLM-L-6-v2 (fast, good quality)
        - cross-encoder/ms-marco-MiniLM-L-12-v2 (better quality)
        - BAAI/bge-reranker-base (excellent quality)
        - BAAI/bge-reranker-large (best quality, slower)
        """
        self.model = CrossEncoder(model_name)
    
    def rerank(self, query: str, documents: List[str], top_k: int = 5) -> List[Tuple[str, float]]:
        """
        Rerank documents based on relevance to the query.
        
        Returns:
            List of (document, score) tuples, sorted by relevance.
        """
        # Create query-document pairs
        pairs = [[query, doc] for doc in documents]
        
        # Get relevance scores
        scores = self.model.predict(pairs)
        
        # Sort by score (descending)
        doc_scores = list(zip(documents, scores))
        doc_scores.sort(key=lambda x: x[1], reverse=True)
        
        return doc_scores[:top_k]

# Example usage (uncomment to run - requires sentence-transformers)
# reranker = Reranker()
# query = "What is the attention mechanism?"
# docs = [
#     "Transformers use self-attention to process sequences.",
#     "Python is a programming language.",
#     "Attention allows models to focus on relevant parts of the input.",
#     "Machine learning is a field of AI.",
# ]
# reranked = reranker.rerank(query, docs, top_k=2)
# for doc, score in reranked:
#     print(f"Score: {score:.4f} | {doc}")

In [None]:
# ============================================================================
# LangChain Reranking with Cohere
# ============================================================================

from langchain.retrievers import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

def create_cohere_reranker(base_retriever, top_n: int = 3):
    """
    Create a retriever with Cohere reranking.
    Requires: COHERE_API_KEY environment variable.
    """
    compressor = CohereRerank(top_n=top_n)
    
    compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor,
        base_retriever=base_retriever,
    )
    
    return compression_retriever

# Example usage
# base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})
# reranking_retriever = create_cohere_reranker(base_retriever, top_n=5)
# results = reranking_retriever.invoke("What is deep learning?")

In [None]:
# ============================================================================
# LlamaIndex Reranking
# ============================================================================

from llama_index.core.postprocessor import SentenceTransformerRerank
from llama_index.core import QueryBundle

def llamaindex_rerank_example(nodes, query: str):
    """
    Rerank nodes using a cross-encoder in LlamaIndex.
    """
    reranker = SentenceTransformerRerank(
        model="cross-encoder/ms-marco-MiniLM-L-6-v2",
        top_n=3,
    )
    
    query_bundle = QueryBundle(query_str=query)
    reranked_nodes = reranker.postprocess_nodes(nodes, query_bundle)
    
    return reranked_nodes

# Example: Using reranker in query engine
# from llama_index.core import VectorStoreIndex
# index = VectorStoreIndex.from_documents(documents)
# query_engine = index.as_query_engine(
#     similarity_top_k=10,
#     node_postprocessors=[reranker],
# )
# response = query_engine.query("What is deep learning?")

---
## 8. Evaluation Metrics

Evaluating RAG systems requires metrics for both retrieval quality and generation quality.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                      RAG EVALUATION FRAMEWORK                                │
└─────────────────────────────────────────────────────────────────────────────┘

╔═══════════════════════════════════════════════════════════════════════════════╗
║                         RETRIEVAL METRICS                                      ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║                                                                               ║
║  1. Precision@K = (Relevant docs in top-K) / K                               ║
║     └── Of the K retrieved docs, how many are relevant?                      ║
║                                                                               ║
║  2. Recall@K = (Relevant docs in top-K) / (Total relevant docs)              ║
║     └── Of all relevant docs, how many did we retrieve?                      ║
║                                                                               ║
║  3. Mean Reciprocal Rank (MRR) = (1/|Q|) Σ (1/rank_i)                        ║
║     └── Where does the first relevant doc appear?                            ║
║                                                                               ║
║  4. Normalized DCG (nDCG) = DCG / IDCG                                       ║
║     └── Considers position and graded relevance                              ║
║                                                                               ║
║  5. Hit Rate = (Queries with ≥1 relevant doc in top-K) / |Q|                 ║
║     └── Simple binary: did we find anything relevant?                        ║
║                                                                               ║
╚═══════════════════════════════════════════════════════════════════════════════╝

╔═══════════════════════════════════════════════════════════════════════════════╗
║                       GENERATION METRICS (RAGAS)                               ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║                                                                               ║
║  ┌─────────────────────────────────────────────────────────────────────────┐ ║
║  │                         CONTEXT METRICS                                  │ ║
║  ├─────────────────────────────────────────────────────────────────────────┤ ║
║  │                                                                         │ ║
║  │  Context Precision: Are retrieved chunks relevant to the question?     │ ║
║  │  ┌─────────────────────────────────────────────────────────────────┐   │ ║
║  │  │ Question: "What is the capital of France?"                      │   │ ║
║  │  │ Retrieved: ["Paris is the capital of France", "France wines"]   │   │ ║
║  │  │ Score: 0.5 (1 relevant / 2 total)                               │   │ ║
║  │  └─────────────────────────────────────────────────────────────────┘   │ ║
║  │                                                                         │ ║
║  │  Context Recall: Does retrieved context cover the ground truth?        │ ║
║  │  ┌─────────────────────────────────────────────────────────────────┐   │ ║
║  │  │ Ground Truth: "Paris is the capital and largest city of France" │   │ ║
║  │  │ Retrieved: Contains "Paris is the capital of France"            │   │ ║
║  │  │ Score: High (covers main fact)                                   │   │ ║
║  │  └─────────────────────────────────────────────────────────────────┘   │ ║
║  └─────────────────────────────────────────────────────────────────────────┘ ║
║                                                                               ║
║  ┌─────────────────────────────────────────────────────────────────────────┐ ║
║  │                        ANSWER METRICS                                    │ ║
║  ├─────────────────────────────────────────────────────────────────────────┤ ║
║  │                                                                         │ ║
║  │  Faithfulness: Is the answer grounded in the retrieved context?        │ ║
║  │  ┌─────────────────────────────────────────────────────────────────┐   │ ║
║  │  │ Context: "Paris is the capital of France"                       │   │ ║
║  │  │ Answer: "The capital of France is Paris"                        │   │ ║
║  │  │ Score: 1.0 (fully grounded)                                      │   │ ║
║  │  └─────────────────────────────────────────────────────────────────┘   │ ║
║  │                                                                         │ ║
║  │  Answer Relevancy: Does the answer address the question?               │ ║
║  │  ┌─────────────────────────────────────────────────────────────────┐   │ ║
║  │  │ Question: "What is the capital of France?"                      │   │ ║
║  │  │ Answer: "Paris is a beautiful city with the Eiffel Tower"       │   │ ║
║  │  │ Score: 0.6 (mentions Paris but doesn't directly answer)         │   │ ║
║  │  └─────────────────────────────────────────────────────────────────┘   │ ║
║  │                                                                         │ ║
║  │  Answer Correctness: Is the answer factually correct?                  │ ║
║  │  └── Compares answer against ground truth                              │ ║
║  │                                                                         │ ║
║  └─────────────────────────────────────────────────────────────────────────┘ ║
╚═══════════════════════════════════════════════════════════════════════════════╝
```

In [None]:
# ============================================================================
# Retrieval Metrics Implementation
# ============================================================================

import numpy as np
from typing import List, Set

def precision_at_k(retrieved: List[str], relevant: Set[str], k: int) -> float:
    """
    Calculate Precision@K.
    
    Args:
        retrieved: List of retrieved document IDs
        relevant: Set of relevant document IDs
        k: Number of top results to consider
    """
    retrieved_k = retrieved[:k]
    relevant_in_k = sum(1 for doc in retrieved_k if doc in relevant)
    return relevant_in_k / k

def recall_at_k(retrieved: List[str], relevant: Set[str], k: int) -> float:
    """
    Calculate Recall@K.
    """
    if not relevant:
        return 0.0
    retrieved_k = retrieved[:k]
    relevant_in_k = sum(1 for doc in retrieved_k if doc in relevant)
    return relevant_in_k / len(relevant)

def mean_reciprocal_rank(retrieved: List[str], relevant: Set[str]) -> float:
    """
    Calculate Mean Reciprocal Rank (MRR).
    Returns the reciprocal of the rank of the first relevant document.
    """
    for i, doc in enumerate(retrieved, 1):
        if doc in relevant:
            return 1.0 / i
    return 0.0

def ndcg_at_k(retrieved: List[str], relevance_scores: dict, k: int) -> float:
    """
    Calculate Normalized Discounted Cumulative Gain (nDCG@K).
    
    Args:
        retrieved: List of retrieved document IDs
        relevance_scores: Dict mapping doc_id to relevance score (0-3 scale)
        k: Number of top results to consider
    """
    def dcg(scores: List[float]) -> float:
        return sum(score / np.log2(i + 2) for i, score in enumerate(scores))
    
    # DCG for retrieved documents
    retrieved_scores = [relevance_scores.get(doc, 0) for doc in retrieved[:k]]
    dcg_score = dcg(retrieved_scores)
    
    # Ideal DCG (best possible ranking)
    ideal_scores = sorted(relevance_scores.values(), reverse=True)[:k]
    idcg_score = dcg(ideal_scores)
    
    if idcg_score == 0:
        return 0.0
    return dcg_score / idcg_score

# Example
retrieved_docs = ["doc1", "doc3", "doc2", "doc5", "doc4"]
relevant_docs = {"doc1", "doc2", "doc4"}

print(f"Precision@3: {precision_at_k(retrieved_docs, relevant_docs, 3):.3f}")
print(f"Recall@3: {recall_at_k(retrieved_docs, relevant_docs, 3):.3f}")
print(f"MRR: {mean_reciprocal_rank(retrieved_docs, relevant_docs):.3f}")

In [None]:
# ============================================================================
# RAGAS Evaluation Framework
# ============================================================================

# RAGAS provides comprehensive RAG evaluation metrics
# Install: pip install ragas

from dataclasses import dataclass
from typing import List, Optional

@dataclass
class RAGASMetrics:
    """RAGAS evaluation metrics explanation."""
    
    @staticmethod
    def get_metrics_info():
        metrics = {
            "context_precision": {
                "description": "Measures if retrieved context is relevant to the question",
                "range": "0-1 (higher is better)",
                "needs": "question, contexts",
            },
            "context_recall": {
                "description": "Measures if retrieved context covers the ground truth",
                "range": "0-1 (higher is better)",
                "needs": "contexts, ground_truth",
            },
            "faithfulness": {
                "description": "Measures if answer is grounded in retrieved context",
                "range": "0-1 (higher is better)",
                "needs": "question, contexts, answer",
            },
            "answer_relevancy": {
                "description": "Measures if answer addresses the question",
                "range": "0-1 (higher is better)",
                "needs": "question, answer",
            },
            "answer_correctness": {
                "description": "Measures factual correctness against ground truth",
                "range": "0-1 (higher is better)",
                "needs": "answer, ground_truth",
            },
        }
        return metrics

# Display metrics info
for name, info in RAGASMetrics.get_metrics_info().items():
    print(f"\n{name}:")
    print(f"  Description: {info['description']}")
    print(f"  Range: {info['range']}")
    print(f"  Requires: {info['needs']}")

In [None]:
# ============================================================================
# RAGAS Evaluation Example
# ============================================================================

# Note: This requires RAGAS and an LLM API key (OpenAI recommended)

def run_ragas_evaluation():
    """
    Example of running RAGAS evaluation on a RAG system.
    """
    from ragas import evaluate
    from ragas.metrics import (
        context_precision,
        context_recall,
        faithfulness,
        answer_relevancy,
        answer_correctness,
    )
    from datasets import Dataset
    
    # Prepare evaluation data
    eval_data = {
        "question": [
            "What is machine learning?",
            "How does deep learning work?",
        ],
        "answer": [
            "Machine learning is a subset of AI that enables systems to learn from data.",
            "Deep learning uses neural networks with multiple layers to learn patterns.",
        ],
        "contexts": [
            ["Machine learning is a field of AI where algorithms learn from data."],
            ["Deep learning uses deep neural networks with many hidden layers."],
        ],
        "ground_truth": [
            "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience.",
            "Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers.",
        ],
    }
    
    dataset = Dataset.from_dict(eval_data)
    
    # Run evaluation
    result = evaluate(
        dataset,
        metrics=[
            context_precision,
            context_recall,
            faithfulness,
            answer_relevancy,
            answer_correctness,
        ],
    )
    
    return result

# Example usage (uncomment to run - requires API key)
# results = run_ragas_evaluation()
# print(results)

In [None]:
# ============================================================================
# LlamaIndex Evaluation
# ============================================================================

from llama_index.core.evaluation import (
    FaithfulnessEvaluator,
    RelevancyEvaluator,
    CorrectnessEvaluator,
)

def llamaindex_evaluation_example(query_engine, llm):
    """
    Example of LlamaIndex built-in evaluators.
    """
    # Initialize evaluators
    faithfulness_evaluator = FaithfulnessEvaluator(llm=llm)
    relevancy_evaluator = RelevancyEvaluator(llm=llm)
    
    # Query the system
    query = "What is deep learning?"
    response = query_engine.query(query)
    
    # Evaluate faithfulness (is response grounded in retrieved context?)
    faithfulness_result = faithfulness_evaluator.evaluate_response(
        query=query,
        response=response,
    )
    
    # Evaluate relevancy (does response answer the question?)
    relevancy_result = relevancy_evaluator.evaluate_response(
        query=query,
        response=response,
    )
    
    return {
        "faithfulness": faithfulness_result.passing,
        "relevancy": relevancy_result.passing,
    }

# Example usage (requires actual query_engine and llm)
# results = llamaindex_evaluation_example(query_engine, llm)

---
## Complete RAG Pipeline Example

Putting it all together: a complete RAG implementation.

In [None]:
# ============================================================================
# Complete RAG Pipeline with LangChain
# ============================================================================

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def create_rag_chain(documents: List[Document]):
    """
    Create a complete RAG chain.
    """
    # 1. Split documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,
        chunk_overlap=50,
    )
    chunks = text_splitter.split_documents(documents)
    
    # 2. Create embeddings and vector store
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    vectorstore = FAISS.from_documents(chunks, embeddings)
    
    # 3. Create retriever
    retriever = vectorstore.as_retriever(
        search_type="mmr",
        search_kwargs={"k": 5, "fetch_k": 20},
    )
    
    # 4. Create prompt template
    template = """Answer the question based only on the following context. 
If you cannot answer the question based on the context, say "I don't have enough information to answer this question."

Context:
{context}

Question: {question}

Answer:"""
    
    prompt = ChatPromptTemplate.from_template(template)
    
    # 5. Create LLM
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    
    # 6. Create RAG chain
    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)
    
    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    
    return rag_chain

# Example usage (requires API key)
# chain = create_rag_chain(documents)
# response = chain.invoke("What is machine learning?")
# print(response)

In [None]:
# ============================================================================
# Complete RAG Pipeline with LlamaIndex
# ============================================================================

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

def create_llamaindex_rag(documents):
    """
    Create a complete RAG system with LlamaIndex.
    """
    # 1. Configure settings
    Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
    Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
    Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
    
    # 2. Create index
    index = VectorStoreIndex.from_documents(documents)
    
    # 3. Create query engine with MMR
    query_engine = index.as_query_engine(
        similarity_top_k=5,
        response_mode="compact",  # Options: refine, compact, tree_summarize
    )
    
    return query_engine

# Example usage (requires API key)
# from llama_index.core import Document
# docs = [Document(text="Machine learning is...")]
# engine = create_llamaindex_rag(docs)
# response = engine.query("What is machine learning?")
# print(response)

---
## Summary

### Key Takeaways

| Component | Best Practice |
|-----------|---------------|
| **Document Loading** | Use appropriate loaders for file types; preserve metadata |
| **Chunking** | Start with recursive splitter; 256-512 tokens; 10-20% overlap |
| **Embeddings** | Match embedding model to use case; consider cost vs. quality |
| **Vector Store** | FAISS/Chroma for dev; Pinecone/Qdrant for production |
| **Retrieval** | Use MMR for diversity; hybrid search for keyword + semantic |
| **Reranking** | Add cross-encoder for quality boost; worth the latency cost |
| **Evaluation** | Use RAGAS metrics; test on representative queries |

### Advanced Topics (Not Covered)

- **Query Transformation**: Query rewriting, HyDE, multi-query
- **Advanced Retrieval**: Parent-child, sentence-window, auto-merging
- **Agentic RAG**: Tool-using agents for complex queries
- **Knowledge Graphs**: Combining structured + unstructured retrieval
- **Fine-tuning**: Embedding and reranker fine-tuning for domains

---
## References

1. Lewis, P. et al. (2020). [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)
2. [LangChain Documentation](https://python.langchain.com/docs/)
3. [LlamaIndex Documentation](https://docs.llamaindex.ai/)
4. [RAGAS: Evaluation Framework for RAG](https://docs.ragas.io/)
5. [Sentence Transformers](https://www.sbert.net/)
6. Gao, Y. et al. (2023). [Retrieval-Augmented Generation for Large Language Models: A Survey](https://arxiv.org/abs/2312.10997)