# Lab 2: Retrieval-Augmented Generation (RAG)

In this lab, you'll build a **RAG system** that can answer questions based on your own documents - completely locally and free.

## What is RAG?
RAG combines the power of LLMs with a knowledge base. Instead of relying only on what the model learned during training, RAG:
1. **Retrieves** relevant documents from a database
2. **Augments** the prompt with this context
3. **Generates** an answer based on the retrieved information

## What You'll Learn
- Creating embeddings with local models
- Storing vectors in ChromaDB
- Semantic search and retrieval
- Building a complete RAG pipeline

## Prerequisites
```bash
pip install chromadb ollama sentence-transformers
ollama pull llama3.2
ollama pull nomic-embed-text
```

## 1. Setup

In [None]:
!pip install chromadb ollama -q

In [None]:
import chromadb
import ollama
from typing import List

print("ChromaDB version:", chromadb.__version__)

## 2. Understanding Embeddings

Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts have similar vectors.

In [None]:
# Create embeddings using Ollama
def get_embedding(text: str) -> List[float]:
    """Get embedding vector for text using Ollama."""
    response = ollama.embeddings(
        model='nomic-embed-text',
        prompt=text
    )
    return response['embedding']

# Test it
test_embedding = get_embedding("Hello, world!")
print(f"Embedding dimension: {len(test_embedding)}")
print(f"First 5 values: {test_embedding[:5]}")

In [None]:
# Demonstrate semantic similarity
import numpy as np

def cosine_similarity(a: List[float], b: List[float]) -> float:
    """Calculate cosine similarity between two vectors."""
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare similar and different sentences
sentences = [
    "The cat sat on the mat",
    "A feline rested on the rug",  # Similar meaning
    "Python is a programming language"  # Different topic
]

embeddings = [get_embedding(s) for s in sentences]

print("Similarity scores:")
print(f"'{sentences[0]}' vs '{sentences[1]}': {cosine_similarity(embeddings[0], embeddings[1]):.4f}")
print(f"'{sentences[0]}' vs '{sentences[2]}': {cosine_similarity(embeddings[0], embeddings[2]):.4f}")

## 3. Setting Up ChromaDB

In [None]:
# Create a ChromaDB client (in-memory for this demo)
client = chromadb.Client()

# For persistent storage, use:
# client = chromadb.PersistentClient(path="./chroma_db")

In [None]:
# Create a custom embedding function for ChromaDB
class OllamaEmbeddingFunction:
    def __init__(self, model_name: str = "nomic-embed-text"):
        self.model_name = model_name
    
    def __call__(self, input: List[str]) -> List[List[float]]:
        embeddings = []
        for text in input:
            response = ollama.embeddings(
                model=self.model_name,
                prompt=text
            )
            embeddings.append(response['embedding'])
        return embeddings

# Create embedding function
embed_fn = OllamaEmbeddingFunction()

In [None]:
# Create a collection
collection = client.create_collection(
    name="knowledge_base",
    embedding_function=embed_fn
)

print(f"Created collection: {collection.name}")

## 4. Adding Documents to the Knowledge Base

In [None]:
# Sample knowledge base about a fictional company
documents = [
    "TechCorp was founded in 2020 by Jane Smith and John Doe in San Francisco.",
    "TechCorp specializes in AI-powered productivity tools for remote teams.",
    "The company's flagship product is TeamFlow, a project management platform.",
    "TeamFlow uses machine learning to predict project timelines and identify bottlenecks.",
    "TechCorp has 150 employees across offices in San Francisco, New York, and London.",
    "The company raised $50 million in Series B funding in 2023.",
    "TechCorp's annual revenue exceeded $20 million in 2023.",
    "Jane Smith serves as CEO while John Doe is the CTO of TechCorp.",
    "TeamFlow integrates with popular tools like Slack, GitHub, and Jira.",
    "TechCorp offers a free tier for teams up to 10 members."
]

# Add documents to collection
collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))]
)

print(f"Added {len(documents)} documents to the collection")
print(f"Collection count: {collection.count()}")

## 5. Semantic Search

In [None]:
# Query the collection
def search(query: str, n_results: int = 3):
    """Search the knowledge base for relevant documents."""
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )
    return results

# Test search
query = "Who founded the company?"
results = search(query)

print(f"Query: {query}\n")
print("Retrieved documents:")
for i, doc in enumerate(results['documents'][0]):
    distance = results['distances'][0][i]
    print(f"  {i+1}. (distance: {distance:.4f}) {doc}")

In [None]:
# Try different queries
test_queries = [
    "How much funding did they raise?",
    "What does the product do?",
    "Where are the offices located?"
]

for query in test_queries:
    results = search(query, n_results=2)
    print(f"\nQuery: {query}")
    for doc in results['documents'][0]:
        print(f"  â†’ {doc}")

## 6. Building the RAG Pipeline

In [None]:
def rag_query(question: str, n_context: int = 3) -> str:
    """
    Answer a question using RAG:
    1. Retrieve relevant documents
    2. Build context from documents
    3. Generate answer using LLM
    """
    # Step 1: Retrieve relevant documents
    results = collection.query(
        query_texts=[question],
        n_results=n_context
    )
    
    # Step 2: Build context
    context = "\n".join(results['documents'][0])
    
    # Step 3: Generate answer
    prompt = f"""Answer the question based only on the following context. If the answer is not in the context, say "I don't have information about that."

Context:
{context}

Question: {question}

Answer:"""
    
    response = ollama.chat(
        model='llama3.2',
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    return response['message']['content'], results['documents'][0]

# Test the RAG pipeline
question = "Who are the founders of TechCorp and what are their roles?"
answer, sources = rag_query(question)

print(f"Question: {question}\n")
print(f"Answer: {answer}\n")
print("Sources used:")
for source in sources:
    print(f"  - {source}")

In [None]:
# Test with various questions
questions = [
    "What is TeamFlow?",
    "How many employees does TechCorp have?",
    "What tools does TeamFlow integrate with?",
    "What is TechCorp's revenue?",
    "What is the weather like in San Francisco?"  # Not in knowledge base
]

for q in questions:
    answer, _ = rag_query(q)
    print(f"Q: {q}")
    print(f"A: {answer}\n")

## 7. Advanced: Document Chunking

In [None]:
def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
    """
    Split text into overlapping chunks.
    Useful for long documents.
    """
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        chunks.append(chunk)
        start = end - overlap
    return chunks

# Example with a longer document
long_document = """
TechCorp's Journey: A Success Story

TechCorp began as a small startup in a garage in San Francisco. Jane Smith, a former Google engineer, 
and John Doe, a Stanford PhD graduate, shared a vision of making remote work more efficient. They 
bootstrapped the company for the first year, building their MVP while working part-time jobs.

The breakthrough came in early 2021 when they launched TeamFlow beta. Within six months, they had 
10,000 active users and caught the attention of venture capitalists. Their Series A round of $15 
million allowed them to expand the team from 5 to 50 employees.

By 2023, TeamFlow had become the go-to project management tool for tech startups. The platform's 
AI-powered features, including smart scheduling and resource allocation, set it apart from competitors. 
The Series B funding of $50 million valued the company at $200 million.

Looking ahead, TechCorp plans to expand into enterprise markets and launch new products focused on 
AI-assisted coding and documentation. Jane Smith envisions a future where AI handles routine tasks, 
freeing humans to focus on creative and strategic work.
"""

chunks = chunk_text(long_document, chunk_size=300, overlap=30)
print(f"Split into {len(chunks)} chunks:\n")
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1} ({len(chunk)} chars):")
    print(chunk[:100] + "...\n")

## 8. Adding Metadata

In [None]:
# Create a new collection with metadata
docs_collection = client.create_collection(
    name="docs_with_metadata",
    embedding_function=embed_fn
)

# Add documents with metadata
docs_collection.add(
    documents=[
        "TechCorp Q1 2023 revenue was $4.5 million",
        "TechCorp Q2 2023 revenue was $5.2 million",
        "TechCorp Q3 2023 revenue was $5.8 million",
        "TechCorp Q4 2023 revenue was $6.1 million",
        "Product roadmap includes AI assistant feature in Q1 2024",
        "Product roadmap includes mobile app in Q2 2024",
    ],
    ids=["q1_rev", "q2_rev", "q3_rev", "q4_rev", "roadmap_1", "roadmap_2"],
    metadatas=[
        {"type": "financial", "quarter": "Q1", "year": 2023},
        {"type": "financial", "quarter": "Q2", "year": 2023},
        {"type": "financial", "quarter": "Q3", "year": 2023},
        {"type": "financial", "quarter": "Q4", "year": 2023},
        {"type": "roadmap", "quarter": "Q1", "year": 2024},
        {"type": "roadmap", "quarter": "Q2", "year": 2024},
    ]
)

print("Added documents with metadata")

In [None]:
# Query with metadata filter
results = docs_collection.query(
    query_texts=["What was the revenue?"],
    n_results=5,
    where={"type": "financial"}  # Only search financial documents
)

print("Financial documents only:")
for doc, meta in zip(results['documents'][0], results['metadatas'][0]):
    print(f"  [{meta['quarter']} {meta['year']}] {doc}")

In [None]:
# Query roadmap documents
results = docs_collection.query(
    query_texts=["What features are planned?"],
    n_results=5,
    where={"type": "roadmap"}
)

print("Roadmap documents only:")
for doc, meta in zip(results['documents'][0], results['metadatas'][0]):
    print(f"  [{meta['quarter']} {meta['year']}] {doc}")

## 9. Persistent Storage

In [None]:
# For production use, persist the database to disk
# Uncomment to use persistent storage:

# persistent_client = chromadb.PersistentClient(path="./chroma_db")
# 
# persistent_collection = persistent_client.get_or_create_collection(
#     name="my_knowledge_base",
#     embedding_function=embed_fn
# )
# 
# # Add documents (they will persist across restarts)
# persistent_collection.add(
#     documents=["Your documents here"],
#     ids=["doc_1"]
# )

print("See comments above for persistent storage example")

## Summary

In this lab, you learned how to:
- Create embeddings using local models (Ollama)
- Store and search vectors with ChromaDB
- Understand semantic similarity
- Build a complete RAG pipeline
- Chunk long documents
- Use metadata for filtered searches
- Persist your vector database

**Key takeaways:**
- RAG allows LLMs to answer questions about your specific data
- Embeddings capture semantic meaning, enabling similarity search
- ChromaDB provides an easy-to-use vector database
- Everything runs locally - no API costs!

**Next Lab:** Model Customization with QLoRA