# Day 11: RAG ‚Äî Retrieval-Augmented Generation

LLMs are powerful, but they have a problem:

They don't know **your** data.

**RAG** solves this: retrieve relevant context first, then generate a response.

## The RAG Pipeline

```
User Query ‚Üí Embed ‚Üí Search Documents ‚Üí Retrieve Top-K ‚Üí Send to LLM ‚Üí Response
```

The LLM generates answers **grounded** in your documents.

## Setup

In [33]:
from google import genai
from google.genai import types
import os
from dotenv import load_dotenv
import numpy as np

load_dotenv(dotenv_path='../.env')
API_KEY = os.environ["GEMINI_API_KEY"]
client = genai.Client(api_key=API_KEY)

def cosine_similarity(vec1, vec2):
    vec1 = np.array(vec1)
    vec2 = np.array(vec2)
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

## Knowledge Base

Our "private data" ‚Äî information the LLM doesn't know.

In [34]:
knowledge_base = [
    "Our company's refund policy allows returns within 30 days of purchase.",
    "Premium members get free shipping on all orders over $25.",
    "Customer support is available Monday to Friday, 9 AM to 6 PM EST.",
    "Gift cards can be purchased in denominations of $25, $50, and $100.",
    "Orders are typically delivered within 3-5 business days."
]

print("üìö Knowledge Base:")
for i, doc in enumerate(knowledge_base):
    print(f"  {i+1}. {doc}")

üìö Knowledge Base:
  1. Our company's refund policy allows returns within 30 days of purchase.
  2. Premium members get free shipping on all orders over $25.
  3. Customer support is available Monday to Friday, 9 AM to 6 PM EST.
  4. Gift cards can be purchased in denominations of $25, $50, and $100.
  5. Orders are typically delivered within 3-5 business days.


## Index the Knowledge Base

In [35]:
# Generate embeddings for all documents
kb_embeddings = []
for doc in knowledge_base:
    response = client.models.embed_content(
        model="gemini-embedding-001",
        contents=doc
    )
    kb_embeddings.append(response.embeddings[0].values)

print(f"‚úÖ Indexed {len(kb_embeddings)} documents")

‚úÖ Indexed 5 documents


## Retrieval Function

In [36]:
def retrieve(query, top_k=2):
    """Find the most relevant documents for a query."""
    query_response = client.models.embed_content(
        model="gemini-embedding-001",
        contents=query
    )
    query_embedding = query_response.embeddings[0].values
    
    scores = []
    for i, doc_emb in enumerate(kb_embeddings):
        similarity = cosine_similarity(query_embedding, doc_emb)
        scores.append((similarity, i))
    
    scores.sort(reverse=True)
    
    return [knowledge_base[idx] for _, idx in scores[:top_k]]

## RAG Function

This is the core: **Retrieve** context, then **Generate** a response.

In [37]:
def rag(query):
    """Retrieval-Augmented Generation."""
    
    # Step 1: Retrieve relevant documents
    relevant_docs = retrieve(query, top_k=2)
    
    # Step 2: Build the context
    context = "\n".join(f"- {doc}" for doc in relevant_docs)
    
    # Step 3: Create the prompt with context
    prompt = f"""Answer the user's question based ONLY on the following context.
If the answer is not in the context, say "I don't have that information."

Context:
{context}

Question: {query}

Answer:"""
    
    # Step 4: Generate response
    response = client.models.generate_content(
        model="gemini-2.5-flash-lite",
        contents=prompt
    )
    
    return {
        "answer": response.text,
        "sources": relevant_docs
    }

## Test: Question About Refunds

In [38]:
query = "Can I return a product I bought 2 weeks ago?"

print(f"‚ùì Question: {query}\n")
result = rag(query)

print(f"üí¨ Answer: {result['answer']}")
print(f"\nüìÑ Sources:")
for doc in result['sources']:
    print(f"  ‚Ä¢ {doc}")

‚ùì Question: Can I return a product I bought 2 weeks ago?

üí¨ Answer: Yes, you can return a product you bought 2 weeks ago.

üìÑ Sources:
  ‚Ä¢ Our company's refund policy allows returns within 30 days of purchase.
  ‚Ä¢ Orders are typically delivered within 3-5 business days.


## Test: Question About Shipping

In [39]:
query = "How long does shipping take?"

print(f"‚ùì Question: {query}\n")
result = rag(query)

print(f"üí¨ Answer: {result['answer']}")
print(f"\nüìÑ Sources:")
for doc in result['sources']:
    print(f"  ‚Ä¢ {doc}")

‚ùì Question: How long does shipping take?

üí¨ Answer: Orders are typically delivered within 3-5 business days.

üìÑ Sources:
  ‚Ä¢ Orders are typically delivered within 3-5 business days.
  ‚Ä¢ Premium members get free shipping on all orders over $25.


## Test: Question Not in Knowledge Base

In [40]:
query = "What is the capital of France?"

print(f"‚ùì Question: {query}\n")
result = rag(query)

print(f"üí¨ Answer: {result['answer']}")
print(f"\nüìÑ Sources (retrieved but not relevant):")
for doc in result['sources']:
    print(f"  ‚Ä¢ {doc}")

‚ùì Question: What is the capital of France?

üí¨ Answer: I don't have that information.

üìÑ Sources (retrieved but not relevant):
  ‚Ä¢ Customer support is available Monday to Friday, 9 AM to 6 PM EST.
  ‚Ä¢ Orders are typically delivered within 3-5 business days.


## Why RAG Works

1. **Grounded Responses** ‚Äî Answers based on actual data, not hallucinations
2. **Up-to-date** ‚Äî Just update the knowledge base, no retraining
3. **Transparent** ‚Äî You can show the sources used
4. **Private** ‚Äî Your data stays in your system

---

## Key Takeaways

1. **RAG = Retrieve + Generate**
2. Embed and index your documents **once**
3. For each query: retrieve relevant context, then prompt the LLM
4. Always provide sources for transparency

---

**Next:** Day 12 ‚Äî Chunking: Handling large documents