# Production RAG

**From Notebook to Production API**

Welcome! This lesson teaches you how to transform RAG code from a notebook into a production-ready API.

## Lesson Flow:

### Part 1: Learn RAG in a Notebook
- Build a complete RAG system step-by-step
- Understand retrieval + generation
- See the limitations of notebook code

### Part 2: Refactor to Clean Python
- Organize code into modular classes
- Separate concerns (loading, retrieval, generation)
- Make it reusable

### Part 3: Build an API
- Create FastAPI endpoints
- Test the API
- Deploy-ready structure

Let's begin!

## Part 1: Learn RAG in a Notebook

**ðŸ‘‰ Open and work through: `messy_rag_notebook.ipynb`**

This notebook teaches you:
- How to load and chunk documents
- How to create embeddings
- How to store in a vector database
- How to retrieve relevant chunks
- **How to generate answers with an LLM**

Complete that notebook first, then come back here!

## Part 2: Understanding the Clean Code Structure

After completing the messy notebook, you saw it works but has problems:
- Everything hardcoded
- Can't reuse the code
- No API for apps to use

Let's see how we refactored it into **clean, modular Python**!

### The New Structure: 3 Simple Classes

We split the RAG system into 3 focused classes:

**1. DocumentProcessor**
```python
class DocumentProcessor:
    def load_documents(folder) -> List[Document]
    def chunk_documents(documents) -> List[Document]
```
- Handles document loading
- Handles text chunking
- One job: prepare documents

**2. VectorStoreManager**
```python
class VectorStoreManager:
    def create_vectorstore(chunks)
    def load_vectorstore()
    def search(query, top_k) -> Dict
    def get_retriever()
```
- Creates embeddings
- Manages ChromaDB
- Searches for relevant chunks

**3. RAGGenerator**
```python
class RAGGenerator:
    def query(question) -> Dict
```
- Combines retrieval + LLM
- Generates answers
- Uses the other two classes

**Why is this better?**
- Each class has ONE job
- Easy to understand
- Easy to test
- Easy to reuse

### Let's Use the Clean Code!

See how much simpler it is to use:

In [2]:
from rag_core import DocumentProcessor, VectorStoreManager, RAGGenerator

# Step 1: Process documents
doc_processor = DocumentProcessor()
documents = doc_processor.load_documents("documents")
chunks = doc_processor.chunk_documents(documents)

âœ… Loaded 3 documents
âœ… Created 10 chunks


In [None]:
# Step 2: Create vector store
vectorstore_manager = VectorStoreManager()
vectorstore_manager.create_vectorstore(chunks)

In [3]:
# Step 2: Create vector store
vectorstore_manager = VectorStoreManager()
vectorstore_manager.load_vectorstore()

Loading embedding model: text-embedding-3-small...
âœ… Embedding model loaded!
Loading existing vector store...


  self.vectorstore = Chroma(


âœ… Vector store loaded!


In [4]:
# Step 3: Create RAG generator
rag_generator = RAGGenerator(vectorstore_manager)

âœ… OpenAI LLM initialized (gpt-3.5-turbo)!
âœ… RAG chain built!


In [5]:
# Step 4: Ask questions!
result = rag_generator.query("What is machine learning?")
print(f"Q: {result['question']}")
print(f"A: {result['answer']}")

Q: What is machine learning?
A: Machine learning is a subset of artificial intelligence that enables computers to learn from data without being explicitly programmed. It focuses on developing algorithms that can identify patterns and make decisions based on input data.


### Even Simpler: Use the Helper Function

In [6]:
from rag_core import build_rag_system

# Build everything in one call
doc_processor, vectorstore_manager, rag_generator = build_rag_system("documents")

# Now you can use any component
result = rag_generator.query("How do REST APIs work?")
print(result['answer'])


BUILDING RAG SYSTEM

âœ… Loaded 3 documents
âœ… Created 10 chunks
Loading embedding model: text-embedding-3-small...
âœ… Embedding model loaded!
Creating vector store with embeddings...
âœ… Vector store created with 10 chunks!
âœ… OpenAI LLM initialized (gpt-3.5-turbo)!
âœ… RAG chain built!

RAG SYSTEM READY!

REST APIs work by using standard HTTP methods such as GET, POST, PUT, and DELETE to interact with resources on a server. These methods allow clients to retrieve data, send new data, update existing data, and remove data from the server. Additionally, RESTful APIs communicate results through HTTP status codes such as 200 for success, 201 for created, 400 for bad request, 401 for unauthorized, 404 for not found, and 500 for server error.


## Part 3: Building the API

Now let's make it accessible as an API!

### Step 1: Build the Index

Run this in your terminal:
```bash
python build_index.py
```

This creates the vector database that the API will use.

### Step 2: Start the API

Run this in your terminal:
```bash
python api.py
```

Or:
```bash
uvicorn api:app --reload
```

The API will start at: http://localhost:8000

### Step 3: Test the API

**Option 1: Use the built-in docs**

Open in your browser:
- http://localhost:8000/docs (Swagger UI)

**Option 2: Use Python**

In [7]:
import requests

# Query endpoint
response = requests.post(
    "http://localhost:8000/query",
    json={"question": "What is machine learning?"}
)

result = response.json()
print(f"Q: {result['question']}")
print(f"A: {result['answer']}")

Q: What is machine learning?
A: Machine learning is a field of artificial intelligence where algorithms are used to learn from data and make predictions or decisions without being explicitly programmed. It involves training algorithms to recognize patterns and make decisions based on that data.


In [None]:
# Search endpoint (retrieval only)
response = requests.post(
    "http://localhost:8000/search",
    json={"question": "What are Python data types?", "top_k": 2}
)

result = response.json()
print(f"Found {len(result['results'])} results:")
for i, doc in enumerate(result['results'], 1):
    print(f"{i}. {doc['text'][:100]}...")

In [None]:
# Stats endpoint
response = requests.get("http://localhost:8000/stats")
print(response.json())

## Summary: What You Built

Congratulations! You've transformed a messy notebook into a production-ready system:

### Part 1: Notebook RAG
- âœ… Learned RAG fundamentals
- âœ… Built a working system
- âœ… Understood the limitations

### Part 2: Clean Python Code
- âœ… 3 modular classes
- âœ… Reusable components
- âœ… Clear separation of concerns

### Part 3: Production API
- âœ… FastAPI endpoints
- âœ… Easy to test
- âœ… Ready for deployment

## File Structure

```
Simple_RAG_API_Workshop/
â”œâ”€â”€ messy_rag_notebook.ipynb      # Learn RAG here
â”œâ”€â”€ rag_core.py                   # 3 clean classes
â”œâ”€â”€ build_index.py                # Build the index
â”œâ”€â”€ api.py                        # FastAPI app
â”œâ”€â”€ requirements.txt              # Dependencies
â”œâ”€â”€ .env                          # API keys
â””â”€â”€ documents/                    # Your documents
```

## Next Steps

Want to improve your RAG system?

1. **Try different chunk sizes** in `DocumentProcessor`
2. **Use different embedding models** in `VectorStoreManager`
3. **Experiment with prompts** in `RAGGenerator`
4. **Add more documents** to your knowledge base
5. **Add authentication** to the API
6. **Deploy** to a cloud platform

You now understand how to build production RAG systems! ðŸš€