# Module 7: Introduction to Orchestration Frameworks

## Overview

In Modules 1-6, you built RAG systems from scratch to understand every component deeply. Now we'll explore **orchestration frameworks** that simplify RAG development.

**Learning Objectives:**
1. Understand why orchestration frameworks exist
2. Compare LangChain and LlamaIndex
3. Learn when to use frameworks vs custom code
4. Explore core framework components


## 1. Why Use Orchestration Frameworks?

### The Problem with Custom Implementations

When you built RAG from scratch (Module 6), you wrote code for:
- Document loading (PDF, TXT, etc.)
- Text chunking strategies
- Embedding generation
- Vector store integration
- Retrieval logic
- Prompt construction
- LLM API calls

**This is valuable for learning, but time-consuming for production.**

### What Frameworks Provide

1. **Abstraction**: Pre-built components for common tasks
2. **Integrations**: Out-of-the-box support for 100+ tools (LLMs, vector stores, loaders)
3. **Best Practices**: Battle-tested implementations
4. **Community**: Shared knowledge, examples, and support
5. **Speed**: Prototype RAG systems in minutes instead of hours

## 2. Framework Comparison: LangChain vs LlamaIndex

### LangChain

**Philosophy:** Modular components for building LLM-powered applications

**Strengths:**
- Rich ecosystem (agents, tools, memory)
- Highly flexible chain and workflow composition
- Excellent for multi-step reasoning and tool use
- Strong community and integrations

**Use Cases:**
- Agent-driven applications
- Complex multi-step LLM workflows
- Chatbots with state and memory
- Systems requiring external tool/API integration

**Core Concepts:**
- **Chains**: Structured sequences of LLM operations
- **Retrievers**: Fetch relevant documents
- **Document Loaders**: Load data from sources
- **Text Splitters**: Chunk documents

---

### LlamaIndex

**Philosophy:** Data framework for connecting LLMs to your data

**Strengths:**
- Advanced retrieval and indexing options
- Clean, intuitive API design
- Excellent handling of both structured and unstructured data
- Powerful query engines and RAG optimizations

**Use Cases:**
- Document question-answering
- Semantic and hybrid search
- Data-centric applications
- Quick RAG prototypes

**Core Concepts:**
- **Indices**: Organized data structures (VectorStoreIndex, TreeIndex)
- **Query Engines**: Execute queries across indices
- **Readers**: Load data from sources
- **Node Parsers**: Chunk and structure documents

| Feature | LangChain | LlamaIndex |
|---------|-----------|------------|
| **Primary Focus** | LLM orchestration (agents, tools, workflows) | Data indexing, retrieval, and RAG optimization |
| **Design Philosophy** | Modular, flexible, component-driven | Data-centric, simple, retrieval-first |
| **Best For** | Agents, tool-using systems, complex pipelines | Question-answering, semantic search, pure RAG |
| **Learning Curve** | Medium‚ÄìHigh | Low‚ÄìMedium |
| **Community Size** | Very large | Fast-growing |
| **GitHub Stars** | 90k+ | 30k+ |
| **Documentation** | Extensive but sometimes scattered | Clean, focused, very practical |
| **API Complexity** | Verbose and highly customizable | Concise and easy to prototype |
| **Integrations** | 100+ tools, models, services | Fewer integrations but deep RAG features |
| **Version Stability** | Frequent updates & breaking changes | Generally more stable across versions |


## 3. Quick Comparison Example

Let's see the same RAG task in both frameworks and compare to custom code.

In [None]:
# Setup
import os
from dotenv import load_dotenv

load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")

if not openai_api_key:
    print("‚ö†Ô∏è Warning: OPENAI_API_KEY not found. Set it in .env file.")
else:
    print("‚úÖ API key loaded successfully")

In [None]:
# Sample documents for all examples
documents = [
    "Python is a high-level programming language known for readability and simplicity.",
    "Machine learning is a subset of AI that enables systems to learn from data.",
    "RAG combines retrieval and generation to provide accurate, grounded responses."
]

### Approach 1: Custom Implementation (From Module 6)

In [None]:
# Custom RAG (simplified from Module 6)
from sentence_transformers import SentenceTransformer
import numpy as np

# 1. Generate embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
doc_embeddings = model.encode(documents)

In [None]:
# 2. Query and retrieve
query = "What is RAG?"
query_embedding = model.encode([query])[0]

In [None]:
# 3. Compute similarity
similarities = np.dot(doc_embeddings, query_embedding)
top_idx = np.argmax(similarities)
retrieved_doc = documents[top_idx]

In [None]:
retrieved_doc

In [None]:
# 4. Generate response
client = OpenAI(api_key=openai_api_key)
prompt = f"""Context: {retrieved_doc}

Question: {query}

Answer based on the context:"""

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt}],
    temperature=1
)

In [None]:
response

In [None]:
print("Custom RAG Answer:")
print(response.choices[0].message.content)

In [None]:
# Custom RAG (simplified from Module 6)
from sentence_transformers import SentenceTransformer
import numpy as np
from openai import OpenAI

# 1. Generate embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
doc_embeddings = model.encode(documents)

# 2. Query and retrieve
query = "What is RAG?"
query_embedding = model.encode([query])[0]

# 3. Compute similarity
similarities = np.dot(doc_embeddings, query_embedding)
top_idx = np.argmax(similarities)
retrieved_doc = documents[top_idx]

# 4. Generate response
client = OpenAI(api_key=openai_api_key)
prompt = f"""Context: {retrieved_doc}

Question: {query}

Answer based on the context:"""

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt}],
    temperature=0
)

print("Custom RAG Answer:")
print(response.choices[0].message.content)

**Lines of code:** ~20 lines

**Pros:**
- Full control over every step
- No external dependencies (except libraries)
- Easy to debug

**Cons:**
- Manual handling of embeddings, similarity, prompts
- Need to write boilerplate for each component
- Scaling requires more custom code

### Approach 2: LangChain Implementation with FAISS

pip install langchain langchain-core langchain-community langchain-openai langchain-text-splitters
pip install faiss-cpu python-dotenv

In [None]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.document import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

# Convert documents
lc_docs = [Document(page_content=doc) for doc in documents]

# Vector store
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
vectorstore = FAISS.from_documents(lc_docs, embeddings)
retriever = vectorstore.as_retriever()

# LLM
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    openai_api_key=openai_api_key
)

# Prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert assistant. Use ONLY the retrieved context."),
    ("human", "{question}\n\nContext:\n{context}")
])

# Build RAG pipeline
rag_chain = (
    RunnableParallel(context=retriever, question=RunnablePassthrough())
    | prompt
    | llm
)

# Query
response = rag_chain.invoke("What is RAG?")
print(response)


### Approach 2: LangChain Implementation with Chroma DB

In [None]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.docstore.document import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

# Convert documents
lc_docs = [Document(page_content=doc) for doc in documents]

# Vector store (Chroma)
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# persist_directory allows saving DB locally; optional
vectorstore = Chroma.from_documents(
    lc_docs,
    embeddings,
    collection_name="my_rag_collection",
    persist_directory="./chroma_db"  # optional
)

retriever = vectorstore.as_retriever()

# LLM
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    openai_api_key=openai_api_key
)

# Prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert assistant. Use ONLY the retrieved context."),
    ("human", "{question}\n\nContext:\n{context}")
])

# Build RAG pipeline
rag_chain = (
    RunnableParallel(context=retriever, question=RunnablePassthrough())
    | prompt
    | llm
)

# Query
response = rag_chain.invoke("What is RAG?")
print(response)


In [None]:
response.content

**Lines of code:** ~10 lines

**Pros:**
- Abstracted embedding and retrieval
- Built-in chain composition
- Easy to swap components (different LLMs, vector stores)

**Cons:**
- Less visibility into internal steps
- Framework dependency
- Learning curve for LangChain concepts

### Approach 3: LlamaIndex Implementation

In [None]:
!pip install llama-index

In [None]:
from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI as LlamaOpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure LlamaIndex
Settings.llm = LlamaOpenAI(model="gpt-3.5-turbo", temperature=0, api_key=openai_api_key)
Settings.embed_model = OpenAIEmbedding(api_key=openai_api_key)

# Create documents and index
llama_docs = [Document(text=doc) for doc in documents]
index = VectorStoreIndex.from_documents(llama_docs)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is RAG?")

print("LlamaIndex Answer:")
print(response.response)

**Lines of code:** ~7 lines

**Pros:**
- Simplest API (most concise)
- Optimized for data indexing
- Intuitive for RAG-specific tasks

**Cons:**
- Less flexible for non-RAG workflows
- Smaller ecosystem compared to LangChain
- Framework dependency

## 4. When to Use What?

### Use Custom Code When:
- Learning RAG fundamentals (you already did this!)
- Simple, specific use case with unique requirements
- You need full control over every component
- Minimizing dependencies is critical
- Performance optimization requires custom logic

### Use LangChain When:
- Building complex multi-step workflows
- Creating agents that use multiple tools
- Need conversational memory
- Integrating many external services
- Prototyping chatbots or assistants

### Use LlamaIndex When:
- Primary goal is question-answering over documents
- Want the simplest API for RAG
- Need advanced indexing strategies (tree, graph)
- Working with structured + unstructured data
- Quick prototypes for data retrieval

### Hybrid Approach:
You can mix custom code with frameworks:
- Use LangChain for retrieval, custom code for generation
- Use LlamaIndex for indexing, custom logic for post-processing
- Build custom components within framework pipelines

## 5. Trade-offs Summary

| Aspect | Custom Code | LangChain | LlamaIndex |
|--------|-------------|-----------|------------|
| **Learning Curve** | Low (Python basics) | Medium-High | Low-Medium |
| **Development Speed** | Slow | Fast | Very Fast |
| **Flexibility** | Highest | High | Medium |
| **Code Length** | Longest | Medium | Shortest |
| **Dependencies** | Minimal | Many | Moderate |
| **Community Support** | N/A | Large | Growing |
| **Best For** | Learning, custom needs | Complex workflows | RAG-focused apps |
| **Maintenance** | You own it | Follow updates | Follow updates |

## 6. Key Takeaways

1. **Frameworks accelerate development** but add dependencies
2. **LangChain** is broader (agents, chains, tools) with a larger ecosystem
3. **LlamaIndex** is specialized for data retrieval with simpler API
4. **Custom code** is still valuable for learning and specific requirements
5. **You can mix approaches** - use frameworks where helpful, custom code where needed

In the next modules:
- **Module 9:** Deep dive into LangChain RAG
- **Module 10:** Deep dive into LlamaIndex RAG
- **Module 11:** Production-ready RAG systems

# üéØ Practice Exercises

## Exercise 1: Personal Project Planning

### Task 1
Think of a RAG project you'd like to build (or one from your Module 6/7 work).

**Answer these questions:**

1. **Project Description:**
   ___________

2. **Would you use a framework or custom code? Why?**
   ___________

3. **If using a framework, which one and why?**
   ___________

4. **What components would you keep custom (if any)?**
   ___________

5. **What are the main risks of your choice?**
   ___________