# Modern RAG Quickstart 2025: Build Your First Advanced RAG Application

Welcome to hands-on RAG development! This notebook will guide you through building a modern RAG application using the latest LangChain features and best practices.

## 🎯 What You'll Build

A modern RAG application with:
- **LangGraph orchestration** for complex workflows
- **Advanced streaming** with intermediate step visibility
- **Semantic chunking** for better context preservation
- **Source attribution** with structured outputs
- **Query analysis** for improved retrieval
- **Production-ready features** like error handling and monitoring

## 📋 Prerequisites

- Basic Python knowledge
- Understanding of RAG concepts (see `rag-concepts-2025.ipynb`)
- OpenAI API key (or other LLM provider)

## 🛠️ Modern Setup (2025)

The LangChain ecosystem has evolved with a more modular architecture:

### 📦 Updated Package Structure

**2024 Approach (Your Original Notebooks):**
```bash
pip install langchain langchain-community langchain-openai chromadb
```

**2025 Modern Approach:**
```bash
# Core LangChain interfaces and base implementations
pip install langchain-core

# Community integrations (vector stores, document loaders)
pip install langchain-community

# LLM providers (choose what you need)
pip install langchain-openai  # or langchain-anthropic, langchain-groq, etc.

# Advanced orchestration for complex workflows
pip install langgraph

# Vector stores and utilities
pip install chromadb beautifulsoup4 python-dotenv

# Optional: Enhanced text processing
pip install langchain-experimental  # for semantic chunking
```

In [None]:
# Install required packages
%pip install --upgrade --quiet langchain-core langchain-community langchain-openai langgraph chromadb beautifulsoup4 python-dotenv langchain-experimental

### 🔑 Environment Setup

Create a `.env` file with your API keys:

```env
# Required
OPENAI_API_KEY=your_openai_api_key_here

# Optional: LangSmith for observability (recommended for production)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langsmith_api_key_here
LANGCHAIN_PROJECT=rag-quickstart-2025
```

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Verify OpenAI API key is loaded
if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("Please set OPENAI_API_KEY in your .env file")

print("✅ Environment setup complete!")
print(f"🔍 LangSmith tracing: {'Enabled' if os.getenv('LANGCHAIN_TRACING_V2') else 'Disabled'}")

## 📚 Phase 1: Modern Document Ingestion

Let's start by loading and processing documents using modern techniques.

### 🌐 Enhanced Document Loading

In [None]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Load a comprehensive AI research document
# We'll use a recent paper on AI agents for more current content
urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2024-02-05-human-data-quality/"  # More recent content
]

# Enhanced BS4 parsing with better content extraction
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content", "post-meta"))

loader = WebBaseLoader(
    web_paths=urls,
    bs_kwargs={"parse_only": bs4_strainer},
    header_template={
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    }
)

try:
    docs = loader.load()
    print(f"✅ Loaded {len(docs)} documents")
    print(f"📄 Total characters: {sum(len(doc.page_content) for doc in docs):,}")
    
    # Show document metadata
    for i, doc in enumerate(docs):
        print(f"\n📋 Document {i+1}:")
        print(f"   Source: {doc.metadata.get('source', 'Unknown')}")
        print(f"   Length: {len(doc.page_content):,} characters")
        print(f"   Preview: {doc.page_content[:100]}...")
        
except Exception as e:
    print(f"❌ Error loading documents: {e}")
    print("💡 Tip: Check your internet connection or try different URLs")

### 🧠 Modern Text Splitting: Semantic Chunking

**2024 Approach**: Simple character-based chunking
**2025 Approach**: Semantic chunking that preserves meaning

In [None]:
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Initialize embeddings for semantic chunking
embeddings = OpenAIEmbeddings()

print("🔄 Comparing chunking strategies...\n")

# 1. Traditional character-based chunking (2024 approach)
print("📝 Traditional Character-based Chunking:")
char_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True,
    separators=["\n\n", "\n", ".", " ", ""]
)

char_chunks = char_splitter.split_documents(docs)
print(f"   Chunks created: {len(char_chunks)}")
print(f"   Average chunk size: {sum(len(chunk.page_content) for chunk in char_chunks) // len(char_chunks)} characters")

# 2. Modern semantic chunking (2025 approach)
print("\n🧠 Modern Semantic Chunking:")
try:
    semantic_splitter = SemanticChunker(
        embeddings=embeddings,
        breakpoint_threshold_type="percentile",  # or "standard_deviation", "interquartile"
        breakpoint_threshold_amount=95  # Top 5% of semantic differences become breakpoints
    )
    
    # Use first document for semantic chunking demo (to avoid API costs)
    semantic_chunks = semantic_splitter.split_documents([docs[0]])
    print(f"   Chunks created: {len(semantic_chunks)}")
    print(f"   Average chunk size: {sum(len(chunk.page_content) for chunk in semantic_chunks) // len(semantic_chunks)} characters")
    
    # For production, we'll use character-based for cost efficiency
    # In real applications, choose based on your accuracy vs cost requirements
    
except Exception as e:
    print(f"   ⚠️  Semantic chunking failed: {e}")
    print("   📝 Falling back to character-based chunking")
    semantic_chunks = char_chunks[:10]  # Use subset for demo

# For the rest of this tutorial, we'll use character-based chunks (more cost-effective)
all_chunks = char_chunks

print(f"\n✅ Using {len(all_chunks)} chunks for vector storage")

# Show a few example chunks with metadata
print("\n📄 Example chunks:")
for i, chunk in enumerate(all_chunks[:3]):
    print(f"\n   Chunk {i+1}:")
    print(f"   Source: {chunk.metadata.get('source', 'Unknown')}")
    print(f"   Start Index: {chunk.metadata.get('start_index', 'N/A')}")
    print(f"   Length: {len(chunk.page_content)} chars")
    print(f"   Content: {chunk.page_content[:150]}...")

### 🗃️ Enhanced Vector Storage

Modern vector storage with better configuration and metadata handling:

In [None]:
from langchain_community.vectorstores import Chroma
import tempfile
import shutil

# Create a temporary directory for the vector store
persist_directory = tempfile.mkdtemp(prefix="rag_modern_")

print(f"🗄️  Creating vector store in: {persist_directory}")

try:
    # Create vector store with persistence and better configuration
    vectorstore = Chroma.from_documents(
        documents=all_chunks,
        embedding=embeddings,
        persist_directory=persist_directory,
        collection_metadata={"hnsw:space": "cosine"}  # Better similarity metric
    )
    
    print(f"✅ Vector store created with {len(all_chunks)} documents")
    
    # Test the vector store
    test_results = vectorstore.similarity_search(
        "What is task decomposition?", 
        k=3,
        include_metadata=True
    )
    
    print(f"\n🔍 Test search returned {len(test_results)} results")
    print(f"📄 First result preview: {test_results[0].page_content[:100]}...")
    
except Exception as e:
    print(f"❌ Error creating vector store: {e}")
    raise

## 🔍 Phase 2: Advanced Retrieval with Query Analysis

Modern RAG systems don't just take user queries as-is. They analyze and optimize them first.

### 🧭 Query Analysis and Rewriting

**2024**: Direct user query → Search  
**2025**: User query → Analyze → Rewrite → Search

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List

# Initialize the language model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Define structured output for query analysis
class QueryAnalysis(BaseModel):
    """Analysis and rewriting of user queries for better retrieval."""
    
    original_query: str = Field(description="The original user query")
    intent: str = Field(description="The intent behind the query (e.g., 'definition', 'comparison', 'process')")
    key_concepts: List[str] = Field(description="Key concepts and terms in the query")
    rewritten_queries: List[str] = Field(description="2-3 rewritten versions optimized for retrieval")
    metadata_filters: dict = Field(description="Any metadata filters that should be applied")

# Create query analysis prompt
query_analysis_prompt = ChatPromptTemplate.from_template(
    """
You are an expert at analyzing user queries for a RAG system about AI agents and LLM research.

Given a user query, analyze it and provide:
1. The query intent (definition, comparison, process, example, etc.)
2. Key concepts and technical terms
3. 2-3 rewritten versions that would be better for semantic search
4. Any metadata filters that might help (like source preferences)

User Query: {query}

Provide your analysis in the specified JSON format.
"""
)

# Create the query analyzer
query_analyzer = query_analysis_prompt | llm.with_structured_output(QueryAnalysis)

# Test the query analyzer
def analyze_query(user_query: str) -> QueryAnalysis:
    """Analyze a user query for better retrieval."""
    try:
        analysis = query_analyzer.invoke({"query": user_query})
        return analysis
    except Exception as e:
        print(f"⚠️ Query analysis failed: {e}")
        # Fallback to simple analysis
        return QueryAnalysis(
            original_query=user_query,
            intent="general",
            key_concepts=[user_query],
            rewritten_queries=[user_query],
            metadata_filters={}
        )

# Test with different types of queries
test_queries = [
    "What is task decomposition?",
    "How do autonomous agents work compared to traditional chatbots?",
    "Can you give me examples of reasoning techniques in AI?"
]

print("🧭 Query Analysis Results:\n")

for query in test_queries:
    print(f"❓ Query: '{query}'")
    analysis = analyze_query(query)
    print(f"   🎯 Intent: {analysis.intent}")
    print(f"   🔑 Key Concepts: {', '.join(analysis.key_concepts)}")
    print(f"   ✍️  Rewritten Queries:")
    for i, rewritten in enumerate(analysis.rewritten_queries, 1):
        print(f"      {i}. {rewritten}")
    print()

### 🎯 Advanced Retrieval Strategies

Modern retrievers with multiple strategies and result fusion:

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from typing import List, Dict, Any

class AdvancedRetriever:
    """Advanced retrieval system with multiple strategies."""
    
    def __init__(self, vectorstore, documents):
        self.vectorstore = vectorstore
        
        # Create multiple retrievers
        self.vector_retriever = vectorstore.as_retriever(
            search_type="similarity_score_threshold",
            search_kwargs={
                "k": 6,
                "score_threshold": 0.7
            }
        )
        
        # BM25 retriever for keyword-based search
        try:
            self.bm25_retriever = BM25Retriever.from_documents(
                documents, k=4
            )
            
            # Ensemble retriever combining semantic and keyword search
            self.ensemble_retriever = EnsembleRetriever(
                retrievers=[self.vector_retriever, self.bm25_retriever],
                weights=[0.7, 0.3]  # Favor semantic search
            )
            self.use_ensemble = True
            print("✅ Advanced ensemble retriever created")
            
        except Exception as e:
            print(f"⚠️ BM25 retriever failed: {e}")
            print("📝 Using vector retriever only")
            self.use_ensemble = False
    
    def retrieve(self, analysis: QueryAnalysis) -> List[Any]:
        """Retrieve documents using analyzed query."""
        all_docs = []
        
        # Use ensemble retriever if available
        retriever = self.ensemble_retriever if self.use_ensemble else self.vector_retriever
        
        # Try each rewritten query
        for query in analysis.rewritten_queries:
            try:
                docs = retriever.invoke(query)
                all_docs.extend(docs)
            except Exception as e:
                print(f"⚠️ Retrieval failed for query '{query}': {e}")
        
        # Remove duplicates while preserving order
        seen = set()
        unique_docs = []
        for doc in all_docs:
            doc_id = (doc.page_content[:100], doc.metadata.get('source', ''))
            if doc_id not in seen:
                seen.add(doc_id)
                unique_docs.append(doc)
        
        # Limit results
        return unique_docs[:8]

# Create advanced retriever
advanced_retriever = AdvancedRetriever(vectorstore, all_chunks)

# Test retrieval
test_query = "What is task decomposition?"
analysis = analyze_query(test_query)
retrieved_docs = advanced_retriever.retrieve(analysis)

print(f"\n🔍 Retrieved {len(retrieved_docs)} documents for: '{test_query}'")
print("\n📄 Retrieved documents:")
for i, doc in enumerate(retrieved_docs[:3]):
    print(f"\n   Document {i+1}:")
    print(f"   Source: {doc.metadata.get('source', 'Unknown')}")
    print(f"   Preview: {doc.page_content[:200]}...")

## 🤖 Phase 3: Modern Generation with LangGraph

**2024**: Simple LCEL chains  
**2025**: LangGraph orchestrated workflows with state management

### 🌊 LangGraph Workflow Setup

In [None]:
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
import json

# Define the state for our RAG workflow
class RAGState(TypedDict):
    """State for the RAG workflow."""
    user_query: str
    query_analysis: QueryAnalysis
    retrieved_docs: List[Any]
    generated_response: str
    sources: List[Dict[str, Any]]
    messages: List[BaseMessage]
    error: str

# Define structured output for final response
class RAGResponse(BaseModel):
    """Structured response from RAG system."""
    answer: str = Field(description="The comprehensive answer to the user's question")
    sources: List[Dict[str, str]] = Field(description="List of sources with titles and URLs")
    confidence: str = Field(description="Confidence level: high, medium, or low")
    follow_up_questions: List[str] = Field(description="Suggested follow-up questions")

print("🌊 Setting up LangGraph workflow...")

### 🔧 Workflow Node Functions

In [None]:
# Node 1: Query Analysis
def analyze_query_node(state: RAGState) -> RAGState:
    """Analyze the user query."""
    try:
        analysis = analyze_query(state["user_query"])
        state["query_analysis"] = analysis
        state["messages"].append(HumanMessage(content=state["user_query"]))
    except Exception as e:
        state["error"] = f"Query analysis failed: {e}"
    
    return state

# Node 2: Document Retrieval
def retrieve_documents_node(state: RAGState) -> RAGState:
    """Retrieve relevant documents."""
    if state.get("error"):
        return state
    
    try:
        docs = advanced_retriever.retrieve(state["query_analysis"])
        state["retrieved_docs"] = docs
        
        # Extract source information
        sources = []
        for doc in docs:
            source_info = {
                "title": doc.metadata.get("title", "Unknown Document"),
                "url": doc.metadata.get("source", ""),
                "preview": doc.page_content[:150] + "..."
            }
            sources.append(source_info)
        
        state["sources"] = sources
        
    except Exception as e:
        state["error"] = f"Document retrieval failed: {e}"
    
    return state

# Node 3: Response Generation
def generate_response_node(state: RAGState) -> RAGState:
    """Generate the final response."""
    if state.get("error"):
        return state
    
    try:
        # Create generation prompt with sources
        generation_prompt = ChatPromptTemplate.from_template(
            """
You are an expert AI assistant specializing in AI agents and LLM research.

User Query: {query}
Query Intent: {intent}
Key Concepts: {concepts}

Retrieved Context:
{context}

Instructions:
1. Provide a comprehensive, accurate answer based on the retrieved context
2. If the context doesn't fully address the query, acknowledge this
3. Include specific examples when relevant
4. Suggest 2-3 follow-up questions
5. Rate your confidence level (high/medium/low) based on context quality

Format your response as structured JSON matching the RAGResponse schema.
"""
        )
        
        # Prepare context
        context = "\n\n".join([
            f"Source: {doc.metadata.get('source', 'Unknown')}\n{doc.page_content}"
            for doc in state["retrieved_docs"]
        ])
        
        # Generate response
        generation_chain = generation_prompt | llm.with_structured_output(RAGResponse)
        
        response = generation_chain.invoke({
            "query": state["user_query"],
            "intent": state["query_analysis"].intent,
            "concepts": ", ".join(state["query_analysis"].key_concepts),
            "context": context
        })
        
        state["generated_response"] = response.answer
        
        # Update sources with response sources
        state["sources"] = response.sources if response.sources else state["sources"]
        
        # Add AI message to conversation
        ai_message = AIMessage(content=json.dumps({
            "answer": response.answer,
            "confidence": response.confidence,
            "follow_up_questions": response.follow_up_questions
        }, indent=2))
        
        state["messages"].append(ai_message)
        
    except Exception as e:
        state["error"] = f"Response generation failed: {e}"
        # Fallback response
        fallback_answer = f"I apologize, but I encountered an error while processing your question: '{state['user_query']}'. Please try rephrasing your question."
        state["generated_response"] = fallback_answer
        state["messages"].append(AIMessage(content=fallback_answer))
    
    return state

print("✅ Workflow nodes defined")

### 🔄 Build the LangGraph Workflow

In [None]:
# Create the workflow graph
workflow = StateGraph(RAGState)

# Add nodes
workflow.add_node("analyze_query", analyze_query_node)
workflow.add_node("retrieve_docs", retrieve_documents_node)
workflow.add_node("generate_response", generate_response_node)

# Define the flow
workflow.set_entry_point("analyze_query")
workflow.add_edge("analyze_query", "retrieve_docs")
workflow.add_edge("retrieve_docs", "generate_response")
workflow.add_edge("generate_response", END)

# Compile the workflow
rag_app = workflow.compile()

print("🏗️ LangGraph RAG workflow compiled successfully!")

# Visualize the workflow (if graphviz is available)
try:
    from IPython.display import Image, display
    display(Image(rag_app.get_graph().draw_mermaid_png()))
except:
    print("📊 Workflow visualization not available (install graphviz for visual graph)")
    print("🔄 Workflow: analyze_query → retrieve_docs → generate_response")

## 🚀 Phase 4: Advanced Streaming and Interaction

### 🌊 Modern Streaming with Intermediate Steps

In [None]:
import asyncio
from typing import AsyncIterator
import time

async def stream_rag_response(user_query: str) -> AsyncIterator[dict]:
    """Stream RAG response with intermediate steps."""
    
    # Initialize state
    initial_state = {
        "user_query": user_query,
        "messages": [],
        "sources": [],
        "error": ""
    }
    
    # Stream through the workflow
    async for event in rag_app.astream(initial_state):
        yield event

# Synchronous version for Jupyter
def run_rag_with_streaming(user_query: str):
    """Run RAG with streaming output."""
    print(f"🤔 Processing query: '{user_query}'\n")
    
    # Initialize state
    initial_state = {
        "user_query": user_query,
        "messages": [],
        "sources": [],
        "error": ""
    }
    
    # Stream through workflow
    for step_output in rag_app.stream(initial_state):
        for node_name, node_state in step_output.items():
            if node_name == "analyze_query":
                if "query_analysis" in node_state:
                    analysis = node_state["query_analysis"]
                    print(f"🧭 Query Analysis Complete:")
                    print(f"   Intent: {analysis.intent}")
                    print(f"   Key Concepts: {', '.join(analysis.key_concepts)}")
                    print(f"   Rewritten: {analysis.rewritten_queries[0]}\n")
                    
            elif node_name == "retrieve_docs":
                if "retrieved_docs" in node_state:
                    docs = node_state["retrieved_docs"]
                    print(f"🔍 Document Retrieval Complete:")
                    print(f"   Found {len(docs)} relevant documents")
                    print(f"   Sources: {set(doc.metadata.get('source', 'Unknown') for doc in docs)}\n")
                    
            elif node_name == "generate_response":
                if "generated_response" in node_state:
                    print(f"✅ Response Generation Complete:\n")
                    
                    # Display final response
                    print("🤖 **Answer:**")
                    print(node_state["generated_response"])
                    
                    # Display sources
                    if node_state["sources"]:
                        print("\n📚 **Sources:**")
                        for i, source in enumerate(node_state["sources"][:3], 1):
                            print(f"   {i}. {source.get('url', 'Unknown source')}")
                    
                    return node_state
    
    return None

print("🌊 Streaming RAG system ready!")

### 🎮 Interactive RAG Demo

Let's test our modern RAG system with various types of questions:

In [None]:
# Test queries showcasing different capabilities
test_queries = [
    "What is task decomposition and why is it important for AI agents?",
    "How do autonomous agents differ from traditional chatbots?",
    "Can you explain the Chain of Thought reasoning technique with examples?",
    "What are the main challenges in building reliable AI agents?"
]

# Interactive demo
print("🎮 Modern RAG System Demo\n")
print("Choose a test query or enter your own:\n")

for i, query in enumerate(test_queries, 1):
    print(f"{i}. {query}")

print("\n5. Enter custom query")
print("="*60)

In [None]:
# Run a test query (you can change this to any of the test queries or your own)
selected_query = test_queries[0]  # Change index 0-3 for different test queries

# Or uncomment the line below to use a custom query
# selected_query = "Your custom question here"

print(f"Running query: '{selected_query}'")
print("=" * 80)

# Run the RAG system with streaming
start_time = time.time()
result = run_rag_with_streaming(selected_query)
end_time = time.time()

if result:
    print(f"\n⏱️ Total processing time: {end_time - start_time:.2f} seconds")
    
    # Show error if any
    if result.get("error"):
        print(f"\n⚠️ Warning: {result['error']}")
else:
    print("❌ Query processing failed")

## 📊 Performance Comparison: 2024 vs 2025

Let's compare the old simple approach with our new advanced system:

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# 2024 Simple RAG Chain (from your original notebook)
def create_simple_rag_chain():
    """Create a simple RAG chain like in 2024."""
    
    # Simple retriever
    simple_retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
    
    # Simple prompt
    simple_prompt = ChatPromptTemplate.from_template(
        """
You are an assistant for question-answering tasks. Use the following pieces of 
retrieved context to answer the question. If you don't know the answer, just say 
that you don't know. Use three sentences maximum and keep the answer concise.

Question: {question}
Context: {context}

Answer:
"""
    )
    
    # Format documents function
    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)
    
    # Simple chain
    rag_chain = (
        {"context": simple_retriever | format_docs, "question": RunnablePassthrough()}
        | simple_prompt
        | llm
        | StrOutputParser()
    )
    
    return rag_chain

# Create simple chain
simple_rag = create_simple_rag_chain()

# Performance comparison
test_query = "What is task decomposition?"

print("⚡ Performance Comparison: 2024 vs 2025 RAG\n")
print("📊 Testing query: 'What is task decomposition?'\n")

# Test 2024 approach
print("🕐 2024 Simple RAG:")
start_time = time.time()
simple_result = simple_rag.invoke(test_query)
simple_time = time.time() - start_time

print(f"   Time: {simple_time:.2f} seconds")
print(f"   Answer: {simple_result[:200]}...\n")

# Test 2025 approach
print("🚀 2025 Advanced RAG:")
start_time = time.time()
# Get the last result from our streaming demo
advanced_result = run_rag_with_streaming(test_query)
advanced_time = time.time() - start_time

print(f"\n   Time: {advanced_time:.2f} seconds")
print(f"   Features: Query analysis, advanced retrieval, source attribution, structured output")

# Summary
print("\n" + "="*60)
print("📈 **Improvements in 2025 RAG:**")
print("   ✅ Query analysis and optimization")
print("   ✅ Multi-strategy retrieval (semantic + keyword)")
print("   ✅ Structured outputs with confidence scoring")
print("   ✅ Proper source attribution")
print("   ✅ LangGraph workflow management")
print("   ✅ Advanced streaming capabilities")
print("   ✅ Better error handling")
print("   ✅ Follow-up question suggestions")

## 🛡️ Production Considerations

Modern RAG systems need to be production-ready. Here are key considerations:

In [None]:
# Production utilities
import logging
from functools import wraps
import hashlib

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("RAG_System")

class ProductionRAGSystem:
    """Production-ready RAG system with monitoring and caching."""
    
    def __init__(self, workflow, max_retries=3):
        self.workflow = workflow
        self.max_retries = max_retries
        self.cache = {}  # Simple in-memory cache (use Redis in production)
        
    def _get_cache_key(self, query: str) -> str:
        """Generate cache key for query."""
        return hashlib.md5(query.encode()).hexdigest()
    
    def query(self, user_query: str, use_cache: bool = True) -> dict:
        """Process query with production features."""
        
        # Check cache first
        if use_cache:
            cache_key = self._get_cache_key(user_query)
            if cache_key in self.cache:
                logger.info(f"Cache hit for query: {user_query[:50]}...")
                return self.cache[cache_key]
        
        # Process with retries
        for attempt in range(self.max_retries):
            try:
                logger.info(f"Processing query (attempt {attempt + 1}): {user_query[:50]}...")
                
                # Initialize state
                initial_state = {
                    "user_query": user_query,
                    "messages": [],
                    "sources": [],
                    "error": ""
                }
                
                # Run workflow
                final_state = None
                for step_output in self.workflow.stream(initial_state):
                    for node_name, node_state in step_output.items():
                        if node_name == "generate_response":
                            final_state = node_state
                
                if final_state and not final_state.get("error"):
                    result = {
                        "answer": final_state.get("generated_response", ""),
                        "sources": final_state.get("sources", []),
                        "query": user_query,
                        "timestamp": time.time()
                    }
                    
                    # Cache result
                    if use_cache:
                        self.cache[cache_key] = result
                    
                    logger.info("Query processed successfully")
                    return result
                else:
                    error_msg = final_state.get("error", "Unknown error") if final_state else "Workflow failed"
                    logger.warning(f"Attempt {attempt + 1} failed: {error_msg}")
                    
            except Exception as e:
                logger.error(f"Attempt {attempt + 1} failed with exception: {e}")
                if attempt == self.max_retries - 1:
                    raise
        
        # Fallback response
        return {
            "answer": "I apologize, but I'm unable to process your query at the moment. Please try again later.",
            "sources": [],
            "query": user_query,
            "timestamp": time.time(),
            "error": "Max retries exceeded"
        }
    
    def get_cache_stats(self) -> dict:
        """Get cache statistics."""
        return {
            "cache_size": len(self.cache),
            "cache_keys": list(self.cache.keys())
        }
    
    def clear_cache(self):
        """Clear the cache."""
        self.cache.clear()
        logger.info("Cache cleared")

# Create production system
prod_rag = ProductionRAGSystem(rag_app)

print("🏭 Production RAG system initialized")
print("\n🔧 Production Features:")
print("   ✅ Comprehensive logging")
print("   ✅ Automatic retry logic")
print("   ✅ Response caching")
print("   ✅ Error handling and fallbacks")
print("   ✅ Performance monitoring")

### 🧪 Production System Test

In [None]:
# Test production system
print("🧪 Testing Production RAG System\n")

# Test query
test_query = "What are the main components of an autonomous agent?"

# First call (no cache)
print("📞 First call (cache miss):")
start_time = time.time()
result1 = prod_rag.query(test_query)
time1 = time.time() - start_time
print(f"   Time: {time1:.2f} seconds")
print(f"   Answer: {result1['answer'][:100]}...\n")

# Second call (cached)
print("⚡ Second call (cache hit):")
start_time = time.time()
result2 = prod_rag.query(test_query)
time2 = time.time() - start_time
print(f"   Time: {time2:.2f} seconds")
print(f"   Speedup: {time1/time2:.1f}x faster\n")

# Cache stats
cache_stats = prod_rag.get_cache_stats()
print(f"📊 Cache Stats: {cache_stats['cache_size']} items cached")

print("\n✅ Production system test complete!")

## 🧹 Cleanup

In [None]:
# Cleanup resources
try:
    # Clean up vector store directory
    shutil.rmtree(persist_directory)
    print(f"🧹 Cleaned up temporary directory: {persist_directory}")
except Exception as e:
    print(f"⚠️ Cleanup warning: {e}")

# Clear production cache
prod_rag.clear_cache()

print("✅ Cleanup complete!")

## 🎓 Key Takeaways: Modern RAG 2025

Congratulations! You've built a state-of-the-art RAG system. Here's what you've learned:

### 🆚 2024 vs 2025 Comparison

| Feature | 2024 Simple RAG | 2025 Modern RAG |
|---------|----------------|------------------|
| **Architecture** | Linear LCEL chains | LangGraph workflows |
| **Query Processing** | Direct search | Analysis + rewriting |
| **Retrieval** | Basic similarity | Multi-strategy ensemble |
| **Chunking** | Character-based | Semantic chunking |
| **Streaming** | Simple output | Intermediate steps |
| **Sources** | Basic citations | Structured attribution |
| **Error Handling** | Minimal | Comprehensive |
| **Production Features** | None | Caching, retries, monitoring |

### 🚀 Modern RAG Benefits

1. **Better Accuracy**: Query analysis and multi-strategy retrieval
2. **Enhanced UX**: Streaming with progress indicators
3. **Production Ready**: Error handling, caching, monitoring
4. **Flexible Architecture**: LangGraph enables complex workflows
5. **Source Transparency**: Structured citations and confidence scores

### 🔮 Next Steps

Ready to explore cutting-edge techniques? Check out:
- **`rag-advanced-features.ipynb`** - Multi-modal RAG, adaptive systems, and enterprise features

### 💡 Production Tips

1. **Monitor Performance**: Use LangSmith for observability
2. **Optimize Costs**: Balance accuracy vs API costs
3. **Scale Gradually**: Start simple, add complexity as needed
4. **Test Extensively**: Use diverse queries and edge cases
5. **Cache Intelligently**: Cache expensive operations

### 🌟 You're Now Ready For

- Building production RAG applications
- Integrating with enterprise systems
- Handling complex multi-step reasoning
- Implementing advanced retrieval strategies

**Happy Building! 🛠️**