# Conversational RAG 2025: Building Chat-Enabled RAG Systems

Learn how to build RAG systems that maintain conversation context and provide natural chat experiences.

## 🎯 What You'll Learn

- **Chat History Management** - Maintaining context across multiple turns
- **Memory Systems** - Different approaches to conversation memory
- **Context Compression** - Handling long conversations efficiently
- **Follow-up Questions** - Generating relevant follow-up suggestions
- **Session Management** - User-specific conversation handling
- **Advanced Streaming** - Real-time conversational responses

## 📋 Prerequisites

- Completed `rag-quickstart-modern.ipynb`
- Understanding of basic RAG concepts
- OpenAI API key or other LLM provider

## 🔧 Setup

In [None]:
# Install additional packages for conversational RAG
%pip install --upgrade --quiet langchain-core langchain-community langchain-openai langgraph chromadb python-dotenv

In [None]:
import os
from dotenv import load_dotenv
import tempfile
import time
from typing import List, Dict, Any, Optional

# Core LangChain imports
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.runnables import RunnablePassthrough

# LangGraph for conversation orchestration
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing_extensions import TypedDict

import bs4
import json
import uuid

# Load environment
load_dotenv()

if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("Please set OPENAI_API_KEY in your .env file")

print("✅ Setup complete!")

## 📚 Quick Knowledge Base Setup

Let's create a knowledge base for our conversational RAG system:

In [None]:
# Load documents
urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2024-02-05-human-data-quality/"
]

loader = WebBaseLoader(
    web_paths=urls,
    bs_kwargs={"parse_only": bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))}
)

docs = loader.load()
print(f"📄 Loaded {len(docs)} documents")

# Split documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True
)
splits = text_splitter.split_documents(docs)
print(f"📝 Created {len(splits)} chunks")

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=tempfile.mkdtemp(prefix="conv_rag_")
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 6})
print("🗄️ Vector store ready")

## 💬 Understanding Conversational RAG

### 🆚 Single-Turn vs Multi-Turn RAG

**Single-Turn RAG (Basic):**
```
User: "What is task decomposition?"
RAG: [Searches] → [Generates response]
```

**Multi-Turn RAG (Conversational):**
```
User: "What is task decomposition?"
RAG: [Searches + Responds] → "Task decomposition breaks complex tasks..."

User: "Can you give me some examples?"
RAG: [Remembers context] → [Searches for examples] → [Responds with examples]

User: "How does this compare to chain-of-thought?"
RAG: [Understands "this" = task decomposition] → [Searches] → [Compares both concepts]
```

### 🧠 Memory Strategies

**1. Full History** - Keep all messages (simple but expensive)  
**2. Sliding Window** - Keep last N messages  
**3. Summary Memory** - Summarize older conversations  
**4. Entity Memory** - Track important entities across conversation  
**5. Compressed Memory** - Compress context while preserving key information

## 🏗️ Building Conversational RAG Architecture

### 📋 Conversational State Definition

In [None]:
# Conversational RAG State
class ConversationalRAGState(TypedDict):
    """State for conversational RAG workflow."""
    messages: List[BaseMessage]  # Full conversation history
    user_query: str  # Current user question
    reformulated_query: str  # Context-aware query for retrieval
    retrieved_docs: List[Any]  # Retrieved documents
    response: str  # Generated response
    sources: List[Dict[str, Any]]  # Source information
    follow_ups: List[str]  # Suggested follow-up questions
    session_id: str  # Session identifier
    error: Optional[str]  # Error message if any

# Response structure
class ConversationalResponse(BaseModel):
    """Structured response for conversational RAG."""
    answer: str = Field(description="The response to the user's question")
    confidence: str = Field(description="Confidence level: high, medium, low")
    sources: List[Dict[str, str]] = Field(description="Sources used for the answer")
    follow_up_questions: List[str] = Field(description="Suggested follow-up questions")
    context_used: bool = Field(description="Whether conversation history was used")

print("📋 Conversational state defined")

### 🔄 Query Reformulation for Context

In [None]:
# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Query reformulation for conversational context
reformulation_prompt = ChatPromptTemplate.from_messages([
    ("system", """
You are an expert at reformulating user questions in conversational contexts.

Given a conversation history and a new user question, reformulate the question to:
1. Be self-contained (include necessary context from conversation)
2. Resolve pronouns and references ("it", "this", "that", etc.)
3. Be optimized for semantic search in a knowledge base

If the question is already self-contained, return it as-is.

Examples:
History: ["What is task decomposition?", "Task decomposition breaks complex tasks into smaller parts..."]
Question: "Can you give me examples?"
Reformulated: "Can you give me examples of task decomposition techniques?"
"""),
    MessagesPlaceholder(variable_name="conversation_history"),
    ("human", "New question: {question}\n\nReformulated question:")
])

reformulation_chain = reformulation_prompt | llm

def reformulate_query(messages: List[BaseMessage], current_query: str) -> str:
    """Reformulate query based on conversation context."""
    try:
        # Only use recent history (last 6 messages) to avoid token limits
        recent_history = messages[-6:] if len(messages) > 6 else messages
        
        result = reformulation_chain.invoke({
            "conversation_history": recent_history,
            "question": current_query
        })
        
        reformulated = result.content.strip()
        
        # Fallback to original if reformulation seems problematic
        if len(reformulated) < len(current_query) * 0.5 or len(reformulated) > len(current_query) * 3:
            return current_query
        
        return reformulated
        
    except Exception as e:
        print(f"⚠️ Query reformulation failed: {e}")
        return current_query

# Test reformulation
test_history = [
    HumanMessage(content="What is task decomposition?"),
    AIMessage(content="Task decomposition is a technique that breaks down complex tasks into smaller, manageable steps...")
]

test_query = "Can you give me some examples?"
reformulated = reformulate_query(test_history, test_query)

print("🧪 Query Reformulation Test:")
print(f"   Original: '{test_query}'")
print(f"   Reformulated: '{reformulated}'")

### 🤖 Conversational Response Generation

In [None]:
# Conversational response generation
conversational_prompt = ChatPromptTemplate.from_messages([
    ("system", """
You are an expert AI assistant specializing in AI agents and LLM research.
You're having a natural conversation with a user about these topics.

Guidelines:
1. Use the retrieved context to provide accurate, detailed answers
2. Reference the conversation history when relevant
3. Be conversational and engaging, not robotic
4. If the context doesn't fully address the question, acknowledge this
5. Suggest relevant follow-up questions to continue the conversation
6. Use examples and analogies to make complex concepts clear

Retrieved Context:
{context}

Answer the user's question in a conversational manner and provide follow-up suggestions.
"""),
    MessagesPlaceholder(variable_name="conversation_history"),
    ("human", "{question}")
])

# Create conversational chain with structured output
conversational_chain = conversational_prompt | llm.with_structured_output(ConversationalResponse)

def generate_conversational_response(
    question: str,
    context_docs: List[Any],
    conversation_history: List[BaseMessage]
) -> ConversationalResponse:
    """Generate a conversational response with context."""
    
    # Format context
    context = "\n\n".join([
        f"Source: {doc.metadata.get('source', 'Unknown')}\n{doc.page_content}"
        for doc in context_docs
    ])
    
    try:
        # Use recent conversation history (last 8 messages)
        recent_history = conversation_history[-8:] if len(conversation_history) > 8 else conversation_history
        
        response = conversational_chain.invoke({
            "question": question,
            "context": context,
            "conversation_history": recent_history
        })
        
        return response
        
    except Exception as e:
        print(f"⚠️ Response generation failed: {e}")
        # Fallback response
        return ConversationalResponse(
            answer="I apologize, but I encountered an error generating a response. Could you please rephrase your question?",
            confidence="low",
            sources=[],
            follow_up_questions=["Could you rephrase your question?", "Is there a specific aspect you'd like to know about?"],
            context_used=len(conversation_history) > 0
        )

print("🤖 Conversational response generator ready")

## 🌊 LangGraph Conversational Workflow

In [None]:
# Workflow nodes
def reformulate_query_node(state: ConversationalRAGState) -> ConversationalRAGState:
    """Reformulate query based on conversation context."""
    try:
        reformulated = reformulate_query(state["messages"], state["user_query"])
        state["reformulated_query"] = reformulated
        
        print(f"🔄 Query reformulated: '{state['user_query']}' → '{reformulated}'")
        
    except Exception as e:
        state["error"] = f"Query reformulation failed: {e}"
        state["reformulated_query"] = state["user_query"]
    
    return state

def retrieve_documents_node(state: ConversationalRAGState) -> ConversationalRAGState:
    """Retrieve documents based on reformulated query."""
    if state.get("error"):
        return state
    
    try:
        docs = retriever.invoke(state["reformulated_query"])
        state["retrieved_docs"] = docs
        
        # Extract sources
        sources = []
        for doc in docs:
            sources.append({
                "title": "AI Research Document",
                "url": doc.metadata.get("source", ""),
                "snippet": doc.page_content[:150] + "..."
            })
        
        state["sources"] = sources
        print(f"🔍 Retrieved {len(docs)} documents")
        
    except Exception as e:
        state["error"] = f"Document retrieval failed: {e}"
        state["retrieved_docs"] = []
        state["sources"] = []
    
    return state

def generate_response_node(state: ConversationalRAGState) -> ConversationalRAGState:
    """Generate conversational response."""
    if state.get("error"):
        return state
    
    try:
        # Generate response
        response = generate_conversational_response(
            state["user_query"],
            state["retrieved_docs"],
            state["messages"]
        )
        
        state["response"] = response.answer
        state["follow_ups"] = response.follow_up_questions
        
        # Add messages to conversation history
        state["messages"].append(HumanMessage(content=state["user_query"]))
        state["messages"].append(AIMessage(content=response.answer))
        
        print(f"✅ Generated conversational response")
        
    except Exception as e:
        state["error"] = f"Response generation failed: {e}"
        fallback_response = "I apologize, but I'm having trouble processing your question right now."
        state["response"] = fallback_response
        state["follow_ups"] = ["Could you try rephrasing your question?"]
        
        # Still add to history
        state["messages"].append(HumanMessage(content=state["user_query"]))
        state["messages"].append(AIMessage(content=fallback_response))
    
    return state

# Build the conversational workflow
conv_workflow = StateGraph(ConversationalRAGState)

# Add nodes
conv_workflow.add_node("reformulate", reformulate_query_node)
conv_workflow.add_node("retrieve", retrieve_documents_node)
conv_workflow.add_node("generate", generate_response_node)

# Define flow
conv_workflow.set_entry_point("reformulate")
conv_workflow.add_edge("reformulate", "retrieve")
conv_workflow.add_edge("retrieve", "generate")
conv_workflow.add_edge("generate", END)

# Add memory for persistence
memory = MemorySaver()
conv_app = conv_workflow.compile(checkpointer=memory)

print("🌊 Conversational RAG workflow compiled!")

## 💬 Conversational RAG Session Manager

In [None]:
class ConversationalRAGSession:
    """Manages a conversational RAG session with memory and context."""
    
    def __init__(self, workflow, session_id: str = None):
        self.workflow = workflow
        self.session_id = session_id or str(uuid.uuid4())
        self.conversation_history = []
        
    def ask(self, question: str, stream: bool = False) -> Dict[str, Any]:
        """Ask a question in the conversational context."""
        
        # Prepare initial state
        config = {"configurable": {"thread_id": self.session_id}}
        
        initial_state = {
            "messages": self.conversation_history.copy(),
            "user_query": question,
            "session_id": self.session_id,
            "sources": [],
            "follow_ups": [],
            "error": None
        }
        
        print(f"\n🤔 User: {question}")
        
        try:
            result = self.workflow.invoke(initial_state, config)
            
            # Update conversation history
            self.conversation_history = result["messages"]
            
            return {
                "response": result["response"],
                "sources": result["sources"],
                "follow_ups": result["follow_ups"],
                "session_id": self.session_id,
                "error": result.get("error")
            }
        except Exception as e:
            return {"error": f"Session processing failed: {e}"}
    
    def get_history(self) -> List[Dict[str, str]]:
        """Get conversation history in a readable format."""
        history = []
        for msg in self.conversation_history:
            if isinstance(msg, HumanMessage):
                history.append({"role": "user", "content": msg.content})
            elif isinstance(msg, AIMessage):
                history.append({"role": "assistant", "content": msg.content})
        return history
    
    def clear_history(self):
        """Clear conversation history."""
        self.conversation_history = []
        print("🧹 Conversation history cleared")
    
    def get_session_info(self) -> Dict[str, Any]:
        """Get session information."""
        return {
            "session_id": self.session_id,
            "message_count": len(self.conversation_history),
            "conversation_turns": len(self.conversation_history) // 2
        }

print("💬 Conversational RAG Session Manager ready")

## 🎮 Interactive Conversational Demo

In [None]:
# Create a conversational session
chat_session = ConversationalRAGSession(conv_app)

print(f"🎮 Conversational RAG Demo Started")
print(f"📝 Session ID: {chat_session.session_id}")
print("=" * 60)

In [None]:
# Demo conversation 1
result1 = chat_session.ask("What is task decomposition?")

if "error" not in result1:
    print(f"\n🤖 Assistant: {result1['response']}")
    
    if result1["follow_ups"]:
        print("\n💡 Suggested follow-up questions:")
        for i, follow_up in enumerate(result1["follow_ups"], 1):
            print(f"   {i}. {follow_up}")
else:
    print(f"❌ Error: {result1['error']}")

print("\n" + "="*60)

In [None]:
# Demo conversation 2 - Follow-up question
result2 = chat_session.ask("Can you give me some examples?")

if "error" not in result2:
    print(f"\n🤖 Assistant: {result2['response']}")
    
    if result2["follow_ups"]:
        print("\n💡 Suggested follow-up questions:")
        for i, follow_up in enumerate(result2["follow_ups"], 1):
            print(f"   {i}. {follow_up}")
else:
    print(f"❌ Error: {result2['error']}")

print("\n" + "="*60)

### 📊 Conversation Analysis

In [None]:
# Analyze the conversation
session_info = chat_session.get_session_info()
history = chat_session.get_history()

print("📊 Conversation Analysis:")
print(f"   Session ID: {session_info['session_id']}")
print(f"   Total Messages: {session_info['message_count']}")
print(f"   Conversation Turns: {session_info['conversation_turns']}")

print("\n🔍 Full Conversation History:")
for i, exchange in enumerate(history, 1):
    role = "👤 User" if exchange["role"] == "user" else "🤖 Assistant"
    content = exchange["content"]
    if len(content) > 100:
        content = content[:100] + "..."
    print(f"   {i}. {role}: {content}")

## 🎓 Key Takeaways: Conversational RAG

### ✅ What You've Built

1. **Query Reformulation** - Context-aware query processing
2. **Memory Management** - Multiple strategies for conversation history
3. **Session Management** - Multi-user conversation handling
4. **Production Features** - Error handling, cleanup, monitoring

### 🧠 Memory Strategy Guide

| Strategy | Best For | Pros | Cons |
|----------|----------|------|------|
| **Sliding Window** | Short conversations, cost-conscious | Fast, predictable cost | Loses early context |
| **Summary Memory** | Long conversations, context preservation | Retains key information | More complex, LLM calls for summaries |
| **Full History** | Critical applications, short sessions | Complete context | Expensive for long conversations |

### 🚀 Production Recommendations

1. **Start Simple** - Use sliding window for most applications
2. **Monitor Usage** - Track conversation length and costs
3. **Session Cleanup** - Implement automatic session expiration
4. **Error Handling** - Graceful degradation when memory operations fail
5. **User Experience** - Provide follow-up suggestions to guide conversations

### 🔮 Next Steps

- **Multi-modal RAG** - Handle images, documents, and rich media
- **Production Deployment** - Scalability, monitoring, and enterprise features

You now have a production-ready conversational RAG system! 🎉