# Basic RAG with LangGraph

![Adaptive RAG](images/simple_rag.png)

In this notebook, we're going to walk through setting up a simple RAG workflow in LangGraph. 

Throughout this process, we're going to show how LangSmith and LangGraph Studio can be used to improve the developer experience for AI applications. We're also going to show how LangSmith enables you to make improvements to production applications with confidence, and how you can use LangSmith to make your application better in production.

## Part One: Datastore Setup

Let's start by loading our environment variables from our .env file. 

In [None]:
from dotenv import load_dotenv
load_dotenv(dotenv_path=".env", override=True)

We're going to have our agent research LangGraph documentation, so let's index the documentation and create a vector store. You can adjust the URLs to research anything you want.

In [None]:
### Build Index
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Set embeddings
embd = OpenAIEmbeddings()
# Docs to index
urls = [
    "https://langchain-ai.github.io/langgraph/",
    "https://langchain-ai.github.io/langgraph/tutorials/customer-support/customer-support/",
    "https://langchain-ai.github.io/langgraph/tutorials/chatbots/information-gather-prompting/",
    "https://langchain-ai.github.io/langgraph/tutorials/code_assistant/langgraph_code_assistant/",
    "https://langchain-ai.github.io/langgraph/tutorials/multi_agent/multi-agent-collaboration/",
    "https://langchain-ai.github.io/langgraph/tutorials/multi_agent/agent_supervisor/",
    "https://langchain-ai.github.io/langgraph/tutorials/multi_agent/hierarchical_agent_teams/",
    "https://langchain-ai.github.io/langgraph/tutorials/plan-and-execute/plan-and-execute/",
    "https://langchain-ai.github.io/langgraph/tutorials/rewoo/rewoo/",
    "https://langchain-ai.github.io/langgraph/tutorials/llm-compiler/LLMCompiler/",
    "https://langchain-ai.github.io/langgraph/concepts/high_level/",
    "https://langchain-ai.github.io/langgraph/concepts/low_level/",
    "https://langchain-ai.github.io/langgraph/concepts/agentic_concepts/",
    "https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/",
    "https://langchain-ai.github.io/langgraph/concepts/multi_agent/",
    "https://langchain-ai.github.io/langgraph/concepts/persistence/",
    "https://langchain-ai.github.io/langgraph/concepts/streaming/",
    "https://langchain-ai.github.io/langgraph/concepts/faq/"
]
# Load
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]
# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=200, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)
# Add to vectorstore
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=embd,
)
retriever = vectorstore.as_retriever(lambda_mult=0)

## Part Two: Prompt Setup

Let's design a prompt for RAG that we'll use throughout the notebook.

In [1]:
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

rag_prompt = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

Question: {question} 

Context: {context} 

Answer:"""
print("Prompt Template: ", rag_prompt)

llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

Prompt Template:  You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

Question: {question} 

Context: {context} 

Answer:


OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

## Part Two: Define the Graph

Let's define the State for our Graph. We'll track the user's question, our application's generation, and the list of relevant documents.

In [4]:
from langchain.schema import Document
from typing import List
from typing_extensions import TypedDict

class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        documents: list of documents
    """
    question: str
    generation: str
    documents: List[Document]

Great, now we're just going to set up two nodes:
1. retrieve_documents: Retrieves documents from our vector store
2. generate_response: Generates an answer from our documents

In [2]:
from langchain_core.messages import HumanMessage

def retrieve_documents(state: GraphState):
    """
    Retrieve documents
    Args:
        state (dict): The current graph state
    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE DOCUMENTS---")
    question = state["question"]
    # Retrieval
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}

def generate_response(state: GraphState):
    """
    Generate response
    Args:
        state (dict): The current graph state
    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    print("---GENERATE RESPONSE---")
    question = state["question"]
    documents = state["documents"]
    formatted_docs = "\n\n".join(doc.page_content for doc in documents)
    
    # RAG generation
    rag_prompt_formatted = rag_prompt.format(context=formatted_docs, question=question)
    generation = llm.invoke([HumanMessage(content=rag_prompt_formatted)])
    return {"documents": documents, "question": question, "generation": generation}

NameError: name 'GraphState' is not defined

Now that we've defined our vector store, State, and Nodes, let's put it all together and construct our RAG graph!

In [None]:
from langgraph.graph import StateGraph, START, END
from IPython.display import Image, display

rag_workflow = StateGraph(GraphState)
rag_workflow.add_node("retrieve_documents", retrieve_documents)
rag_workflow.add_node("generate_response", generate_response)
rag_workflow.add_edge(START, "retrieve_documents")
rag_workflow.add_edge("retrieve_documents", "generate_response")
rag_workflow.add_edge("generate_response", END)

rag_app = rag_workflow.compile()
display(Image(rag_app.get_graph().draw_mermaid_png()))

## Part Four: Testing Our App

Let's test it out and see how it works!

In [None]:
question = "Does LangGraph work with OSS LLMs?"
rag_app.invoke({"question": question})

That worked great!

In [3]:
question = "Does LangGraph work with Anthropic models?"
rag_app.invoke({"question": question})

NameError: name 'rag_app' is not defined

Let's test out how we do on a random question!