
We would like to create Question-Answering system for my medium articles on GenAI using OpenAI LLM on local system.

For this, we shall create an adaptive RAG application using OpenAI API. We shall feed the documents in it. After the RAG pipeline is ready, we shall use the questions to get responses from the RAG pipeline and compare them with the ground truth answers.





![title](adaptive-rag.png)

So the steps are as follows:

1. [Retrieve Node] Ingest Tuhin's articles in ChromaDB2. Define the Router (when to get documents from ChromaDB and when to get it from web)
2. [Decision - Route Query] Define the Query analysis logic
3. [Grader Node] Define Retrieval Grader
4. [Generate Node] Write Generation logic
5. [Decision] - Hallucination and Answer Grader
6. [Rewrite Question Node]
7. [Web Search Node]
8. [Decision - Generate or Rewrite question]
9. Construct the Agentic Graph

# 1. [Retrieve Node] Ingest Tuhin's articles in ChromaDB

The following code blocks does the following
- Loads a set of blog articles from the web.
- Splits them into manageable text chunks.
- Converts each chunk into an embedding vector.
- Stores them in a vector DB (Chroma).
- Sets up a retriever to query these chunks later for RAG-based applications.

## 1.1. Imports

- WebBaseLoader: Used to fetch content from web pages.
- RecursiveCharacterTextSplitter: Splits large texts into smaller chunks while keeping logical structure.
- Chroma: A vector database to store and retrieve documents using embeddings.
- OpenAIEmbeddings: Generates vector representations of text using OpenAI’s embedding models.

In [1]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

USER_AGENT environment variable not set, consider setting it to identify your requests.


## 1.2. Set Embeddings

Sets up OpenAI embeddings (like text-embedding-ada-002) to convert text chunks into vectors for similarity search.

In [2]:
embd = OpenAIEmbeddings()

## 1.3. Specify Web URLs to Index

These are links to blog posts by me, likely technical articles on AI and transformers.

In [3]:
urls = [
    "https://medium.com/@tuhinsharma121/raptor-a-smarter-way-to-retrieve-and-use-information-in-ai-fd3cb68a6f2f",
    "https://medium.com/@tuhinsharma121/building-a-biomedical-question-answering-system-using-rag-and-openai-llm-b9c3502fd287",
    "https://medium.com/@tuhinsharma121/llamaindex-vs-langchain-vs-haystack-vs-llama-stack-a-comparative-analysis-6d03aaa1bc36",
    "https://medium.com/@tuhinsharma121/riding-multi-headed-dragons-decoding-the-power-of-multi-head-attention-in-transformers-7c9d18dc2b68",
    "https://medium.com/@tuhinsharma121/decoding-the-magic-of-transformers-a-deep-dive-into-input-embeddings-and-positional-encoding-a9a282f4e055"
]

## 1.4. Load Web Content

- For each URL, it uses WebBaseLoader to download and parse the content into Document objects.
- docs becomes a list of lists (one per URL), so it’s flattened into docs_list.

In [4]:
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

## 1.5. Split Text into Chunks

- Splits each document into 500-character chunks (no overlap).
- This helps avoid context limits in LLMs and makes retrieval more granular.

In [5]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=500, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

## 1.6. Create Vector Store

- Converts each chunk into a vector using OpenAI embeddings.
- Stores those vectors in a Chroma DB collection named "tuhin-blogs".

In [6]:
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="tuhin-blogs",
    embedding=embd,
)

## 1.7. Create Retriever

Creates a retriever object that can now be used to search for relevant chunks using similarity-based lookup during question answering.

In [7]:
retriever = vectorstore.as_retriever()

## 1.8 Create Retriever function

This function retrieve(state) is designed to retrieve documents based on a question from the state dictionary. 

In [8]:
def retrieve(state):
    """
    Retrieve documents

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE---")
    question = state["question"]

    # Retrieval
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}
state = {"question": "What is RAPTOR?"}
retrieve(state)

---RETRIEVE---


{'documents': [Document(metadata={'description': 'Large language models (LLMs) like ChatGPT and GPT-4 are incredibly powerful, but they struggle to keep up with new information and understand long, complex documents. Traditional retrieval methods…', 'language': 'en', 'source': 'https://medium.com/@tuhinsharma121/raptor-a-smarter-way-to-retrieve-and-use-information-in-ai-fd3cb68a6f2f', 'title': 'RAPTOR: A Smarter Way to Retrieve and Use Information in AI | by Tuhin Sharma | Mar, 2025 | Medium'}, page_content='RAPTOR: A Smarter Way to Retrieve and Use Information in AI | by Tuhin Sharma | Mar, 2025 | MediumOpen in appSign upSign inWriteSign upSign inHomeFollowingLibraryYour listsSaved listsHighlightsReading historyStoriesStatsRAPTOR: A Smarter Way to Retrieve and Use Information in AITuhin Sharma·Follow8 min read·Mar 10, 2025--ListenShareGenerated on MidjourneyIntroductionLarge language models (LLMs) like ChatGPT and GPT-4 are incredibly powerful, but they struggle to keep up with new in

# 2. [Decision - Route Query] Define the Query analysis logic

This code is an example of LLM-powered decision routing using structured output and LangChain. It:
- Defines two routes: vectorstore and web_search
- Instructs the LLM to choose the right one based on topic
- Leverages OpenAI’s function-calling to enforce a fixed schema
- Demonstrates how to use prompts to implement logic without writing if-else code

## 2.1. Imports Required Modules

Brings in typing support (Literal), data modeling (BaseModel), and LangChain components for prompt engineering and LLM interaction.

In [9]:
from typing import Literal

from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

## 2.2. Define Output Schema with Pydantic

The data model `RouteQuery` returns `datasource='web_search'` or `datasource='vectorstore'` based on the input question. Note the `system_prompt` includes the logic of the routing. 

In [10]:
class RouteQuery(BaseModel):
    """Route a user query to the most relevant datasource."""

    datasource: Literal["vectorstore", "web_search"] = Field(
        ...,
        description="Given a user question choose to route it to web search or a vectorstore.",
    )

## 2.3. Set Up the LLM with Structured Output

Uses OpenAI’s GPT-3.5-turbo-0125 model with deterministic output (temperature=0).  
Configures the LLM to follow the structure defined in RouteQuery using the function-calling paradigm, a newer method of constrained output generation.

In [11]:
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_router = llm.with_structured_output(RouteQuery,method="function_calling")

## 2.4. Define a System Prompt for Routing Logic

This is the instructions for the LLM: use vectorstore for specific technical topics and web_search for anything else.

In [12]:
system_prompt = """You are an expert at routing a user question to a vectorstore or web search.
The vectorstore contains documents related to RAPTOR, llamastack,llamaindex, langchain,haystack and transformer architecture.
Use the vectorstore for questions on these topics. Otherwise, use web-search."""

## 2.5. Create a Chat Prompt Template

Defines a 2-turn conversation structure: system gives instructions, human asks a question.

In [13]:
route_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{question}"),
    ]
)

## 2.6. Create a Router Chain

Chains the prompt and the structured output LLM together into a single callable object.

In [14]:
question_router = route_prompt | structured_llm_router

## 2.7. Test the Router

In [15]:
print(question_router.invoke({"question": "Who will win IPL 2025?"}))
print(question_router.invoke({"question": "What are the various components of transformer architecture?"}))

datasource='web_search'
datasource='vectorstore'


- First question: “Who will win IPL 2025?”  
    - Not related to technical topics → output: {"datasource": "web_search"}  
- Second question: “What are the various components of transformer architecture?”  
    - Matches the vectorstore topic list → output: {"datasource": "vectorstore"}  

## 2.8. Create Router Node

In [16]:
def route_question(state):
    """
    Route question to web search or RAG.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("---ROUTE QUESTION---")
    question = state["question"]
    source = question_router.invoke({"question": question})
    if source.datasource == "web_search":
        print("---ROUTE QUESTION TO WEB SEARCH---")
        return "web_search"
    elif source.datasource == "vectorstore":
        print("---ROUTE QUESTION TO RAG---")
        return "vectorstore"
        
state = {"question": "What is RAPTOR?"}
route_question(state)

---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---


'vectorstore'

# 3. [Grader Node] Define Retrieval Grader

## 3.1. Define Output Schema

This is a Pydantic data model used to constrain and structure the LLM’s output.  
It expects a single field binary_score with the value 'yes' or 'no'.

In [17]:
class GradeDocuments(BaseModel):
    """Binary score for relevance check on retrieved documents."""

    binary_score: str = Field(
        description="Documents are relevant to the question, 'yes' or 'no'"
    )

## 3.2. Set Up the LLM with Function Calling

Initializes the OpenAI chat model (gpt-3.5-turbo-0125) with temperature=0 for deterministic outputs.  
Wraps it using with_structured_output, enabling function calling to return results in the format defined by GradeDocuments.

In [18]:
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeDocuments,method="function_calling")

## 3.3. Create a Grading Prompt

This system message instructs the LLM to act as a binary grader:
- Mark a document 'yes' if it’s loosely relevant to the user question.
- Mark 'no' if it’s not relevant.
- Emphasis is on filtering out erroneous retrievals rather than perfect matching.

In [19]:
system = """You are a grader assessing relevance of a retrieved document to a user question. \n 
    If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
    It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""

Then, the prompt template is created using:

In [20]:
grade_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),
    ]
)

This combines:
- the system message, and
- a user message that plugs in a document and a question.

In [21]:
retrieval_grader = grade_prompt | structured_llm_grader

## 3.5. Run the Grader

A user question is defined: "agent memory".  
A document is retrieved using retriever.invoke(question).  
The second document (docs[1]) is selected and its content is extracted.  
The grader pipeline is invoked with the question + document.  
It prints whether the document is relevant (‘yes’) or not (‘no’).  

In [22]:
question = "agent memory"
docs = retriever.invoke(question)
doc_txt = docs[1].page_content
print(retrieval_grader.invoke({"question": question, "document": doc_txt}))

binary_score='no'


## 3.6. Create Grader Node

In [23]:
def grade_documents(state):
    """
    Determines whether the retrieved documents are relevant to the question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with only filtered relevant documents
    """

    print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
    question = state["question"]
    documents = state["documents"]

    # Score each doc
    filtered_docs = []
    for d in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": d.page_content}
        )
        grade = score.binary_score
        if grade == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")
            continue
    return {"documents": filtered_docs, "question": question}

from langchain_core.documents.base import Document
state = {"question": "What is RAPTOR?", 'documents':[Document(metadata={'description': 'Large language models (LLMs) like ChatGPT and GPT-4 are incredibly powerful, but they struggle to keep up with new information and understand long, complex documents. Traditional retrieval methods…', 'language': 'en', 'source': 'https://medium.com/@tuhinsharma121/raptor-a-smarter-way-to-retrieve-and-use-information-in-ai-fd3cb68a6f2f', 'title': 'RAPTOR: A Smarter Way to Retrieve and Use Information in AI | by Tuhin Sharma | Mar, 2025 | Medium'}, page_content='1 created summaries/clusters: 1Generating embeddings for level 2.Performing clustering for level 2.Generating summaries for level 2 with 1 clusters.Level 2 created summaries/clusters: 1Step 8: Query RAPTOR for AnswersThis retrieves the most relevant answer from the structured data. This step is heavily influenced by how we want to retrieve the documents from the vector store. There are basically two options.Credit : Parth Sarthi et alCollapsed tree collapses the tree into a single layer and retrieves nodes until a threshold number of tokens is reached, based on cosine similarity to the query vector. The nodes on which cosine similarity search is performed are highlighted in both illustrations.nodes = raptor_pack.run(    "What baselines is InvestLM compared against?", mode="collapsed")print(len(nodes))print(nodes[0].text)2Baselines. We compare InvestLM with three state-of-the-art commercial models, GPT-3.5, GPT-4and Claude-2. OpenAI’s GPT-3.5 and GPT-4 arelarge language models tuned with reinforcementlearning from human feedback (RLHF) (Ouyanget al., 2022). Anthropic’s Claude-2 is a large lan-guage model that can take up to 100K tokens in theuser’s prompt. 3 Responses from all baselines aresampled throughout August 2023.We manually write 30 test questions that arerelated to financial markets and investment. Foreach question, we generate a single response fromInvestLM and the three commercial models. Wethen ask the financial experts to compare InvestLMresponses to each of the baselines and label whichresponse is better or whether neither response issignificantly better than the other.In addition to the expert evaluation, we also con-duct a GPT-4 evaluation, following the same pro-tocol used in (Zhou et al., 2023). Specifically, wesend GPT-4 with exactly the same instructions anddata annotations, and ask GPT-4 which response isbetter or whether neither response is significantlybetter than the other. The expert evaluation inter-face and GPT-4 evaluation prompt are presented inAppendix B.The expert evaluation and GPT-4 evaluationresults are presented in Figure 1 and Figure')]}
grade_documents(state)

---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---


{'documents': [Document(metadata={'description': 'Large language models (LLMs) like ChatGPT and GPT-4 are incredibly powerful, but they struggle to keep up with new information and understand long, complex documents. Traditional retrieval methods…', 'language': 'en', 'source': 'https://medium.com/@tuhinsharma121/raptor-a-smarter-way-to-retrieve-and-use-information-in-ai-fd3cb68a6f2f', 'title': 'RAPTOR: A Smarter Way to Retrieve and Use Information in AI | by Tuhin Sharma | Mar, 2025 | Medium'}, page_content='1 created summaries/clusters: 1Generating embeddings for level 2.Performing clustering for level 2.Generating summaries for level 2 with 1 clusters.Level 2 created summaries/clusters: 1Step 8: Query RAPTOR for AnswersThis retrieves the most relevant answer from the structured data. This step is heavily influenced by how we want to retrieve the documents from the vector store. There are basically two options.Credit : Parth Sarthi et alCollapsed tree collapses the tree into a single 

# 4. [Generate Node] Write Generation logic

## 4.1. Import Modules

- hub.pull: Fetches a pre-defined prompt template from LangChain Hub.
- StrOutputParser: Used to convert the LLM response into a plain string.

In [24]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser

# 4.2. Define Prompt

Pulls a community-defined RAG-style prompt template (likely expects context and question as input keys).

In [25]:
prompt = hub.pull("rlm/rag-prompt")

## 4.3. LLM

Sets up an OpenAI chat model with deterministic output (temperature=0).

In [26]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

## 4.4. Create RAG Chain:

Composes a chain where:
 - The prompt is filled with context and question.
 - The LLM generates an answer.
 - The result is parsed to a clean string.


In [27]:
rag_chain = prompt | llm | StrOutputParser()

## 4.5. Test Run

Runs the chain with a test question: "agent memory" and a docs variable (not defined in the snippet but assumed to be a list of retrieved documents with .page_content attributes).

In [28]:
question = "agent memory"
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

I don't know.


## 4.6. Create Generate Node

This function is a reusable wrapper for the RAG process. It:  
Takes a state dictionary with:  
 - "question": the query string.  
 - "documents": the list of documents.  

Passes them to the rag_chain.  
Returns an updated state that includes:  
 - Original question  
 - Original documents  
 - Generated answer under the "generation" key.  

In [29]:
def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    print("---GENERATE---")
    question = state["question"]
    documents = state["documents"]

    # RAG generation
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {"documents": documents, "question": question, "generation": generation}

# 5. [Decision] - Hallucination and Answer Grader

1. **Hallucination Detection** — Is the answer grounded in the provided documents?
2. **Answer Relevance** — Does the answer address the user's question?

We'll use OpenAI's `gpt-3.5-turbo` model and leverage **structured output** via Pydantic-based schemas to grade the responses.


## 5.1. Define a Binary Scoring Schema for Hallucinations


We use a Pydantic `BaseModel` to define the output format of our hallucination grader.

- If the generation is **grounded in the facts**, we return `"yes"`.
- If not, we return `"no"`.

This schema helps the LLM return consistent, structured outputs.

In [30]:
class GradeHallucinations(BaseModel):
    """Binary score for hallucination present in generation answer."""

    binary_score: str = Field(
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )

## 5.2. Setup the LLM Grader with Structured Output

We instantiate a ChatOpenAI model and configure it to return structured responses conforming to the GradeHallucinations schema.

In [31]:
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeHallucinations)



## 5.3. Create a Prompt Template for Hallucination Grading

This prompt asks the LLM to decide if the generated answer is supported by the set of retrieved documents.

In [32]:
system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n 
     Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""
hallucination_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Set of facts: \n\n {documents} \n\n LLM generation: {generation}"),
    ]
)

## 5.4. Build the Hallucination Grader Chain

We pipe the hallucination prompt to the structured LLM grader.

In [33]:
hallucination_grader = hallucination_prompt | structured_llm_grader

## 5.5. Define a Binary Scoring Schema for Answer Quality

We now evaluate whether the generated answer addresses the user’s question.

In [34]:
class GradeAnswer(BaseModel):
    """Binary score to assess answer addresses question."""

    binary_score: str = Field(
        description="Answer addresses the question, 'yes' or 'no'"
    )

## 5.6. Create a Prompt and Grader for Answer Relevance

Just like with hallucinations, we set up a structured LLM grader and prompt for determining if the answer is useful.

In [35]:
system = """You are a grader assessing whether an answer addresses / resolves a question \n 
     Give a binary score 'yes' or 'no'. Yes' means that the answer resolves the question."""
answer_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "User question: \n\n {question} \n\n LLM generation: {generation}"),
    ]
)
answer_grader = answer_prompt | structured_llm_grader

## 5.7. Grade Answer for Both Groundedness and Relevance

The following function performs dual grading:
 - First, it checks whether the answer is grounded in the documents (i.e., not hallucinated).
 - If yes, it then checks if the answer is useful and addresses the original question.

Returns one of:
 - "useful": if answer is grounded and addresses the question.
 - "not useful": if grounded but irrelevant.
 - "not supported": if hallucinated.

In [36]:
def grade_generation_v_documents_and_question(state):
    """
    Determines whether the generation is grounded in the document and answers question.

    Args:
        state (dict): The current graph state

    Returns:
        str: Decision for next node to call
    """

    print("---CHECK HALLUCINATIONS---")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    score = hallucination_grader.invoke(
        {"documents": documents, "generation": generation}
    )
    grade = score.binary_score

    # Check hallucination
    if grade == "yes":
        print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
        # Check question-answering
        print("---GRADE GENERATION vs QUESTION---")
        score = answer_grader.invoke({"question": question, "generation": generation})
        grade = score.binary_score
        if grade == "yes":
            print("---DECISION: GENERATION ADDRESSES QUESTION---")
            return "useful"
        else:
            print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
            return "not useful"
    else:
        pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
        return "not supported"

# 6. [Rewrite Question Node] 

## 6.1 Load the LLM

We initialize a Chat-based Large Language Model (gpt-3.5-turbo-0125) with zero temperature to ensure deterministic, repeatable outputs.

In [37]:
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)

## 6.2. Define the Prompt Template

- We define a system message guiding the LLM to act as a question re-writer.
- The human message template takes in a {question} and instructs the model to improve it.
- ChatPromptTemplate.from_messages is used to structure the multi-turn chat prompt for the model.

In [38]:
system = """You a question re-writer that converts an input question to a better version that is optimized \n 
     for vectorstore retrieval. Look at the input and try to reason about the underlying semantic intent / meaning."""
re_write_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        (
            "human",
            "Here is the initial question: \n\n {question} \n Formulate an improved question.",
        ),
    ]
)

## 6.3. Compose the Rewriting Chain

- This creates a LangChain pipeline: Prompt → LLM → Output Parser
- The StrOutputParser() ensures we extract clean text from the LLM response.
- We use invoke() to trigger the rewriting logic with the original question.

In [39]:
question_rewriter = re_write_prompt | llm | StrOutputParser()
question_rewriter.invoke({"question": question})

"What is the role of memory in an agent's functioning?"

## 6.4. Define a Transform Function

This function is useful in retrieval-augmented generation (RAG) pipelines:
- Input: A dictionary state containing the original question and documents.
- Action: Rewrites the question using the question_rewriter chain.
- Output: Returns a new state with the same documents but an improved question.


In [40]:
def transform_query(state):
    """
    Transform the query to produce a better question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates question key with a re-phrased question
    """

    print("---TRANSFORM QUERY---")
    question = state["question"]
    documents = state["documents"]

    # Re-write question
    better_question = question_rewriter.invoke({"question": question})
    return {"documents": documents, "question": better_question}

# 7. [Web Search Node]

## 7.1. Web Search with LangChain using TavilySearchResults

we'll demonstrate how to use the `TavilySearchResults` tool from LangChain to perform real-time web searches. This is useful when building retrieval-augmented generation (RAG) applications that need fresh or dynamic content from the internet.



In [41]:
from langchain_community.tools.tavily_search import TavilySearchResults

web_search_tool = TavilySearchResults(k=3)

## 7.2. Define the Search Function

We'll define a `web_search` function that accepts a question and fetches top search results, which can later be used in downstream tasks like question answering.

In [42]:
def web_search(state):
    """
    Web search based on the re-phrased question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with appended web results
    """

    print("---WEB SEARCH---")
    question = state["question"]

    # Web search
    docs = web_search_tool.invoke({"query": question})
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)

    return {"documents": web_results, "question": question}

# 8. [Decision - Generate or Rewrite question]

Determines whether to generate an answer, or re-generate a question.

In [43]:
def decide_to_generate(state):
    """
    Determines whether to generate an answer, or re-generate a question.

    Args:
        state (dict): The current graph state

    Returns:
        str: Binary decision for next node to call
    """

    print("---ASSESS GRADED DOCUMENTS---")
    state["question"]
    filtered_documents = state["documents"]

    if not filtered_documents:
        # All documents have been filtered check_relevance
        # We will re-generate a new query
        print(
            "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---"
        )
        return "transform_query"
    else:
        # We have relevant documents, so generate answer
        print("---DECISION: GENERATE---")
        return "generate"

# 9. Construct the Agentic Graph

## 9.1. Imports and Type Definition

1. StateGraph is a class that lets you build a flow between different “states” or nodes.
2. START and END are markers for the start and end of the workflow.
3. GraphState is a TypedDict that defines the data passed between nodes:
 - question: input query
 - generation: LLM-generated answer
 - documents: list of retrieved or searched documents

In [44]:
from langgraph.graph import END, StateGraph, START

from typing import List

from typing_extensions import TypedDict


class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        documents: list of documents
    """

    question: str
    generation: str
    documents: List[str]
    

## 9.2. Create a StateGraph Instance

This initializes the graph with the GraphState schema — all nodes will share and update this state.

In [45]:
workflow = StateGraph(GraphState)

## 9.3. Add Nodes

Each node represents a function (assumed to be defined elsewhere) that operates on the GraphState. These nodes probably do the following:  
	•	web_search: Search the web for relevant content.  
	•	retrieve: Pull documents from a vector store or database.  
	•	grade_documents: Score the quality/relevance of the retrieved docs.  
	•	generate: Use LLM to generate an answer.  
	•	transform_query: Refine or rewrite the query if needed.  

In [46]:
workflow.add_node("web_search", web_search)  # web search
workflow.add_node("retrieve", retrieve)  # retrieve
workflow.add_node("grade_documents", grade_documents)  # grade documents
workflow.add_node("generate", generate)  # generatae
workflow.add_node("transform_query", transform_query)  # transform_query

<langgraph.graph.state.StateGraph at 0x10e7b5a10>

## 9.4. Add Conditional Edges (Decision-Making)

route_question is a function (defined elsewhere) that decides where to route the question first — either to "web_search" or "retrieve" — based on some logic.

In [47]:
workflow.add_conditional_edges(
    START,
    route_question,
    {
        "web_search": "web_search",
        "vectorstore": "retrieve",
    },
)
workflow.add_edge("web_search", "generate")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "transform_query": "transform_query",
        "generate": "generate",
    },
)
workflow.add_edge("transform_query", "retrieve")
workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "not supported": "generate",
        "useful": END,
        "not useful": "transform_query",
    },
)

<langgraph.graph.state.StateGraph at 0x10e7b5a10>

## 9.5. Compile the Graph

Final step to compile the graph into a runnable application (app) that you can now call with inputs.

In [48]:
app = workflow.compile()

# Example - The answer is not present in vectordb

In [49]:
from pprint import pprint

# Run
inputs = {
    "question": "what is the share price of IBM?"
}
for output in app.stream(inputs):
    for key, value in output.items():
        # Node
        pprint(f"Node '{key}':")
        # pprint(value, indent=2, width=80, depth=None)
    pprint("\n---\n")

# Final generation
pprint(value["generation"])

---ROUTE QUESTION---
---ROUTE QUESTION TO WEB SEARCH---
---WEB SEARCH---
"Node 'web_search':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
"Node 'generate':"
'\n---\n'
'The share price of IBM is $251.33.'


# Example - The answer not present in vectordb

In [50]:
# Run
inputs = {"question": "what is the full form of RAPTOR?"}
for output in app.stream(inputs):
    for key, value in output.items():
        # Node
        pprint(f"Node '{key}':")
        # pprint(value, indent=2, width=80, depth=None)
    pprint("\n---\n")

# Final generation
pprint(value["generation"])

---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---
---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---TRANSFORM QUERY---
"Node 'transform_query':"
'\n---\n'
---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS G