### Adaptive RAG
- Adaptive RAG is an enhanced Retrieval-Augmented Generation (RAG) technique where the system dynamically adjusts how it retrieves, selects, and uses knowledge based on the query, model confidence, and feedback signals.
- It adapts its retrieval + generation strategy in real time to improve accuracy, reduce hallucination, and optimize performance.

#### Simple Definition
Adaptive RAG = RAG that adjusts retrieval depth, number of documents, retrieval methods, reasoning style, and even tool use based on the complexity of the user query or model uncertainty.

#### Why Adaptive RAG?
Traditional RAG always retrieves a fixed number of documents (e.g., top-3).  
But:
- Some questions need more documents
- Some need none
- Some need web search
- Some need reasoning first, retrieval later
- Some need reranking or cross-encoder verification
Adaptive RAG automatically chooses what to do.

#### Key Features of Adaptive RAG
1. Dynamic Retrieval Depth
The system decides:
    - how many documents to retrieve,
    - whether to retrieve at all,
    - whether to re-retrieve.
Example:
    • If model uncertainty is high → increase retrieval (e.g., from 3 to 10 docs)
    • If uncertainty is low → skip retrieval (faster, cheaper)

2. Query Classification → Choose RAG strategy

3. Adaptive Filtering & Reranking
Uses:
    - cross-encoder reranking
    - vector + keyword hybrid search
    - confidence-based filtering

4. Multi-turn Retrieval (Iterative RAG)
If answer is incomplete:
    - Model says: “I need more info”
    - System retrieves new documents
    - LLM tries again
This is similar to Corrective RAG (CRAG) but broader.

5. Model Uncertainty Feedback Loop
Model evaluates its own answer:
    - If “confidence < threshold” → retrieve more documents
    - If “confidence > threshold” → answer immediately

6. Cost-Aware Adjustments
Adaptive RAG reduces cost by:
    - skipping retrieval when unnecessary
    - using smaller models for retrieval
    - escalating to larger LLM only for hard queries

#### How to Implement Adaptive RAG (LangChain Example)
Step 1: Classify Query  
if is_simple_query(query):
    top_k = 2
elif is_expert_query(query):
    top_k = 10
else:
    top_k = 5

Step 2: Confidence-Based Retry  
answer, confidence = llm_with_confidence(context, query)
if confidence < 0.7:
    context = retrieve_more_docs(query, top_k=10)
    answer = llm(context, query)

Step 3: Hybrid Retrieval  
results = merge(
    vector_store.search(query),
    bm25.search(query)
)

#### Summary
Adaptive RAG = RAG that adapts.
It dynamically modifies retrieval strategies, answer generation, and reasoning pathways based on query type, model uncertainty, and context needs.
This leads to:
- fewer hallucinations
- more accurate answers
- lower cost
- better performance


In [1]:
# Imports
from typing import Annotated, List, Sequence
import operator
from typing_extensions import TypedDict, Literal
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage, BaseMessage
from langgraph.graph import StateGraph, START, END, MessagesState
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from langgraph.checkpoint.memory import MemorySaver
from IPython.display import Image, display
import os
from dotenv import load_dotenv

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["TAVILY_API_KEY"] = os.getenv("TAVILY_API_KEY")

llm = ChatOpenAI(name="gpt-5-nano")
llm.invoke("What is machine learning?")
type(llm)

langchain_openai.chat_models.base.ChatOpenAI

In [3]:
# RAG imports
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
urls = ["https://docs.langchain.com/oss/python/langgraph/graph-api",
        "https://docs.langchain.com/oss/python/langgraph/functional-api"]

docs = WebBaseLoader(urls).load()
docs

[Document(metadata={'source': 'https://docs.langchain.com/oss/python/langgraph/graph-api', 'title': 'Graph API overview - Docs by LangChain', 'language': 'en'}, page_content='Graph API overview - Docs by LangChainSkip to main contentDocs by LangChain home pageLangChain + LangGraphSearch...⌘KGitHubTry LangSmithTry LangSmithSearch...NavigationGraph APIGraph API overviewLangChainLangGraphDeep AgentsIntegrationsLearnReferenceContributePythonOverviewLangGraph v1.0Release notesMigration guideGet startedInstallQuickstartLocal serverThinking in LangGraphWorkflows + agentsCapabilitiesPersistenceDurable executionStreamingInterruptsTime travelMemorySubgraphsProductionApplication structureStudioTestDeployAgent Chat UIObservabilityLangGraph APIsGraph APIGraph APIUse the graph APIFunctional APIRuntimeOn this pageGraphsStateGraphCompiling your graphStateSchemaMultiple schemasReducersDefault ReducerOverwriteWorking with Messages in Graph StateWhy use messages?Using Messages in your GraphSerializationM

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
doc_splits = text_splitter.split_documents(documents=docs)

vectorstore = FAISS.from_documents(documents=doc_splits, embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

In [None]:
# Data model - query analysis
class RouteQuery(BaseModel):
    """Route the user query to the most relevant datasource."""
    datasource: Literal["vectorstore", "web_search"] = Field(
        description="Given a user query, choose to route it to web search or vectorstore")

structured_llm_router = llm.with_structured_output(RouteQuery)

system_prompt = """You are an expert at routing the user question to a vectorstore or web search.
The vectorstore contains the documents related to graph API and functional API of LangGraph.
Use the vectorstore for questions on these topics. Otherwise use websearch"""
route_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{question}")
    ]
)

question_router = route_prompt | structured_llm_router

print(question_router.invoke({"question": "Who won the cricket world cup in 2023?"}))



datasource='web_search'


In [10]:
print(question_router.invoke({"question": "How to create a stategraph?"}))

datasource='vectorstore'


In [14]:
# Data model - Retrieval grader
class GradeDocuments(BaseModel):
    binary_score: str = Field(description="Documents are relevant to the question, 'yes' or 'no'")

structured_llm_grader = llm.with_structured_output(GradeDocuments)

system_prompt = """You are a grader assessing relevance of retrieved documents to a user question. \n
    If the documents contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
    It does not need to be stringent test. The goal is to filter out errorneous retrievals. \n
    Give a binary score 'yes' or 'no' to indicate whether the document is relevant to the question."""

grade_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "Retrieved document: \n\n {document} \n\n User question: {question}")
    ]
)

retrieval_grader = grade_prompt | structured_llm_grader

# Example invocation for grader
question = "agent memory"
docs = retriever.invoke(question)
doc_text = docs[1].page_content
print(retrieval_grader.invoke({"question": question, "document": doc_text}))

question = "StateGraph"
docs = retriever.invoke(question)
doc_text = docs[1].page_content
print(retrieval_grader.invoke({"question": question, "document": doc_text}))




binary_score='no'
binary_score='yes'


In [16]:
# Generate
prompt = PromptTemplate.from_template(template="""
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context}
Answer:
""")

generate_chain = prompt | llm | StrOutputParser()
generation = generate_chain.invoke({"question": question, "context": doc_text})

In [17]:
# Data model - Hallucination grader
class GradeHallucinations(BaseModel):
    """Binary score for hallucination present in the generated answer"""
    binary_score: str = Field(description="")

structured_llm_hall_grader = llm.with_structured_output(GradeHallucinations)

system_prompt = """You are a grader, assessing whether an LLM generation is grounded in/supported by a set of retrieved facts.
Give a binary score 'yes' or 'no'. 'yes' means that the answer is grounded in/supported by a set of retrieved facts."""

hall_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "Set of facts: \n\n {documents} \n\n LLM generation: {generation}")
    ]
)

hall_chain = hall_prompt | structured_llm_hall_grader

hall_chain.invoke({"documents": docs, "generation": generation})




GradeHallucinations(binary_score='yes')

In [18]:
# Data model - Answer grader
class GradeAnswer(BaseModel):
    """Binary score to assess answer addresses the question"""
    binary_score: str = Field(description="")

structured_llm_ans_grader = llm.with_structured_output(GradeAnswer)

system_prompt = """You are a grader, assessing whether an answer addresses / resolves the question.
Give a binary score 'yes' or 'no'. 'yes' means the answer addresses / resolves the question."""

ans_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "User question: \n\n {question} \n\n LLM generation: {generation}")
    ]
)

ans_chain = ans_prompt | structured_llm_ans_grader

ans_chain.invoke({"question": question, "generation": generation})




GradeAnswer(binary_score='yes')

In [19]:
# Question Rewrite

# Prompt
system_prompt = """
You are a question re-writer that converts an input question to a better version that is optimized by re-phrasing the question with the correct and complete sentence using web search. \n
 Look at the input and try to reason about the underlying semantic intent/meaning."""

re_write_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "Here is the question: {question} \n Re-write this question to make it complete and semantically correct and improve the question.")
    ],
)
rag_rewrite_chain = re_write_prompt | llm | StrOutputParser()
response = rag_rewrite_chain.invoke({"question": question})
print(response)

What is a StateGraph and how is it used in software development?


In [20]:
# Web search tool - Tavily
from langchain_community.tools.tavily_search import TavilySearchResults
web_search_tool = TavilySearchResults(k=3)
web_search_tool.invoke({"query": "Who is Sachin tendulkar?"})

  web_search_tool = TavilySearchResults(k=3)


[{'title': 'Sachin Tendulkar | Biography, Stats, Records, Age, Centuries ...',
  'url': 'https://www.britannica.com/biography/Sachin-Tendulkar',
  'content': 'Sachin Tendulkar (born April 24, 1973, Bombay [Mumbai], India) is a former Indian professional cricket player, considered by many to be one of the greatest batters of all time. He is the leading run scorer in both Test cricket and one-day internationals (ODIs) and the first cricketer to score 100 centuries (100 runs in a single innings) in international cricket. Often compared to Australian great Don Bradman, Tendulkar became known for his confident stroke play off both the front and back foot. [...] Sachin Tendulkar was India’s top run scorer in the 2011 World Cup-winning campaign. He considers winning the World Cup the crowning achievement of his career.\n\n### What were some of Sachin Tendulkar’s achievements in the Indian Premier League (IPL)?\n\nIn the IPL Sachin Tendulkar played for the Mumbai Indians, captaining the team f

In [21]:
# Graph state
class GraphState(TypedDict):
    """
    Represents the state of the graph.
    Attributes:
        question: question
        generation: generation
        documents: list of documents
    """
    question: str
    generation: str
    documents: List[str]

In [None]:
# Define the nodes as we saw in corrective RAG and create a workflow
# Invoke the workflow