# Agentic RAG using LlamaIndex

## What is LlamaIndex?
- Initially it was developed specifically for a Retrival Augmented Generation use case.  Now it has evolved and is used for AI Agents as well.
- Complete toolkit for context-augmented LLM applications.
- Main Components of LlamaIndex:
    - Data connectors (ingest and format existing data)
    - Data Indexes (structure and store data to be consumed by LLM)
    - Engines (Chat engine and Query engine)
    - Agents (simple tools, helper function to API integrations)
    - Observability / Evaluations
    - Workflows (event-driven or graph-based system)
- What makes it special?
    - Easy document parsing using LlamaParse
    - Many ready-to-use components
    - Simple and clear workflow system
    - LlamaHub (3rd party ready-to-use tools)


## Why moving away from SmolAgent?
- SmolAgent is a minimalistic library to create coding and tool calling agent.
- Great for creating simple agents but becomes complicated for a complex or multiple task use case.
- Single Agent uses long context and high token as well as prone to hallucination during complex reasoning tasks.
- Multiple Agent overcomes the above issues, but the library lacks flexibility, # of out of box tools available and is not scalable.

## What is RAG?
- It is also known as grounded generation.
- RAG is a technique extremely useful for creating chatbots.
- It only provides relevant information to LLM to answer the user's query leading to better, faster, cheaper, and more relevant information.
- Generic RAG Flow
	- User asks query -> LLM looks at vector database and retrieves relevant information -> LLM makes a decision based on the retrieved information. -> LLM sends the information to the user.

## How is it different from Agentic-RAG?
- sometime one pass might not be enough to answer user's query and need to go through multiple passes. (like ReACT pattern)
- Traditional RAG has no access to external tools, might limit it's capabilities to get enough information to make a complex decision.
- So we can conclude RAG system is an agent like memory, tools, reason, plan and external tools as well as a query engine as a tool.
- Agentic RAG Flow
	- User Asks query -> LLM looks at does it have enough information to answer the query -> if not, it will look at different tools it has access and try to get the information -> LLM will look at the retrieved information and make a decision if enough information is available. -> if not, it will make a modification to the query and try again till it gets the information. -> then sends information to the user.

## Why create RAG-based Agent
- Reduction in hallucination
- Better memory management
- Updated knowledge base of llm

## Using LLM-as-a-judge using LangFuse for any LLM application
- Using a large LLM to review the responses generated by the agents and evaluate the quality of the responses.
- LangFuse supports Ragas library Evolution metrics out-of-box.

### Important Metrics
- Hallucinations
- Trustworthiness
- Relevance
- Correctness and completeness
- Efficiency (token/time)


## Important libraries for simple and Agentic RAG from LlamaIndex
- llama-index
- llama-index-vector-stores-chroma
- llama-index-embeddings-openai
- llama-index-llms-openai

## Part 0: Setup foundation

### 0.1 Setup environment and required paths

In [None]:
# setup your environment variables
import os

from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY")
BRAVE_SEARCH_API_KEY = os.getenv("BRAVE_SEARCH_API_KEY")
print("Environment variables set")

In [None]:
# setup path
from pathlib import Path

try:  # inside a script
	BASE_DIR = Path(__file__).resolve().parent.parent
except NameError:  # inside a notebook
	BASE_DIR = Path.cwd().parent
pdf_path = BASE_DIR / "data" / "the-state-of-ai.pdf"

### 0.2 Document preparation

In [None]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

In [None]:
# load documents
reader = SimpleDirectoryReader(input_files=[pdf_path])
documents = reader.load_data()
print(f"Number of documents loaded: {len(documents)}")

In [None]:
# split document into chunks
splitter = SentenceSplitter(chunk_size=200, chunk_overlap=0)
nodes = splitter.get_nodes_from_documents(documents)

### 0.3 Arize Phoenix setup

In [None]:
from phoenix.otel import register

tracer_provider = register(
	project_name="default",
	endpoint="https://app.phoenix.arize.com/s/tejas-er/v1/traces",
	auto_instrument=True,
)

In [None]:
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor

LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)

## Part 1—Simple RAG System

In [None]:
# loading the necessary libraries
from llama_index.core import VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

In [None]:
# setup LLM and Embedding model
Settings.llm = OpenAI(
	model="gpt-4.1-mini", api_key=OPENAI_API_KEY, temperature=0, verbose=False
)
Settings.embed_model = OpenAIEmbedding(
	model="text-embedding-3-small", api_key=OPENAI_API_KEY
)

In [None]:
# create vector index
vector_index = VectorStoreIndex(nodes)

In [None]:
# create a query engine
query_engine = vector_index.as_query_engine()

### 1.1 Inspecting the vector stores

In [None]:
# set up vector store to access it directly
vector_store = vector_index.vector_store

In [None]:
# get embedding dictionary and node dictionary
embedding_dict = vector_store.data.embedding_dict
node_dict = vector_store.data.text_id_to_ref_doc_id

In [None]:
print(f"Number of embeddings: {len(embedding_dict)}")
print(f"Number of node references: {len(node_dict)}")
print(f"Embedding dimension: {len(list(embedding_dict.values())[0])}")

### 1.2 Asking question to RAG system

In [None]:
# query vector store
response = query_engine.query("Who is Lareina Yee?")

In [None]:
response.response

In [None]:
print(len(response.source_nodes))

### 1.3 Checking if the response makes sense

In [None]:
# print out relevant source nodes
print("Relevant source nodes:")
print("-" * 50)
for idx, node in enumerate(response.source_nodes):
	print(f"Node {idx + 1}")
	print(f"Score: {node.score}")
	print(f"Text: {node.text}")
	print(f"Metadata: {node.metadata}")
	print("*" * 50)

## Part 2 - Agentic RAG

### 2.1 Setup vector and summary index

In [None]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

### 2.2 Create vector query engine and summary query engine

In [None]:
# summary query engine
summary_query_engine = summary_index.as_query_engine(
	response_mode="tree_summarize", use_async=True
)

In [None]:
# vector query engine
from llama_index.core.response_synthesizers import ResponseMode

vector_query_engine = vector_index.as_query_engine(
	response_mode=ResponseMode.COMPACT, use_async=True, top_k=3
)

### 2.3 Convert the vectors and summary query engine into tools

In [None]:
from llama_index.core.tools import QueryEngineTool

summary_tool = QueryEngineTool.from_defaults(
	query_engine=summary_query_engine,
	description="Useful when you need to answer questions related to the summary of the document.",
)
vector_tool = QueryEngineTool.from_defaults(
	query_engine=vector_query_engine,
	description="Useful when you need to answer specific questions from the document.",
)

### 2.4 Create a superset query to manage both query engines

In [None]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
	selector=LLMSingleSelector.from_defaults(),
	query_engine_tools=[summary_tool, vector_tool],
	verbose=False,
)

### 2.5 Testing if routing works

In [None]:
response = query_engine.query("Who is Lareina yee?")

In [None]:
print(response.response)

In [None]:
response = query_engine.query("What is the summary of the document?")

In [None]:
print(response.response)

### 2.6 Convert the query engine into a tool

In [None]:
# create a tool wrapper around
query_engine_tool = QueryEngineTool.from_defaults(
	query_engine=query_engine,
	name="state_of_ai_rag_tool",
	description="Answer's question on the McKinsey 2025 State of AI Report",
)

### 2.7 Define system prompt for final Agentic


In [None]:
system_prompt = """
You are an expert AI assistant trained on the McKinsey report **'The State of AI – March 2025'**.
You answer user questions only using the information contained in this report.
### Available Tools
You have access to two tools:
1. **SummaryTool**
   - Use this when the query asks for:
     - High-level insights or executive summaries
     - Trends, survey findings, or general understanding
     - Broad takeaways or conceptual analysis
   - Example triggers:
     - "Summarize key findings on GenAI adoption"
     - "What are the main themes of the report?"
2. **VectorTool**
   - Use this when the query asks for:
     - Specific data, statistics, or exhibit-based evidence
     - Organizational practices, metrics, or concrete examples
   - Example triggers:
     - "What percentage of companies track AI KPIs?"
     - "Who is responsible for AI governance in large firms?"
### Tool Selection Logic
Before answering:
1. Identify whether the query requires **broad synthesis** or **specific factual retrieval**.
2. Select and call the correct tool accordingly:
   - For high-level or conceptual queries → `use_tool("SummaryTool")`
   - For factual, data-based, or detailed queries → `use_tool("VectorTool")`
3. Use **only one tool per query** unless explicitly required otherwise.
### Response Construction Rules
- **Grounding:** Only use information from the McKinsey report.
  If the question falls outside the report, respond:
  _"The report does not contain that information."_
- **Precision:** When citing data, mention specific numbers or exhibit insights when available.
- **Clarity:** Use concise, structured paragraphs or bullet points.
- **Insight:** Always explain the meaning or implication of findings, not just raw facts.
### Output Style
- Confident, factual, and grounded in the report.
- Well-organized and readable.
- Do **not** speculate or generate content outside the report context.
### Examples
**use_tool("SummaryTool")**
- “How are companies restructuring to adopt GenAI?”
- “What does the report say about workforce reskilling?”
**use_tool("VectorTool")**
- “What percentage of companies have a GenAI roadmap?”
- “Which departments lead AI strategy execution?”
""".strip()

In [None]:
from llama_index.core.agent.workflow import AgentWorkflow

query_engine_agent = AgentWorkflow.from_tools_or_functions(
	tools_or_functions=[query_engine_tool],
	system_prompt=system_prompt,
	llm=Settings.llm,
)

In [None]:
question = "Who is Lareina Yee according to the document? Where is she mentioned in the document and in what context?"
response = await query_engine_agent.run(question)

In [None]:
print(response)

## Part 3- Augment the agent with LlamaHub tools

In [None]:
# importing LlamaHub tools
from llama_index.tools.brave_search import BraveSearchToolSpec
from llama_index.tools.arxiv import ArxivToolSpec
from llama_index.tools.wikipedia import WikipediaToolSpec

In [None]:
# setting up tools
arxiv_tool = ArxivToolSpec()
arxiv_tools = arxiv_tool.to_tool_list()
brave_search_tool = BraveSearchToolSpec(api_key=BRAVE_SEARCH_API_KEY)
brave_search_tools = brave_search_tool.to_tool_list()

In [None]:
wikipedia_tool = WikipediaToolSpec()
wikipedia_tools = wikipedia_tool.to_tool_list()

In [None]:
enhanced_tools = [query_engine_tool]
enhanced_tools.extend(brave_search_tools)
enhanced_tools.extend(arxiv_tools)
enhanced_tools.extend(wikipedia_tools)

### 3.1 Enhanced agent

In [None]:
new_system_prompt = """
You are an AI research assistant with access to:
1. The state of ai report 2025 by McKinsey
2. Web search using brave search
3. Arxiv research paper search
4. Wikipedia search
Use these tools to provide comprehensive, well-researched and accurate answers to user's questions.  When discussing AI trends, combine insight from the mckinsey report with recent research and web findings.
""".strip()

In [None]:
enhanced_agent = AgentWorkflow.from_tools_or_functions(
	tools_or_functions=enhanced_tools, llm=Settings.llm, system_prompt=new_system_prompt
)

In [None]:
# Let us test the enhanced agent
question_1 = """According to the mckinsey report, what are the main organizational changes companies are making for AI agents.  Can you please search for recent research papers on AI governance and organizational transformation to provide additional information""".strip()
print("Question 1. Organizational changes and governance")
print("=" * 50)
response_1 = await enhanced_agent.run(question_1)
print(response_1)
print("*" * 50 + "\n")

In [None]:
question_2 = """What does McKinsey report say about workflow redesign for AI implementation? Search Arxiv for papers on business process automation with AI and find current web articles about workflow transformations.""".strip()
print("Question 2. Workflow redesign and Implementation")
print("=" * 50)
response_2 = await enhanced_agent.run(question_2)
print(response_2)
print("*" * 50 + "\n")

In [None]:
question_3 = """Based on the McKinsey report, what are the key risks organizations are addressing with gen ai? Can you search the web for recent academic research on AI risk mitigation and compare with the report's findings?
""".strip()
print("Question 3. Risk management")
print("=" * 50)
response_3 = await enhanced_agent.run(question_3)
print(response_3)
print("*" * 50 + "\n")

In [None]:
question_4 = """
Who is Lareina Yee in the McKinsey report and what are her views on AI's workforce impact?
After finding the information about her from the document, please:
1. search the web using brave search for recent articles, interviews or news about Lareina Yee and her work on AI
2. search arxiv for papers about her work on AI and find current web articles about her work on AI, workforce transformation and AI risk mitigation
3. provide a comprehensive profile combining the information from all the above sources about her expertise and contributions to AI.
""".strip()
print("Question 4. Detailed Information about Lareina Yee")
print("=" * 50)
response_4 = await enhanced_agent.run(question_4)
print(response_4)
print("*" * 50 + "\n")