# Agentic RAG using LlamaIndex

## What is LlamaIndex?
- Initially it was developed specifically for a Retrival Augmented Generation use case.  Now it has evolved and is used for AI Agents as well.
- Complete toolkit for context-augmented LLM applications.
- Main Components of LlamaIndex:
    - Data connectors (ingest and format existing data)
    - Data Indexes (structure and store data to be consumed by LLM)
    - Engines (Chat engine and Query engine)
    - Agents (simple tools, helper function to API integrations)
    - Observability / Evaluations
    - Workflows (event-driven or graph-based system)
- What makes it special?
    - Easy document parsing using LlamaParse
    - Many ready-to-use components
    - Simple and clear workflow system
    - LlamaHub (3rd party ready-to-use tools)


## Why moving away from SmolAgent?
- SmolAgent is a minimalistic library to create coding and tool calling agent.
- Great for creating simple agents but becomes complicated for a complex or multiple task use case.
- Single Agent uses long context and high token as well as prone to hallucination during complex reasoning tasks.
- Multiple Agent overcomes the above issues, but the library lacks flexibility, # of out of box tools available and is not scalable.

## What is RAG?
- It is also known as grounded generation.
- RAG is a technique extremely useful for creating chatbots.
- It only provides relevant information to LLM to answer the user's query leading to better, faster, cheaper, and more relevant information.

## How is it different from Agentic-RAG?
- has access to real-time data via tools
- can access multiple sources of data
- can make independant decisions based on the data
- supports deeper reasoning, tool integration, and more

## Why create RAG-based Agent
- Reduction in hallucination
- Better memory management
- Updated knowledge base of llm

## Using LLM-as-a-judge using LangFuse
- Using a large LLM to review the responses generated by the agents and evaluate the quality of the responses.
- LangFuse supports Ragas library Evolution metrics out-of-box.

### Important Metrics
- Hallucinations
- Trustworthiness
- Relevance
- Correctness and completeness
- Efficiency (token/time)


## Part 0: Library used
- llama-index
- llama-index-vector-stores-chroma
- llama-index-embeddings-openai
- llama-index-llms-openai

In [None]:
# setup your environment variables
import os
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [None]:
# setup path
from pathlib import Path

try:  # inside a script
	BASE_DIR = Path(__file__).resolve().parent.parent
except NameError:  # inside a notebook
	BASE_DIR = Path.cwd().parent
pdf_path = BASE_DIR / "data" / "the-state-of-ai.pdf"

## Part 1—Simple RAG System

In [None]:
# loading the necessary libraries
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

In [None]:
# load documents
reader = SimpleDirectoryReader(input_files=[pdf_path])
documents = reader.load_data()
print(f"Number of documents loaded: {len(documents)}")

In [None]:
# split document into chunks
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [None]:
# setup LLM and Embedding model
Settings.llm = OpenAI(model="gpt-3.5-turbo", api_key=OPENAI_API_KEY)
Settings.embed_model = OpenAIEmbedding(
	model="text-embedding-ada-002", api_key=OPENAI_API_KEY
)

In [None]:
# create vector index
vector_index = VectorStoreIndex(nodes)

In [None]:
# create a query engine
query_engine = vector_index.as_query_engine()

### 1.1 Inspecting the vector stores

In [None]:
# set up vector store to access it directly
vector_store = vector_index.vector_store

In [None]:
# get embedding dictionary and node dictionary
embedding_dict = vector_store.data.embedding_dict
node_dict = vector_store.data.text_id_to_ref_doc_id

In [None]:
print(f"Number of embeddings: {len(embedding_dict)}")
print(f"Number of node references: {len(node_dict)}")
print(f"Embedding dimension: {len(list(embedding_dict.values())[0])}")

### 1.2 Asking question to RAG system

In [None]:
# query vector store
response = query_engine.query("Who is Lareina Yee?")

In [None]:
response.response

In [None]:
print(len(response.source_nodes))

### 1.3 Checking if the response makes sense

In [None]:
# print out relevant source nodes
print("Relevant source nodes:")
print("-" * 50)
for idx, node in enumerate(response.source_nodes):
	print(f"Node {idx + 1}")
	print(f"Score: {node.score}")
	print(f"Text: {node.text}")
	print(f"Metadata: {node.metadata}")
	print("*" * 50)