<a href="https://colab.research.google.com/github/theship/Building-and-Evaluating-Advanced-RAG/blob/main/L1_Advanced_RAG_Pipeline_with_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building and Evaluating Advanced RAG

Runs through a basic RAG pipeline, then goes through two techniques for improving context: 1. sentence window retrieval (window of surrounding sentences, rather than just the most relevant sentence, to capture broader context) and auto-merging retrieval (if child node overlaps with siblings, use parent as context).

https://learn.deeplearning.ai/building-evaluating-advanced-rag/lesson/2/advanced-rag-pipeline


In [None]:
!pip install -U  "pydantic<3,>2" fastapi kaleido python-multipart uvicorn cohere tiktoken openai "llama-index" trulens_eval

In [None]:
import pydantic

print(pydantic.__version__)


2.5.3


In [None]:
!pip install -qU torch sentence-transformers

In [None]:
!pip install pypdf



In [None]:
from getpass import getpass

import os
import openai

openai.api_key = api_key = getpass('Enter your API key:')

os.environ['OPENAI_API_KEY'] = openai.api_key


Enter your API key:··········


## Basic RAG pipeline

In [None]:
from llama_index import SimpleDirectoryReader


documents = SimpleDirectoryReader(
    input_files=["/content/data/NOAANavigationRulesStandardSize.pdf"]
).load_data()

In [None]:
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[3])


<class 'list'> 

33 

<class 'llama_index.schema.Document'>
Doc ID: c6f6abc3-8d15-44ad-8aed-e9392b1ab184
Text: iv INTERNATIONAL & U.S. INLAND NAVIGATION RULESInternational
Rules The International Rules in this book were formalized in the
Convention on the International Regulations for Preventing Collisions
at Sea, 1972, and became  effective on July 15, 1977. The Rules
(commonly called 72 COLREGS) are part of the Convention, and vessels
flying the flags ...


In [None]:
from llama_index import Document

document = Document(text="\n\n".join([doc.text for doc in documents]))

In [None]:
from llama_index import VectorStoreIndex
from llama_index import ServiceContext
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
service_context = ServiceContext.from_defaults(
    llm=llm, embed_model="local:BAAI/bge-small-en-v1.5"
)
index = VectorStoreIndex.from_documents([document],
                                        service_context=service_context)

In [None]:
query_engine = index.as_query_engine()


In [None]:
response = query_engine.query(
    "What observation leads to determining that a 'Head-on situation' exists? And what are the visible indicators, accordig to Rule 14(b)?"
)
print(str(response))

The observation that leads to determining that a 'Head-on situation' exists is when the compass bearing of an approaching vessel does not appreciably change. According to Rule 14(b), the visible indicators of a 'Head-on situation' are that both vessels will be seen to be on a steady bearing and the relative positions of the two vessels do not appreciably change.


## Evaluation setup using TruLens

In [None]:
eval_questions = []
with open('/content/data/eval_questions.txt', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        print(item)
        eval_questions.append(item)

What observation leads to determining that a 'Head-on situation' exists?
What are the visible indicators of a 'Head-on situation', according to Rule 14(b)?
What is a Masthead light?
A sailing vessel underway shall keep out of the way of which vessels?
If a vessel detects by radar alone the presence of another vessel, how should she take action to avoid collision?
What lights should a sailing vessel underway exhibit?
What lights should a sailing vessel not under command or restricted in their ability to maneuver exhibit?
What are whistle signals used for?
What signals, used or exhibited either together or separately, indicate distress and need of assistance?
If the Range of visibility (luminous range) of light in nautical miles D  is 4, what is the Minimum luminous intensity of light in candelas for K=0.8 I?


In [None]:
from trulens_eval import Tru
tru = Tru()

tru.reset_database()

In [None]:
from utils import get_prebuilt_trulens_recorder



✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input response will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .


In [None]:
tru_recorder = get_prebuilt_trulens_recorder(query_engine,
                                             app_id="Direct Query Engine")

In [None]:
with tru_recorder as recording:
    for question in eval_questions:
        response = query_engine.query(question)

In [None]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

In [None]:
records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,Answer Relevance,Groundedness,Context Relevance,Answer Relevance_calls,Groundedness_calls,Context Relevance_calls,latency,total_tokens,total_cost
0,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_702f4493b2789b34b3a7cc011217ee5c,"""What observation leads to determining that a ...","""The observation that leads to determining tha...",-,"{""record_id"": ""record_hash_702f4493b2789b34b3a...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-01-11T17:31:53.434139"", ""...",2024-01-11T17:31:55.836036,1.0,1.0,0.95,[{'args': {'prompt': 'What observation leads t...,"[{'args': {'source': '(iii) A vessel, the pass...",[{'args': {'prompt': 'What observation leads t...,2,2141,0.003227
1,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_b86c607d341401b646bd0f3df5653337,"""What are the visible indicators of a 'Head-on...","""The visible indicators of a 'Head-on situatio...",-,"{""record_id"": ""record_hash_b86c607d341401b646b...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-01-11T17:31:56.054163"", ""...",2024-01-11T17:31:58.492824,1.0,0.0,0.45,[{'args': {'prompt': 'What are the visible ind...,[{'args': {'source': '‹‹ (p) A high intensity ...,[{'args': {'prompt': 'What are the visible ind...,2,2056,0.003099
2,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_8f034a96bd441ec7cbc62665b1331aec,"""What is a Masthead light?""","""A masthead light is a type of navigation ligh...",-,"{""record_id"": ""record_hash_8f034a96bd441ec7cbc...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-01-11T17:31:58.706431"", ""...",2024-01-11T17:32:02.739973,1.0,0.666667,0.1,[{'args': {'prompt': 'What is a Masthead light...,[{'args': {'source': 'Inland (b) High-speed cr...,[{'args': {'prompt': 'What is a Masthead light...,4,2190,0.003322
3,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_b722b2abf745d69c7e131fe6c24dde12,"""A sailing vessel underway shall keep out of t...","""A sailing vessel underway shall keep out of t...",-,"{""record_id"": ""record_hash_b722b2abf745d69c7e1...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-01-11T17:32:03.060009"", ""...",2024-01-11T17:32:05.350631,1.0,1.0,1.0,[{'args': {'prompt': 'A sailing vessel underwa...,[{'args': {'source': 'Rule 17—Action by Stand-...,[{'args': {'prompt': 'A sailing vessel underwa...,2,2121,0.003197
4,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_9d4ecd902ddab5b5cb274576dec4c320,"""If a vessel detects by radar alone the presen...","""The vessel should determine if a close-quarte...",-,"{""record_id"": ""record_hash_9d4ecd902ddab5b5cb2...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-01-11T17:32:05.550165"", ""...",2024-01-11T17:32:10.379514,1.0,1.0,0.95,[{'args': {'prompt': 'If a vessel detects by r...,"[{'args': {'source': '(iii) A vessel, the pass...",[{'args': {'prompt': 'If a vessel detects by r...,4,2207,0.003355


In [None]:
# launches on http://localhost:8501/
tru.run_dashboard()

Starting dashboard ...
npx: installed 22 in 3.809s

Go to this url and submit the ip given here. your url is: https://common-maps-buy.loca.lt

  Submit this IP Address: 34.125.32.187



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

## Advanced RAG pipeline
### 1. Sentence Window retrieval

Get a sentence and allow LLM to view sentences around that sentence to get more semantic context.

#### From BARD

##### Q: As an LLM RAG technique, what is "Sentence Window retrieval"?

##### A: Here's an explanation of Sentence Window retrieval as an LLM RAG technique:

###### **Key Idea**:

Provide more comprehensive context to LLMs for better understanding and generation of responses.
Retrieve a window of surrounding sentences, rather than just the most relevant sentence, to capture broader context.

###### **How it Works**:

1. Indexing:

Documents are split into individual sentences.
Each sentence is stored as a separate "node" in a retrieval system (like Elasticsearch or Haystack).
Key step: A "window" of surrounding sentences (e.g., 5 sentences before and after) is also stored as metadata for each sentence.

2. Retrieval:

When a query is received, the retrieval system identifies the most relevant sentences based on the query.
Instead of just returning those sentences, it also fetches their associated "window" sentences from the metadata.
The LLM receives both the relevant sentences and their surrounding context.

###### **Benefits**:

Improved Context: The LLM has a wider view of the topic, leading to more accurate and informative responses.
Better Coherence: The generated text flows more naturally, as the LLM has a better grasp of the overall flow of ideas.
Reduced Contradictions: By considering surrounding context, the LLM can avoid generating statements that contradict information in nearby sentences.
Enhanced Factual Accuracy: The LLM can more effectively verify claims and facts against surrounding context, leading to more reliable responses.
Example:

Query: "What are the key concepts for safety finetuning?"
Retrieved Sentence: "Safety finetuning is a process that aims to improve the safety of large language models."
Window Sentences: (Additional sentences providing details about safety finetuning methods, goals, and challenges)
LLM Response: (Incorporates both the retrieved sentence and window sentences to provide a comprehensive and informative explanation of safety finetuning)

#### From tutorial

"improve [the above] metrics with more advanced retrieval techniques like sentence window retrieval as well as on emerging retrieval. The first advanced technique we'll talk about is sentence window retrieval. This works by embedding and retrieving single sentences, so more granular chunks. But after retrieval, the sentences are replaced with a larger window of sentences around the original retrieved sentence. The intuition is that this allows for the LLM to have more context for the information retrieved in order to better answer queries while still retrieving on more granular pieces of information. So ideally improving both retrieval as well as synthesis performance."

In [None]:
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

In [None]:
from utils import build_sentence_window_index

sentence_index = build_sentence_window_index(
    document,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="sentence_index"
)

In [None]:
# moved up top: `!pip install -qU torch sentence-transformers`

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m670.2/670.2 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m82.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m70.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m103.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m731.7/731.7 MB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.6/410.6 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.6/121.6 MB[0m [31m

In [None]:
from utils import get_sentence_window_query_engine

# Get query engine from the sentence window index
sentence_window_engine = get_sentence_window_query_engine(sentence_index)

config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

In [None]:
window_response = sentence_window_engine.query(
    "What are whistle signals used for?"
)
print(str(window_response))



In [None]:
tru.reset_database()

tru_recorder_sentence_window = get_prebuilt_trulens_recorder(
    sentence_window_engine,
    app_id = "Sentence Window Query Engine"
)

In [None]:
for question in eval_questions:
    with tru_recorder_sentence_window as recording:
        response = sentence_window_engine.query(question)
        print(question)
        print(str(response))

What observation leads to determining that a 'Head-on situation' exists?
Observing the other vessel ahead or nearly ahead, and by night being able to see the masthead lights of the other in a line or nearly in a line and/or both sidelights, and by day observing the corresponding aspect of the other vessel leads to determining that a 'Head-on situation' exists.
What are the visible indicators of a 'Head-on situation', according to Rule 14(b)?
The visible indicators of a 'Head-on situation', according to Rule 14(b), are when a vessel sees the other ahead or nearly ahead and by night she could see the masthead lights of the other in a line or nearly in a line and/or both sidelights, and by day she observes the corresponding aspect of the other vessel.
What is a Masthead light?
A masthead light is a white light that is placed over the fore and aft centerline of a vessel. It shows an unbroken light over an arc of the horizon of 225° and is fixed in such a way that it shows the light from ri

In [None]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Groundedness,Context Relevance,Answer Relevance,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Sentence Window Query Engine,0.975,0.77,0.97,9.6,0.001388


In [None]:
# launches on http://localhost:8501/
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
Dashboard already running at path:   Submit this IP Address: 34.125.32.187



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

"Latency is more or less the
same, and the total cost is lower. Since the groundedness
and context relevance are higher, but the total cost is lower, we
can intuit that the sentence window retriever is actually
giving us more relevant context and more
efficiently as well."

### 2. Auto-merging retrieval


Index as hierarch and if children overlap, roll them up into the parent for contxt.

#### From BARD

##### Q: As an LLM RAG technique, what is "Auto-merging retrieval"?

##### A: Here's an explanation of Auto-merging retrieval (AMR) as an LLM RAG technique:

###### **Key Idea**:

Combine information from multiple sources or segments of text to create more comprehensive and contextually relevant responses.
Achieve this by automatically merging smaller text blocks into larger parent chunks based on their relevance to a query.
How it Works:

1. Hierarchical Indexing:

Documents are organized into a tree-like structure, with parent nodes representing larger sections and child nodes representing smaller blocks of text.
Each node is indexed individually, allowing for retrieval at different levels of granularity.

2. Retrieval and Merging:

When a query is received, the retrieval system identifies relevant child nodes.
It then checks for overlap or high similarity among the relevant child nodes.
If there's significant overlap or similarity, those child nodes are automatically merged into their parent node, creating a larger, more comprehensive text chunk.
The merged parent node is then provided as context to the LLM.
###### **Benefits**:

Improved Context: By merging related segments, AMR provides a broader and more unified context for the LLM, leading to better understanding and more informative responses.
Reduced Fragmentation: Avoids presenting the LLM with disjointed pieces of information, which can improve coherence and reduce contradictions in generated responses.
Better Coverage: Increases the likelihood of capturing all relevant information, even when it's scattered across different sections of a document or multiple documents.
Enhanced Topic Understanding: The LLM gains a better grasp of the overall topic structure and relationships between different concepts.
###### **Example**:

Query: "What are the key benefits and challenges of implementing a multi-cloud strategy?"
Retrieved Child Nodes: (Individual sentences about benefits and challenges from different sections of a document)
Auto-Merged Parent Node: (A larger section combining those sentences into a cohesive discussion of multi-cloud benefits and challenges)
LLM Response: (Incorporates the merged parent node to provide a comprehensive and informative response that addresses both benefits and challenges)




#### From tutorial:


"Here we construct a hierarchy of larger parent nodes with smaller child nodes that
reference the parent node. So for instance we might
have a parent node of chunk size 512 tokens, and underneath there are four
child nodes of chunk size 128 tokens that link to
this parent node. The auto-merging retriever
works by merging retrieved nodes into larger parent nodes, which means
that during retrieval, if a parent actually has a
majority of its children nodes retrieved, then we'll replace the
children nodes with the parent node. So this allows us
to hierarchically merge our retrieved nodes. The combination
of all the child nodes is the same text as
the parent node. Similarly to the sentence window
retriever, in the next few lessons, we'll do a bit
more of a deep dive on how it
works. Here, we'll show you how to set it up
with our helper functions. Here, we've built the auto merging index,
again, using GPT 3.5 Turbo for the LLM, as
well as the BGE model for the embedding model. We
get the query engine from the
auto-merging retriever. "

In [None]:
from utils import build_automerging_index

automerging_index = build_automerging_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index"
)

In [None]:
from utils import get_automerging_query_engine

automerging_query_engine = get_automerging_query_engine(
    automerging_index,
)

In [None]:
auto_merging_response = automerging_query_engine.query(
    "What is a Masthead light?"
)
print(str(auto_merging_response))

> Merging 6 nodes into parent node.
> Parent node id: 36732ee3-fc63-42c5-99da-aad498e4abd1.
> Parent node text: (b) The vertical separation of the masthead lights of power-driven vessels shall be such that in ...

> Merging 1 nodes into parent node.
> Parent node id: ef6c409e-4d2c-4062-9bef-4e9c3bbae5c6.
> Parent node text: [13‡19]. High Speed Craft
(a) The masthead light of high-speed craft may be placed at a height re...

A masthead light is a white light that is placed over the fore and aft centerline of a vessel. It shows an unbroken light over an arc of the horizon and is fixed in such a way that it shows the light from right ahead to a certain angle.


In [None]:
tru.reset_database()

tru_recorder_automerging = get_prebuilt_trulens_recorder(automerging_query_engine,
                                                         app_id="Automerging Query Engine")

In [None]:
for question in eval_questions:
    with tru_recorder_automerging as recording:
        response = automerging_query_engine.query(question)
        print(question)
        print(response)

What observation leads to determining that a 'Head-on situation' exists?
Observing the masthead lights of the other vessel in a line or nearly in a line, and/or observing both sidelights, by night, leads to determining that a 'Head-on situation' exists.
What are the visible indicators of a 'Head-on situation', according to Rule 14(b)?
The visible indicators of a 'Head-on situation', according to Rule 14(b), are when two power-driven vessels are meeting on reciprocal or nearly reciprocal courses so as to involve risk of collision.
> Merging 6 nodes into parent node.
> Parent node id: 36732ee3-fc63-42c5-99da-aad498e4abd1.
> Parent node text: (b) The vertical separation of the masthead lights of power-driven vessels shall be such that in ...

> Merging 1 nodes into parent node.
> Parent node id: ef6c409e-4d2c-4062-9bef-4e9c3bbae5c6.
> Parent node text: [13‡19]. High Speed Craft
(a) The masthead light of high-speed craft may be placed at a height re...

What is a Masthead light?
A masthead

In [None]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Groundedness,Context Relevance,Answer Relevance,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Automerging Query Engine,1.0,0.75,1.0,11.2,0.000823


In [None]:
# launches on http://localhost:8501/
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
Dashboard already running at path:   Submit this IP Address: 34.125.32.187



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>