<a href="https://colab.research.google.com/github/vin00d/AIE5/blob/main/13_Advanced_Retrieval/Advanced_Retrieval_with_LangChain_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally.

In [1]:
!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/55.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.3/55.3 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m253.9/253.9 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m29.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [2]:
!pip install -qU qdrant-client

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m306.6/306.6 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m319.7/319.7 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25h

We'll also provide our OpenAI key, as well as our Cohere API key.

In [3]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

Enter your OpenAI API Key:··········


In [4]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

Cohere API Key:··········


## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [5]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-03-03 05:57:07--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-03-03 05:57:08 (25.2 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-03-03 05:57:08--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-03-03 05:57:08 (22.3 MB/s) - ‘john_wick_2.csv’

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [6]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [7]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 2, 28, 5, 57, 13, 226007)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [8]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [9]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [10]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-3.5-turbo` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [11]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI()

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [12]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [13]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, it seems that people generally liked John Wick. The reviews praise the action sequences, Keanu Reeves' performance, the pacing, and the overall entertainment value of the film. Some describe it as a remarkable and surprising film that stands out in the action genre. So, yes, it can be inferred that people liked John Wick."

In [14]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review: '/review/rw4854296/?ref_=tt_urv'."

In [15]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"John Wick is a story about an ex-hitman who comes out of retirement to seek vengeance against the gangsters that killed his dog and took everything from him. The movie is filled with action, shootouts, and breathtaking fights as John Wick unleashes destruction against those who are after him. It's a tale of revenge and survival in a world filled with professional killers."

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [16]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [17]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [18]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Overall, opinions about John Wick seem to vary. Some people really enjoyed it and found it to be a great action film with stylish stunts and a relatable hero, while others found it to be lacking in plot and substance, feeling it was too violent and uninteresting. So, it's safe to say that not everyone liked John Wick."

In [19]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"I'm sorry, there are no reviews with a rating of 10 in the provided context."

In [20]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, the main character, portrayed by Keanu Reeves, is a retired hitman seeking vengeance for the death of his beloved dog, which was a final gift from his deceased wife. Throughout the movie, John's old connections from his assassin days come back to haunt him, leading to intense action sequences and a thrilling story of revenge."

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [21]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [22]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick based on the positive reviews from viewers.'

In [24]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review: \'/review/rw4854296/?ref_=tt_urv\' from the movie "John Wick 3".'

In [25]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, after resolving issues with the Russian mafia, John Wick is forced back into action by a mobster who blows up his house and asks him to kill his sister in Rome. Wick carries out the assignment but then becomes the target of a seven-million dollar contract on his life, leading to a promise to kill the mobster who betrayed him.'

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [26]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [27]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick based on the reviews provided.'

In [29]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?\n\nYes, there is a review with a rating of 10. Here is the URL to that review:\n1. Review: A Masterpiece & Brilliant Sequel\nURL: '/review/rw4854296/?ref_=tt_urv'"

In [30]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick: Chapter 2, the story continues with John Wick being forced back into action after an Italian baddie calls in a favor. Wick must fulfill the task to settle his debt, leading to a series of intense and action-packed events, including a mission to assassinate Santino D'Antonio's sister in Rome. As Wick completes his assignment, Santino puts a seven-million dollar contract on him, resulting in a chase by professional killers from all over, culminating in Wick's promise to take down Santino."

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [31]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [32]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [33]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [34]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [35]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Overall, opinions about "John Wick" seem to be divided. Some people like the series, while others have strong negative opinions about it.'

In [37]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The review for the movie "John Wick 3" has a rating of 10. Here is the URL to that review: /review/rw4854296/?ref_=tt_urv'

In [38]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, the main character, played by Keanu Reeves, is a retired assassin who is forced back into action when someone kills his dog and steals his car. He embarks on a violent journey of revenge and redemption, ultimately killing a large number of people in the process.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [39]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [40]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, it appears that the majority of people enjoyed John Wick. Multiple reviews have praised the film for its action sequences, stylish presentation, and engaging storyline. The positive feedback suggests that people generally liked John Wick.'

In [42]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is one review with a rating of 10 for "John Wick 3". Here is the URL to that review:\n\n[Review by ymyuseda](/review/rw4854296/?ref_=tt_urv)'

In [43]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, an ex-hitman comes out of retirement to seek vengeance on the gangsters who killed his dog and took everything from him. The plot involves a deadly game of retaliation and survival as John Wick faces off against multiple adversaries in a world filled with assassins and criminals.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [44]:
!pip install -qU langchain_experimental

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/209.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m112.6/209.2 kB[0m [31m3.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.2/209.2 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [45]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [46]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [47]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [48]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [49]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [50]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, it seems that most people enjoyed John Wick and found it to be a thrilling and well-executed action film.'

In [51]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for the movie "John Wick 3" by the author ymyuseda. The URL to that review is \'/review/rw4854296/?ref_=tt_urv\'.'

In [52]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, the main character, John Wick, seeks revenge on the people who killed his dog and took something he loved from him. The film showcases awesome action, stylish stunts, kinetic chaos, and a relatable hero on a mission of vengeance.'

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

## RAGAS Evaluation

In [53]:
!pip install -qU ragas==0.2.10 unstructured==0.16.12

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/981.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m471.0/981.5 kB[0m [31m14.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m175.7/175.7 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m37.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.1/71.1 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.4/485.4 kB[0m [31m23

In [54]:
os.environ["RAGAS_APP_TOKEN"] = getpass.getpass("Please enter your Ragas API key!")

Please enter your Ragas API key!··········


### Golden Test Data Set

In [55]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [56]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(documents, testset_size=10)

Applying SummaryExtractor:   0%|          | 0/44 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/100 [00:00<?, ?it/s]



Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/244 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [57]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,How does the film 'John Wick' compare to 'Take...,[: 0\nReview: The best way I can describe John...,The film 'John Wick' can be described as simil...,single_hop_specifc_query_synthesizer
1,What is the current reception and impact of th...,[: 2\nReview: With the fourth installment scor...,The fourth installment of the John Wick film f...,single_hop_specifc_query_synthesizer
2,How does Chad Stahelski's expertise as a stunt...,[: 3\nReview: John wick has a very simple reve...,Chad Stahelski's expertise as a stunt speciali...,single_hop_specifc_query_synthesizer
3,Wht makes John Wick such a captivating action ...,[: 4\nReview: Though he no longer has a taste ...,"John Wick is captivating due to its stylized, ...",single_hop_specifc_query_synthesizer
4,What role does the Russian mob play in the fil...,[: 5\nReview: Ultra-violent first entry with l...,"In the film John Wick, an arrogant Russian mob...",single_hop_specifc_query_synthesizer
5,How does 'John Wick: Chapter 4' improve upon '...,"[<1-hop>\n\n: 11\nReview: The overrated ""John ...",'John Wick: Chapter 4' improves upon 'John Wic...,multi_hop_specific_query_synthesizer
6,How has Keanu Reeves' portrayal of John Wick e...,[<1-hop>\n\n: 20\nReview: John Wick is somethi...,Keanu Reeves' portrayal of John Wick has evolv...,multi_hop_specific_query_synthesizer
7,What are the key elements that make John Wick ...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",John Wick 3 is praised for its high-quality ac...,multi_hop_specific_query_synthesizer
8,What are the contrasting elements in the revie...,[<1-hop>\n\n: 8\nReview: About 6 months ago I ...,The review of 'John Wick: Chapter 3 - Parabell...,multi_hop_specific_query_synthesizer
9,How has the John Wick film franchise evolved f...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",The John Wick film franchise began with its fi...,multi_hop_specific_query_synthesizer


In [58]:
dataset.upload()

Testset uploaded! View at https://app.ragas.io/dashboard/alignment/testset/f71ee1e1-9385-436b-ba7f-023cd867cabb


'https://app.ragas.io/dashboard/alignment/testset/f71ee1e1-9385-436b-ba7f-023cd867cabb'

#### RAGAS Setup

In [59]:
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))

In [73]:
def create_retriever_test_set(retriever, dataset):
  for test_row in dataset:
    response = retriever.invoke({"question" : test_row.eval_sample.user_input})
    test_row.eval_sample.response = response["response"].content
    test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
  return dataset

In [74]:
from ragas import EvaluationDataset

def create_ragas_eval_dataset(dataset):
  evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())
  return evaluation_dataset

In [75]:
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig

custom_run_config = RunConfig(timeout=360)

def perform_ragas_eval(evaluation_dataset, run_config=custom_run_config):
  result = evaluate(
      dataset=evaluation_dataset,
      metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
      llm=evaluator_llm,
      run_config=custom_run_config
  )
  return result

#### Baseline - Naive RAG Chain

In [76]:
test_set = create_retriever_test_set(naive_retrieval_chain, dataset)

In [77]:
test_set.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,How does the film 'John Wick' compare to 'Take...,[: 0\nReview: The best way I can describe John...,[: 0\nReview: The best way I can describe John...,"In terms of the lead character, 'John Wick' co...",The film 'John Wick' can be described as simil...,single_hop_specifc_query_synthesizer
1,What is the current reception and impact of th...,[: 18\nReview: Ever since the original John Wi...,[: 2\nReview: With the fourth installment scor...,"Based on the reviews provided, the current rec...",The fourth installment of the John Wick film f...,single_hop_specifc_query_synthesizer
2,How does Chad Stahelski's expertise as a stunt...,[: 3\nReview: John wick has a very simple reve...,[: 3\nReview: John wick has a very simple reve...,Chad Stahelski's expertise as a stunt speciali...,Chad Stahelski's expertise as a stunt speciali...,single_hop_specifc_query_synthesizer
3,Wht makes John Wick such a captivating action ...,"[: 9\nReview: At first glance, John Wick sound...",[: 4\nReview: Though he no longer has a taste ...,John Wick is such a captivating action film du...,"John Wick is captivating due to its stylized, ...",single_hop_specifc_query_synthesizer
4,What role does the Russian mob play in the fil...,"[: 18\nReview: When the story begins, John (Ke...",[: 5\nReview: Ultra-violent first entry with l...,"In the film John Wick, the Russian mob plays a...","In the film John Wick, an arrogant Russian mob...",single_hop_specifc_query_synthesizer
5,How does 'John Wick: Chapter 4' improve upon '...,[: 19\nReview: John Wick: Chapter 4 picks up w...,"[<1-hop>\n\n: 11\nReview: The overrated ""John ...","In 'John Wick: Chapter 4', the film improves u...",'John Wick: Chapter 4' improves upon 'John Wic...,multi_hop_specific_query_synthesizer
6,How has Keanu Reeves' portrayal of John Wick e...,[: 8\nReview: In this 2nd installment of John ...,[<1-hop>\n\n: 20\nReview: John Wick is somethi...,I don't have specific information about how Ke...,Keanu Reeves' portrayal of John Wick has evolv...,multi_hop_specific_query_synthesizer
7,What are the key elements that make John Wick ...,[: 14\nReview: By now you know what to expect ...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",Key elements that make John Wick 3 stand out i...,John Wick 3 is praised for its high-quality ac...,multi_hop_specific_query_synthesizer
8,What are the contrasting elements in the revie...,[: 0\nReview: It is 5 years since the first Jo...,[<1-hop>\n\n: 8\nReview: About 6 months ago I ...,The contrasting elements in the reviews of 'Jo...,The review of 'John Wick: Chapter 3 - Parabell...,multi_hop_specific_query_synthesizer
9,How has the John Wick film franchise evolved f...,[: 0\nReview: It is 5 years since the first Jo...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",The John Wick film franchise has evolved from ...,The John Wick film franchise began with its fi...,multi_hop_specific_query_synthesizer


In [78]:
evaluation_dataset = create_ragas_eval_dataset(test_set)

In [79]:
result = perform_ragas_eval(evaluation_dataset)
result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

ERROR:ragas.executor:Exception raised in Job[23]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[41]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[47]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[53]: TimeoutError()


{'context_recall': 0.9150, 'faithfulness': 0.9606, 'factual_correctness': 0.3780, 'answer_relevancy': 0.8825, 'context_entity_recall': 0.5783, 'noise_sensitivity_relevant': 0.3729}

#### BM25 RAG Chain

In [80]:
test_set = create_retriever_test_set(bm25_retrieval_chain, dataset)

In [81]:
test_set.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,How does the film 'John Wick' compare to 'Take...,[: 0\nReview: The best way I can describe John...,[: 0\nReview: The best way I can describe John...,In terms of the lead character comparison betw...,The film 'John Wick' can be described as simil...,single_hop_specifc_query_synthesizer
1,What is the current reception and impact of th...,[: 6\nReview: In this fourth installment of 87...,[: 2\nReview: With the fourth installment scor...,The reception and impact of the John Wick film...,The fourth installment of the John Wick film f...,single_hop_specifc_query_synthesizer
2,How does Chad Stahelski's expertise as a stunt...,[: 3\nReview: John wick has a very simple reve...,[: 3\nReview: John wick has a very simple reve...,Chad Stahelski's expertise as a stunt speciali...,Chad Stahelski's expertise as a stunt speciali...,single_hop_specifc_query_synthesizer
3,Wht makes John Wick such a captivating action ...,[: 21\nReview: John Wick is an action film wit...,[: 4\nReview: Though he no longer has a taste ...,John Wick is such a captivating action film be...,"John Wick is captivating due to its stylized, ...",single_hop_specifc_query_synthesizer
4,What role does the Russian mob play in the fil...,"[: 9\nReview: At first glance, John Wick sound...",[: 5\nReview: Ultra-violent first entry with l...,"In the film John Wick, the Russian mob plays a...","In the film John Wick, an arrogant Russian mob...",single_hop_specifc_query_synthesizer
5,How does 'John Wick: Chapter 4' improve upon '...,[: 20\nReview: In a world where movie sequels ...,"[<1-hop>\n\n: 11\nReview: The overrated ""John ...","Based on the provided context, 'John Wick: Cha...",'John Wick: Chapter 4' improves upon 'John Wic...,multi_hop_specific_query_synthesizer
6,How has Keanu Reeves' portrayal of John Wick e...,[: 19\nReview: John Wick: Chapter 4 picks up w...,[<1-hop>\n\n: 20\nReview: John Wick is somethi...,I don't have the specific details on how Keanu...,Keanu Reeves' portrayal of John Wick has evolv...,multi_hop_specific_query_synthesizer
7,What are the key elements that make John Wick ...,"[: 3\nReview: Well, I committed to watching th...","[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...","I'm sorry, I don't have specific information a...",John Wick 3 is praised for its high-quality ac...,multi_hop_specific_query_synthesizer
8,What are the contrasting elements in the revie...,[: 8\nReview: About 6 months ago I saw a pictu...,[<1-hop>\n\n: 8\nReview: About 6 months ago I ...,The contrasting elements in the reviews of 'Jo...,The review of 'John Wick: Chapter 3 - Parabell...,multi_hop_specific_query_synthesizer
9,How has the John Wick film franchise evolved f...,[: 8\nReview: In this 2nd installment of John ...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",The John Wick film franchise has evolved from ...,The John Wick film franchise began with its fi...,multi_hop_specific_query_synthesizer


In [82]:
evaluation_dataset = create_ragas_eval_dataset(test_set)

In [83]:
result = perform_ragas_eval(evaluation_dataset)
result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.6557, 'faithfulness': 0.7896, 'factual_correctness': 0.4340, 'answer_relevancy': 0.6862, 'context_entity_recall': 0.5783, 'noise_sensitivity_relevant': 0.2035}

#### Contextual Compression - Reranking RAG Chain

In [89]:
test_set = create_retriever_test_set(contextual_compression_retrieval_chain, dataset)

In [90]:
test_set.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,How does the film 'John Wick' compare to 'Take...,[: 11\nReview: JOHN WICK is a rare example of ...,[: 0\nReview: The best way I can describe John...,"Based on the context provided, it seems that '...",The film 'John Wick' can be described as simil...,single_hop_specifc_query_synthesizer
1,What is the current reception and impact of th...,[: 20\nReview: In a world where movie sequels ...,[: 2\nReview: With the fourth installment scor...,"Based on the reviews provided, the current rec...",The fourth installment of the John Wick film f...,single_hop_specifc_query_synthesizer
2,How does Chad Stahelski's expertise as a stunt...,[: 3\nReview: John wick has a very simple reve...,[: 3\nReview: John wick has a very simple reve...,Chad Stahelski's expertise as a stunt speciali...,Chad Stahelski's expertise as a stunt speciali...,single_hop_specifc_query_synthesizer
3,Wht makes John Wick such a captivating action ...,[: 3\nReview: John wick has a very simple reve...,[: 4\nReview: Though he no longer has a taste ...,John Wick is such a captivating action film be...,"John Wick is captivating due to its stylized, ...",single_hop_specifc_query_synthesizer
4,What role does the Russian mob play in the fil...,[: 20\nReview: After resolving his issues with...,[: 5\nReview: Ultra-violent first entry with l...,The Russian mob plays a significant role in th...,"In the film John Wick, an arrogant Russian mob...",single_hop_specifc_query_synthesizer
5,How does 'John Wick: Chapter 4' improve upon '...,[: 19\nReview: John Wick: Chapter 4 picks up w...,"[<1-hop>\n\n: 11\nReview: The overrated ""John ...","According to the review, ""John Wick: Chapter 4...",'John Wick: Chapter 4' improves upon 'John Wic...,multi_hop_specific_query_synthesizer
6,How has Keanu Reeves' portrayal of John Wick e...,[: 19\nReview: The inevitable third chapter of...,[<1-hop>\n\n: 20\nReview: John Wick is somethi...,I don't know the answer to that question.,Keanu Reeves' portrayal of John Wick has evolv...,multi_hop_specific_query_synthesizer
7,What are the key elements that make John Wick ...,[: 19\nReview: The inevitable third chapter of...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",The key elements that make John Wick 3 stand o...,John Wick 3 is praised for its high-quality ac...,multi_hop_specific_query_synthesizer
8,What are the contrasting elements in the revie...,[: 24\nReview: John Wick: Chapter 3 - Parabell...,[<1-hop>\n\n: 8\nReview: About 6 months ago I ...,The contrasting elements in the reviews of 'Jo...,The review of 'John Wick: Chapter 3 - Parabell...,multi_hop_specific_query_synthesizer
9,How has the John Wick film franchise evolved f...,[: 19\nReview: The inevitable third chapter of...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",The John Wick film franchise has evolved from ...,The John Wick film franchise began with its fi...,multi_hop_specific_query_synthesizer


In [91]:
evaluation_dataset = create_ragas_eval_dataset(test_set)

In [92]:
reranking_eval_result = perform_ragas_eval(evaluation_dataset)
reranking_eval_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.6707, 'faithfulness': 0.7931, 'factual_correctness': 0.3250, 'answer_relevancy': 0.7875, 'context_entity_recall': 0.6083, 'noise_sensitivity_relevant': 0.4482}

#### Multi-Query Retriever

In [93]:
test_set = create_retriever_test_set(multi_query_retrieval_chain, dataset)

In [94]:
test_set.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,How does the film 'John Wick' compare to 'Take...,[: 0\nReview: The best way I can describe John...,[: 0\nReview: The best way I can describe John...,"In terms of the lead character, Keanu Reeves i...",The film 'John Wick' can be described as simil...,single_hop_specifc_query_synthesizer
1,What is the current reception and impact of th...,[: 18\nReview: Ever since the original John Wi...,[: 2\nReview: With the fourth installment scor...,"Based on the reviews provided, the current rec...",The fourth installment of the John Wick film f...,single_hop_specifc_query_synthesizer
2,How does Chad Stahelski's expertise as a stunt...,[: 3\nReview: John wick has a very simple reve...,[: 3\nReview: John wick has a very simple reve...,Chad Stahelski's expertise as a stunt speciali...,Chad Stahelski's expertise as a stunt speciali...,single_hop_specifc_query_synthesizer
3,Wht makes John Wick such a captivating action ...,"[: 9\nReview: At first glance, John Wick sound...",[: 4\nReview: Though he no longer has a taste ...,John Wick is such a captivating action film du...,"John Wick is captivating due to its stylized, ...",single_hop_specifc_query_synthesizer
4,What role does the Russian mob play in the fil...,[: 20\nReview: After resolving his issues with...,[: 5\nReview: Ultra-violent first entry with l...,The Russian mob plays a significant role in th...,"In the film John Wick, an arrogant Russian mob...",single_hop_specifc_query_synthesizer
5,How does 'John Wick: Chapter 4' improve upon '...,[: 19\nReview: John Wick: Chapter 4 picks up w...,"[<1-hop>\n\n: 11\nReview: The overrated ""John ...","In 'John Wick: Chapter 4', the film improves u...",'John Wick: Chapter 4' improves upon 'John Wic...,multi_hop_specific_query_synthesizer
6,How has Keanu Reeves' portrayal of John Wick e...,[: 8\nReview: In this 2nd installment of John ...,[<1-hop>\n\n: 20\nReview: John Wick is somethi...,"I'm sorry, I don't have specific information o...",Keanu Reeves' portrayal of John Wick has evolv...,multi_hop_specific_query_synthesizer
7,What are the key elements that make John Wick ...,[: 0\nReview: It is 5 years since the first Jo...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...","According to reviews, the key elements that ma...",John Wick 3 is praised for its high-quality ac...,multi_hop_specific_query_synthesizer
8,What are the contrasting elements in the revie...,[: 0\nReview: It is 5 years since the first Jo...,[<1-hop>\n\n: 8\nReview: About 6 months ago I ...,The contrasting elements in the reviews of 'Jo...,The review of 'John Wick: Chapter 3 - Parabell...,multi_hop_specific_query_synthesizer
9,How has the John Wick film franchise evolved f...,[: 0\nReview: It is 5 years since the first Jo...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",The John Wick film franchise has evolved from ...,The John Wick film franchise began with its fi...,multi_hop_specific_query_synthesizer


In [95]:
evaluation_dataset = create_ragas_eval_dataset(test_set)

In [96]:
multiq_eval_result = perform_ragas_eval(evaluation_dataset)
multiq_eval_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

ERROR:ragas.executor:Exception raised in Job[17]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[23]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[29]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[41]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[47]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[53]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[59]: TimeoutError()


{'context_recall': 0.9007, 'faithfulness': 0.8590, 'factual_correctness': 0.3740, 'answer_relevancy': 0.8847, 'context_entity_recall': 0.6033, 'noise_sensitivity_relevant': 0.3583}

#### Parent Document Retriever

In [97]:
test_set = create_retriever_test_set(parent_document_retrieval_chain, dataset)

In [98]:
test_set.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,How does the film 'John Wick' compare to 'Take...,[: 11\nReview: JOHN WICK is a rare example of ...,[: 0\nReview: The best way I can describe John...,In terms of comparing the lead characters in '...,The film 'John Wick' can be described as simil...,single_hop_specifc_query_synthesizer
1,What is the current reception and impact of th...,[: 17\nReview: Stuntman turned writer/director...,[: 2\nReview: With the fourth installment scor...,The current reception and impact of the John W...,The fourth installment of the John Wick film f...,single_hop_specifc_query_synthesizer
2,How does Chad Stahelski's expertise as a stunt...,[: 18\nReview: Ever since the original John Wi...,[: 3\nReview: John wick has a very simple reve...,Chad Stahelski's expertise as a stunt speciali...,Chad Stahelski's expertise as a stunt speciali...,single_hop_specifc_query_synthesizer
3,Wht makes John Wick such a captivating action ...,[: 8\nReview: It's hard to find anything bad t...,[: 4\nReview: Though he no longer has a taste ...,John Wick is such a captivating action film du...,"John Wick is captivating due to its stylized, ...",single_hop_specifc_query_synthesizer
4,What role does the Russian mob play in the fil...,[: 20\nReview: After resolving his issues with...,[: 5\nReview: Ultra-violent first entry with l...,"In the film ""John Wick,"" the Russian mob plays...","In the film John Wick, an arrogant Russian mob...",single_hop_specifc_query_synthesizer
5,How does 'John Wick: Chapter 4' improve upon '...,[: 19\nReview: John Wick: Chapter 4 picks up w...,"[<1-hop>\n\n: 11\nReview: The overrated ""John ...","Based on the review provided, 'John Wick: Chap...",'John Wick: Chapter 4' improves upon 'John Wic...,multi_hop_specific_query_synthesizer
6,How has Keanu Reeves' portrayal of John Wick e...,[: 18\nReview: Ever since the original John Wi...,[<1-hop>\n\n: 20\nReview: John Wick is somethi...,"As a helpful assistant, I do not have specific...",Keanu Reeves' portrayal of John Wick has evolv...,multi_hop_specific_query_synthesizer
7,What are the key elements that make John Wick ...,[: 2\nReview: The first three John Wick films ...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",The key elements that make John Wick 3 stand o...,John Wick 3 is praised for its high-quality ac...,multi_hop_specific_query_synthesizer
8,What are the contrasting elements in the revie...,[: 19\nReview: John Wick: Chapter 4 picks up w...,[<1-hop>\n\n: 8\nReview: About 6 months ago I ...,The contrasting elements in the reviews of 'Jo...,The review of 'John Wick: Chapter 3 - Parabell...,multi_hop_specific_query_synthesizer
9,How has the John Wick film franchise evolved f...,[: 2\nReview: The first three John Wick films ...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",The John Wick film franchise has evolved from ...,The John Wick film franchise began with its fi...,multi_hop_specific_query_synthesizer


In [99]:
evaluation_dataset = create_ragas_eval_dataset(test_set)

In [100]:
parentdoc_eval_result = perform_ragas_eval(evaluation_dataset)
parentdoc_eval_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.4700, 'faithfulness': 0.8063, 'factual_correctness': 0.5020, 'answer_relevancy': 0.8856, 'context_entity_recall': 0.5350, 'noise_sensitivity_relevant': 0.2828}

#### Ensemble Retriever

In [101]:
test_set = create_retriever_test_set(ensemble_retrieval_chain, dataset)

In [102]:
test_set.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,How does the film 'John Wick' compare to 'Take...,[: 0\nReview: The best way I can describe John...,[: 0\nReview: The best way I can describe John...,"In terms of the lead character, John Wick play...",The film 'John Wick' can be described as simil...,single_hop_specifc_query_synthesizer
1,What is the current reception and impact of th...,[: 20\nReview: In a world where movie sequels ...,[: 2\nReview: With the fourth installment scor...,The current reception and impact of the John W...,The fourth installment of the John Wick film f...,single_hop_specifc_query_synthesizer
2,How does Chad Stahelski's expertise as a stunt...,[: 3\nReview: John wick has a very simple reve...,[: 3\nReview: John wick has a very simple reve...,Chad Stahelski's expertise as a stunt speciali...,Chad Stahelski's expertise as a stunt speciali...,single_hop_specifc_query_synthesizer
3,Wht makes John Wick such a captivating action ...,"[: 9\nReview: At first glance, John Wick sound...",[: 4\nReview: Though he no longer has a taste ...,John Wick is such a captivating action film du...,"John Wick is captivating due to its stylized, ...",single_hop_specifc_query_synthesizer
4,What role does the Russian mob play in the fil...,"[: 18\nReview: When the story begins, John (Ke...",[: 5\nReview: Ultra-violent first entry with l...,"In the film John Wick, the Russian mob plays a...","In the film John Wick, an arrogant Russian mob...",single_hop_specifc_query_synthesizer
5,How does 'John Wick: Chapter 4' improve upon '...,[: 19\nReview: John Wick: Chapter 4 picks up w...,"[<1-hop>\n\n: 11\nReview: The overrated ""John ...",John Wick: Chapter 4 improves upon 'John Wick:...,'John Wick: Chapter 4' improves upon 'John Wic...,multi_hop_specific_query_synthesizer
6,How has Keanu Reeves' portrayal of John Wick e...,[: 8\nReview: In this 2nd installment of John ...,[<1-hop>\n\n: 20\nReview: John Wick is somethi...,Keanu Reeves' portrayal of John Wick has evolv...,Keanu Reeves' portrayal of John Wick has evolv...,multi_hop_specific_query_synthesizer
7,What are the key elements that make John Wick ...,[: 14\nReview: By now you know what to expect ...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...","According to the reviews, the key elements tha...",John Wick 3 is praised for its high-quality ac...,multi_hop_specific_query_synthesizer
8,What are the contrasting elements in the revie...,[: 24\nReview: John Wick: Chapter 3 - Parabell...,[<1-hop>\n\n: 8\nReview: About 6 months ago I ...,In the reviews of 'John Wick: Chapter 3 - Para...,The review of 'John Wick: Chapter 3 - Parabell...,multi_hop_specific_query_synthesizer
9,How has the John Wick film franchise evolved f...,[: 20\nReview: In a world where movie sequels ...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...",The John Wick film franchise has evolved from ...,The John Wick film franchise began with its fi...,multi_hop_specific_query_synthesizer


In [103]:
evaluation_dataset = create_ragas_eval_dataset(test_set)

In [104]:
ensemble_eval_result = perform_ragas_eval(evaluation_dataset)
ensemble_eval_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

ERROR:ragas.executor:Exception raised in Job[5]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[11]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[17]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[23]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[41]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[47]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[53]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[59]: TimeoutError()


{'context_recall': 0.9150, 'faithfulness': 0.8827, 'factual_correctness': 0.4820, 'answer_relevancy': 0.9668, 'context_entity_recall': 0.6172, 'noise_sensitivity_relevant': 0.5000}

### Comparison of RAGAS Evals

In [105]:
naive_eval_result = {'context_recall': 0.9150, 'faithfulness': 0.9606, 'factual_correctness': 0.3780, 'answer_relevancy': 0.8825, 'context_entity_recall': 0.5783, 'noise_sensitivity_relevant': 0.3729}
bm25_eval_result = {'context_recall': 0.6557, 'faithfulness': 0.7896, 'factual_correctness': 0.4340, 'answer_relevancy': 0.6862, 'context_entity_recall': 0.5783, 'noise_sensitivity_relevant': 0.2035}
reranking_eval_result = {'context_recall': 0.6707, 'faithfulness': 0.7931, 'factual_correctness': 0.3250, 'answer_relevancy': 0.7875, 'context_entity_recall': 0.6083, 'noise_sensitivity_relevant': 0.4482}
multiq_eval_result = {'context_recall': 0.9007, 'faithfulness': 0.8590, 'factual_correctness': 0.3740, 'answer_relevancy': 0.8847, 'context_entity_recall': 0.6033, 'noise_sensitivity_relevant': 0.3583}
parentdoc_eval_result = {'context_recall': 0.4700, 'faithfulness': 0.8063, 'factual_correctness': 0.5020, 'answer_relevancy': 0.8856, 'context_entity_recall': 0.5350, 'noise_sensitivity_relevant': 0.2828}
ensemble_eval_result = {'context_recall': 0.9150, 'faithfulness': 0.8827, 'factual_correctness': 0.4820, 'answer_relevancy': 0.9668, 'context_entity_recall': 0.6172, 'noise_sensitivity_relevant': 0.5000}

In [106]:
import pandas as pd

# Create a list of dictionaries
eval_results = [naive_eval_result, bm25_eval_result, reranking_eval_result, multiq_eval_result, parentdoc_eval_result, ensemble_eval_result]

# Create the DataFrame
df = pd.DataFrame(eval_results, index=['Naive RAG Chain', 'BM25 RAG Chain', 'Reranking RAG Chain', 'Multi-Query Retriever', 'Parent Document Retriever', 'Ensemble Retriever'])

# Display the DataFrame
df


Unnamed: 0,context_recall,faithfulness,factual_correctness,answer_relevancy,context_entity_recall,noise_sensitivity_relevant
Naive RAG Chain,0.915,0.9606,0.378,0.8825,0.5783,0.3729
BM25 RAG Chain,0.6557,0.7896,0.434,0.6862,0.5783,0.2035
Reranking RAG Chain,0.6707,0.7931,0.325,0.7875,0.6083,0.4482
Multi-Query Retriever,0.9007,0.859,0.374,0.8847,0.6033,0.3583
Parent Document Retriever,0.47,0.8063,0.502,0.8856,0.535,0.2828
Ensemble Retriever,0.915,0.8827,0.482,0.9668,0.6172,0.5


## LangSmith Evaluation

In [107]:
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangChain API Key:")

LangChain API Key:··········


In [112]:
from langsmith import Client
from langsmith.evaluation import LangChainStringEvaluator, evaluate

In [108]:
client = Client()

dataset_name = "John Wick!"

langsmith_dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="John Wick!"
)

In [110]:
eval_llm = ChatOpenAI(model="gpt-4o")

1. Iterate through RAGAS dataset
2. Change to LangSmith format - `question`, `answer` and `context`
3. Add to LangSmith dataset

In [109]:
for data_row in dataset.to_pandas().iterrows():
  client.create_example(
      inputs={
          "question": data_row[1]["user_input"]
      },
      outputs={
          "answer": data_row[1]["reference"]
      },
      metadata={
          "context": data_row[1]["reference_contexts"]
      },
      dataset_id=langsmith_dataset.id
  )

In [120]:
qa_evaluator = LangChainStringEvaluator("qa", config={"llm" : eval_llm},
                                        prepare_data=lambda run, example: {
                                            "prediction": run.outputs["response"],
                                            "reference": example.outputs["answer"],
                                            "input": example.inputs["question"],
                                        }
)

In [121]:
def langsmith_evaluate(chain, dataset_name):
  evaluate(
    chain.invoke,
    data=dataset_name,
    evaluators=[qa_evaluator],
    metadata={"revision_id": "default_chain_init"},
  )

In [122]:
langsmith_evaluate(naive_retrieval_chain, dataset_name)

View the evaluation results for experiment: 'spotless-bite-82' at:
https://smith.langchain.com/o/161c3e3f-f804-470c-8335-7e78e8516961/datasets/4b2a4ab7-1155-4765-aabe-53a45e4777cc/compare?selectedSessions=44aa324a-67db-497b-991a-f4098409da9e




0it [00:00, ?it/s]

In [123]:
langsmith_evaluate(bm25_retrieval_chain, dataset_name)

View the evaluation results for experiment: 'aching-prose-82' at:
https://smith.langchain.com/o/161c3e3f-f804-470c-8335-7e78e8516961/datasets/4b2a4ab7-1155-4765-aabe-53a45e4777cc/compare?selectedSessions=51da17d7-ba23-4a53-8c0b-2f670f70308d




0it [00:00, ?it/s]

In [124]:
langsmith_evaluate(contextual_compression_retrieval_chain, dataset_name)
langsmith_evaluate(multi_query_retrieval_chain, dataset_name)

View the evaluation results for experiment: 'roasted-vacation-86' at:
https://smith.langchain.com/o/161c3e3f-f804-470c-8335-7e78e8516961/datasets/4b2a4ab7-1155-4765-aabe-53a45e4777cc/compare?selectedSessions=9ebcccd7-20d4-4c40-b9e7-2948665298ab




0it [00:00, ?it/s]

View the evaluation results for experiment: 'brief-coffee-36' at:
https://smith.langchain.com/o/161c3e3f-f804-470c-8335-7e78e8516961/datasets/4b2a4ab7-1155-4765-aabe-53a45e4777cc/compare?selectedSessions=586e2769-59f4-4d6c-99b1-ab656fc05bec




0it [00:00, ?it/s]

In [125]:
langsmith_evaluate(parent_document_retrieval_chain, dataset_name)
langsmith_evaluate(ensemble_retrieval_chain, dataset_name)

View the evaluation results for experiment: 'slight-side-27' at:
https://smith.langchain.com/o/161c3e3f-f804-470c-8335-7e78e8516961/datasets/4b2a4ab7-1155-4765-aabe-53a45e4777cc/compare?selectedSessions=66d44621-7da7-468e-b3eb-aba898553f48




0it [00:00, ?it/s]

View the evaluation results for experiment: 'sparkling-desire-85' at:
https://smith.langchain.com/o/161c3e3f-f804-470c-8335-7e78e8516961/datasets/4b2a4ab7-1155-4765-aabe-53a45e4777cc/compare?selectedSessions=ed24e298-f11e-4095-a539-150089085d84




0it [00:00, ?it/s]

## Eval Analysis

#### RAGAS Eval Results

In [126]:
# Create a list of dictionaries
eval_results = [naive_eval_result, bm25_eval_result, reranking_eval_result, multiq_eval_result, parentdoc_eval_result, ensemble_eval_result]

# Create the DataFrame
df = pd.DataFrame(eval_results, index=['Naive RAG Chain', 'BM25 RAG Chain', 'Reranking RAG Chain', 'Multi-Query Retriever', 'Parent Document Retriever', 'Ensemble Retriever'])

# Display the DataFrame
df

Unnamed: 0,context_recall,faithfulness,factual_correctness,answer_relevancy,context_entity_recall,noise_sensitivity_relevant
Naive RAG Chain,0.915,0.9606,0.378,0.8825,0.5783,0.3729
BM25 RAG Chain,0.6557,0.7896,0.434,0.6862,0.5783,0.2035
Reranking RAG Chain,0.6707,0.7931,0.325,0.7875,0.6083,0.4482
Multi-Query Retriever,0.9007,0.859,0.374,0.8847,0.6033,0.3583
Parent Document Retriever,0.47,0.8063,0.502,0.8856,0.535,0.2828
Ensemble Retriever,0.915,0.8827,0.482,0.9668,0.6172,0.5


#### LangSmith Eval Results

**Retriever-Chain to LangSmith Experiment Number Mapping**
- Naive - #3
- Bm25 - #4
- Rerank - #5
- Multi-Query - #6
- Parent-doc - #7
- Ensemble - #8

![LangSmith Eval](./LangSmith_Eval.png)

#### Analysis of Results

- Ensemble Retriever shows the best results overall (RAGAS results), while also being the most expensive and highest latency (LangSmith results).
- BM25 is showing the worst results overall (RAGAS results), but has the lowest latency (LangSmith results).
- Multi-Query Retriever while almost as expensive and high latency as Ensemble Retriever (LangSmith results), gives slightly worse results than Ensemble Retriever (RAGAS results).
- Parent Document Retriever stand out as the most cost-effective and low latency option (LangSmith results), with mixed results (RAGAS results).
- Nothing beats Naive Retriever in `faithfulness`, and it shows good results on other metrics as well (RAGAS results), while still being cheap and relatively fast (LangSmith results).
- Rerank Retriever was not great on any metric.
