# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [6]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [7]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [8]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [9]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [10]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [11]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans appear to involve errors and mismanagement by loan servicers, including:\n\n- Errors in loan balances and account information\n- Misapplication of payments, often favoring interest over principal\n- Incorrect or inconsistent reporting of loan status to credit bureaus\n- Unauthorized or unnotified transfers of loans between servicers\n- Problems with repayment plans and difficulty applying payments correctly\n- Issues related to loan fee charges, interest increases, and confusing loan terms\n- Disputes over loan ownership and handling, including privacy violations and improper disclosures\n\nOverall, a prevalent theme is that borrowers frequently face incorrect information, unhelpful or untransparent communication, and mishandling of their loans by servicers and agencies.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, some complaints did not get handled in a timely manner. For example:\n\n- One complaint (row 441) received on 03/28/25 from MOHELA was marked as "Timely response?": No, indicating it was not handled promptly.\n- Another complaint (row 67) received on 04/14/25 from EdFinancial Services was marked as "Timely response?": Yes, so this one was handled timely.\n- A different complaint (row 816) received on 04/05/25 from Maximus Federal Services was handled timely.\n- A complaint (row 474) from Nelnet received on 04/27/25 did not specify a delay, but the narrative indicates ongoing issues.\n\nOverall, at least one complaint was not handled in a timely manner, and some complaints experienced delays in resolution or response.'

In [13]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors including mismanagement and lack of clear communication from loan servicers, inability to afford increased payments, accumulation of interest during forbearance or deferment, and lack of awareness of repayment terms or changes in loan status. Many borrowers were also misled or insufficiently informed about how interest accrues, especially during forbearance or deferment periods, which often extended their repayment period and increased the total amount owed. Additionally, some borrowers faced administrative issues such as unnotified loan transfers, incorrect or inconsistent information about their loan balances, and difficulties in applying payments correctly. These challenges collectively contributed to their inability to repay their loans effectively.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [14]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [15]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [16]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"Based on the provided context, the most common issue with student loan complaints appears to involve problems with dealing with lenders or servicers, such as disputes over fees, difficulties with repayment processes (e.g., applying extra payments to interest rather than principal), receiving inaccurate or bad information about the loan, and issues related to loan terms and balances. \n\nIf I were to summarize, a frequent theme is the frustration and mistrust stemming from poor communication, perceived unfair practices, or mistakes made by loan servicers.\n\nHowever, I do not have data on all types of loans or wider industry statistics in this particular context. Therefore, the specific most common issue isn't explicitly stated, but handling issues with service providers, such as disputes over fees and repayment procedures, appears to be a recurring problem.\n\nIf you need a precise industry-wide statistic, I would recommend consulting a broader report or a survey on loan issues."

In [17]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, it appears that several complaints were handled in a timely manner, with responses marked as "Yes" for timely response, and the company\'s reply for those complaints was "Closed with explanation." However, there is also a significant complaint where the consumer reports ongoing issues and delays, such as multiple failed attempts to get the issue corrected and long durations of unanswered calls, indicating that not all complaints may have been handled promptly or effectively. \n\nIn summary, while many complaints received timely responses, there are indications that some complaints, especially ongoing or complex issues, may not have been handled in a fully timely manner.'

In [18]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including problems with their payment plans, lack of communication from the loan servicers, issues with payment processing (such as payments being reversed or not received), and difficulties in getting proper assistance or information about their loans. Some also experienced being misled or deceived by their loan servicers, who steered them into wrong types of forbearances or failed to notify them of important changes, leading to missed payments and negative impacts on their credit scores.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

##### ✅ Answer:

"What complaints were filed against MOHELA in March 2025?"

BM25 would be better than embeddings for this query because it excels at exact keyword matching. The query contains specific terms like "MOHELA" (a company name) and "March 2025" (a specific time period) that need precise matches.
BM25 uses bag-of-words representation and gives higher weight to rare but important terms like company names and dates. It would find documents containing both "MOHELA" and "2025" as exact keywords, ensuring you get complaints specifically about that company during that time period.
Embeddings, on the other hand, work by semantic similarity and might retrieve complaints about other loan servicers or different time periods that are conceptually similar but don't contain the exact terms you're looking for. For queries requiring specific entities, dates, or other precise identifiers, BM25's keyword-based approach provides better precision than embeddings' semantic approach.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [19]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [20]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, such as errors in loan balances, misapplied payments, wrongful denials of payment plans, misinformation, and mishandling of loan data. Many complaints involve inaccuracies in loan information, lack of communication, and improper handling of loan data, which can lead to confusion and financial discrepancies for borrowers.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, at least one complaint was handled in a timely manner. Specifically, the complaint regarding the issue with paying your mortgage payments (Complaint ID: 12973003) was responded to with a "Closed with explanation" status, and the response was marked as "Yes" for being timely.\n\nHowever, there are other complaints, such as the one regarding the long-term payment plan issue (Complaint ID: 12975634), which remains unresolved despite being open for over a year, indicating that some complaints did not get handled in a timely manner.\n\nIn summary, yes, some complaints were handled promptly, but others, like the long-term payment plan case, have not been resolved in a timely manner.'

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a lack of clear communication, inadequate information, and complex or unmanageable repayment options. Many borrowers were unaware that they would need to repay their loans, as they were not properly informed by financial aid officers about the requirements. Additionally, issues such as receiving incorrect or confusing account information, difficulties in setting up or maintaining payment plans, and the continuous accumulation of interest—especially when loans were put into forbearance or deferment—further complicated repayment. The problem was exacerbated by loan servicers, like NelNet, failing to notify borrowers of important changes or overdue payments, resulting in borrowers being unable to keep track of their loan status or make timely payments. Financial hardships and the complex nature of interest accumulation also played a significant role in making repayment unfeasible for many.'

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [24]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [25]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [26]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the information provided, the most common issues with loans highlighted in these complaints are:\n\n- Problems with loan servicing such as mishandling, errors in balances, misapplied payments, and wrongful denials of payment plans.\n- Difficulties with repayment options, especially issues with forbearance, income-driven repayment plans, and loan forgiveness or discharge.\n- Lack of communication or notices about account changes, payments, or account status.\n- Unauthorized or improper transfer of loan management between servicers without proper notice.\n- Problems with interest accrual, capitalization, or unfounded increase in loan balances.\n- Challenges in correcting errors, obtaining transparent account information, and dealing with customer service.\n- Mishandling of personal data or privacy violations relating to student loan information.\n\nWhile multiple issues are prevalent, the overarching commonality appears to be **mismanagement or mishandling by loan servicers, es

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints and their responses, yes, there are multiple instances where complaints were not handled in a timely manner. For example:\n\n- The complaint regarding "Dealing with your lender or servicer" (Complaint ID: 12709087) was marked as \'No\' for timely response.\n- The complaint about "Incorrect information on your report" (Complaint ID: 12744910) was also marked as \'Yes\' for timely response, indicating some delays but possibly within acceptable timeframes.\n- Several cases state that responses were \'Closed with explanation\' and responses were received after or within the expected timeframes, but the complaints often highlighted delays in investigation or correction (sometimes spanning over 30 days or more).\n\nFurthermore, multiple complaints mention difficulties in getting proper responses or resolution after significant delays—some over months or years—indicating that certain issues were indeed not handled promptly.\n\nIn conclusion, yes, multiple com

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors highlighted in the complaints:\n\n1. **Lack of Clear or Accurate Information:** Borrowers often reported not being properly informed about how interest accrues, the impact of forbearance, or available repayment options such as income-driven plans or loan forgiveness. Several complaints mention being pushed into forbearance without understanding that interest would continue to grow, making repayment more difficult over time.\n\n2. **Systemic Servicing Issues and Poor Communication:** Many borrowers experienced mishandling by loan servicers, such as unnotified transfers, incorrect account statuses, or delayed notices, which led to unintentional delinquency. For example, some complaints describe loans being reported as delinquent without proper notice or confirmation, impacting credit scores.\n\n3. **Interest Accumulation and Loan Management Practices:** The continued accumulation of interest during forbearan

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer:

Generating multiple reformulations of a user query can improve recall by capturing different ways to express the same intent. When you have just one query, you might miss relevant documents that use different terminology or phrasing. For example, if someone asks "What are common loan problems?", the system might only find documents with words like "loan" and "problems". But by creating multiple reformulations like "What issues do borrowers face?", "What complaints exist about lending?", and "What are typical loan difficulties?", the system can find documents that use synonyms, related terms, or different phrasings.

This approach helps because documents in the corpus might describe the same concepts using different vocabulary. Some might say "borrower complaints" while others say "loan issues" or "servicing problems". By searching with multiple query variations, you're more likely to find all the relevant documents, even if they don't use the exact same words as the original query. Think of it like asking the same question in different ways to make sure you don't miss anything important.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [29]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [30]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [31]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [32]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [33]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [34]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to the servicing and accuracy of loan information. Specifically, issues such as incorrect information on credit reports, errors in loan balances, misapplied payments, wrongful denials of payment plans, and misconduct by loan servicers are prevalent. Many complaints involve discrepancies in loan balances, interest rates, and credit reporting, indicating systemic issues in how loans are managed and reported.\n\nTherefore, the most common issue with loans, as evidenced by the complaints, is problems related to loan servicing and reporting errors.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, there are complaints indicating they did not get handled in a timely manner. Specifically, complaints related to complaints about delayed responses or unresolved issues include:\n\n- Complaint ID 12709087 (MOHELA): The consumer waited multiple times for responses, with instances where they did not hear back even after 15 days. The response explicitly mentions "Timely response?": "No."\n- Complaint ID 12935889 (MOHELA): Similar issues with delayed responses and extended wait times of 4 hours or more.\n- Complaint ID 13205525 (Nelnet, Inc.): The consumer sent a dispute over 30 days ago and has not yet received a response.\n\nTherefore, it is correct to say that some complaints did not get handled in a timely manner.'

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a variety of issues, including financial hardship, mismanagement, lack of adequate information, and external circumstances. For example, some borrowers experienced severe financial difficulties after graduation, making it difficult to make consistent payments. Others were misled about the value and manageability of their education and loans, or were not properly informed about when payments should begin. Additionally, factors such as institution closures, inability to find employment, and administrative errors by loan servicers also contributed to non-repayment.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [37]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [38]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [39]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"Based on the provided data, the most common issues with student loans are related to dealing with lenders or servicers, including mismanagement, errors in loan balances, bad information about loans, and trouble with how payments are being handled. Many complaints involve incorrect or inconsistent loan balances, improper handling of deferments and forbearances, issues with interest calculation, and lack of clear communication or documentation from loan servicers. Therefore, the most common issue appears to be mismanagement and issues arising from how loans are handled by the servicers, leading to errors, confusion, and negative impacts on borrowers' credit reports and financial stability."

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, several complaints indicate that complaints were not handled in a timely manner. For example:\n\n- Complaint with Complaint ID 12935889 (MOHELA, NJ) received on 04/11/25 was marked "No" for timely response, meaning it was not handled promptly.\n- Complaint with Complaint ID 12668396 (MOHELA, NJ) received on 03/26/25 was marked "No" for timely response.\n- Multiple complaints from EdFinancial Services show "Yes" for timely response, indicating those were handled on time.\n- Complaint with Complaint ID 13062402 (Nelnet, MI) received on 04/18/25 was handled timely.\n- Complaint with Complaint ID 12823876 (EdFinancial, CA) received on 04/04/25 was marked "Yes" for timely response.\n- Others, such as Complaint ID 12739706 (MOHELA, NJ) and ID 12744910, also indicate responses were timely.\n\nHowever, at least one complaint (ID 12935889) explicitly states "No" under the "Timely response?" field, confirming that some complaints did not get handled prompt

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including:\n\n1. Lack of proper notifications and communication from loan servicers regarding payment due dates, account status, or transfer of loans, leading to unawareness of repayment obligations.\n2. Financial hardships and economic difficulties making it impossible to afford payments without relying on options like forbearance or deferment, which often lead to accumulating interest.\n3. Mismanagement and misleading information from loan servicers, including incorrect account information, errors in balances, or improper reporting to credit bureaus.\n4. Lack of awareness of available repayment options such as income-driven plans or loan forgiveness programs, resulting in entering into unaffordable repayment structures.\n5. Aggressive loan servicing practices, including coercive consolidation, forbearance steering, and failure to disclose legal rights and options.\n6. Administrative issues like improper transfer of loans, fa

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [42]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [43]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [44]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [45]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [46]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [47]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, particularly student loans based on the complaints provided, appears to be related to poor servicing and mismanagement. Common problems include difficulties with repayment plans (such as incorrect payment amounts or failure to process IDR recertifications), issues with loan reporting (such as inaccurate or outdated information, default statuses, or improper reporting to credit bureaus), lack of communication from servicers, and mishandling of account information. Many complaints also involve alleged delays, errors, or intentional stalling by loan servicers, which can cause borrower confusion, financial strain, and damage to credit scores.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that all the complaints listed received responses that were marked as "Closed with explanation" and have a "Timely response?" status of "Yes." This suggests that, according to the data, these complaints were handled in a timely manner.\n\nHowever, the context does not provide details on whether all complaints, especially those not listed, were handled promptly. Since all listed complaints show timely responses, I do not have evidence from this data indicating any complaints that were not handled in a timely manner.\n\nTherefore, based on the available information, I would answer:  \nNo, there is no indication that any complaints in this dataset did not get handled in a timely manner.'

In [49]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including difficulties with loan servicing and communication issues, problems with loan documentation and verification, illegal reporting and collection practices, and mishandling or breach of personal information. Some borrowers experienced delays and errors in processing payments, mismanagement by the loan servicers, or disputes over the legitimacy and legality of their debt due to changes in government policies or data breaches. Additionally, some faced obstacles when trying to resolve issues related to loan forgiveness, re-amortization, or disputes about default status, which contributed to their inability to repay or resolve their loans promptly.'

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer:

Semantic chunking would likely create very small chunks or fail to find meaningful breakpoints between sentences. Since all the sentences would have similar semantic content, the algorithm wouldn't detect significant differences to split on.

To adjust the algorithm for this scenario -
- Use a more sensitive percentile or standard deviation setting so the algorithm can detect smaller semantic differences between repetitive sentences.
- Set a minimum number of sentences per chunk to prevent creating overly small fragments, even when semantic differences are minimal.
- For highly repetitive content like FAQs, traditional character-based or sentence-based chunking might work better than semantic chunking since the content doesn't have natural semantic breaks.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

##### ✅ Answer:

### PLEASE CHECK [Activity1_Retriever_Evaluation.ipynb](./Activity1_Retriever_Evaluation.ipynb)