# Assignment 2: Implementing Retriever Functions in a RAG System

---

In this assignment, you will enhance your RAG system by implementing various retrieval functions. Your main tasks will include integrating semantic search and BM25 algorithms as retrieval methods. By the end of this assignment, you will be able to run and evaluate your RAG system with and without these retrieval functions to observe how each one affects performance and improves the quality of the generated answers.

In this assignment, you will:

* Use a library to implement BM25 search
* Implement semantic search using vector embeddings
* Implement the Reciprocal Rank Fusion algorithm to combine BM25 and semantic search
* Analyze how different retrieval methods impact the responses generated by the LLM



# Table of Contents
- [ 1 - Importing the libraries](#1)
- [ 2 - Loading the Dataset](#2)
- [ 3 - Retrieve Functions](#3)
  - [ 3.1 Query news by index](#3-1)
  - [ 3.2 BM25 Retrieve](#3-2)
    - [ Exercise 1](#ex01)
  - [ 3.3 Semantic Search](#3-3)
  - [ 3.4 Embeddings](#3-4)
    - [ Exercise 2](#ex02)
  - [ 3.5 RRF Retrieve](#3-5)
    - [ Exercise 3](#ex03)
- [ 4 - Completing the RAG System](#4)
  - [ 4.1 Creating the final prompt](#4-1)
  - [ 4.2 Experimenting with the RAG system](#4-2)
  - [ 4.3 Ask yourself](#4-3)


---
<h4 style="color:black; font-weight:bold;">USING THE TABLE OF CONTENTS</h4>

JupyterLab provides an easy way for you to navigate through your assignment. It's located under the Table of Contents tab, found in the left panel, as shown in the picture below.

![TOC Location](images/toc.png)

---

<h4 style="color:green; font-weight:bold;">TIPS FOR SUCCESSFUL GRADING OF YOUR ASSIGNMENT:</h4>

- All cells are frozen except for the ones where you need to submit your solutions or when explicitly mentioned you can interact with it.

- You can add new cells to experiment but these will be omitted by the grader, so don't rely on newly created cells to host your solution code, use the provided places for this.

- Avoid using global variables unless you absolutely have to. The grader tests your code in an isolated environment without running all cells from the top. As a result, global variables may be unavailable when scoring your submission. Global variables that are meant to be used will be defined in UPPERCASE.

- - To submit your notebook for grading, first save it by clicking the 💾 icon on the top left of the page and then click on the <span style="background-color: blue; color: white; padding: 3px 5px; font-size: 16px; border-radius: 5px;">Submit assignment</span> button on the top right of the page.
---

<a id='1'></a>
## 1 - Importing the libraries
---

Alright, let's get started by importing all of the necessary libraries needed for this assignment.

In [1]:
import joblib
import numpy as np
import bm25s
import os
from sentence_transformers import SentenceTransformer

In [2]:
from utils import (
    read_dataframe,
    pprint, 
    generate_with_single_input, 
    cosine_similarity,
    display_widget
)
import unittests

<a id='2'></a>
## 2 - Loading the Dataset
---
You will be working with the same Kaggle [BBC News dataset](https://www.kaggle.com/datasets/gpreda/bbc-news) as in Module 1. However, now you will focus on the retrieval part, implementing three different retrieval algorithms and experimenting with them.

In [3]:
NEWS_DATA = read_dataframe("news_data_dedup.csv")

Let's check the data structure.

In [4]:
pprint(NEWS_DATA[5])

{'guid': '18ba9f2676859f393a271d15692a9c6e',
 'title': 'WATCH: Would you pay a tourist fee to enter Venice?',
 'description': 'From Thursday visitors making a trip to the famous city at '
                'peak times will be charged a trial entrance fee.',
 'venue': 'BBC',
 'url': 'https://www.bbc.co.uk/news/world-europe-68898441',
 'published_at': '2024-04-25',
 'updated_at': '2024-04-26'}


<a id='3'></a>
## 3 - Retrieve Functions
---
In this assignment, you will focus on the retrieve part, so the other functions in the RAG system you saw previously will be given.

In RAG systems, as you saw in the lectures,the retrieve function is key to finding relevant information from a large set of documents. This step is fundamental to select the best documents to answer a specific query. 

**Retrieve Functions in RAG:**  
As you saw in the lectures, there are several retrieve algorithms used in RAG, in this assignment

**Semantic Search vs. BM25 Retrieve:**

1. **Semantic Search:**  
   This method uses advanced techniques to understand the meaning behind words in a query. Instead of just matching keywords, it looks at the context and relationships between words to find the best matches.

2. **BM25 Retrieve:**  
   BM25 is a traditional yet effective algorithm that scores documents based on how well they match a query. It looks at factors like how often a term appears in a document, how unique the term is, and the document's length. This helps in efficiently finding documents that are most relevant to the query.

In short, semantic search focuses on understanding the meaning of queries, while BM25 provides a reliable way to rank and retrieve documents, making both useful in RAG systems.

In this assignment, you will focus in this part:

<div align="center">
  <img src="images/retriever_overview.png" alt="RAG Overview" width="60%">
</div>

<a id='3-1'></a>
### 3.1 Query news by index

This function was used previously as a helper.

In [5]:
def query_news(indices):
    """
    Retrieves elements from a dataset based on specified indices.

    Parameters:
    indices (list of int): A list containing the indices of the desired elements in the dataset.
    dataset (list or sequence): The dataset from which elements are to be retrieved. It should support indexing.

    Returns:
    list: A list of elements from the dataset corresponding to the indices provided in list_of_indices.
    """
     
    output = [NEWS_DATA[index] for index in indices]

    return output

<a id='3-2'></a>
### 3.2 BM25 Retrieve



### Example of BM25 retrieve

Let's have an example of BM25 retrieve using the [bm25s](https://bm25s.github.io/) library.

In [6]:
# The corpus used will be the title appended with the description
corpus = [x['title'] + " " + x['description'] for x in NEWS_DATA]

# Instantiate the retriever by passing the corpus data
BM25_RETRIEVER = bm25s.BM25(corpus=corpus)

# Tokenize the chunks
tokenized_data = bm25s.tokenize(corpus)

# Index the tokenized chunks within the retriever
BM25_RETRIEVER.index(tokenized_data)

# Tokenize the same query used in the previous exercise
sample_query = "What are the recent news about GDP?"
tokenized_sample_query = bm25s.tokenize(sample_query)

# Get the retrieved results and their respective scores
results, scores = BM25_RETRIEVER.retrieve(tokenized_sample_query, k=3)

print(f"Results for query: {sample_query}\n")
for doc in results[0]:
  print(f"Document retrieved {corpus.index(doc)} : {doc}\n")

Split strings:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Count Tokens:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Compute Scores:   0%|          | 0/870 [00:00<?, ?it/s]

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

Results for query: What are the recent news about GDP?

Document retrieved 752 : GDP and the Dow Are Up. But What About American Well-Being? The standard ways of measuring economic growth don’t capture what life is like for real people. A new metric offers a better alternative, especially for seeing disparities across the country.

Document retrieved 673 : What the GDP Report Says About Inflation: A Hot First Quarter Thursday’s gross domestic product report suggests that a widely watched inflation reading due Friday could be worse than expected.




<a id='ex01'></a>

<a id='ex01'></a>
### Exercise 1

In this exercise, you will implement a BM25 retrieval function. This function will take two parameters:

* `query`: the search term or phrase you're interested in.
* `top_k`: the number of top relevant results you want to retrieve.

Your task is to use the BM25 algorithm to find the most relevant documents from a corpus based on the given query. You may refer back to the code above to help complete this exercise.

<details>
<summary style="color:green;">Hint 1</summary>

Start by tokenizing the query. You will need to call the <code>tokenize</code> function to split the query into manageable parts. Use the <code>bm25s.tokenize</code> function with the appropriate parameter.

</details>

<details>
<summary style="color:green;">Hint 2</summary>

Make sure the corpus is indexed. This can be done by preparing the retriever with the document data before performing retrieval.
Use the <code>.index</code> method of <code>BM25\_RETRIEVER</code>.

</details>

<details>
<summary style="color:green;">Hint 3</summary>

Use the BM25 retriever to calculate scores and retrieve documents. You’ll want to retrieve the top <code>k</code> documents.
Remember that <code>BM25\_RETRIEVER</code> has a method called <code>.retrieve</code>.

</details>


In [7]:
# Use these as a global defined BM25 retriever objects

corpus = [x['title'] + " " + x['description'] for x in NEWS_DATA]
BM25_RETRIEVER = bm25s.BM25(corpus=corpus)
TOKENIZED_DATA = bm25s.tokenize(corpus)
BM25_RETRIEVER.index(TOKENIZED_DATA)

Split strings:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Count Tokens:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Compute Scores:   0%|          | 0/870 [00:00<?, ?it/s]

In [20]:
# GRADED CELL

def bm25_retrieve(query: str, top_k: int = 5):
    """
    Retrieves the top k relevant documents for a given query using the BM25 algorithm.

    This function tokenizes the input query and uses a pre-indexed BM25 retriever to
    search through a collection of documents. It returns the indices of the top k documents
    that are most relevant to the query.

    Args:
        query (str): The search query for which documents need to be retrieved.
        top_k (int): The number of top relevant documents to retrieve. Default is 5.

    Returns:
        List[int]: A list of indices corresponding to the top k relevant documents
        within the corpus.
    """
    ### START CODE HERE ###

    # Tokenize the query using the 'tokenize' function from the 'bm25s' module
    tokenized_query = bm25s.tokenize(query)

    # Index the tokenized chunks with the retriever
    BM25_RETRIEVER.index(TOKENIZED_DATA)
    
    # Use the 'BM25_RETRIEVER' to retrieve documents and their scores based on the tokenized query
    # Retrieve the top 'k' documents
    results, scores = BM25_RETRIEVER.retrieve(tokenized_query, k=top_k)

    # Extract the first element from 'results' to get the list of retrieved documents
    results = results[0]

    # Convert the retrieved documents into their corresponding indices in the results list
    top_k_indices = [corpus.index(doc) for doc in results[:top_k]]

    ### END CODE HERE ###
    
    return top_k_indices
    

In [21]:
# Output is a list of indices
bm25_retrieve("What are the recent news about GDP?")

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Count Tokens:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Compute Scores:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

[752, 673, 289, 626, 43]

**Expected output**
```
[752, 673, 289, 626, 43]
```

In [22]:
# Test your function!
unittests.test_bm25_retrieve(bm25_retrieve)

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Count Tokens:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Compute Scores:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Count Tokens:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Compute Scores:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

[92m All tests passed!


<a id='3-3'></a>
### 3.3 Semantic Search

Semantic search enhances traditional search by focusing on the meaning behind queries, rather than just matching keywords. The idea is to convert the sentences into vectors that preserve semantic relations and then use metrics to compare them.

<a id='3-4'></a>
### 3.4 Embeddings

A key component of semantic search is the use of embeddings, which are vector representations of text. These embeddings capture semantic meaning, allowing us to compare text based on context. One common way to measure the similarity between these vectors is through cosine similarity, which calculates how close two vectors are in high-dimensional space. This approach helps in finding content that is contextually similar to the user's query, leading to more accurate and meaningful search results.

We've pre-embedded the corpus for you, so you will just load it.

In [23]:
# Load the pre-computed embeddings with joblib
EMBEDDINGS = joblib.load("embeddings.joblib")

You will use the sentence_transformers library to load an embedding model.

In [24]:
model_name = os.path.join(os.environ['MODEL_PATH'],"BAAI/bge-base-en-v1.5" )
model = SentenceTransformer(model_name)

In [25]:
# Example usage
query = "RAG is awesome"
# Using, but truncating the result to not pollute the output, don't truncate it in the exercise.
model.encode(query)[:40]

array([ 0.00886302, -0.04775146, -0.00156089,  0.01309993, -0.00206938,
       -0.06157268,  0.01384688,  0.00101498, -0.04903949, -0.04762559,
       -0.03628184,  0.00478035, -0.03492182,  0.05323148,  0.02193964,
        0.03645132,  0.04029363, -0.00453639,  0.01883798, -0.03367384,
        0.02516192, -0.04843621, -0.04047944,  0.02590903,  0.02175229,
        0.03160364,  0.03937921, -0.03640463, -0.03113303, -0.01247228,
        0.03661649, -0.00458202, -0.00100169, -0.03188789,  0.02957137,
        0.01986158, -0.00737474,  0.02370178, -0.02151621, -0.07361361],
      dtype=float32)

### Example of cosine similarity and embedding

Let's see an example on using the cosine similarity. The function is the same used in the ungraded lab. You might check them! 

In [26]:
query1 = "What are the primary colors"
query2 = "Yellow, red and blue"
query3 = "Cats are friendly animals"

query1_embed = model.encode(query1)
query2_embed = model.encode(query2)
query3_embed = model.encode(query3)

print(f"Similarity between '{query1}' and '{query2}' = {cosine_similarity(query1_embed, query2_embed)[0]}")
print(f"Similarity between '{query1}' and '{query3}' = {cosine_similarity(query1_embed, query3_embed)[0]}")

Similarity between 'What are the primary colors' and 'Yellow, red and blue' = 0.7377141714096069
Similarity between 'What are the primary colors' and 'Cats are friendly animals' = 0.4508620798587799


**ATTENTION!**: The output of `cosine_similarity` is always a list with the similarities between the vector (first input) and the vector/array of vectors (second output)!

#### Example with the full embedding

Let's have an example with the entire embedding vectors.

In [27]:
query = "Taylor Swift"
query_embed = model.encode(query)
# The result is a matrix with one matrix per sample. Since there is only one sample (the query), it is a matrix with one matrix within.
# This is why you need to get the first element
similarity_scores = cosine_similarity(query_embed, EMBEDDINGS)
similarity_indices = np.argsort(-similarity_scores) # Sort on decreasing order (sort the negative on increasing order), but return the indices
# Top 2 indices
top_2_indices = similarity_indices[:2]
print(top_2_indices)

[350 176]


In [28]:
# Retrieving the data
query_news(top_2_indices)

[{'guid': '927257674585bb6ef669cf2c2f409fa7',
  'title': '‘The working class can’t afford it’: the shocking truth about the money bands make on tour',
  'description': 'As Taylor Swift tops $1bn in tour revenue, musicians playing smaller venues are facing pitiful fees and frequent losses. Should the state step in to save our live music scene?When you see a band playing to thousands of fans in a sun-drenched festival field, signing a record deal with a major label or playing endlessly from the airwaves, it’s easy to conjure an image of success that comes with some serious cash to boot – particularly when Taylor Swift has broken $1bn in revenue for her current Eras tour. But looks can be deceiving. “I don’t blame the public for seeing a band playing to 2,000 people and thinking they’re minted,” says artist manager Dan Potts. “But the reality is quite different.”Post-Covid there has been significant focus on grassroots music venues as they struggle to stay open. There’s been less focus on

<a id='ex02'></a>
### Exercise 2

Now it's time to build the `semantic_search_retrieve` function! You will use embeddings to represent the query and then apply the `cosine_similarity` function to compute how similar the query is to each document in the embedding matrix. The goal is to retrieve the indices of the top_k most similar documents by ordering the similarity scores in descending order.

In this exercise, you will explore how embeddings and cosine similarity can be used in a semantic search to effectively find contextually relevant documents.

<details>
<summary style="color:green;">Hint 1</summary>

Start by encoding the query into an embedding using the pre-trained model. Remember that the call is <code>model.encode(query)</code>.

</details>

<details>
<summary style="color:green;">Hint 2</summary>

Calculate the cosine similarity between the query embedding and all document embeddings. This will provide a set of similarity scores. You might write:  
<code>similarity_scores = cosine_similarity(...)</code>  
Make sure to pass the query embedding and document embeddings correctly and remember that the output of the function is a list with all the scores.

</details> 

<details>
<summary style="color:green;">Hint 3</summary>

Sort the similarity scores to find the order of most relevant documents. You will need to use:  
<code>similarity_indices = np.argsort(...)</code>, keep in mind that by default it sorts in ascending order, you need it in descending order! Check how it was done in the previous examples if you are not sure how to proceed. There is no unique way of doing it.
Then slice to get the top-k indices and convert them to integers.

</details>

In [33]:
# GRADED CELL 

def semantic_search_retrieve(query, top_k=5):
    """
    Retrieves the top k relevant documents for a given query using semantic search and cosine similarity.

    This function generates an embedding for the input query and compares it against pre-computed document
    embeddings using cosine similarity. The indices of the top k most similar documents are returned.

    Args:
        query (str): The search query for which relevant documents need to be retrieved.
        top_k (int): The number of top relevant documents to retrieve. Default value is 5.

    Returns:
        List[int]: A list of indices corresponding to the top k most relevant documents in the corpus.
    """
    ### START CODE HERE ###
    # Generate the embedding for the query using the pre-trained model
    query_embedding = model.encode(query)
    
    # Calculate the cosine similarity scores between the query embedding and the pre-computed document embeddings
    similarity_scores = cosine_similarity(query_embedding, EMBEDDINGS)
    
    # Sort the similarity scores in descending order and get the indices
    similarity_indices = np.argsort(-similarity_scores)

    # Select the indices of the top k documents as a numpy array
    top_k_indices_array = similarity_indices[:top_k]

    ### END CODE HERE ###
    
    # Cast them to int 
    top_k_indices = [int(x) for x in top_k_indices_array]
    
    return top_k_indices

In [34]:
# Let's see an example
semantic_search_retrieve("What are the recent news about GDP?")

[743, 673, 626, 752, 326]

**Expected output**
```
[743, 673, 626, 752, 326]
```

In [35]:
unittests.test_semantic_search_retrieve(semantic_search_retrieve, EMBEDDINGS)

[92m All tests passed!


<a id='3-5'></a>
### 3.5 RRF Retrieve

Reciprocal Rank Fusion (RRF) is an information retrieval technique used to combine results from multiple ranking systems. It aims to enhance the overall retrieval performance by integrating different ranking algorithms. RRF assigns a score to each document based on its rank in different result lists, allowing it to leverage the strengths of several retrieval approaches.

#### Formula

The RRF formula for computing the score of a document $d$ is:

$$ 
\text{Score}(d) = \sum_{r=1}^{n} \frac{1}{k + \text{rank}_r(d)} 
$$

where:
- $n$ is the number of ranking systems,
- $\text{rank}_r(d)$ is the rank of document $d$ in the $r$-th result list,
- $k$ is a constant to scale the contribution of each rank, often set to a small positive value.

The resulting RRF score is higher for documents that appear with high rankings across multiple systems, helping to combine different retrieval methodologies effectively.

<a id='ex03'></a>
### Exercise 3

In this exercise, you will implement the `reciprocal_rank_fusion` function. This function will take four parameters:
- `list1` and `list2`, which are lists of indices representing the top-ranked documents from two different retrieval systems.
- `top_k`, which is the number of top relevant indices you wish to retrieve after fusion. 
- `K`, a constant used in the Reciprocal Rank Fusion (RRF) formula to scale the influence of rank position.

Your task is to use the RRF algorithm to merge rankings from the two lists and output the indices of the top-k documents as determined by the combined RRF scores. This exercise will help you understand how RRF works to aggregate results from multiple retrieval systems, improving the overall search performance.

To complete this exercise, you will need an understanding of how to iterate over lists, calculate Reciprocal Rank scores, and effectively combine ranked results.

<details>
<summary style="color:green;">Hint 1</summary>

Begin by creating an empty dictionary to store RRF scores, mapping each document index to a cumulative score. Initialize it with:  
<code>rrf_scores = {}</code>

</details>

<details>
<summary style="color:green;">Hint 2</summary>

Iterate through each list and calculate scores. For each item, if it’s not already in the dictionary, add it with an initial score. Update the score by considering the rank and constant K. Look at:  
<code>if item not in rrf_scores:</code>  
<code>rrf_scores[item] = ...</code>  

</details>

<details>
<summary style="color:green;">Hint 3</summary>

After computing scores, sort the indices by their RRF scores in descending order to get the most relevant ones. You need to select the top results with:  
<code>sorted_items = sorted(rrf_scores, key=rrf_scores.get, reverse=True)</code>  
Then limit this to the top-k results by slicing:  
<code>top_k_indices = [... for ... in sorted_items[:top_k]]</code>

</details>

In [41]:
# GRADED CELL 
def reciprocal_rank_fusion(list1, list2, top_k=5, K=60):
    """
    Fuse rank from multiple IR systems using Reciprocal Rank Fusion.

    Args:
        list1 (list[int]): A list of indices of the top-k documents that match the query.
        list2 (list[int]): Another list of indices of the top-k documents that match the query.
        top_k (int): The number of top documents to consider from each list for fusion. Defaults to 5.
        K (int): A constant used in the RRF formula. Defaults to 60.

    Returns:
        list[int]: A list of indices of the top-k documents sorted by their RRF scores.
    """

    ### START CODE HERE ###

    # Create a dictionary to store the RRF scores for each document index
    rrf_scores = {}

    # Iterate over each document list
    for lst in [list1, list2]:
        # Calculate the RRF score for each document index
        for rank, item in enumerate(lst, start=1): # Start = 1 set the first element as 1 and not 0. 
                                                   # This is a convention on how ranks work (the first element in ranking is denoted by 1 and not 0 as in lists)
            # If the item is not in the dictionary, initialize its score to 0
            if item not in rrf_scores:
                rrf_scores[item] = 0
            # Update the RRF score for each document index using the formula 1 / (rank + K)
            rrf_scores[item] += (1/(rank+K))

    # Sort the document indices based on their RRF scores in descending order
    sorted_items = sorted(rrf_scores.items(), key=lambda x: x[1], reverse = True)

    # Slice the list to get the top-k document indices
    top_k_indices = [int(x[0]) for x in sorted_items[:top_k]]

    ### END CODE HERE ###

    return top_k_indices

In [42]:
list1 = semantic_search_retrieve('What are the recent news about GDP?')
list2 = bm25_retrieve('What are the recent news about GDP?')
rrf_list = reciprocal_rank_fusion(list1, list2)
print(f"Semantic Search List: {list1}")
print(f"BM25 List: {list2}")
print(f"RRF List: {rrf_list}")

Split strings:   0%|          | 0/1 [00:00<?, ?it/s]

BM25S Count Tokens:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Compute Scores:   0%|          | 0/870 [00:00<?, ?it/s]

BM25S Retrieve:   0%|          | 0/1 [00:00<?, ?it/s]

Semantic Search List: [743, 673, 626, 752, 326]
BM25 List: [752, 673, 289, 626, 43]
RRF List: [673, 752, 626, 743, 289]


**Expected output (order may vary)**
```
Semantic Search List: [743 673 626 752 326]
BM25 List: [752, 673, 289, 626, 43]
RRF List: [673, 752, 626, 743, 289]
```

In [43]:
unittests.test_reciprocal_rank_fusion(reciprocal_rank_fusion)

[92m All tests passed!


<a id='4'></a>
## 4 - Completing the RAG System

<a id='4-1'></a>
### 4.1 Creating the final prompt

Now you will proceed as you proceeded in the previous assignment. These functions are the same you wrote in the previous assignment, but adjusted to fit in this assignment.

In [44]:
def generate_final_prompt(query, top_k, retrieve_function = None, use_rag=True):
    """
    Generates an augmented prompt for a Retrieval-Augmented Generation (RAG) system by retrieving the top_k most 
    relevant documents based on a given query.

    Parameters:
    query (str): The search query for which the relevant documents are to be retrieved.
    top_k (int): The number of top relevant documents to retrieve.
    retrieve_function (callable): The function used to retrieve relevant documents. If 'reciprocal_rank_fusion', 
                                  it will combine results from different retrieval functions.
    use_rag (bool): A flag to determine whether to incorporate retrieved data into the prompt (default is True).

    Returns:
    str: A prompt that includes the top_k relevant documents formatted for use in a RAG system.
    """

    # Define the prompt as the initial query
    prompt = query
    
    # If not using rag, return the prompt
    if not use_rag:
        return prompt


    # Determine which retrieve function to use based on its name.
    if retrieve_function.__name__ == 'reciprocal_rank_fusion':
        # Retrieve top documents using two different methods.
        list1 = semantic_search_retrieve(query, top_k)
        list2 = bm25_retrieve(query, top_k)
        # Combine the results using reciprocal rank fusion.
        top_k_indices = retrieve_function(list1, list2, top_k)
    else:
        # Use the provided retrieval function.
        top_k_indices = retrieve_function(query=query, top_k=top_k)
    
    
    # Retrieve documents from the dataset using the indices.
    relevant_documents = query_news(top_k_indices)
    
    formatted_documents = []

    # Iterate over each retrieved document.
    for document in relevant_documents:
        # Format each document into a structured string.
        formatted_document = (
            f"Title: {document['title']}, Description: {document['description']}, "
            f"Published at: {document['published_at']}\nURL: {document['url']}"
        )
        # Append the formatted string to the main data string with a newline for separation.
        formatted_documents.append(formatted_document)

    retrieve_data_formatted = "\n".join(formatted_documents)
    
    prompt = (
        f"Answer the user query below. There will be provided additional information for you to compose your answer. "
        f"The relevant information provided is from 2024 and it should be added as your overall knowledge to answer the query, "
        f"you should not rely only on this information to answer the query, but add it to your overall knowledge."
        f"Query: {query}\n"
        f"2024 News: {retrieve_data_formatted}"
    )

    
    return prompt

In [None]:
def llm_call(query, retrieve_function = None, top_k = 5,use_rag = True):

    # Get the system and user dictionaries
    prompt = generate_final_prompt(query, top_k = top_k, retrieve_function = retrieve_function, use_rag = use_rag)

    generated_response = generate_with_single_input(prompt)

    generated_message = generated_response['content']
    
    return generated_message

In [None]:
query = "Recent news in technology. Provide sources."
print(llm_call(query, retrieve_function = semantic_search_retrieve))

<a id='4-2'></a>
### 4.2 Experimenting with the RAG system

Now it is time to test our RAG system! Run the code to generate a widget that will output 4 responses for each query using the following methods:

- RAG with Semantic Search
- RAG with BM25
- RAG with Reciprocal Rank Fusion
- Without RAG

You may use one of these questions to test, but feel free to ask your own!

* What were the most important events of the past year?
* How is global warming progressing in 2024?
* Tell me about the most recent advances in AI.
* Give me the most important facts from past year.

In [None]:
display_widget(llm_call, semantic_search_retrieve, bm25_retrieve, reciprocal_rank_fusion)

<a id='4-3'></a>
### 4.3 Ask yourself

In your opinion, which setup gave better results? Is there a type of query where one method outperforms the other?

Congratulations! You finished your second assignment. Keep it up!