# Advanced RAG Exercise

This notebook is designed as an exercise to build a complete Retrieval-Augmented Generation (RAG) system. In this exercise, you will integrate three main components into a single pipeline:

1. **Retrieval Module** – Retrieve relevant documents based on a query.
2. **Transformation Module** – Transform the retrieved queries.
3. **Generation Module and Evaluation** – Use the transformed data to generate responses and evaluate the overall system performance.

In [159]:
import tqdm
import glob
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings  # For generating embeddings for text chunks
import faiss
import pickle
from dotenv import load_dotenv
import os
from groq import Groq
from sentence_transformers import SentenceTransformer
import random
from sentence_transformers import CrossEncoder
import numpy as np


## 1. Building the RAG Pipeline

Load the data and store it in a string.

In [160]:
### load the pdf from the path
glob_path = "data/*.pdf"
text = ""
for pdf_path in tqdm.tqdm(glob.glob(glob_path)):
    with open(pdf_path, "rb") as file:
        reader = PdfReader(file)
         # Extract text from all pages in the PDF
        text += " ".join(page.extract_text() for page in reader.pages if page.extract_text())

100%|██████████| 2/2 [00:01<00:00,  1.23it/s]


Split the data into chunks.

In [161]:
# Create a splitter: 2000 characters per chunk with an overlap of 200 characters
splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)

# Split the full text into chunks
chunks = splitter.split_text(text)

In [162]:
print(f"Total chunks: {len(chunks)}")
print("Preview of the first chunk:", chunks[0][:200])

Total chunks: 130
Preview of the first chunk: Hyper tension in adul ts: 
diagnosis and manag emen t 
NICE guideline 
Published: 28 August 2019 
Last updat ed: 21 No vember 2023 
www .nice.or g.uk/guidance/ng136 
© NICE 202 4. All right s reserved


## Choose an embedding model
Use the SentenceTransfomer wrapper as we have done so far.
Models are found here: https://www.sbert.net/docs/sentence_transformer/pretrained_models.html
or on HuggingFace.

Embed the chunks.

In [163]:

from sentence_transformers import SentenceTransformer

# Load the embedding model
model_name = "paraphrase-multilingual-MiniLM-L12-v2"
model = SentenceTransformer(model_name)

# Embed the original chunks (130 chunks)
chunk_embeddings = model.encode(chunks, convert_to_numpy=True)

## 3. Build Index and save index

In [164]:
import faiss
import os
import pickle

# Build the FAISS index
d = chunk_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(chunk_embeddings)
print("Number of embeddings in FAISS index:", index.ntotal)  # Should print 130

# Save the index and chunks
os.makedirs("faiss", exist_ok=True)
faiss.write_index(index, "faiss/faiss_index.index")
with open("faiss/chunks_mapping.pkl", "wb") as f:
    pickle.dump(chunks, f)

Number of embeddings in FAISS index: 130


## Load Key for language Models

In [165]:
load_dotenv()
# Access the API key using the variable name defined in the .env file
google_api_key = os.getenv("GOOGLE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")
groq_api_key = os.getenv("GROQ_API_KEY")

## 4. Build a retriever function

arguments: query, k, index, chunks, embedding model

return: retrieved texts, distances

In [166]:
def retrieve(query, k, index, chunks, model):
    """
    Retrieve the top k similar text chunks and their distances for a given query.
    """
    query_embedding = model.encode([query], convert_to_numpy=True)
    distances, indices = index.search(query_embedding, k)
    retrieved_texts = [chunks[i] for i in indices[0]]
    return retrieved_texts, distances[0]

## 5. Build an answer function
Build an answer function that takes a query, k, an index and the chunks.

return: answer

In [167]:
def answer_query(query, k, index, chunks):
    """
    Answer a query using retrieved chunks and an LLM.
    """
    # Retrieve relevant chunks
    retrieved_texts, distances = retrieve(query, k, index, chunks, model)
    
    # Combine the retrieved texts into a single context
    context = "\n".join(retrieved_texts)
    
    # Build the prompt for the language model
    prompt = (
        f"Answer the following question based on the provided context:\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {query}\n\n"
        f"Answer:"
    )
    
    # Use OpenAI to generate the answer
    from openai import OpenAI
    client = OpenAI(api_key=openai_api_key)
    messages = [{"role": "system", "content": prompt}]
    
    try:
        llm_response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages
        )
        answer = llm_response.choices[0].message.content.strip()
    except Exception as e:
        print(f"Error generating answer: {e}")
        answer = "Error: Unable to generate an answer."
    
    return answer

#### Test your RAG

In [168]:

query = "What is the most important factor in diagnosing asthma?"
answer = answer_query(query, 5, index, chunks)
print("LLM Answer:", answer)

LLM Answer: The most important factor in diagnosing asthma is the demonstration of variable airflow obstruction, which can be assessed through various diagnostic tests such as measuring blood eosinophil counts, fractional exhaled nitric oxide (FeNO) levels, bronchodilator reversibility, or peak expiratory flow (PEF) variability. If asthma symptoms are suggestive, a combination of these tests is utilized to confirm the diagnosis.


## 6. Create a Rewriter

Take a query and an api key for the model and rewrite the query. 

Rewriting a query: A Language Model is prompted to rewrite a query to better suit a task.

Other Transfomrations are implemented in a similar fashion, this is just an example!

In [169]:
from groq import Groq

def rewrite_query(query, groq_api_key):
    """
    Rewrite a query using Grok to make it more suitable for retrieval.
    
    Parameters:
    - query (str): The original query.
    - groq_api_key (str): The Groq API key.
    
    Returns:
    - rewritten_query (str): The rewritten query.
    """
    client = Groq(api_key=groq_api_key)
    
    prompt = (
        f"Rewrite the following query to be clearer, more precise, and better suited for retrieving relevant information from a medical guideline document:\n\n"
        f"Original Query: {query}\n\n"
        f"Rewritten Query:"
    )
    
    messages = [{"role": "user", "content": prompt}]
    
    try:
        llm_response = client.chat.completions.create(
            model="mixtral-8x7b-32768",
            messages=messages,
            max_tokens=150,
            temperature=0.5
        )
        rewritten_query = llm_response.choices[0].message.content.strip()
    except Exception as e:
        print(f"Error rewriting query with Grok: {e}")
        rewritten_query = query  # Fallback to original query
    
    return rewritten_query

## 7. Implement the rewriter into your answer function

In [170]:
def answer_query_with_rewriting(query, k, index, chunks, groq_api_key):
    """
    Answer a query using a rewritten query (via Grok), retrieved chunks, and an LLM (OpenAI).
    
    Parameters:
    - query (str): The original query.
    - k (int): Number of chunks to retrieve.
    - index: The FAISS index.
    - chunks (list): The list of text chunks (130 chunks).
    - groq_api_key (str): The Groq API key for query rewriting.
    
    Returns:
    - answer (str): The generated answer.
    """
    # Rewrite the query using Grok
    rewritten_query = rewrite_query(query, groq_api_key)
    print(f"Original Query: {query}")
    print(f"Rewritten Query: {rewritten_query}")
    
    # Retrieve relevant chunks using the rewritten query
    retrieved_texts, distances = retrieve(rewritten_query, k, index, chunks, model)
    
    # Combine the retrieved texts into a single context
    context = "\n".join(retrieved_texts)
    
    # Build the prompt for the language model
    prompt = (
        f"Answer the following question based on the provided context:\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {query}\n\n"
        f"Answer:"
    )
    
    # Use OpenAI to generate the answer
    from openai import OpenAI
    client = OpenAI(api_key=openai_api_key)
    messages = [{"role": "system", "content": prompt}]
    
    try:
        llm_response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages
        )
        answer = llm_response.choices[0].message.content.strip()
    except Exception as e:
        print(f"Error generating answer: {e}")
        answer = "Error: Unable to generate an answer."
    
    return answer

#### Test it

In [171]:
query = "What is the most important factor in diagnosing asthma?"
answer = answer_query_with_rewriting(query, 5, index, chunks, groq_api_key)
print("LLM Answer:", answer)

Error rewriting query with Grok: Error code: 400 - {'error': {'message': 'The model `mixtral-8x7b-32768` has been decommissioned and is no longer supported. Please refer to https://console.groq.com/docs/deprecations for a recommendation on which model to use instead.', 'type': 'invalid_request_error', 'code': 'model_decommissioned'}}
Original Query: What is the most important factor in diagnosing asthma?
Rewritten Query: What is the most important factor in diagnosing asthma?
LLM Answer: The most important factor in diagnosing asthma is the objective testing of lung function, which includes measuring blood eosinophil counts, fractional exhaled nitric oxide (FeNO) levels, bronchodilator reversibility (BDR) through spirometry, and peak expiratory flow (PEF) variability. If asthma is suspected but not confirmed through these tests, further evaluations such as bronchial challenge tests may be conducted. Accurate diagnosis is crucial to ensuring appropriate management and treatment of the c

## 8 .Evaluation

Select random chunks from all your chunks, and generate a question to each of these chunks

In [172]:
import time
import httpx  # Ensure you're catching the correct timeout exception
from openai import OpenAI
def generate_questions_for_random_chunks(chunks, num_chunks=20, max_retries=3):
    """
    Randomly selects a specified number of text chunks from the provided list,
    then generates a question for each selected chunk using the Groq LLM.

    Parameters:
    - chunks (list): List of text chunks.
    - groq_api_key (str): Your Groq API key.
    - num_chunks (int): Number of chunks to select randomly (default is 20).

    Returns:
    - questions (list of tuples): Each tuple contains (chunk, generated_question).
    """
    # Randomly select the desired number of chunks.
    selected_chunks = random.sample(chunks, num_chunks)
    
    # Initialize the Groq client once
    client = OpenAI(api_key=openai_api_key)
    
    questions = []
    for chunk in tqdm.tqdm(selected_chunks):
        # Build a prompt that asks the LLM to generate a question based on the chunk.
        prompt = (
            "Based on the following text, generate an insightful question that covers its key content:\n\n"
            "Text:\n" + chunk + "\n\n"
            "Question:"
        )
        
        messages = [
            {"role": "system", "content": prompt}
        ]
        
        generated_question = None
        attempt = 0
        
        # Try calling the API with simple retry logic.
        while attempt < max_retries:
            try:
                llm_response = client.chat.completions.create(
                     model="gpt-4o-mini",
                    messages=messages
                )
                generated_question = llm_response.choices[0].message.content.strip()
                break  # Exit the loop if successful.
            except httpx.ReadTimeout:
                attempt += 1
                print(f"Timeout occurred for chunk. Retrying attempt {attempt}/{max_retries}...")
                time.sleep(2)  # Wait a bit before retrying.
        
        # If all attempts fail, use an error message as the generated question.
        if generated_question is None:
            generated_question = "Error: Failed to generate question after several retries."
        
        questions.append((chunk, generated_question))
    
    return questions

#### Test it

In [173]:
questions = generate_questions_for_random_chunks(chunks, num_chunks=5, max_retries=2)
for idx, (chunk, question) in enumerate(questions, start=1):
    print(f"Chunk {idx}:\n{chunk[:100]}...\nGenerated Question: {question}\n")

100%|██████████| 5/5 [00:05<00:00,  1.00s/it]

Chunk 1:
regular ICS plus SABA as needed. The committ ee therefore concluded t hat combination 
inhalers used...
Generated Question: What are the implications of the new recommendations for using combination inhalers as preferred treatment for newly diagnosed asthma in adults, especially in relation to costs and potential reduction in severe exacerbations compared to current treatment options?

Chunk 2:
(NG2 45)
© NICE 202 4. All right s reserved. Subject t o Notice of right s (https://www .nice.or g.u...
Generated Question: What evidence influenced the committee's recommendation regarding the management of asthma in children, particularly in relation to dose adjustment of inhaled corticosteroids and the importance of personalized action plans?

Chunk 3:
diagnosis of ast hma who ar e stable on t heir curr ent t herap y do not ha ve to swit ch 
treatment...
Generated Question: What changes in asthma treatment pathways are being recommended for individuals aged 12 and over, and how do th




## 9.Test the questions with your built retriever

In [174]:
def answer_generated_questions(question_tuples, k, index, texts, groq_api_key):
    """
    For each (chunk, generated_question) tuple in the provided list, use the prebuilt
    retrieval function to generate an answer for the generated question. The function
    returns a list of dictionaries containing the original chunk, the generated question,
    and the answer.
    
    Parameters:
    - question_tuples (list of tuples): Each tuple is (chunk, generated_question)
    - k (int): Number of retrieved documents to use for answering.
    - index: The FAISS index.
    - texts (list): The tokenized text chunks mapping.
    - groq_api_key (str): Your Groq API key.
    
    Returns:
    - results (list of dict): Each dict contains 'chunk', 'question', and 'answer'.
    """
    results = []
    for chunk, question in question_tuples:
        # Use your retrieval-based answer function. Here we assume the function signature is:
        # answer_query(query, k, index, texts, groq_api_key)
        answer = answer_query(question, k, index, texts) #query, k, index,texts
        results.append({
            "chunk": chunk,
            "question": question,
            "answer": answer
        })
    return results

#### Check the results

In [175]:
results = answer_generated_questions(questions, 5, index, chunks, groq_api_key)

for item in results:
    print("Chunk Preview:", item['chunk'][:100])
    print("Generated Question:", item['question'])
    print("Answer:", item['answer'])
    print("-----------------------------")

Chunk Preview: regular ICS plus SABA as needed. The committ ee therefore concluded t hat combination 
inhalers used
Generated Question: What are the implications of the new recommendations for using combination inhalers as preferred treatment for newly diagnosed asthma in adults, especially in relation to costs and potential reduction in severe exacerbations compared to current treatment options?
Answer: The new recommendations for using combination inhalers as the preferred treatment for newly diagnosed asthma in adults have several important implications:

1. **Reduction in Severe Exacerbations**: The evidence indicates that using combination inhalers (which include an inhaled corticosteroid (ICS) and a long-acting beta 2 agonist (LABA)) as needed leads to a significant reduction in severe asthma exacerbations compared to other treatment options, such as the use of short-acting beta agonists (SABA) alone or regular low-dose ICS plus SABA. This improvement in clinical outcomes emphasi

## Evaluate the answers

In [178]:
import pandas as pd
def evaluate_answers_binary(results, groq_api_key, max_retries=3):
    """
    Evaluates each answer in the results list using an LLM.
    For each result (a dictionary containing 'chunk', 'question', and 'answer'),
    it sends an evaluation prompt to the Groq LLM which outputs 1 if the answer is on point,
    and 0 if it is missing the point.
    
    Parameters:
    - results (list of dict): Each dict must contain keys 'chunk', 'question', and 'answer'.
    - groq_api_key (str): Your Groq API key.
    - max_retries (int): Maximum number of retries if the API call times out.
    
    Returns:
    - df (pandas.DataFrame): A dataframe containing the original chunk, question, answer, and evaluation score.
    """
    evaluations = []
    client = OpenAI(api_key=openai_api_key)
    
    for item in tqdm.tqdm(results, desc="Evaluating Answers"):
        # Build the evaluation prompt.
        prompt = (
            "Evaluate the following answer to the given question. "
            "If the answer is accurate and complete, reply with 1. "
            "If the answer is inaccurate, incomplete, or otherwise not acceptable, reply with 0. "
            "Do not include any extra text.\n\n"
            "Question: " + item['question'] + "\n\n"
            "Answer: " + item['answer'] + "\n\n"
            "Context (original chunk): " + item['chunk'] + "\n\n"
            "Evaluation (1 for good, 0 for bad):"
        )
        
        messages = [{"role": "system", "content": prompt}]
        
        generated_eval = None
        attempt = 0
        
        # Retry logic in case of timeouts or errors.
        while attempt < max_retries:
            try:
                llm_response = client.chat.completions.create(
                    messages=messages,
                    model="gpt-4o-mini"
                )
                generated_eval = llm_response.choices[0].message.content.strip()
                break  # Exit the retry loop if successful.
            except httpx.ReadTimeout:
                attempt += 1
                print(f"Timeout occurred during evaluation. Retrying attempt {attempt}/{max_retries}...")
                time.sleep(2)
            except Exception as e:
                attempt += 1
                print(f"Error during evaluation: {e}. Retrying attempt {attempt}/{max_retries}...")
                time.sleep(2)
        
        # If no valid evaluation was produced, default to 0.
        if generated_eval is None:
            generated_eval = "0"
        
        # Convert the response to an integer (1 or 0).
        try:
            score = int(generated_eval)
            if score not in [0, 1]:
                score = 0
        except:
            score = 0
        
        evaluations.append(score)
    
    # Add the evaluation score to each result.
    for i, item in enumerate(results):
        item['evaluation'] = evaluations[i]
    
    # Create a dataframe for manual review.
    df = pd.DataFrame(results)
    return df

### Display them

In [179]:
df_evaluations = evaluate_answers_binary(results, openai_api_key)
display(df_evaluations)

Evaluating Answers:   0%|          | 0/5 [00:00<?, ?it/s]

Evaluating Answers: 100%|██████████| 5/5 [00:03<00:00,  1.60it/s]


Unnamed: 0,chunk,question,answer,evaluation
0,regular ICS plus SABA as needed. The committ e...,What are the implications of the new recommend...,The new recommendations for using combination ...,1
1,(NG2 45)\n© NICE 202 4. All right s reserved. ...,What evidence influenced the committee's recom...,The committee's recommendation regarding the m...,1
2,diagnosis of ast hma who ar e stable on t heir...,What changes in asthma treatment pathways are ...,The recommended changes in asthma treatment pa...,1
3,put NICE guidance int o practice . Asthma: dia...,What are the key updates and recommendations f...,The key updates and recommendations from NICE ...,1
4,blood pr essur e to lower tar gets in people w...,What are the key updates in the NICE guideline...,The key updates in the NICE guidelines for man...,0
