# Advanced RAG Exercise

This notebook is designed as an exercise to build a complete Retrieval-Augmented Generation (RAG) system. In this exercise, you will integrate three main components into a single pipeline:

1. **Retrieval Module** – Retrieve relevant documents based on a query.
2. **Transformation Module** – Transform the retrieved queries.
3. **Generation Module and Evaluation** – Use the transformed data to generate responses and evaluate the overall system performance.

In [1]:
import tqdm
import glob
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings  # For generating embeddings for text chunks
import faiss
import pickle
from dotenv import load_dotenv
import os
from groq import Groq
from sentence_transformers import SentenceTransformer
import random
from sentence_transformers import CrossEncoder
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm



## 1. Building the RAG Pipeline

Load the data and store it in a string.

In [2]:
### load the pdf from the path
glob_path = "data/*.pdf"
text = ""
for pdf_path in tqdm.tqdm(glob.glob(glob_path)):
    with open(pdf_path, "rb") as file:
        reader = PdfReader(file)
         # Extract text from all pages in the PDF
        text += " ".join(page.extract_text() for page in reader.pages if page.extract_text())

text[:50]

100%|██████████| 1/1 [00:00<00:00,  1.12it/s]


'   Einsatz von KI-Agenten zur Automatisierung einf'

Split the data into chunks.

In [3]:
splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=200)
# Split the extracted text into manageable chunks
chunks = splitter.split_text(text)

# Display the resulting chunks
chunks

['Einsatz von KI-Agenten zur Automatisierung einfacher Sachbearbeitungsaufgaben in Unternehmen   Disposition   Eingereicht bei der ZHAW School of Management and Law (SML) Philipp Stalder   Wissenschaftliche Methoden (2025-FS)   Abgegeben am 13.04.2025 von Sandro Uhler      I Inhaltsverzeichnis INHALTSVERZEICHNIS .............................................................................................................. I LITERATURVERZEICHNIS ....................................................................................................... II ABBILDUNGSVERZEICHNIS .................................................................................................. III 1 ABSTRACT UND KEYWORDS ........................................................................................... 1 2 PROBLEMSTELLUNG UND RELEV ANZ DES THEMAS .............................................. 2 3 STAND DER FORSCHUNG ........................................................................................

In [4]:
print(f"Total chunks: {len(chunks)}")
print("Preview of the first chunk:", chunks[0][:200])

Total chunks: 3
Preview of the first chunk: Einsatz von KI-Agenten zur Automatisierung einfacher Sachbearbeitungsaufgaben in Unternehmen   Disposition   Eingereicht bei der ZHAW School of Management and Law (SML) Philipp Stalder   Wissenschaftl


## Choose an embedding model
Use the SentenceTransfomer wrapper as we have done so far.
Models are found here: https://www.sbert.net/docs/sentence_transformer/pretrained_models.html
or on HuggingFace.

Embed the chunks.

In [27]:
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store = FAISS.from_embeddings(
    text_embeddings=list(zip(chunks, chunk_embeddings)),
    embedding=embedding_model)

## 3. Build Index and save index

In [28]:
d = chunk_embeddings.shape[1]
print(d)

384


In [29]:
index = faiss.IndexFlatL2(d)
index.add(chunk_embeddings)
print("Number of embeddings in FAISS index:", index.ntotal)

Number of embeddings in FAISS index: 3


In [30]:
import os

os.makedirs("faiss", exist_ok=True)  # Erstellt den Ordner nur, wenn er nicht existiert


faiss.write_index(index, "faiss/faiss_index.index")
with open("faiss/chunks_mapping.pkl", "wb") as f:
    pickle.dump(chunks, f)

In [31]:
vector_store.save_local("faiss_langchain")

## Load Key for language Models

In [38]:
load_dotenv()
# Access the API key using the variable name defined in the .env file
google_api_key = os.getenv("GOOGLE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")
groq_api_key = os.getenv("GROQ_API_KEY")


## 4. Build a retriever function

arguments: query, k, index, chunks, embedding model

return: retrieved texts, distances

In [34]:
import numpy as np
from langchain_community.vectorstores import FAISS

def retrieve_documents(query, k, index, chunks, embedding_model):
    """
    Retrieves documents using FAISS index
    
    Args:
        query: Search string
        k: Number of results to return
        index: FAISS index
        chunks: List of original text chunks
        embedding_model: Embedding model
    
    Returns:
        retrieved_texts: List of top-k document texts
        distances: List of corresponding L2 distances
    """
    # 1. Embed the query (handle both model types)
    if hasattr(embedding_model, 'encode'):  # SentenceTransformer
        query_embedding = embedding_model.encode([query])
    else:  # HuggingFaceEmbeddings
        query_embedding = np.array([embedding_model.embed_query(query)], dtype='float32')
    
    # Ensure correct shape (1 x embedding_dim)
    if len(query_embedding.shape) == 1:
        query_embedding = query_embedding.reshape(1, -1)
    
    # 2. FAISS search (requires numpy array)
    distances, indices = index.search(query_embedding, k)
    
    # 3. Get results
    retrieved_texts = [chunks[i] for i in indices[0]]
    return retrieved_texts, distances[0].tolist()

# Usage example
retrieved_texts, distances = retrieve_documents(
    query="What is the main topic?",
    k=4,
    index=index,          # Your FAISS index
    chunks=chunks,        # Your text chunks
    embedding_model=embeddings  # Your embedding model
)

## 5. Build an answer function
Build an answer function that takes a query, k, an index and the chunks.

return: answer

In [57]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

def answer_query(query, k, index, chunks):
    """
    Generates an answer to a query using retrieved documents and a Hugging Face model.

    Args:
        query (str): The input query.
        k (int): Number of documents to retrieve.
        index: FAISS index for document retrieval.
        chunks (list): List of text chunks.

    Returns:
        str: The generated answer.
    """
    # Retrieve documents using the retrieve function
    retrieved_texts, _, _ = retrieve(query, k=k)
    
    if not retrieved_texts:
        return "I don't know the answer."
    
    # Combine retrieved texts into context
    context = "\n\n".join(retrieved_texts)
    
    # Define prompt
    prompt = f"""
    Answer the user's question based on the below context. If the context is not relevant to the question, say "I don't know the answer."
    Question: {query}
    Context: {context}
    Answer:
    """
    
    # Load Hugging Face model and tokenizer
    model_name = "google/flan-t5-base"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
    
    # Tokenize and generate answer
    inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True)
    outputs = model.generate(inputs["input_ids"], max_length=200, num_beams=5)
    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    return answer.strip()

#### Test your RAG

In [59]:

query = "Was können KI Agenten in Unternehmen bezwecken?"
answer = answer_query(query, 5, index, chunks)
print("LLM Answer:", answer)

LLM Answer: I don't know the answer.


## 6. Create a Rewriter

Take a query and an api key for the model and rewrite the query. 

Rewriting a query: A Language Model is prompted to rewrite a query to better suit a task.

Other Transfomrations are implemented in a similar fashion, this is just an example!

In [60]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Globale Initialisierung des Modells und Tokenizers
model_name = "google/flan-t5-base"
global_tokenizer = AutoTokenizer.from_pretrained(model_name)
global_model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

def rewrite_query(query):
    """
    Rewrites a query to make it clearer or more suitable for a task using a Hugging Face model.

    Args:
        query (str): The input query to be rewritten.

    Returns:
        str: The rewritten query.
    """
    try:
        # Define the prompt for rewriting the query
        prompt = f"""
        Rewrite the following query to make it clearer, more precise, or better suited for retrieving relevant information:
        Original query: {query}
        Rewritten query:
        """
        
        # Tokenize the prompt
        inputs = global_tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True)
        
        # Generate the rewritten query
        outputs = global_model.generate(
            inputs["input_ids"],
            max_length=100,
            num_beams=5,
            early_stopping=True
        )
        
        # Decode the output
        rewritten_query = global_tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        return rewritten_query.strip()
    
    except Exception as e:
        return f"Error rewriting query: {str(e)}"

## 7. Implement the rewriter into your answer function

In [63]:
from groq import Groq
import os

def rewrite_query(query, groq_api_key):
    """
    Rewrites a query to make it clearer or more suitable for a task using the Groq API.

    Args:
        query (str): The input query to be rewritten.
        groq_api_key (str): The Groq API key.

    Returns:
        str: The rewritten query.
    """
    try:
        # Initialize Groq client
        client = Groq(api_key=groq_api_key)
        
        # Define the prompt for rewriting the query
        prompt = f"""
        Rewrite the following query to make it clearer, more precise, or better suited for retrieving relevant information:
        Original query: {query}
        Rewritten query:
        """
        
        # Call Groq API
        response = client.chat.completions.create(
            model="llama3-8b-8192",  # Groq-Modell, kann z. B. auch "mixtral-8x7b-32768" sein
            messages=[
                {"role": "system", "content": "You are a helpful assistant that rewrites queries to improve clarity and precision."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.5,
            max_tokens=100
        )
        
        # Extract the rewritten query
        rewritten_query = response.choices[0].message.content.strip()
        return rewritten_query
    
    except Exception as e:
        print(f"Error rewriting query: {str(e)}")
        return query  # Fallback: Originale Query zurückgeben

def answer_query_with_rewriting(query, k, index, chunks, groq_api_key):
    """
    Generates an answer to a query using a rewritten query, retrieved documents, and the Groq API.

    Args:
        query (str): The input query.
        k (int): Number of documents to retrieve.
        index: FAISS index for document retrieval.
        chunks (list): List of text chunks.
        groq_api_key (str): The Groq API key.

    Returns:
        str: The generated answer.
    """
    try:
        # Schritt 1: Query umformulieren
        rewritten_query = rewrite_query(query, groq_api_key)
        print(f"Original Query: {query}")
        print(f"Rewritten Query: {rewritten_query}")
        
        # Schritt 2: Dokumente mit der umformulierten Query abrufen
        retrieved_texts, _, _ = retrieve(query=rewritten_query, k=k)
        
        if not retrieved_texts:
            return "I don't know the answer."
        
        # Schritt 3: Kontext aus abgerufenen Texten erstellen
        context = "\n\n".join(retrieved_texts)
        
        # Schritt 4: Prompt für die Antwortgenerierung
        prompt = f"""
        Answer the user's question based on the below context. If the context is not relevant to the question, say "I don't know the answer."
        Question: {rewritten_query}
        Context: {context}
        Answer:
        """
        
        # Schritt 5: Groq-Client initialisieren
        client = Groq(api_key=groq_api_key)
        
        # Schritt 6: Antwort generieren
        response = client.chat.completions.create(
            model="llama3-8b-8192",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.5,
            max_tokens=200
        )
        
        # Antwort extrahieren
        answer = response.choices[0].message.content.strip()
        return answer
    
    except Exception as e:
        return f"Error generating answer: {str(e)}"

#### Test it

In [64]:
query = "What is the most important factor in diagnosing asthma?"
answer = answer_query_with_rewriting(query, 5, index, chunks, groq_api_key)
print("LLM Answer:", answer)

Original Query: What is the most important factor in diagnosing asthma?
Rewritten Query: Here's a rewritten query that's clearer, more precise, and better suited for retrieving relevant information:

"Which clinical or laboratory criteria are most consistently correlated with accurate asthma diagnosis, and what is the relative importance of each factor in distinguishing asthma from other respiratory conditions?"

This rewritten query:

1. Clarifies the scope: Instead of asking a broad, open-ended question, the rewritten query focuses on a specific aspect of asthma diagnosis.
2. Specifies the goal: It clearly states the intention to identify the most
LLM Answer: I don't know the answer. The provided context is about the implementation of KI-agents (Artificial Intelligence agents) in companies to automate administrative tasks, and it seems to be a research paper or a thesis. The query is about the clinical or laboratory criteria that are most consistently correlated with accurate asthma 

## 8 .Evaluation

Select random chunks from all your chunks, and generate a question to each of these chunks

In [67]:
import random
import time
import httpx
from groq import Groq
from tqdm import tqdm

def generate_questions_for_random_chunks(chunks, groq_api_key, num_chunks=20, max_retries=3):
    """
    Randomly selects a specified number of text chunks from the provided list,
    then generates a question for each selected chunk using the Groq LLM.

    Parameters:
    - chunks (list): List of text chunks.
    - groq_api_key (str): Your Groq API key.
    - num_chunks (int): Number of chunks to select randomly (default is 20).
    - max_retries (int): Maximum number of retries for API calls (default is 3).

    Returns:
    - questions (list of tuples): Each tuple contains (chunk, generated_question).
    """
    # Check if chunks list is empty or num_chunks is invalid
    if not chunks:
        print("Error: The chunks list is empty.")
        return []
    if num_chunks <= 0:
        print("Error: num_chunks must be positive.")
        return []
    
    # Adjust num_chunks if it exceeds the number of available chunks
    num_chunks = min(num_chunks, len(chunks))
    
    # Randomly select the desired number of chunks
    selected_chunks = random.sample(chunks, num_chunks)
    
    # Initialize the Groq client
    client = Groq(api_key=groq_api_key)
    
    questions = []
    for chunk in tqdm(selected_chunks):
        # Build a prompt that asks the LLM to generate a question based on the chunk
        prompt = (
            "Based on the following text, generate an insightful question that covers its key content:\n\n"
            "Text:\n" + chunk + "\n\n"
            "Question:"
        )
        
        messages = [
            {"role": "system", "content": "You are a helpful assistant that generates insightful questions based on text."},
            {"role": "user", "content": prompt}
        ]
        
        generated_question = None
        attempt = 0
        
        # Try calling the API with retry logic
        while attempt < max_retries:
            try:
                llm_response = client.chat.completions.create(
                    model="llama3-8b-8192",
                    messages=messages,
                    temperature=0.5,
                    max_tokens=100
                )
                generated_question = llm_response.choices[0].message.content.strip()
                break  # Exit the loop if successful
            except httpx.ReadTimeout:
                attempt += 1
                print(f"Timeout occurred for chunk. Retrying attempt {attempt}/{max_retries}...")
                time.sleep(2)  # Wait before retrying
            except Exception as e:
                attempt += 1
                print(f"Error occurred: {str(e)}. Retrying attempt {attempt}/{max_retries}...")
                time.sleep(2)
        
        # If all attempts fail, use an error message
        if generated_question is None:
            generated_question = "Error: Failed to generate question after several retries."
        
        questions.append((chunk, generated_question))
    
    return questions

#### Test it

In [68]:
questions = generate_questions_for_random_chunks(chunks, num_chunks=5, max_retries=2)
for idx, (chunk, question) in enumerate(questions, start=1):
    print(f"Chunk {idx}:\n{chunk[:100]}...\nGenerated Question: {question}\n")

TypeError: generate_questions_for_random_chunks() missing 1 required positional argument: 'groq_api_key'

## 9.Test the questions with your built retriever

In [69]:
def answer_generated_questions(question_tuples, k, index, texts, groq_api_key):
    """
    For each (chunk, generated_question) tuple in the provided list, use the prebuilt
    retrieval function to generate an answer for the generated question. The function
    returns a list of dictionaries containing the original chunk, the generated question,
    and the answer.
    
    Parameters:
    - question_tuples (list of tuples): Each tuple is (chunk, generated_question)
    - k (int): Number of retrieved documents to use for answering.
    - index: The FAISS index.
    - texts (list): The tokenized text chunks mapping.
    - groq_api_key (str): Your Groq API key.
    
    Returns:
    - results (list of dict): Each dict contains 'chunk', 'question', and 'answer'.
    """
    results = []
    for chunk, question in question_tuples:
        # Use your retrieval-based answer function. Here we assume the function signature is:
        # answer_query(query, k, index, texts, groq_api_key)
        answer = answer_query(question, k, index, texts) #query, k, index,texts
        results.append({
            "chunk": chunk,
            "question": question,
            "answer": answer
        })
    return results

#### Check the results

In [70]:
results = answer_generated_questions(questions, 5, index, chunks, groq_api_key)

for item in results:
    print("Chunk Preview:", item['chunk'][:100])
    print("Generated Question:", item['question'])
    print("Answer:", item['answer'])
    print("-----------------------------")

NameError: name 'questions' is not defined

## Evaluate the answers

In [71]:
import pandas as pd
def evaluate_answers_binary(results, groq_api_key, max_retries=3):
    """
    Evaluates each answer in the results list using an LLM.
    For each result (a dictionary containing 'chunk', 'question', and 'answer'),
    it sends an evaluation prompt to the Groq LLM which outputs 1 if the answer is on point,
    and 0 if it is missing the point.
    
    Parameters:
    - results (list of dict): Each dict must contain keys 'chunk', 'question', and 'answer'.
    - groq_api_key (str): Your Groq API key.
    - max_retries (int): Maximum number of retries if the API call times out.
    
    Returns:
    - df (pandas.DataFrame): A dataframe containing the original chunk, question, answer, and evaluation score.
    """
    evaluations = []
    client = OpenAI(api_key=openai_api_key)
    
    for item in tqdm.tqdm(results, desc="Evaluating Answers"):
        # Build the evaluation prompt.
        prompt = (
            "Evaluate the following answer to the given question. "
            "If the answer is accurate and complete, reply with 1. "
            "If the answer is inaccurate, incomplete, or otherwise not acceptable, reply with 0. "
            "Do not include any extra text.\n\n"
            "Question: " + item['question'] + "\n\n"
            "Answer: " + item['answer'] + "\n\n"
            "Context (original chunk): " + item['chunk'] + "\n\n"
            "Evaluation (1 for good, 0 for bad):"
        )
        
        messages = [{"role": "system", "content": prompt}]
        
        generated_eval = None
        attempt = 0
        
        # Retry logic in case of timeouts or errors.
        while attempt < max_retries:
            try:
                llm_response = client.chat.completions.create(
                    messages=messages,
                    model="4o-mini"
                )
                generated_eval = llm_response.choices[0].message.content.strip()
                break  # Exit the retry loop if successful.
            except httpx.ReadTimeout:
                attempt += 1
                print(f"Timeout occurred during evaluation. Retrying attempt {attempt}/{max_retries}...")
                time.sleep(2)
            except Exception as e:
                attempt += 1
                print(f"Error during evaluation: {e}. Retrying attempt {attempt}/{max_retries}...")
                time.sleep(2)
        
        # If no valid evaluation was produced, default to 0.
        if generated_eval is None:
            generated_eval = "0"
        
        # Convert the response to an integer (1 or 0).
        try:
            score = int(generated_eval)
            if score not in [0, 1]:
                score = 0
        except:
            score = 0
        
        evaluations.append(score)
    
    # Add the evaluation score to each result.
    for i, item in enumerate(results):
        item['evaluation'] = evaluations[i]
    
    # Create a dataframe for manual review.
    df = pd.DataFrame(results)
    return df

### Display them

In [72]:
df_evaluations = evaluate_answers_binary(results, openai_api_key)
display(df_evaluations)

NameError: name 'results' is not defined