<a href="https://colab.research.google.com/github/violetxs16/TIM-175/blob/main/Copy_of_TIM175_Week_8_PreLab_RAG_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Imports
Please make sure to run this first to set up the environment. This can take around a few minutes

In [None]:
#You do not need to make any changes to this cell, only run the cell by using the button on the left

%%capture
from sentence_transformers import SentenceTransformer
import numpy as np
from numpy.linalg import norm
model = SentenceTransformer("avsolatorio/GIST-large-Embedding-v0", revision=None)


!pip install pymupdf
!pip install -q -U google-generativeai
!pip install langchain-google-genai
!pip install ragas


from langchain_text_splitters import RecursiveCharacterTextSplitter, CharacterTextSplitter, TokenTextSplitter
import google.generativeai as genai



In [None]:
#You do not need to make any changes to this cell, only run the cell by using the button on the left

def cos(A, B, normalize=True): #normalize for cos sim, or do dot product
  """
  returns similarity between 2 vecors using cos sim
  """
  A, B = A, B
  if normalize:
      return (A @ B.T)/(norm(A)*norm(B))
  return A @ B.T

def read_file(file):
  """
  read in pdf or txt file
  """
  if file[-3:] == "txt":
    f = open(file)
    return f.read()
  elif file[-3:] == "pdf":
    import pymupdf
    text = ""
    doc = pymupdf.open(file)
    for page in doc:
      text += page.get_text()
    return text

  else:
    print("Please use a pdf or txt file for this task")
    return None


def generate(model, GOOGLE_API_KEY, query):
    """
    Creates a call to the API
    Takes in a model to choose which API
    Returns the updated tokens (int, int) and API response (str)
    """
    if model[0:6] == "gemini":
        genai.configure(api_key=GOOGLE_API_KEY)
        model = genai.GenerativeModel(model)
        response = model.generate_content(query)
        return response.text

    else: raise (f"Error, model {model} not found")

In [None]:
config = {
    "model": "gemini-2.0-flash",  # or other model IDs
    "temperature": 0.4,
    "max_tokens": None,
    "top_p": 0.8,
}

### The goal of this activity is to get practice using the RAGAS library to assess RAG outputs. To do this we will first need some RAG output to assess!

For now, we will set up a simple database with a document of your choosing. Since RAG is not the main focus this week we wont cover each step in depth. If you need a refresher of how anything works, please see last weeks PreLab.

### 1. Upload a document of your choosing. This can be the same document you used for the PreLab last week, or a completely new one! Try and keep it a reasonable length.

\* To upload files, click the folder icon on the left hand side of the screen and select "Upload to session storage".This will only keep the file as long as the session is running so if you come back to this task later, you will need to reupload your files *

In [None]:
# Upload the file (you already did this)
from google.colab import files
uploaded = files.upload()

# Get the filename directly from uploaded dict
file_name = list(uploaded.keys())[0]  # Should be '011_ Brook Ewoldsen_Fashion Institute Of Design (FIDM) .txt'

document = read_file(file_name)

Saving 011_ Brook Ewoldsen_Fashion Institute Of Design (FIDM) .txt to 011_ Brook Ewoldsen_Fashion Institute Of Design (FIDM)  (2).txt


In [None]:
#You do not need to make any changes to this cell, only run the cell by using the button on the left

print(document) # Make sure everything looks right

﻿Interviewee: Brook Ewoldsen
Industry Sectors: Fashion and Interior Design
Source: https://soundcloud.com/what-to-be/fashion-institute-of-design-fidm-guest-speaker-brook-ewoldsen?utm_source=clipboard&utm_medium=text&utm_campaign=social_sharinghttps://soundcloud.com/what-to-be/fashion-institute-of-design-fidm-guest-speaker-brook-ewoldsen?utm_source=clipboard&utm_medium=text&utm_campaign=social_sharing








# INTRO
Interviewer  0:04  
You're listening to k squared Santa Cruz at 90.7 FM. My name is Interviewer and I am an intern at Your futures our business, a Santa Cruz County nonprofit which helps students ages 10 to 18. Explore Careers and start their career journey. We provide career expos, panels, guest speakers and more. Today I'm here with my other host Interviewer.




Interviewer  0:26  
Hello everyone. Our show want to be highlights the career journeys of inspirational people in Santa Cruz County. If you've ever thought Hmm, how did they get that job? Or what does that job re

### 2. Chunk your document and create local database

In [None]:
#You do not need to make any changes to this cell, only run the cell by using the button on the left

def get_text_splitter(splitter_type="character", chunk_size=1000, chunk_overlap=200):
    """
    Args:
        splitter_type (str): The type of text splitter to use ('character', 'recursive', 'token').
        chunk_size (int): The maximum size of each text chunk.
        chunk_overlap (int): The overlap between text chunks.
    """
    if splitter_type == "character":
        return CharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    elif splitter_type == "recursive":
        return RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    elif splitter_type == "token":
        return TokenTextSplitter(chunk_size=chunk_size)
    else:
        raise ValueError("Invalid splitter type. Choose from 'character', 'recursive', or 'token'.")


def chunk_data(doc, splitter, chunk_size=1000, overlap=50):
  splitter = get_text_splitter(splitter, chunk_size=chunk_size, chunk_overlap=overlap)
  doc_chunks = splitter.split_text(doc)
  documents = splitter.create_documents(doc_chunks)
  return documents

documents = [i.page_content for i in chunk_data(document, "recursive", chunk_size=1000, overlap=50)]

In [None]:
#You do not need to make any changes to this cell, only run the cell by using the button on the left

# Ensure chunks look correct
print(f"Split into {len(documents)} chunks")
print("Chunk 1: \n", documents[0])
print(type(documents))

Split into 47 chunks
Chunk 1: 
 ﻿Interviewee: Brook Ewoldsen
Industry Sectors: Fashion and Interior Design
Source: https://soundcloud.com/what-to-be/fashion-institute-of-design-fidm-guest-speaker-brook-ewoldsen?utm_source=clipboard&utm_medium=text&utm_campaign=social_sharinghttps://soundcloud.com/what-to-be/fashion-institute-of-design-fidm-guest-speaker-brook-ewoldsen?utm_source=clipboard&utm_medium=text&utm_campaign=social_sharing








# INTRO
Interviewer  0:04  
You're listening to k squared Santa Cruz at 90.7 FM. My name is Interviewer and I am an intern at Your futures our business, a Santa Cruz County nonprofit which helps students ages 10 to 18. Explore Careers and start their career journey. We provide career expos, panels, guest speakers and more. Today I'm here with my other host Interviewer.
<class 'list'>


In [None]:
#You do not need to make any changes to this cell, only run the cell by using the button on the left

#Create local embeddings database
#This may take some time if you have many chunks
embeddings = np.stack([model.encode(i) for i in documents])
print(embeddings.shape) # (num chunks, embed dim)

(47, 1024)


### 3. Simple RAG setup

This function implements the most basic version of RAG. You are welcome to modify the prompt, but it is not required. The function returns both the LLM output as well as the retieved context for use in evaluation.

In [None]:
#You do not need to make any changes to this cell, only run the cell by using the button on the left

def RAG(query, database, embeddings, k=1):
  import torch

  query_vector = model.encode(query)

  context = []
  for c in torch.topk(torch.from_numpy(cos(query_vector, embeddings)), k)[1]:
    context.append(database[c])

  prompt = f'''
  You will be given a user query and context. Use the context given to
  you to ground your response to answer the user query

  USER QUERY: {query}
  CONTEXT: {"".join(context)}
  '''
                             #gemini-2.0-flash-lite#
  generation = generate(model="gemini-2.0-flash", GOOGLE_API_KEY=os.environ["GOOGLE_API_KEY"], query=prompt)

  return context, generation

## TASK: Using RAGAS to evaluate outputs

Before we can evaluate any outputs, RAGAS requires a small amount of setup. First, ensure you gemini API key is connected:

In [None]:
#You do not need to make any changes to this cell, only run the cell by using the button on the left

from google.colab import userdata
import os

os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')

For many metrics, RAGAS uses a second LLM to evaluate outputs, not completely unlike LLM as a judge, which we've previously discussed! For this example, we'll use Gemini, but RAGAS support various models.

In [None]:
#You do not need to make any changes to this cell, only run the cell by using the button on the left

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_google_genai import ChatGoogleGenerativeAI


# Initialize evaluator with Google AI Studio
evaluator_llm = LangchainLLMWrapper(ChatGoogleGenerativeAI(
    model=config["model"],
    temperature=config["temperature"],
    max_tokens=config["max_tokens"],
    top_p=config["top_p"],
))

from langchain_google_genai import GoogleGenerativeAIEmbeddings
evaluator_embeddings = LangchainEmbeddingsWrapper(GoogleGenerativeAIEmbeddings(
    model="models/embedding-004",  # Google's text embedding model
    task_type="retrieval_document"  # Optional: specify the task type
))

### We are now ready to evaluate some outputs!

Your job is two-fold:


1. We have implemented Context Precision, Context Recall, Content Entity Recall, Noise Sensitivity, and Faithfulness. You may try implementing more metrics, but don't hesitate to ask for help if you would like to! Each metric has some starter code that you can find [HERE](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/).

2.   Test out 5 unique queries for your database. Run 3 metrics on each one and log the results in the google doc. Below is some code to call the simple RAG function we definied earlier. Play around with different values of k to see what that does to different metrics.






In [None]:
# Insert your query and your reference here and then run the cells below to check each metric
# Repeat for the following 4 unique queries for your database
query = "How does interior design impact well-being, according to Brook?"
reference="Brook explains that interior design can significantly affect well-being through the psychological effects of color and environment. For example, pink is soothing, blue is calming, red can induce anxiety, orange can increase hunger, and yellow brings cheerfulness. She emphasizes that design can create healing environments, and that colors and their specific shades can influence emotions and mental states."
context, generation = RAG(query, documents, embeddings, k=3)
print(context)
print(generation)

["Like even interior design, you know, creating everything from healing environments, you know, you can use color, like a moth pink is one of the most soothing, yeah, blue is calming, red, you know, can actually induce anxiety. Orange is will make you hungry. Yeah. So, you know, every color has its association, you know, black can absolutely be sophisticated. But if you surround yourself in too much black, it can, you know, start to pull you into the darkness a little bit more. So, yellow, so you can counter that with a little yellow, which will make you feel cheerful. So, you know, a green, you know, is growing. So there's and then it gets even down to the specific shade. Yeah. So it's, and there are people that just do that for a living, you know, they they only work with color.", "Brooke  22:54  \nWell, I probably at today, now originally, it definitely would have been fashion, because I used to look at people and I would design clothes for them in my head and just like a random str

In [None]:
#IMPLEMENTATION OF Context Precision
from ragas import SingleTurnSample
from ragas.metrics import LLMContextPrecisionWithoutReference

context_precision = LLMContextPrecisionWithoutReference(llm=evaluator_llm)

sample = SingleTurnSample(
    user_input=query,
    response=generation,
    retrieved_contexts=context,
)
await context_precision.single_turn_ascore(sample)

0.9999999999

In [None]:
#IMPLEMENTATION OF Context Recall
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import LLMContextRecall

sample = SingleTurnSample(
    user_input=query,
    response=generation,
    reference=reference,
    retrieved_contexts=context,
)

context_recall = LLMContextRecall(llm=evaluator_llm)
await context_recall.single_turn_ascore(sample)

1.0

In [None]:
#IMPLEMENTATION OF Content Entities Recall
from ragas import SingleTurnSample
from ragas.metrics import ContextEntityRecall

sample = SingleTurnSample(
    reference=generation,
    retrieved_contexts=context,
)

scorer = ContextEntityRecall(llm=evaluator_llm)

await scorer.single_turn_ascore(sample)

0.14285714265306124

In [None]:
#IMPLEMENTATION OF Noise Sensitivity
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import NoiseSensitivity

sample = SingleTurnSample(
    user_input=query,
    response=generation,
    reference=reference,
    retrieved_contexts=context
)

scorer = NoiseSensitivity(llm=evaluator_llm)
await scorer.single_turn_ascore(sample)

np.float64(0.7272727272727273)

In [None]:
#IMPLEMENTATION OF Faithfulness
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import Faithfulness

sample = SingleTurnSample(
        user_input=query,
        response=generation,
        retrieved_contexts=context
    )
scorer = Faithfulness(llm=evaluator_llm)
await scorer.single_turn_ascore(sample)

1.0

In [None]:
#IMPLEMENTATION OF relevancy
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import ResponseRelevancy

sample = SingleTurnSample(
        user_input=query,
        response=generation,
        retrieved_contexts=context
    )
scorer = ResponseRelevancy(llm=evaluator_llm)
await scorer.single_turn_ascore(sample)

ImportError: cannot import name 'SentenceTransformerEmbeddings' from 'ragas.embeddings' (/usr/local/lib/python3.11/dist-packages/ragas/embeddings/__init__.py)