# AI Engineering Cohort#4 Midterm Notebook Playground

## Install Packages

### NOTE - May need to pin langchain_core version

In [None]:
# NOTE!!!
# May need to pin version: langchain_core==0.2.38
!pip install -U -q langchain langchain-openai langchain_core==0.2.38 langchain-community langchainhub langchain-qdrant langchain_huggingface   langchain-text-splitters

In [None]:
!pip install -qU openai

In [None]:
!pip install -qU ragas

In [None]:
!pip install -qU qdrant-client pymupdf pandas

In [None]:
!pip install -qU faiss-cpu unstructured==0.15.7 python-pptx==1.0.2 nltk==3.9.1

#### Note - pin the version of pyarrow

In [None]:
# !pip uninstall -y pyarrow
!pip install -qU sentence_transformers datasets pyarrow==14.0.1

## Imports and API Keys

In [1]:
import os
import openai
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key here: ")

In [None]:
from operator import itemgetter
import pandas as pd
from typing import List

from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.documents import Document

from langchain_community.document_loaders import PyMuPDFLoader

from datasets import Dataset

from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, answer_correctness, context_recall, context_precision
from ragas.testset.evolutions import simple, reasoning, multi_context

from myutils.rag_pipeline_utils import SimpleTextSplitter, SemanticTextSplitter, VectorStore, AdvancedRetriever
from myutils.ragas_pipeline import RagasPipeline

In [3]:
from sentence_transformers import SentenceTransformer

from torch.utils.data import DataLoader
from torch.utils.data import Dataset
from sentence_transformers import InputExample

In [4]:
from sentence_transformers.losses import MatryoshkaLoss, MultipleNegativesRankingLoss

In [5]:
from sentence_transformers.evaluation import InformationRetrievalEvaluator

In [6]:
from langchain_huggingface import HuggingFaceEmbeddings

In [7]:
import pandas as pd

from langchain_community.vectorstores import FAISS
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_core.documents import Document

In [8]:
import nest_asyncio

nest_asyncio.apply()

## STEP 1 - Load the Documents

#### Make a local copy of the two pdfs needed for this exercise

In [None]:
# !wget https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf -O ./data/docs_for_rag/Blueprint-for-an-AI-Bill-of-Rights.pdf

In [None]:

# !wget https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf -O ./data/docs_for_rag/NIST.AI.600-1.pdf

#### Load pdfs into Langchain Documents

In [9]:
pdf_file_paths = [
    './data/docs_for_rag/Blueprint-for-an-AI-Bill-of-Rights.pdf',
    './data/docs_for_rag/NIST.AI.600-1.pdf'
]

In [None]:

from myutils.rag_pipeline_utils import load_all_pdfs

documents = load_all_pdfs(pdf_file_paths)

#### Quick Overview of Documents

a.  2022: Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People
    
This is really two docs in one
first doc sets up five principles and practices
second one is labeled a technical companion; it expands on each principle as well as how to operationalize it; each principle is reiterated, followed by an articulation of what the principle is important, what should be expected of automated systems in regard to following this principle, and examples of how these principles can move into practice.


b.  2024: National Institute of Standards and Technology (NIST) Artificial Intelligent Risk Management Framework

First part describes the risks as well as Trustworthy AI characteristics to mitigate the risk
Second part, in tabular form, describes mitigation plan for each risks; each risk is identified in the table by a serial number based on the first part of the document rather than by the actual name of the risk.

#### Chunking Strategy

It is clear that chunking strategies should account for the semantics in the document, as well as the fact that there are strong connections between the first and second parts of the document.  This comment applies to both documents in this assignment.

I will examine two alternatives:

(a) BASELINE: use the Swiss-army-knife chunking approach: RecursiveCharacterTextSplitter

(b) ADVANCED: Semantic Chunking



WHY I CHOSE THESE TWO CHUNKING STRATEGIES
1. RecursiveCharacterTextSplitter: if the chunk_size and chunk_overlap are set to reasonable numbers, this approach is surprisingly effective across a range of document content.  It is cost-effective, relatively easy to tune if needed, is well-suited for answering queries that are SIMPLE and those that require MULTI-CONTEXT.


2. Semantic chunking has great appeal as it groups content that is contiguous and semantically similar in a single chunk.  To that end, the chunk sizes may be rather uneven.  Advantage: It avoids artificially splitting content that may be very similar into multiple chunks which would make the retriever work harder during the retrieval process and/or perhaps miss relevant context.  The downside is that it is not as cost-effective as it requires the use of an LLM during the chunking process.  It is likely to perform well for MULTI-CONTEXT and potentially queries that require REASONING.

#### Formulate and Load My Test Questions

In [11]:
def load_test_questions(filename):
    """
    Loads a text file with questions

    Input
        name of file which contains a set of questions to test the RAG pipeline
    
    Output
        List of questions
    """
    with open(filename) as f:
        all_q = f.read()
        all_q_list = all_q.split('\n')
    return all_q_list

In [None]:
my_test_questions = load_test_questions(filename='./data/rag_questions_and_answers/my_test_questions.txt')
my_test_questions

## STEP 2 - Quick End-to-end Prototype RAG

#### Set Up RAG Template and RAG Prompt
> NOTE that the RAG template and RAG Prompt below will be used throughout this exercise

In [13]:
rag_template = """
Use the provided context to answer the following question.
If you can't answer the question based on the context, say you don't know.

Question:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(template=rag_template)

#### Set Up OpenAI Embeddings and Chat Model For Use in Prototype and for Comparison Throughout This Exercise

In [14]:
openai_embeddings_small = OpenAIEmbeddings(model='text-embedding-3-small')
openai_embeddings_small_dimension = 1536

openai_embeddings_small_context_window = 8191

openai_chat_gpt4omini = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [15]:
# Use the large embeddings in Semantic Chunking below!!!
openai_embeddings_large = OpenAIEmbeddings(model='text-embedding-3-large')
openai_embeddings_large_dimension = 3072

openai_embeddings_large_context_window = 8191

# Set up the lmore performant chat model just in case I decide to use it later...
openai_chat_gpt4o = ChatOpenAI(model_name="gpt-4o", temperature=0)

#### Load Snowflake-arctic-embed-m Model (Will be Finetuned Later in The Exercise)

In [41]:
from sentence_transformers import SentenceTransformer

model_id = "Snowflake/snowflake-arctic-embed-m"
model = SentenceTransformer(model_id)

In [42]:
arctic_original_embeddings = HuggingFaceEmbeddings(model_name="Snowflake/snowflake-arctic-embed-m")
arctic_original_embeddings_dimension = 768
arctic_original_context_window_in_tokens = 512

In [44]:
long_model_id = "Snowflake/snowflake-arctic-embed-m-long"
long_model = SentenceTransformer(model_id)

In [None]:
arctic_long_original_embeddings = HuggingFaceEmbeddings(model_name="Snowflake/snowflake-arctic-embed-m-long")
arctic_long_original_embeddings_dimension = 768
arctic_long_original_context_window_in_tokens =  2048

#### Chunk Documents Using Recursive Character Text Splitting

In [18]:
chunk_size = 1000
chunk_overlap = 300

# instantiate baseline text splitter -
# NOTE!!! The `SimpleTextSplitter` below is my wrapper around Langchain RecursiveCharacterTextSplitter!!!!
# (see module for the code if needed)
baseline_text_splitter = \
    SimpleTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap, documents=documents)

# split text for baseline case
baseline_text_splits = baseline_text_splitter.split_text()

In [None]:
len(baseline_text_splits)

#### Chunk Documents Using Semantic Chunking - NOTE Using OpenAI Embeddings Large

In [None]:
# instantiate semantic text splitter
#  NOTE!!!! SemanticTextSplitter is my wrapper around Langchain SemanticChunker
#  see my module for code if needed
# NOTE!!! I use openai large embeddings model to get the best possible representation of the semantics of sentences
# and to ensure high-quality semantic chunking
sem_text_splitter = \
    SemanticTextSplitter(llm_embeddings=openai_embeddings_large, threshold_type="interquartile", documents=documents)

# split text for semantic-chunking case
sem_text_splits = sem_text_splitter.split_text()

#### Vibe Check on My Test Questions - Read This First!!!

NOTE:  Four RAG Pipelines are run below!!!  These are:

1.  `Demo_Baseline_OpenAI`: This uses baseline chunking (`RecursiveCharacterTextSplitter`) and OpenAI embeddings as a Demo.

2.  `Demo_Semantic_OpenAI`: uses semantic chunking (`SemanticChunker`) and OpenAI embeddings as a Demo.

3.  `Baseline_Arctic_Original`: uses baseline chunking and `Snowflake/snowflake-arctic-embed-m` model embeddings.

4.  `Semantic_Arctic_Original`: uses semantic chunking and `Snowflake/snowflake-arctic-embed-m` model embeddings.

NOTE!!!
Later in this notebook, I will finetune the `Snowflake/snowflake-arctic-embed-m` model embeddings and will then compare the finetuned embeddings from this model against the runs in 3. and 4. above


In [21]:
from myutils.rag_pipeline_utils import get_vibe_check_on_list_of_questions

In [None]:
baseline_openai_retrieval_chain, baseline_openai_q_and_a = \
    get_vibe_check_on_list_of_questions(collection_name="Demo_Baseline_OpenAI",
                                        embeddings=openai_embeddings_small,  # <- openai embeddings
                                        embed_dim=openai_embeddings_small_dimension,
                                        prompt=rag_prompt,
                                        llm=openai_chat_gpt4omini,
                                        text_splits=baseline_text_splits, # <- baseline chunking
                                        list_of_questions=my_test_questions)

In [None]:
sem_openai_retrieval_chain, sem_openai_q_and_a = \
    get_vibe_check_on_list_of_questions(collection_name="Demo_Semantic_OpenAI",
                                        embeddings=openai_embeddings_small, # <- openai embeddings
                                        embed_dim=openai_embeddings_small_dimension,
                                        prompt=rag_prompt,
                                        llm=openai_chat_gpt4omini,
                                        text_splits=sem_text_splits, # <- semantic chunking
                                        list_of_questions=my_test_questions)

In [None]:
baseline_arctic_original_retrieval_chain, baseline_arctic_original_q_and_a = \
    get_vibe_check_on_list_of_questions(collection_name="Baseline_Arctic_Original",
                                        embeddings=arctic_original_embeddings, # <- arctic original embeddings
                                        embed_dim=arctic_original_embeddings_dimension,
                                        prompt=rag_prompt,
                                        llm=openai_chat_gpt4omini,
                                        text_splits=baseline_text_splits, # <- baseline chunking
                                        list_of_questions=my_test_questions)

In [None]:
baseline_arctic_original_retrieval_chain.invoke({'question': 'What rights do I have to ensure protection against algorithmic discrimination?'})

In [None]:
sem_arctic_original_retrieval_chain, sem_arctic_original_q_and_a = \
    get_vibe_check_on_list_of_questions(collection_name="Semantic_Arctic_Original",
                                        embeddings=arctic_original_embeddings, # <- arctic original embeddings
                                        embed_dim=arctic_original_embeddings_dimension,
                                        prompt=rag_prompt,
                                        llm=openai_chat_gpt4omini,
                                        text_splits=sem_text_splits, # <- semantic chunking
                                        list_of_questions=my_test_questions)

In [None]:
sem_arctic_original_retrieval_chain.invoke({'question': 'What rights do I have to ensure protection against algorithmic discrimination?'})

#### Save Test Questions and Answers in File

In [26]:
import pandas as pd
from pathlib import Path

def save_df_to_csv(q_a_data, csvfilename):
    qa_df = pd.DataFrame(q_a_data, 
                         columns=['questions', 'answers'])
    
    filepath = Path(csvfilename)
    filepath.parent.mkdir(parents=True, exist_ok=True)
    qa_df.to_csv(filepath, index=False)
    return


save_df_to_csv(baseline_openai_q_and_a, 
               csvfilename='./data/rag_questions_and_answers/baseline_openai_test_q_and_a.csv')

save_df_to_csv(sem_openai_q_and_a, 
               csvfilename='./data/rag_questions_and_answers/sem_openai_test_q_and_a.csv')

save_df_to_csv(baseline_arctic_original_q_and_a, 
               csvfilename='./data/rag_questions_and_answers/baseline_arctic_original_test_q_and_a.csv')

save_df_to_csv(sem_arctic_original_q_and_a, 
               csvfilename='./data/rag_questions_and_answers/sem_arctic_original_test_q_and_a.csv')

## STEP 3 - Synthetically Generate Test Questions Using the RAGAS Pipeline

#### Set Up RAGAS Pipeline Parameters

In [27]:
# LLM models used in RAGAS pipeline
ragas_generator_llm_model = 'gpt-3.5-turbo'
ragas_critic_llm_model = 'gpt-4o-mini'

# embeddings used for RAGAS pipeline
ragas_openai_embeddings_model = 'text-embedding-3-small'

# text splitter params
ragas_chunk_size = 1500
ragas_chunk_overlap = 500

# number of qa pairs needed - reduce if running into rate limit issues
ragas_number_of_qa_pairs = 20

# initialize distributions - desired distribution of question types
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# name of file to persist RAGAS Q&A on disk
ragas_testset_filename = "./data/rag_questions_and_answers/ragas_questions_and_answers.csv"

In [28]:
# FLAG TO INDICATE IF RAGAS TESTSET SHOULD BE GENERATED IN THIS RUN
# IF it is run, note the cost and time estimate below!!!
generate_ragas_testset_now = False

In [29]:
# set up list of RAGAS metrics used below
ragas_metrics = [
    faithfulness,
    answer_relevancy,
    context_precision,
    context_recall
]

#### Instantiate RAGAS Pipeline, Run Pipeline, Generate Test Questions


In [30]:
# NOTE - this cell will incur significant cost due to SDG's use of OpenAI models
# Time taken on my local machine: ~ 15 mins

ragas_pipeline = RagasPipeline(
        generator_llm_model=ragas_generator_llm_model,
        critic_llm_model=ragas_critic_llm_model,
        embedding_model=ragas_openai_embeddings_model,
        number_of_qa_pairs=ragas_number_of_qa_pairs,
        chunk_size=ragas_chunk_size,
        chunk_overlap=ragas_chunk_overlap,
        documents=documents,
        distributions=distributions
)

In [31]:

if generate_ragas_testset_now is True:
    ragas_testset_df = ragas_pipeline.generate_testset()
    ragas_testset_df.to_csv(ragas_testset_filename)
else:
    pass

#### Load RAGAS Q&A from disk

In [32]:
ragas_test_df = pd.read_csv(ragas_testset_filename)
ragas_test_questions = ragas_test_df["question"].values.tolist()
ragas_test_groundtruths = ragas_test_df["ground_truth"].values.tolist()

#### Evaluate RAG Pipeline Using RAGAS Generated Synthetic Questions

In [None]:
baseline_openai_results, baseline_openai_results_df = \
    ragas_pipeline.ragas_eval_of_rag_pipeline(baseline_openai_retrieval_chain, # <- baseline chunking + openai embeddings
                                              ragas_test_questions, 
                                              ragas_test_groundtruths, 
                                              ragas_metrics)

In [None]:
sem_openai_results, sem_openai_results_df = \
    ragas_pipeline.ragas_eval_of_rag_pipeline(sem_openai_retrieval_chain, # <- semantic chunking + openai embeddings
                                              ragas_test_questions, 
                                              ragas_test_groundtruths, 
                                              ragas_metrics)

In [None]:
baseline_arctic_original_results, baseline_arctic_original_results_df = \
    ragas_pipeline.ragas_eval_of_rag_pipeline(baseline_arctic_original_retrieval_chain, # <- baseline chunking + arctic orig embeddings
                                              ragas_test_questions, 
                                              ragas_test_groundtruths, 
                                              ragas_metrics)

In [None]:
sem_arctic_original_results, sem_arctic_original_results_df = \
    ragas_pipeline.ragas_eval_of_rag_pipeline(sem_arctic_original_retrieval_chain, # <- semantic chunking + arctic orig embeddings
                                              ragas_test_questions, 
                                              ragas_test_groundtruths, 
                                              ragas_metrics)

#### Compare The Results

In [None]:
df_baseline_openai = pd.DataFrame(list(baseline_openai_results.items()), columns=['Metric', 'BaselineChunkOpenAI'])
df_sem_openai = pd.DataFrame(list(sem_openai_results.items()), columns=['Metric', 'SemanticChunkOpenAI'])
df_merged_openai = pd.merge(df_baseline_openai, df_sem_openai, on='Metric')

df_baseline_arctic_original = pd.DataFrame(list(baseline_arctic_original_results.items()), columns=['Metric', 'BaselineChunkArcticOrig'])
df_sem_arctic_original = pd.DataFrame(list(sem_arctic_original_results.items()), columns=['Metric', 'SemanticChunkArcticOrig'])
df_merged_arctic_original = pd.merge(df_baseline_arctic_original, df_sem_arctic_original, on='Metric')

df_all_merged = pd.merge(df_merged_openai, df_merged_arctic_original, on='Metric')

df_all_merged

## Analysis of Results - Needs Updating!!!!

1.  The results with `Semantic Chunking` seem to be dramatically improved in `RETRIEVAL`-focused metrics like `context_recall` and `answer_relevancy`.

2.  Even in measures like `faithfulness` that primarily assesses generation part of the pipeline, the results seem quite improved.

3.  Given other results, I would have expected `answer_correctness` to be higher.  It would be useful to dig into factual similarity and semantic similarity differences.

## STEP 4 - Fine-tuning Embeddings for RAG

In [21]:
from myutils.finetuning import PrepareDataForFinetuning, FineTuneModelAndEvaluateRetriever

In [22]:
pdft = PrepareDataForFinetuning(all_splits=baseline_text_splits,
                                train_val_test_fraction=[0.80, 0.10, 0.10],
                                train_val_test_split_type='random',
                                random_seed=69,
                                qa_chat_model_name='gpt-4o-mini',
                                n_questions=3,
                                batch_size=64)

In [None]:
pdft.run_all_prep_data()

In [None]:
evr = FineTuneModelAndEvaluateRetriever(train_data=pdft.train_dataset,
                                        val_data=pdft.val_dataset,
                                        test_data=pdft.test_dataset,
                                        batch_size=64,
                                        base_model_id='Snowflake/snowflake-arctic-embed-m',
                                        matryoshka_dimensions=[768, 512, 256, 128, 64],
                                        number_of_training_epochs=5,
                                        finetuned_model_output_path='finetuned_arctic',
                                        evaluation_steps=50)

In [None]:
evr.run_steps_to_finetune_model()

In [None]:
arctic_finetuned_model = SentenceTransformer('finetuned_arctic')

In [None]:
arctic_finetuned_model.push_to_hub("vincha77/finetuned_arctic")

In [1]:
## code here to pull from hub
model_id = "vincha77/finetuned_arctic"
arctic_finetuned_model = SentenceTransformer(model_id)

In [None]:
arctic_finetuned_embeddings = HuggingFaceEmbeddings(model_name="vincha77/finetuned_arctic")
arctic_finetuned_embeddings_dimension = 768
arctic_finetuned_context_window_in_tokens = 512

In [None]:
te3_results = evr.evaluate_embeddings_model(openai_embeddings_small, top_k_for_retrieval=5)

te3_results_df = pd.DataFrame(te3_results)

te3_hit_rate = te3_results_df["is_hit"].mean()
te3_hit_rate

In [None]:
arctic_embed_m_results = evr.evaluate_embeddings_model(arctic_original_embeddings, top_k_for_retrieval=5)

arctic_embed_m_results_df = pd.DataFrame(arctic_embed_m_results)

arctic_embed_m_hit_rate = arctic_embed_m_results_df["is_hit"].mean()
arctic_embed_m_hit_rate

In [None]:
finetuned_results = evr.evaluate_embeddings_model(arctic_finetuned_embeddings, top_k_for_retrieval=5)

finetuned_results_df = pd.DataFrame(finetuned_results)

finetuned_hit_rate = finetuned_results_df["is_hit"].mean()
finetuned_hit_rate

## Vibe Checking the RAG Pipeline

We're going to use our RAG pipeline to vibe check on some common phrases now that we've modified it!

#### Chunk Documents Using Recursive Character Text Splitting

In [None]:
new_chunk_size = 600
new_chunk_overlap = 200

# instantiate baseline text splitter -
# NOTE!!! The `SimpleTextSplitter` below is my wrapper around Langchain RecursiveCharacterTextSplitter!!!!
# (see module for the code if needed)
new_baseline_text_splitter = \
    SimpleTextSplitter(chunk_size=new_chunk_size, chunk_overlap=new_chunk_overlap, documents=documents)

# split text for baseline case
new_baseline_text_splits = new_baseline_text_splitter.split_text()

In [None]:
len(new_baseline_text_splits)

#### Chunk Documents Using Semantic Chunking - NOTE Using OpenAI Embeddings Large

In [None]:
# instantiate semantic text splitter
#  NOTE!!!! SemanticTextSplitter is my wrapper around Langchain SemanticChunker
#  see my module for code if needed
# NOTE!!! I use openai large embeddings model to get the best possible representation of the semantics of sentences
# and to ensure high-quality semantic chunking
new_sem_text_splitter = \
    SemanticTextSplitter(llm_embeddings=openai_embeddings_large, threshold_type="interquartile", documents=documents)

# split text for semantic-chunking case
new_sem_text_splits = new_sem_text_splitter.split_text()

#### Vibe Check on My Test Questions - Read This First!!!

NOTE:  Four RAG Pipelines are run below!!!  These are:

1.  `Demo_Baseline_OpenAI`: This uses baseline chunking (`RecursiveCharacterTextSplitter`) and OpenAI embeddings as a Demo.

2.  `Demo_Semantic_OpenAI`: uses semantic chunking (`SemanticChunker`) and OpenAI embeddings as a Demo.

3.  `Baseline_Arctic_Original`: uses baseline chunking and `Snowflake/snowflake-arctic-embed-m` model embeddings.

4.  `Semantic_Arctic_Original`: uses semantic chunking and `Snowflake/snowflake-arctic-embed-m` model embeddings.

NOTE!!!
Later in this notebook, I will finetune the `Snowflake/snowflake-arctic-embed-m` model embeddings and will then compare the finetuned embeddings from this model against the runs in 3. and 4. above


In [None]:
new_baseline_arctic_original_retrieval_chain, new_baseline_arctic_original_q_and_a = \
    get_vibe_check_on_list_of_questions(collection_name="Baseline_Arctic_Original",
                                        embeddings=arctic_original_embeddings, # <- arctic original embeddings
                                        embed_dim=arctic_original_embeddings_dimension,
                                        prompt=rag_prompt,
                                        llm=openai_chat_gpt4omini,
                                        text_splits=new_baseline_text_splits, # <- NEW baseline chunking
                                        list_of_questions=my_test_questions)

In [None]:
new_baseline_arctic_finetuned_retrieval_chain, new_baseline_arctic_finetuned_q_and_a = \
    get_vibe_check_on_list_of_questions(collection_name="Baseline_Arctic_Finetuned",
                                        embeddings=arctic_finetuned_embeddings, # <- arctic finetuned embeddings
                                        embed_dim=arctic_finetuned_embeddings_dimension,
                                        prompt=rag_prompt,
                                        llm=openai_chat_gpt4omini,
                                        text_splits=new_baseline_text_splits, # <- NEW baseline chunking
                                        list_of_questions=my_test_questions)

In [None]:
new_sem_arctic_original_retrieval_chain, new_sem_arctic_original_q_and_a = \
    get_vibe_check_on_list_of_questions(collection_name="Semantic_Arctic_Original",
                                        embeddings=arctic_original_embeddings, # <- arctic original embeddings
                                        embed_dim=arctic_original_embeddings_dimension,
                                        prompt=rag_prompt,
                                        llm=openai_chat_gpt4omini,
                                        text_splits=new_sem_text_splits, # <- NEW semantic chunking
                                        list_of_questions=my_test_questions)

In [None]:
new_sem_arctic_finetuned_retrieval_chain, new_sem_arctic_finetuned_q_and_a = \
    get_vibe_check_on_list_of_questions(collection_name="Semantic_Arctic_Finetuned",
                                        embeddings=arctic_finetuned_embeddings, # <- arctic finetuned embeddings
                                        embed_dim=arctic_finetuned_embeddings_dimension,
                                        prompt=rag_prompt,
                                        llm=openai_chat_gpt4omini,
                                        text_splits=new_sem_text_splits, # <- NEW semantic chunking
                                        list_of_questions=my_test_questions)

#### Evaluate RAG Pipeline Using RAGAS Generated Synthetic Questions

In [None]:
baseline_arctic_original_results, baseline_arctic_original_results_df = \
    ragas_pipeline.ragas_eval_of_rag_pipeline(new_baseline_arctic_original_retrieval_chain, # <- baseline chunking + arctic orig embeddings
                                              ragas_test_questions, 
                                              ragas_test_groundtruths, 
                                              ragas_metrics)

In [None]:
baseline_arctic_finetuned_results, baseline_arctic_finetuned_results_df = \
    ragas_pipeline.ragas_eval_of_rag_pipeline(new_baseline_arctic_finetuned_retrieval_chain, # <- baseline chunking + arctic finetuned embeddings
                                              ragas_test_questions, 
                                              ragas_test_groundtruths, 
                                              ragas_metrics)

In [None]:
sem_arctic_original_results, sem_arctic_original_results_df = \
    ragas_pipeline.ragas_eval_of_rag_pipeline(new_sem_arctic_original_retrieval_chain, # <- semantic chunking + arctic orig embeddings
                                              ragas_test_questions, 
                                              ragas_test_groundtruths, 
                                              ragas_metrics)

In [None]:
sem_arctic_finetuned_results, sem_arctic_finetuned_results_df = \
    ragas_pipeline.ragas_eval_of_rag_pipeline(new_sem_arctic_finetuned_retrieval_chain, # <- semantic chunking + arctic finetuned embeddings
                                              ragas_test_questions, 
                                              ragas_test_groundtruths, 
                                              ragas_metrics)

#### Compare The Results

In [None]:
df_baseline_arctic_original = pd.DataFrame(list(baseline_arctic_original_results.items()), columns=['Metric', 'BaselineChunkArcticOrig'])
df_baseline_arctic_finetuned = pd.DataFrame(list(baseline_arctic_finetuned_results.items()), columns=['Metric', 'BaselineChunkArcticFinetuned'])
df_merged_arctic_baseline = pd.merge(df_baseline_arctic_original, df_baseline_arctic_finetuned, on='Metric')

df_sem_arctic_original = pd.DataFrame(list(sem_arctic_original_results.items()), columns=['Metric', 'SemanticChunkArcticOrig'])
df_sem_arctic_finetuned = pd.DataFrame(list(sem_arctic_finetuned_results.items()), columns=['Metric', 'SemanticChunkArcticFinetuned'])
df_merged_arctic_sem = pd.merge(df_sem_arctic_original, df_sem_arctic_finetuned, on='Metric')

df_all_merged = pd.merge(df_merged_arctic_baseline, df_merged_arctic_sem, on='Metric')

df_all_merged