# RAG

Components of RAG
1. Data Ingestion
2. Data Retriever
3. Data Generation

## Load Environment Variables

In [13]:
from dotenv import load_dotenv
load_dotenv()

True

## Data Ingestion

### Load data

In [1]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [7]:
FILE_PATH = "D:\STUDY\LLMOPS_KN\document_portal\data\sample.pdf"

In [14]:
loader = PyPDFLoader(FILE_PATH)
documents = loader.load()
print(len(documents))
print(documents[:3])

77
[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'D:\\STUDY\\LLMOPS_KN\\document_portal\\data\\sample.pdf', 'total_pages': 77, 'page': 0, 'page_label': '1'}, page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗ Louis Martin† Kevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev\

### Chunking

In [15]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, # Size of each text chunk
    chunk_overlap=200, # Overlap between chunks
    length_function=len, # Using len function to determine the length of text chunks
)
docs = text_splitter.split_documents(documents)
print(len(docs))
print(docs[:3])

342
[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'D:\\STUDY\\LLMOPS_KN\\document_portal\\data\\sample.pdf', 'total_pages': 77, 'page': 0, 'page_label': '1'}, page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗ Louis Martin† Kevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev

In [18]:
docs[0]

Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'D:\\STUDY\\LLMOPS_KN\\document_portal\\data\\sample.pdf', 'total_pages': 77, 'page': 0, 'page_label': '1'}, page_content='Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗ Louis Martin† Kevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev\nPun

In [19]:
docs[0].page_content

'Llama 2: Open Foundation and Fine-Tuned Chat Models\nHugo Touvron∗ Louis Martin† Kevin Stone†\nPeter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra\nPrajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen\nGuillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller\nCynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou\nHakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev\nPunit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich\nYinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra\nIgor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi\nAlan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic'

### VectorDB Indexing

In [None]:
# load embedding model
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [None]:
# index chunks in vector DB
from langchain.vectorstores import FAISS
vectorstore = FAISS.from_documents(docs, embedding_model)

In [23]:
vectorstore

<langchain_community.vectorstores.faiss.FAISS at 0x18a5212c6a0>

## Retrieval

In [25]:
vectorstore.similarity_search("What is the llama2 llm model?")

[Document(id='418c9e83-1540-47a0-ba7f-438865fd2628', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'D:\\STUDY\\LLMOPS_KN\\document_portal\\data\\sample.pdf', 'total_pages': 77, 'page': 76, 'page_label': '77'}, page_content='specific applications of the model. Please see the Responsible Use Guide available available at\nhttps://ai.meta.com/llama/responsible-user-guide\nTable 52: Model card forLlama 2.\n77'),
 Document(id='f08d9b8c-1934-4934-a352-a0d1a5da6ffb', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, 

In [26]:
vectorstore.similarity_search("llama2 finetuning benchmark experiments?")

[Document(id='f08d9b8c-1934-4934-a352-a0d1a5da6ffb', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'D:\\STUDY\\LLMOPS_KN\\document_portal\\data\\sample.pdf', 'total_pages': 77, 'page': 4, 'page_label': '5'}, page_content='final learning rate down to 10% of the peak learning rate. We use a weight decay of0.1 and gradient clipping\nof 1.0. Figure 5 (a) shows the training loss forLlama 2with these hyperparameters.\n5'),
 Document(id='47e7e87d-5965-44cb-9955-c6fab6bbbd54', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This i

In [28]:
vectorstore.similarity_search("llama2 finetuning benchmark experiments?", k=2)

[Document(id='f08d9b8c-1934-4934-a352-a0d1a5da6ffb', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'D:\\STUDY\\LLMOPS_KN\\document_portal\\data\\sample.pdf', 'total_pages': 77, 'page': 4, 'page_label': '5'}, page_content='final learning rate down to 10% of the peak learning rate. We use a weight decay of0.1 and gradient clipping\nof 1.0. Figure 5 (a) shows the training loss forLlama 2with these hyperparameters.\n5'),
 Document(id='47e7e87d-5965-44cb-9955-c6fab6bbbd54', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This i

## Response Generation

In [46]:
retriever = vectorstore.as_retriever(k=10)
retriever.invoke("llama2 finetuning benchmark experiments.")

[Document(id='47e7e87d-5965-44cb-9955-c6fab6bbbd54', metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-07-20T00:30:36+00:00', 'author': '', 'keywords': '', 'moddate': '2023-07-20T00:30:36+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'D:\\STUDY\\LLMOPS_KN\\document_portal\\data\\sample.pdf', 'total_pages': 77, 'page': 73, 'page_label': '74'}, page_content='Llama 1\n7B 0.27 0.26 0.34 0.54 0.36 0.39 0.26 0.28 0.33 0.45 0.33 0.17 0.24 0.31 0.44 0.57 0.39 0.3513B 0.24 0.24 0.31 0.52 0.37 0.37 0.23 0.28 0.31 0.50 0.27 0.10 0.24 0.27 0.41 0.55 0.34 0.2533B 0.23 0.26 0.34 0.50 0.36 0.35 0.24 0.33 0.34 0.49 0.31 0.12 0.23 0.30 0.41 0.60 0.28 0.2765B 0.25 0.26 0.34 0.46 0.36 0.40 0.25 0.32 0.32 0.48 0.31 0.11 0.25 0.30 0.43 0.60 0.39 0.34\nLlama 2\n7B 0.28 0.25 0.29 0.50 0.36 0.37 0.21 0.34 0.32 0.50 0.28 0.19 0.26 0.32 0

### Create Prompt

In [30]:
from langchain.prompts import PromptTemplate

prompt_template = """
Answer the question based on the context provided below.
If the context does not contain the answer, say "I don't know".

Context: {context}

Question: {question}

Answer:
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template
)

In [31]:
prompt

PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\nAnswer the question based on the context provided below.\nIf the context does not contain the answer, say "I don\'t know".\n\nContext: {context}\n\nQuestion: {question}\n\nAnswer:\n')

### Create Chain

In [34]:
from langchain_groq import ChatGroq
model = ChatGroq(model="qwen/qwen3-32b", temperature=0.0, max_tokens=1000)

In [37]:
from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()

In [40]:
from langchain_core.runnables import RunnablePassthrough

In [39]:
def format_docs(docs):
    return "\n\n".join([doc.page_content for doc in docs])

In [None]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt 
    | model 
    | output_parser
)

In [None]:
rag_chain.invoke("Tell me about llama2 finetuning benchmark experiments")

'<think>\nOkay, let\'s see. The user is asking about the Llama 2 fine-tuning benchmark experiments. First, I need to check the context provided to find relevant information.\n\nLooking at the context, there\'s a section labeled "Fine-tuned" under both Llama 1 and Llama 2. For Llama 2, there are numbers listed for different model sizes (7B, 13B, 34B, 70B) with various metrics. The metrics seem to be percentages, possibly accuracy or similar performance indicators on different benchmarks. The numbers are grouped in sets, maybe corresponding to different tasks or datasets. \n\nThe user specifically mentioned "benchmark experiments," so I should focus on the "Fine-tuned" section under Llama 2. The context shows that after fine-tuning, the Llama 2 models have performance metrics across different sizes. For example, the 7B model has 77.4, 78.8, 48.3, etc. However, without column headers, it\'s a bit unclear what each number represents. But since the context mentions "Performance on standard 

In [49]:
rag_chain.invoke("Tell me about Granural reward model accuracy per preference rating")

'<think>\nOkay, let\'s see. The user is asking about the Granular reward model accuracy per preference rating. First, I need to check the context provided to see if there\'s any information related to this.\n\nLooking through the context, there\'s mention of a 7-point Likert scale used for human evaluations, and that the reward models are well calibrated with human preference annotations as shown in Figure 29. There\'s also talk about a pairwise ranking loss and some figures showing density distributions with different margins. However, the term "Granural reward model" isn\'t mentioned anywhere. The closest is "reward models" in general. The context discusses calibration and robustness but doesn\'t break down accuracy by each preference rating level. The figures mentioned (like Figure 27 and 29) might have more details, but the provided text doesn\'t include specific accuracy metrics per rating. Since there\'s no explicit data on accuracy per preference rating, the answer should be tha

In [50]:
rag_chain.invoke("Tell me about scaling trends for the reward model")

'<think>\nOkay, let\'s see. The user is asking about scaling trends for the reward model. I need to check the context provided to find any information related to scaling.\n\nLooking through the context, there\'s mention of a 7-point Likert scale used for human annotations and how the reward models are calibrated with these preferences. There\'s also a figure (Figure 29) showing the calibration. Then, there\'s a part about Goodhart’s Law and adding a more general reward. \n\nIn section A.3.6, they talk about testing the reward model\'s robustness with a test set for helpfulness and safety. They used triple reviews and again mention Figure 29. There\'s a figure (Figure 27) discussing reward model score distribution shifts with margin terms in ranking loss, showing a binary split pattern with larger margins. \n\nThe context also compares their reward models against baselines, including GPT-4, noting that their models outperform others. However, there\'s no explicit mention of scaling tren

# END