[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/witchapong/build-ai-based-applications/blob/main/llm/3_RAG.ipynb)

# RAG
In this notebook, we'll build a RAG chatbot which is basically a ChatGPT model that could answer by referencing to user-provided documents.

The content of this notebook is mainly modified from content from [KBTG M.A.D.Bootcamp](https://kbtgkampus.tech/).

In [1]:
# Uncomment this cell to install required libraries if you have not already done so
!pip -q install numpy pandas langchain langchain-community langchain-openai langchain-chroma \
langgraph langchainhub python-dotenv bs4 pymupdf unstructured lark nltk

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m43.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m35.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.9/54.9 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.4/151.4 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# What is Retrieval Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response.

![RAG diagram](https://blogs.nvidia.com/wp-content/uploads/2023/11/NVIDIA-RAG-diagram-scaled.jpg)

source: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/

# 1. Document loading

For a full documentation about document loader, see here:

https://python.langchain.com/v0.2/docs/integrations/document_loaders/

In [2]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader, PyMuPDFLoader

## 1.1 Loading PDF file

In [3]:
# mount Google Drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [4]:
loader = PyMuPDFLoader("/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf")
pdf_docs = loader.load()

In [5]:
len(pdf_docs)

537

In [6]:
pdf_docs[300]

Document(metadata={'producer': 'Acrobat Distiller 5.0.5 (Windows)', 'creator': 'dvips(k) 5.90a Copyright 2002 Radical Eye Software', 'creationdate': '2004-08-16T15:20:18+00:00', 'source': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'file_path': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'total_pages': 537, 'format': 'PDF 1.3', 'title': 'probww.dvi', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2006-12-16T13:50:39-08:00', 'trapped': '', 'modDate': "D:20061216135039-08'00'", 'creationDate': 'D:20040816152018Z', 'page': 300}, page_content='7.3\nPOINT ESTIMATES OF MODEL PARAMETERS\n283\nTheorem 7.7 is better known as the weak law of large numbers, which we restate here in\ntwo equivalent forms.\nTheorem 7.8\nWeak Law of Large Numbers\nIf X has ﬁnite variance, then for any constant c > 0,\n(a)\nlim\nn→∞P[|Mn(X) −µX| ≥c] = 0,\n(b)\nlim\nn→∞P[|Mn(X) −µX| < c] = 1.\nTheorem 7.8(a) is just the mathematical 

In [7]:
pdf_docs[300].metadata

{'producer': 'Acrobat Distiller 5.0.5 (Windows)',
 'creator': 'dvips(k) 5.90a Copyright 2002 Radical Eye Software',
 'creationdate': '2004-08-16T15:20:18+00:00',
 'source': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf',
 'file_path': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf',
 'total_pages': 537,
 'format': 'PDF 1.3',
 'title': 'probww.dvi',
 'author': '',
 'subject': '',
 'keywords': '',
 'moddate': '2006-12-16T13:50:39-08:00',
 'trapped': '',
 'modDate': "D:20061216135039-08'00'",
 'creationDate': 'D:20040816152018Z',
 'page': 300}

# 2. Document Splitting

Documents will be splitted into chunks, possibly with overlap between consecutive chunks

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

# 3. Embed documents and store in VectorDB

In [10]:
import os
os.environ['OPENAI_API_KEY'] = "YOUR-SECRET-API-KEY"

In [11]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from tqdm import tqdm

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,
    chunk_overlap=300,
)

splits = splitter.split_documents(pdf_docs)

persist_directory = 'docs/chroma/'

In [12]:
# If this is the first run, we have to create a vector store
# and store embeddings. Otherwise, we can load the persisted
# vector store to save time.

vectordb = Chroma.from_documents(
    splits,
    embedding=OpenAIEmbeddings(),
    persist_directory=persist_directory
)

In [13]:
# # This cell is for loading the persisted vector store that
# # has already been created in the previous run.

# vectordb = Chroma(
#     persist_directory=persist_directory,
#     embedding_function=OpenAIEmbeddings()
# )

In [14]:
print(vectordb._collection.count())

1181


# 4. Retrieve relevant documents from VectorDB

In [15]:
query = "How is Bernoulli trial and Poisson distribution related?"

## 4.1 Similarity search

In [16]:
docs_ss = vectordb.similarity_search_with_score(query, k=2)

In [17]:
docs_ss

[(Document(metadata={'author': '', 'creationDate': 'D:20040816152018Z', 'creationdate': '2004-08-16T15:20:18+00:00', 'creator': 'dvips(k) 5.90a Copyright 2002 Radical Eye Software', 'file_path': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'format': 'PDF 1.3', 'keywords': '', 'modDate': "D:20061216135039-08'00'", 'moddate': '2006-12-16T13:50:39-08:00', 'page': 86, 'producer': 'Acrobat Distiller 5.0.5 (Windows)', 'source': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'subject': '', 'title': 'probww.dvi', 'total_pages': 537, 'trapped': ''}, page_content='PMF. In the binomial model, n, the number of Bernoulli trials grows without limit but the\nexpected number of trials np remains constant at α, the expected value of the Poisson PMF.\nIn the theorem, we let α = λT and divide the T-second interval into n time slots each\nwith duration T/n. In each slot, we assume that there is either one arrival, with probability\np 

## 4.2 Maximum marginal relevance search

In [18]:
docs_mmr = vectordb.max_marginal_relevance_search(query, k=2, fetch_k=5)

In [19]:
docs_mmr

[Document(metadata={'author': '', 'creationDate': 'D:20040816152018Z', 'creationdate': '2004-08-16T15:20:18+00:00', 'creator': 'dvips(k) 5.90a Copyright 2002 Radical Eye Software', 'file_path': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'format': 'PDF 1.3', 'keywords': '', 'modDate': "D:20061216135039-08'00'", 'moddate': '2006-12-16T13:50:39-08:00', 'page': 86, 'producer': 'Acrobat Distiller 5.0.5 (Windows)', 'source': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'subject': '', 'title': 'probww.dvi', 'total_pages': 537, 'trapped': ''}, page_content='PMF. In the binomial model, n, the number of Bernoulli trials grows without limit but the\nexpected number of trials np remains constant at α, the expected value of the Poisson PMF.\nIn the theorem, we let α = λT and divide the T-second interval into n time slots each\nwith duration T/n. In each slot, we assume that there is either one arrival, with probability\np =

# 5. Combine retrieved documents with LLM prompt

In [20]:
# from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables import RunnableLambda, RunnablePassthrough, RunnableParallel

llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)

retriever = vectordb.as_retriever(search_type='mmr', search_kwargs={'k': 4})

In [21]:
from langchain_core.prompts import ChatPromptTemplate

system_prompt = """You are an assistant for answering question about probability and random process for Electrical Engineering students.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, say that you don't know.
Use three sentences maximum, keep the answer concise, and use Mathematic equation for explanation if needed.

{context}
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for answering question about probability and random process for Electrical Engineering students.\nUse the following pieces of retrieved context to answer the question.\nIf you don't know the answer, say that you don't know.\nUse three sentences maximum, keep the answer concise, and use Mathematic equation for explanation if needed.\n\n{context}\n"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

In [22]:
# Langchain also has a built-in function to combine prompt and llm into a chain
# This function has the same effect as prompt | llm we did above

question_answer_chain = create_stuff_documents_chain(llm, prompt) # prompt | llm
question_answer_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for answering question about probability and random process for Electrical Engineering students.\nUse the following pieces of retrieved context to answer the question.\nIf you don't know the answer, say that you don't know.\nUse three sentences maximum, keep the answer concise, and use Mathematic equation for explanation if needed.\n\n{context}\n"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatOpenAI(client=<openai.re

In [24]:
# To simplify this process, Langchain has another built-in function to combine
# the retriever into question_answer_chain

rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [25]:
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x79e2061474d0>, search_type='mmr', search_kwargs={'k': 4}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for answering question about probability and random process for Electrical Engineering student

In [26]:
# Now we can simply invoke rag_chain with the query directly

rag_chain.invoke({'input': 'What are expectation and variance of Poisson distribution?'})

{'input': 'What are expectation and variance of Poisson distribution?',
 'context': [Document(metadata={'author': '', 'creationDate': 'D:20040816152018Z', 'creationdate': '2004-08-16T15:20:18+00:00', 'creator': 'dvips(k) 5.90a Copyright 2002 Radical Eye Software', 'file_path': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'format': 'PDF 1.3', 'keywords': '', 'modDate': "D:20061216135039-08'00'", 'moddate': '2006-12-16T13:50:39-08:00', 'page': 257, 'producer': 'Acrobat Distiller 5.0.5 (Windows)', 'source': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'subject': '', 'title': 'probww.dvi', 'total_pages': 537, 'trapped': ''}, page_content='ﬁnish the race in less than 25 minutes?\n(b) What is the probability that the last boat will\ncross the ﬁnish line in more than 50 minutes?\n(c) Given this model, what is the probability that a\nboat will ﬁnish before it starts (negative ﬁnishing\ntime)?\n5.5.5\n♦♦\nIn a weekly lott

## 6. Add memory for storing previous messages

![RAG memory](https://python.langchain.com/v0.1/assets/images/conversational_retrieval_chain-5c7a96abe29e582bc575a0a0d63f86b0.png)

In [27]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

# Create a history_aware_retriever that formulate user query
# into a standalone query (by using LLM) and retrieve relevant
# documents with standard retriever
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

In [28]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

# Create a standard QA chain using system_prompt defined above
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

# Combine history_aware_retriever with QA chain
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

In [29]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory


# To store chat history, create a dictionary where keys are session_id
# and values are chat history
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# Combine rag_chain with chat history
conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [30]:
session_id = '1'

In [31]:
conversational_rag_chain.invoke(
    {"input": "What is Poisson process?"},
    config={
        "configurable": {"session_id": session_id}
    },
)

{'input': 'What is Poisson process?',
 'chat_history': [],
 'context': [Document(metadata={'author': '', 'creationDate': 'D:20040816152018Z', 'creationdate': '2004-08-16T15:20:18+00:00', 'creator': 'dvips(k) 5.90a Copyright 2002 Radical Eye Software', 'file_path': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'format': 'PDF 1.3', 'keywords': '', 'modDate': "D:20061216135039-08'00'", 'moddate': '2006-12-16T13:50:39-08:00', 'page': 379, 'producer': 'Acrobat Distiller 5.0.5 (Windows)', 'source': '/content/drive/MyDrive/build-ai-based-applications/probability_textbook_ee.pdf', 'subject': '', 'title': 'probww.dvi', 'total_pages': 537, 'trapped': ''}, page_content='\x08′.\n10.5\nThe Poisson Process\nA counting process N(t) starts at time 0 and counts the occurrences of events. These events\nare generally called arrivals because counting processes are most often used to model the\narrivals of customers at a service facility. However, since counting processe

In [32]:
conversational_rag_chain.invoke(
    {"input": "When should I use it?"},
    config={
        "configurable": {"session_id": session_id}
    },
)

{'input': 'When should I use it?',
 'chat_history': [HumanMessage(content='What is Poisson process?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='A Poisson process is a counting process that models the occurrences of events over time, often used to represent arrivals of customers at a service facility. It is a stochastic process where the number of arrivals in a given time interval is integer-valued and nondecreasing with time. The probability of a certain number of arrivals in a Poisson process can be calculated using the Poisson distribution formula: P[N(t) = k] = (λt)^k * e^(-λt) / k!, where λ is the rate parameter and t is the time interval.', additional_kwargs={}, response_metadata={})],
 'context': [Document(metadata={'author': '', 'creationDate': 'D:20040816152018Z', 'creationdate': '2004-08-16T15:20:18+00:00', 'creator': 'dvips(k) 5.90a Copyright 2002 Radical Eye Software', 'file_path': '/content/drive/MyDrive/build-ai-based-applications/probability_textbo