# Intro to RAG: Custom chatbots(oversimplified)


**Retrieval-augmented generation (RAG)** is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.

It basically means instead of relying on the llm's internal knowledge to answer all the question, we provide a database of knowledge from which we can extract the correct and relevant texts to answer the question.

**Why do we need some overengineered system when we can simply talk to the llm itself?**

=>LLM does not have the knowledge of your private data. EG. if you want to create a chatbot for your ecommerce company, you can provide all the informations about the products, prices and other informations to the llm and generate the factual answer for your clients.

Some smart guy must be thinking, I will add all the information about my products to the query and ask the llm directly. But there are a few problems with that. First, the longer the query is the more computational cost you will have to pay. Second, there is a limit to how long query you can ask to a llm called context window. Third, even if you provide all the information of your products along with the query, llm tends to forget the information at the beginning of the context. This is called recency bias.

<img src="https://miro.medium.com/v2/resize:fit:720/format:webp/0*E2Flv8LJglvGq4pi.jpeg">

As shown in figure, we first retrieve the relevant documents from our database for the given query, then ask the llm to be an obidient assistant and answer the question using the information provided in the retrieved documents only. That's it.


Let's dive into the code. Before we start building the chatbot, we need to install some python packages.


In [None]:
# ! pip install langchain_community tiktoken langchainhub chromadb langchain langchain-google-genai

Gemini is free for everybody. Get your Gemini API key from https://ai.google.dev/gemini-api/docs/api-key


In [1]:
import os
os.environ['GOOGLE_API_KEY'] = "AIzaSyBfjPwM1QerNiCbX4Qxx9Lj15SE32YHh6M"
# os.environ['GOOGLE_API_KEY'] = "<your api key>"

## Building a database

At first let us download the document to provide the knowledge base to our chatbot. For this example, we will use the wikipedia page of my favorite philosopher **Friendrich Nietzsche**.


In [3]:
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI


# # Load Documents
loader = WebBaseLoader(
    web_paths=("https://en.wikipedia.org/wiki/Friedrich_Nietzsche",),
)

docs = loader.load()
docs

[Document(page_content='\n\n\n\nFriedrich Nietzsche - Wikipedia\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nJump to content\n\n\n\n\n\n\n\nMain menu\n\n\n\n\n\nMain menu\nmove to sidebar\nhide\n\n\n\n\t\tNavigation\n\t\n\n\nMain pageContentsCurrent eventsRandom articleAbout WikipediaContact usDonate\n\n\n\n\n\n\t\tContribute\n\t\n\n\nHelpLearn to editCommunity portalRecent changesUpload file\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSearch\n\n\n\n\n\n\n\n\n\n\n\nSearch\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCreate account\n\nLog in\n\n\n\n\n\n\n\n\nPersonal tools\n\n\n\n\n\n Create account Log in\n\n\n\n\n\n\t\tPages for logged out editors learn more\n\n\n\nContributionsTalk\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nContents\nmove to sidebar\nhide\n\n\n\n\n(Top)\n\n\n\n\n\n1Life\n\n\n\nToggle Life subsection\n\n\n\n\n\n1.1Youth (1844–1868)\n\n\n\n\n\n\n\n1.2Professor at Basel (1869–1879)\n\n\n\n\n\n\n\n1.3Independent philosop

Splitting the whole wikipedia page and creating small chunks of text is a good idea to create small documents that can be retrieved according to relevance so that we can pass the only necessary details to the llm to answer the specific query.


In [8]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
print(len(splits))

284


## Creating a vector db

Now, we convert all the splitted little chunks into the vector embedding using the llm's embedding and store them in the database. Embedding vector is simply a one dimensional array of numbers that represent the semantic meaning of the text. Creating the vectors enables us to accurately search which chunks of texts are relevant to answer the given question. We basically convert the given question to embedding and get the cosine similary of that question embedding and each of the documents in our database. Documents with high cosine similarity are semantically similar to the question i.e. they have the potential answers to the user's question.


In [9]:
gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")


vectorstore = Chroma.from_documents(
                     documents=splits,
                     embedding=gemini_embeddings,    
                     )
retriever = vectorstore.as_retriever()


Let me prove my point by showing the following example.


In [12]:
question = "What did Nietzsche believe about the christian doctrine?"
retriever.get_relevant_documents(question)

  warn_deprecated(


[Document(page_content='Nietzsche believed that Christian moral doctrine was originally constructed to counteract nihilism. It provides people with traditional beliefs about the moral values of good and evil, belief in God (whose existence one might appeal to in justifying the evil in the world), and a framework with which one might claim to have objective knowledge. In constructing a world where objective knowledge is supposed to be possible, Christianity is an antidote to a primal form of nihilism—the despair of meaninglessness. As Heidegger put the problem, "If God as the supra sensory ground and goal of all reality is dead if the supra sensory world of the ideas has suffered the loss of its obligatory and above it its vitalising and upbuilding power, then nothing more remains to which man can cling and by which he can orient himself."[168]', metadata={'language': 'en', 'source': 'https://en.wikipedia.org/wiki/Friedrich_Nietzsche', 'title': 'Friedrich Nietzsche - Wikipedia'}),
 Docu

This is retrieved as the relevant document from the database we created earlier.
<br>
_Nietzsche believed that Christian moral doctrine was originally constructed to counteract nihilism..._


## Prompt Engineering

The prompt we use to answer the user's query using the context retrieved from our database looks like this:
<br>
_input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"_


In [13]:
prompt = hub.pull("rlm/rag-prompt")
prompt

ChatPromptTemplate(input_variables=['context', 'question'], metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

The chain in langchain comes from the code below. We first get the user's question, retrieve the context from the database using retrieve, format the retrieved documents(4 most similar documents are retrieved here) into a single text and then create a prompt using the context and the query. We can then pass that prompt and get the response. To parse the response, langchain provides us `StrOutputParser`. Finally, we can show the response to our user.


In [11]:
llm = ChatGoogleGenerativeAI(model="gemini-pro",
                 temperature=0.7, top_p=0.85)

# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Question
rag_chain.invoke(question)

'According to Nietzsche, the Christian moral doctrine was a response to nihilism. It provided people with traditional beliefs about morality, belief in God, and a framework for objective knowledge. This doctrine was meant to counteract the despair of meaninglessness.'

The llm says
<br>
_According to Nietzsche, the Christian moral doctrine was a response to nihilism. It provided people with traditional beliefs about morality, belief in God, and a framework for objective knowledge. This doctrine was meant to counteract the despair of meaninglessness._

**Try it out** https://www.kaggle.com/code/ashokneupane/intro-to-rag-custom-chatbots-oversimplified
<br>
**Follow me in linked:** https://www.linkedin.com/in/ashok-neupane-156959232/
