# 🚀 **Building a Simple RAG System**  

Retrieval-Augmented Generation (**RAG**) enhances **LLMs** by retrieving relevant data before generating responses. Follow these steps to implement a basic **RAG pipeline**:  

---

## 🔹 **1. Gather Documents**  
📌 Collect structured/unstructured data such as:  
- PDFs 📄  
- Excel Sheets 📊  
- Word Documents 📝  
- Web Articles 🌐  

---

## 🔹 **2. Load the Documents**  
Use libraries like:  
```python
from langchain.document_loaders import PyPDFLoader, CSVLoader, UnstructuredWordDocumentLoader
```
- PDFs → `PyPDFLoader()`  
- Excel → `CSVLoader()`  
- Word → `UnstructuredWordDocumentLoader()`  

---

## 🔹 **3. Split the Text**  
**Chunk large documents** for efficient retrieval:  
- **Option 1:** Split by **pages**  
- **Option 2:** Split by **logical flows (sections/paragraphs)**  

```python
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
```

---

## 🔹 **4. Chunk the Split Data**  
Breaking down text into **manageable chunks** improves retrieval accuracy.  

✅ **Chunking Example:**  
```
Chunk 1: "Introduction to RAG..."
Chunk 2: "RAG uses embeddings to enhance LLMs..."
```

---

## 🔹 **5. Create Embeddings of the Chunked Data**  
Convert text chunks into **vector embeddings** using models like:  
- `text-embedding-ada-002` (OpenAI)  
- `all-MiniLM-L6-v2` (Sentence Transformers)  

```python
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
vector_data = embeddings.embed_documents(chunks)
```

---

## 🔹 **6. Instantiate the Vector Store**  
Choose a **Vector Database**:  
✅ **Local**: FAISS, ChromaDB  
✅ **Cloud**: Pinecone, Weaviate, MongoDBAtlasVectorSearch  

Example using **FAISS**:  
```python
from langchain.vectorstores import FAISS

vectorstore = FAISS.from_documents(chunks, embeddings)
```

---

## 🔹 **7. Add Chunk Embeddings to the Vector Store**  
Store **chunked embeddings** for fast retrieval:  

```python
vectorstore.add_documents(chunks)
```

Now, **queries** can retrieve the **most relevant** document chunks!

---

## 🔹 **8. Initialize the Chat Model**  
Use **local** or **cloud-based** LLMs:  
✅ **Local:** Ollama, LM Studio  
✅ **Cloud:** OpenAI, AzureAI, AWS  

```python
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4")
```

---

## 🔹 **9. Set Instructions for the Model**  
Define **chatting templates** or **string prompts** to guide the model.  

✅ Example Prompt:  
```
Use the retrieved context to answer the user's query concisely.
```

```python
from langchain.prompts import PromptTemplate

template = PromptTemplate(
    input_variables=["context", "question"],
    template="Answer the question based on context:\n\n{context}\n\nQuestion: {question}"
)
```

---

## 🔹 **10. Create the Retrieval Chain**  
**Chains** sequence multiple **LLM calls, tools, and preprocessing steps**.  
Use **LCEL (LangChain Expression Language)** to build a chain.

```python
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever())  ### Example Chain (Now deprecated)
```

---

## ✅ **Final Architecture**  
```
1️⃣ User Query  ➝  2️⃣ Query Embedding  ➝  3️⃣ Vector Search  
  ➝  4️⃣ Retrieve Top-K Chunks  ➝  5️⃣ LLM Generation  ➝  6️⃣ Final Answer  
```

🔹 Now, your **RAG system** is ready to **retrieve & generate** answers efficiently! 🚀  




### STEP-1
Gather documents
We will be using text type documents and pdf type documents

In [None]:
%pip install -qU langchain_community pypdf  # You can checkout complete list of document loaders available in langchain at https://python.langchain.com/docs/integrations/document_loaders/

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━[0m [32m1.8/2.5 MB[0m [31m53.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m46.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/300.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m300.7/300.7 kB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# !pip install langchain (Install lanchain)

from langchain.document_loaders import TextLoader # Simple TextLoader

from langchain_community.document_loaders import PyPDFLoader  # One of the many PDF loader

In [None]:
!pip install chardet



### STEP-2
Load relevant documents

In [None]:
import pprint
loader = PyPDFLoader(
    "book1.pdf",
    mode="page",                                            # You can use mode ="simple" as well for whole doc as single unit
)
docs = loader.load()
print(len(docs))
pprint.pp(docs[0].metadata)


# loader = TextLoader('odyssey.txt',autodetect_encoding=True) # In case text document has unknown encodings use autodetect
# docs = loader.load()
# docs = loader.load()
# print(len(docs))
# pprint.pp(docs[0].metadata)

137
{'producer': 'calibre (0.7.50) [http://calibre-ebook.com]',
 'creator': 'calibre (0.7.50) [http://calibre-ebook.com]',
 'creationdate': '2022-09-20T04:19:35+00:00',
 'author': 'Franklin W. Dixon',
 'keywords': 'Hardy Boys (Fictitious characters), Detective and mystery '
             'stories, Brothers, Teenage boy detectives, Mystery & Detective, '
             'Juvenile Fiction, Mysteries & Detective Stories, General, '
             "Children's stories; American, Mystery fiction, Fiction, "
             'Detective and mystery stories; American, Mystery and detective '
             'stories',
 'moddate': '2022-09-20T04:19:36+00:00',
 'title': 'The Tower Treasure',
 'source': 'book1.pdf',
 'total_pages': 137,
 'page': 0,
 'page_label': '1'}


### STEP-3 & 4
Splitting the text into chunks

There are various methods to split the text in langchain

1. Character splitter: Splits based on characters
2. Sentence splitter: Splits based on sentences
3. Token splitter: Splits based on tokens
4. Recursive splitter: Mix of sentences, paragraphs (Most used)
5. You can create your own as well!!

In [None]:
# import various splitters
from langchain.text_splitter import (
    CharacterTextSplitter,
    RecursiveCharacterTextSplitter,
    SentenceTransformersTokenTextSplitter,
    TextSplitter,
    TokenTextSplitter,
)

In [None]:
# # Useful for consistent chunk sizes regardless of content structure.
# print("\n--- Using Character-based Splitting ---")
# char_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
# char_docs = char_splitter.split_documents(docs)

# # Splits text into chunks based on sentences, ensuring chunks end at sentence boundaries.
# # Ideal for maintaining semantic coherence within chunks.
# print("\n--- Using Sentence-based Splitting ---")
# sent_splitter = SentenceTransformersTokenTextSplitter(chunk_size=1000)
# sent_docs = sent_splitter.split_documents(docs)

# # Splits text into chunks based on tokens (words or subwords), using tokenizers like GPT-2.
# # Useful for transformer models with strict token limits.
# print("\n--- Using Token-based Splitting ---")
# token_splitter = TokenTextSplitter(chunk_overlap=0, chunk_size=512)
# token_docs = token_splitter.split_documents(docs)

# Attempts to split text at natural boundaries (sentences, paragraphs) within character limit.
# Balances between maintaining coherence and adhering to character limits.
print("\n--- Using Recursive Character-based Splitting ---")
rec_char_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512, chunk_overlap=0)
rec_char_docs = rec_char_splitter.split_documents(docs)

# Allows creating custom splitting logic based on specific requirements.
# Useful for documents with unique structure that standard splitters can't handle.
# print("\n--- Using Custom Splitting ---")


# class CustomTextSplitter(TextSplitter):
#     def split_text(self, text):
#         # Custom logic for splitting text
#         return text.split("\n\n")  # Example: split by paragraphs


# custom_splitter = CustomTextSplitter()
# custom_docs = custom_splitter.split_documents(docs)



--- Using Recursive Character-based Splitting ---


### STEP-5

Create embeddings of the chunks

In [None]:
%pip install --upgrade --quiet  langchain langchain-huggingface sentence_transformers

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m64.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m35.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m45.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

In [None]:
# from langchain_ollama import OllamaEmbeddings

# embeddings = OllamaEmbeddings(
#     model="nomic-embed-text",                    # nomic-embed-text is a large context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks.
# )

# import getpass
# import os

# if not os.environ.get("OPENAI_API_KEY"):
#   os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

# from langchain_openai import OpenAIEmbeddings

# embeddings = OpenAIEmbeddings(model="text-embedding-3-large")


from langchain_huggingface.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

### STEP-6

Instantiate the vectorDB of your choice

In [None]:
from langchain_chroma import Chroma

# vector_store = Chroma(embedding_function=embeddings)

### STEP-7

Add chunk embeddings to VectorDB

In [None]:
import os


current_dir = os.getcwd()

db_dir = os.path.join(current_dir, "db3")

# Create function which stops from repititve addition of data into DB multiple times
def create_vector_store(docs, store_name):
    persistent_directory = os.path.join(db_dir, store_name)  # ChromaDB storage location
    if not os.path.exists(persistent_directory):
        print(f"\n--- Creating vector store {store_name} ---")
        store_name = Chroma.from_documents(
            docs, embeddings, persist_directory=persistent_directory
        )
        print(f"--- Finished creating vector store {store_name} ---")
    else:
        print(
            f"Vector store {store_name} already exists. No need to initialize.")

create_vector_store(docs,'vector_store')


--- Creating vector store vector_store ---
--- Finished creating vector store <langchain_chroma.vectorstores.Chroma object at 0x0000013FF49D8B60> ---


In [None]:
def query_vector_store(store_name, query, embedding_function):
    persistent_directory = os.path.join(db_dir, store_name)
    if os.path.exists(persistent_directory):
        print(f"\n--- Querying the Vector Store {store_name} ---")
        db = Chroma(
            persist_directory=persistent_directory,
            embedding_function=embedding_function,
        )
        retriever = db.as_retriever(
            search_type="similarity",
            search_kwargs={"k": 3},
        )
        relevant_docs = retriever.invoke(query)
        # Display the relevant results with metadata
        print(f"\n--- Relevant Documents for {store_name} ---")
        for i, doc in enumerate(relevant_docs, 1):
            print(f"Document {i}:\n{doc.page_content}\n")
            if doc.metadata:
                print(f"Source: {doc.metadata.get('source', 'Unknown')}\n")
        return relevant_docs
    else:
        print(f"Vector store {store_name} does not exist.")

query = "Who is frank hardy ?"
query_vector_store("vector_store", query, embeddings)


--- Querying the Vector Store vector_store ---

--- Relevant Documents for vector_store ---
Document 1:
Of these men, the Hardys took the reports on the ones who were thin and of
medium height.
Next came a check by telephone on the whereabouts of these people. All
could be accounted for as working some distance from Bayport at the time
of the thefts, with one exception.
“I’ll bet he’s our man!” Frank exclaimed. “But where is he now?”

Source: book1.pdf

Document 2:
Hardy. Hi, chums!” he said cheerily. “Sorry to be late. My dad had a lot of
phoning to do before he left. I was afraid if I’d tried to walk here, I
wouldn’t have arrived until tomorrow.”
At this point Mr. Hardy spoke up. “As I said before, I think you boys should
work in twos. There are only three of you to take care of half the territory.”
The detective suddenly grinned boyishly. “How about me teaming up with
one of you?”
Frank and Joe looked at their dad in delight. “You mean it?” Frank cried
out. “I’ll choose you as my p

[Document(id='416a5ab7-c443-4cbd-8e4c-71e78fc73e2d', metadata={'author': 'Franklin W. Dixon', 'creationdate': '2022-09-20T04:19:35+00:00', 'creator': 'calibre (0.7.50) [http://calibre-ebook.com]', 'keywords': "Hardy Boys (Fictitious characters), Detective and mystery stories, Brothers, Teenage boy detectives, Mystery & Detective, Juvenile Fiction, Mysteries & Detective Stories, General, Children's stories; American, Mystery fiction, Fiction, Detective and mystery stories; American, Mystery and detective stories", 'moddate': '2022-09-20T04:19:36+00:00', 'page': 71, 'page_label': '72', 'producer': 'calibre (0.7.50) [http://calibre-ebook.com]', 'source': 'book1.pdf', 'title': 'The Tower Treasure', 'total_pages': 137}, page_content='Of these men, the Hardys took the reports on the ones who were thin and of\nmedium height.\nNext came a check by telephone on the whereabouts of these people. All\ncould be accounted for as working some distance from Bayport at the time\nof the thefts, with one

### Verification step

You can manually check whether the data is correctly added to DB my manually querying into DB

In [None]:
query = "Who is frank hardy ?"
# Retrieve relevant documents based on the query

relevant_docs = query_vector_store("vector_store", query, embeddings)
# print(relevant_docs)
# Display the relevant results with metadata
print("\n--- Relevant Documents ---")
for i, doc in enumerate(relevant_docs, 1):
    # print(f"Document {i}:\n{doc.page_content}\n")
    if doc.metadata:
        print(f"Source: {doc.metadata.get('source', 'Unknown')}\n")

### STEP-8

Initialize the chat model of your choice OpenAI or Ollama or HuggingFace

In [None]:
from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage, SystemMessage

# messages = [
#     SystemMessage(content="Solve the following math problems"),
#     HumanMessage(content="What is 81 divided by 9?"),
# ]

# model = init_chat_model("gpt-4o-mini", model_provider="openai")

# model.invoke("Hello, world!")
# # Create a ChatOpenAI model
# model = init_chat_modelodel="gpt-4o")

# # Invoke the model with messages
# result = model.invoke(messages)
# print(f"Answer from OpenAI: {result.content}")


# # ---- Anthropic Chat Model Example ----

# # Create a Anthropic model
# # Anthropic models: https://docs.anthropic.com/en/docs/models-overview
# model = init_chat_modeldel=("claude-3-opus-20240229")

# result = model.invoke(messages)
# print(f"Answer from Anthropic: {result.content}")


# # ---- Google Chat Model Example ----

# # https://console.cloud.google.com/gen-app-builder/engines
# # https://ai.google.dev/gemini-api/docs/models/gemini
# model = init_chat_modelmodel=("gemini-1.5-flash")

# result = model.invoke(messages)
# print(f"Answer from Google: {result.content}")
%pip install --upgrade --quiet  langchain-huggingface text-generation transformers google-search-results numexpr langchainhub sentencepiece jinja2 bitsandbytes accelerate
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
)


from langchain_ollama import ChatOllama


llm = ChatOllama(
    model="llama3.2",
    temperature=0,
    # other params...
)

from langchain_core.messages import AIMessage

messages = [
    (
        "system",
        "You are a helpful assistant that writes professional code. Write good quality code.",
    ),
    ("human", "write python code to print fibonaci series til 6th "),
]
ai_msg = llm.invoke(messages)
ai_msg



### STEP-9

Set instructions for model
1. Set prompt for the model
2. Combine the retrieval output and the user query (Chaining)

In [None]:
# from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessage
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama
query = "Who is frank hardy?"
combined_input = (
    "Here are some documents that might help answer the question: "
    + query
    + "\n\nRelevant Documents:\n"
    + "\n\n".join([doc.page_content for doc in relevant_docs])
    + "\n\nPlease provide an answer based only on the provided documents. If the answer is not found in the documents, respond with 'I'm not sure'."
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content=combined_input),
]


llm = ChatOllama(
    model="llama3.2",
    temperature=0,
    # other params...
)


# Invoke the model with the combined input
result = llm.invoke(messages)
# Create the ChatPromptTemplate with messages

print(result.content)

Based on the provided documents, I can tell you that Frank Hardy is one of the main characters mentioned. He is a young man who works with his brother Joe and their father, Fenton Hardy, to solve mysteries and crimes.


### STEP-10

We will invoke use invoke function with retrieved_doc as context and user query as question

In [None]:
message1 = chain.format_messages(context = {relevant_docs},user_input="What is the name of the book ?")
chain.invoke(message1)

AttributeError: 'RunnableSequence' object has no attribute 'format_messages'

### Using community made end-to-end chains combined with retrievars
We can optionally use prebuilt chains as well to avoid the fuss like RetrievalQA

In [None]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate


persistent_directory = os.path.join(db_dir, 'vector_store')


if os.path.exists(persistent_directory):
    print(f"\n--- Querying the Vector Store  ---")
    db = Chroma(
        persist_directory=persistent_directory,
        embedding_function=embeddings,
    )
    retriever = db.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 3},
    )

prompt_template = """
    You are a helpful AI assistant that answers questions based on the provided PDF document.
    Use only the context provided to answer the question. If you don't know the answer or
    can't find it in the context, say so.

    Context: {context}

    Question: {question}

    Answer: Let me help you with that based on the PDF content."""

PROMPT = PromptTemplate(
        template=prompt_template,
        input_variables=["context", "question"]
    )

    # 6. Create and return the QA chain
qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,  # in this we have added the retriever in the chain itself instead of querying it manually first
        return_source_documents=True,
        chain_type_kwargs={"prompt": PROMPT}
    )


--- Querying the Vector Store  ---


In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI




system_prompt = (
    "Use the given context to answer the question. "
    "If you don't know the answer, say you don't know. "
    "Use three sentence maximum and keep the answer concise. "
    "Context: {context}"
)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)
question_answer_chain = create_stuff_documents_chain(llm, prompt)
chain = create_retrieval_chain(retriever, question_answer_chain)

result = chain.invoke({"input": query})

print(result['answer'])

Frank Hardy is one of the main characters in the story. He is the son of Fenton Hardy, a famous detective, and is also a young detective himself who works with his brother Joe to solve cases.
