# Raw RAG 01: The Basics

I'm sure you are familiar with RAG. In case you are not, here is a brief explanation. 

**Retrieval-Augmented Generation (RAG)** is an innovative approach in natural language processing that combines the power of large language models with external knowledge retrieval. It addresses some of the limitations of traditional language models by allowing them to access and utilize up-to-date, specific information from a knowledge base.

## Key Components of RAG:

* *Retriever*: Searches and selects relevant information from a knowledge base.
* *Generator*: A language model that produces responses based on the retrieved information and the input query.
* *Knowledge Base*: A collection of documents or data that serves as an external source of information.

## How RAG Works:

When given a query, the retriever searches the knowledge base for relevant information.
The retrieved information is then passed to the generator along with the original query.
The generator uses this context to produce a more informed and accurate response.

## Benefits of RAG:

* *Up-to-date Information*: Can access the latest information, unlike static language models.
* *Reduced Hallucination*: By grounding responses in retrieved facts, RAG reduces the likelihood of generating false or irrelevant information.
* *Transparency*: The retrieved information can be presented alongside the generated response, providing insight into the model's "reasoning."
* *Flexibility*: The knowledge base can be easily updated or customized for specific domains or applications.

RAG represents a significant step forward in creating more reliable, informative, and context-aware systems.

In this notebook, we will implement a Retrieval-Augmented Generation (RAG) system using a "raw" approach. Instead of relying on high-level frameworks like LangChain or LlamaIndex, I'll build our system using only basic Python code, existing OpenAI (and other commercial LLMs) libraries, and a local vector database. This approach offers several benefits:

* Greater understanding: By implementing each component ourselves, we gain deeper insights into how RAG systems work "under the hood".
* Flexibility: We can easily modify and optimize each step for our specific needs.
* Minimal dependencies: This implementation will have fewer external dependencies, reducing potential conflicts and security risks.
* Educational value: This approach is excellent for learning the fundamentals of RAG systems.

**Note**: You will need an OpenAI API key for the embedding and generation processes. Ensure you have this set up before proceeding.
We will walk through the following steps to create our RAG system:

* Data Ingestion: Load and preprocess the input documents from our knowledge base.
* Embedding Generation: Convert the preprocessed text into dense vector representations (embeddings) using OpenAI's embedding model.
* Vector Database Storage: Store the generated embeddings efficiently in a local vector database for quick retrieval.
* Semantic Retrieval: Given a user query, find the most relevant documents by comparing the query embedding to those in our database.
* Augmented Generation: Combine the retrieved relevant context with the original query to generate an informed and accurate response using OpenAI's language model.

Each step will be explained in detail, with Python code examples and explanations to help you understand the process thoroughly. By the end of this notebook, you'll have a functional RAG system built from scratch and a solid foundation for more advanced implementations.
Let's begin by setting up our environment and importing the necessary libraries.

In [16]:
# Only necessary libraries are imported, such as openai and faiss. This aims to reduce the library dependencies and make the notebook more lightweight.

%pip install openai faiss-cpu numpy python-dotenv

 '',
 '\x1b[1m[\x1b[0m\x1b[34;49mnotice\x1b[0m\x1b[1;39;49m]\x1b[0m\x1b[39;49m A new release of pip is available: \x1b[0m\x1b[31;49m24.0\x1b[0m\x1b[39;49m -> \x1b[0m\x1b[32;49m24.1.2\x1b[0m',
 '\x1b[1m[\x1b[0m\x1b[34;49mnotice\x1b[0m\x1b[1;39;49m]\x1b[0m\x1b[39;49m To update, run: \x1b[0m\x1b[32;49mpip install --upgrade pip\x1b[0m']

In [2]:
# Load the environment variables from the .env file

from dotenv import load_dotenv
import os

dotenv_path = ".env"
load_dotenv(dotenv_path=dotenv_path)

In [3]:
# Please see the utils.py file for the implementation of the text_splitter function.
# Additional information on utils.py can be read at utils_readme.md

from utils import TextProcessor

text_process = TextProcessor()

“The Lottery [abridged]” (1948)--- By Shirley Jackson

The people of the village began to gather in the square, between the post office and the bank, around ten o'clock; in some towns there were so many people that the lottery took two days and had to be started on June 2nd.  But in this village, where there were only about three hundred people, the whole lottery took less than two hours, so it could begin at ten o'clock in the morning and still be through in time to allow the villagers to get home for noon dinner. 
The children assembled first, of course. Bobby Martin had already stuffed his pockets full of stones, and the other boys soon followed his example, selecting the smooth


In [4]:
# Load the text from a file. Original text: "The Lottery" by Shirley Jackson https://www.newyorker.com/magazine/1948/06/26/the-lottery 

file_path = 'docs/the_lottery_text.txt'

with open(file_path, 'r', encoding="utf-8") as file:
  text = file.read()

print(text[:690])

Total number of paragraphs: 23

First 5 paragraphs:
“The Lottery [abridged]” (1948)--- By Shirley Jackson The people of the village began to gather in the square, between the post office and the bank, around ten o'clock; in some towns there were so many people that the lottery took two days and had to be started on June 2nd.

---

But in this village, where there were only about three hundred people, the whole lottery took less than two hours, so it could begin at ten o'clock in the morning and still be through in time to allow the villagers to get home for noon dinner. The children assembled first, of course.

---

Bobby Martin had already stuffed his pockets full of stones, and the other boys soon followed his example, selecting the smoothest and roundest stones; Bobby and Harry Jones and Dickie Delacroix-- the villagers pronounced this name "Dellacroy"--eventually made a great pile of stones in one corner of the square and guarded it against the raids of the other boys. The lottery 

In [5]:
# Pre-process the text into chunks in order to for vectorized embedding process

paragraphs = text_process.text_splitter(text, char_limit=500)

print("Total number of paragraphs:", len(paragraphs))

print("\nFirst 5 paragraphs:")
for paragraph in paragraphs[:5]:
  print(paragraph)
  print("\n---\n")

Total number of paragraphs: 23

First 5 paragraphs:
“The Lottery [abridged]” (1948)--- By Shirley Jackson The people of the village began to gather in the square, between the post office and the bank, around ten o'clock; in some towns there were so many people that the lottery took two days and had to be started on June 2nd.

---

But in this village, where there were only about three hundred people, the whole lottery took less than two hours, so it could begin at ten o'clock in the morning and still be through in time to allow the villagers to get home for noon dinner. The children assembled first, of course.

---

Bobby Martin had already stuffed his pockets full of stones, and the other boys soon followed his example, selecting the smoothest and roundest stones; Bobby and Harry Jones and Dickie Delacroix-- the villagers pronounced this name "Dellacroy"--eventually made a great pile of stones in one corner of the square and guarded it against the raids of the other boys. The lottery 

In [6]:
from openai import OpenAI
import numpy as np
import faiss 

client = OpenAI()

In [7]:
def get_embedding(text: str, model: str = "text-embedding-3-small"):
    """
    Get embedding for a given text using OpenAI's API.

    Parameters:
    text (str): The input text for which the embedding needs to be generated.
    model (str): The name of the model to use for generating the embedding. Default is "text-embedding-3-small", the cheapest latest model.

    Returns:
    list: The embedding vector for the input text.
    """
    text = text.replace("\n", " ")
    return client.embeddings.create(input=[text], model=model).data[0].embedding

In [8]:
# Create embedding for all paragraphs
embeddings = [get_embedding(text) for text in paragraphs]

In [9]:
# Initialize FAISS index
dimension = len(embeddings[0])  # Dimension of the embedding
index = faiss.IndexFlatL2(dimension)

In [10]:
# Add embeddings to the FAISS index
index.add(np.array(embeddings).astype("float32"))

In [11]:
# Perform a search
query = "When was it written?"
query_embedding = get_embedding(query)

k = 2  # Number of nearest neighbors to retrieve
distances, indices = index.search(np.array([query_embedding]).astype("float32"), k)

In [12]:
# Retrieve and print results
print(f"Query: {query}")
print("Nearest neighbors:")
for i, idx in enumerate(indices[0]):
    print(f"{i+1}. {paragraphs[idx]} (Distance: {distances[0][i]})")
    
# Combine the paragraphs into a single context
context = " ".join(paragraphs)

Query: When was it written?
Nearest neighbors:
1. “The Lottery [abridged]” (1948)--- By Shirley Jackson The people of the village began to gather in the square, between the post office and the bank, around ten o'clock; in some towns there were so many people that the lottery took two days and had to be started on June 2nd. (Distance: 1.5569045543670654)
2. Summers said, and Bill Hutchinson reached into the box and felt around, bringing his hand out at last with the slip of paper in it. The crowd was quiet. A girl whispered, "I hope it's not Nancy," and the sound of the whisper reached the edges of the crowd. "It's not the way it used to be. " Old Man Warner said clearly. "People ain't the way they used to be. " "All right," Mr. Summers said. "Open the papers. Harry, you open little Dave's. " Mr. (Distance: 1.602304458618164)


In [13]:
# Send the retrieved paragraphs along with the query to the OpenAI API to generate a complete answer

full_query = f"""Use the below context to answer the subsequent question. If the answer cannot be found, write "I don't know."

Article:
\"\"\"
{context}
\"\"\"

Question: {query}"""

response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You answer questions for the user.",
        },
        {"role": "user", "content": full_query},
    ],
    model="gpt-4-turbo",
    temperature=0,
)

print(response.choices[0].message.content)

The article was written in 1948.


# Conclusion: You've Built Your Own RAG System!

Congratulations! You've successfully implemented a Retrieval-Augmented Generation (RAG) system from scratch using pure Python and minimal external libraries. Let's recap what you've accomplished:

1. **Built a complete RAG pipeline**: From data ingestion to augmented generation, you've created each component of a functional RAG system.

2. **Gained deep insights**: By implementing each step yourself, you've developed a thorough understanding of how RAG systems work under the hood.

3. **Developed practical skills**: You've worked with embeddings, vector databases, and language models - all valuable skills in RAG.

4. **Created a flexible foundation**: Your "raw" implementation can be easily modified and extended for various applications.

## Key Takeaways:

- RAG systems enhance language models with external knowledge, improving accuracy and reducing hallucinations.
- Efficient embedding and retrieval are crucial for RAG performance.
- Proper context integration is key to generating relevant and accurate responses.

## Next Steps:

Now that you have a working RAG system, consider these ideas to further your learning:

1. **Experiment with different embedding models** to see how they affect retrieval accuracy.
2. **Try various vector databases** and compare their performance.
3. **Implement additional preprocessing steps** to improve the quality of your knowledge base.
4. **Explore advanced retrieval techniques** like hybrid search or re-ranking.
5. **Add a simple UI** to interact with your RAG system more easily.

Remember, the field of LLM and RAG is constantly evolving. Keep exploring, experimenting, and building to stay at the forefront of this exciting field!

---

**Note**: While we could have used direct HTTP requests to process OpenAI API calls for an even more "raw" approach, our focus was on understanding RAG implementation without additional frameworks. The key takeaway is that you can create a functional RAG system using basic Python libraries and concepts, giving you full control and understanding of the process.