# RAG (Retrieval-Augmented Generation)

RAG = ```Retrieval-Augmented Generation``` is an AI architecture that combines:

Retrieval → A retriever component (e.g., vector database with embeddings like FAISS, Pinecone, Milvus, Weaviate, Chroma) fetches the most relevant documents given a query.

Generation → A large language model (LLM) takes both the query and the retrieved context to produce a grounded, coherent answer.
```
User Query → Encode as embedding → Retrieve top-k docs → LLM input = [query + docs] → LLM generates answer 

```

![RAG Architecture](https://media.licdn.com/dms/image/v2/D5622AQFmFigdQfc-MQ/feedshare-shrink_2048_1536/B56ZisfydZG4Ao-/0/1755240673959?e=1758758400&v=beta&t=A60hsG_lPD0PvSXPif3mwgps35J6cUwkFkzn1VCw2Ws)



## Vector DB

![Vector DB](https://cdn.prod.website-files.com/6064b31ff49a2d31e0493af1/6894989eb89eba6244f42a13_66d6f57b458248b848888df1_AD_4nXc03SwDTX0vl9Y3L59hBb96cNc56cTMdEBNll2T57ReWxzg7r9JOeDlGxCUfswjP5r2XRAVT7ZF4ri7ZF6PuF92bMGh7Swg50tK-2NqZyWThdj0BskQ46PrOnb0Q7jGT1WV06jBTxvzE1HhtzqkkhdDQ4-t.png)


A vector database is a special type of database designed to store and search high-dimensional vector embeddings (numeric representations of data like text, images, or audio) efficiently.

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

# Load environment variables from .env
_ = load_dotenv(find_dotenv())

# Access Groq API key
groq_api_key = os.environ["GROQ_API_KEY"]


In [None]:
!pip install -r requirements.txt


In [3]:
from langchain_groq import ChatGroq

llamaChatModel = ChatGroq(
    model="llama3-70b-8192",   # you can also try "llama3-8b-8192" for cheaper runs
    temperature=0.2
)

# 5. Call the model
response = llamaChatModel.invoke("Tell me a fun fact about space.")
print(response.content)

Here's one:

**There is a giant storm on Jupiter that has been raging for at least 187 years!**

The Great Red Spot, as it's called, is a persistent anticyclonic storm on Jupiter, which means that it's a high-pressure region with clockwise rotation. It's so large that three Earths could fit inside it.

The storm was first observed in 1831, and it's been continuously monitored since then. Despite its incredible longevity, the Great Red Spot is actually shrinking, and its color has changed from a deep red to more of a pale pink over the years.

Isn't that just mind-blowing?


### Txt loading

In [5]:
# Txt data loading
from langchain_community.document_loaders import TextLoader

loader = TextLoader("ms_dhoni.txt")
loaded_data = loader.load()

In [12]:
loaded_data

[Document(metadata={'source': 'ms_dhoni.txt'}, page_content="Mahendra Singh Dhoni, often known simply as MS Dhoni, is one of the greatest cricketers in the history of the game. \nHe was born on 7th July 1981 in Ranchi, Jharkhand, India. Dhoni is celebrated not only for his exceptional cricketing skills \nbut also for his calm demeanor, sharp decision-making abilities, and inspirational leadership on and off the field.\n\nDhoni made his debut for the Indian cricket team in December 2004 against Bangladesh. He initially caught attention with his \naggressive batting style, unorthodox wicket-keeping, and ability to finish matches under pressure. Over the years, he transformed \ninto a complete cricketer and one of the finest captains cricket has ever seen.\n\nKnown as 'Captain Cool,' Dhoni led the Indian team to several historic victories. Under his leadership, India won the inaugural \nT20 World Cup in 2007, the 2011 ICC Cricket World Cup, and the 2013 ICC Champions Trophy, making him th

## Csv Loader


In [15]:
# CSV loader

from langchain_community.document_loaders import CSVLoader

loader = CSVLoader('student_performance.csv')

loaded_data = loader.load()
# loaded_data

In [16]:
loaded_data

[Document(metadata={'source': 'student_performance.csv', 'row': 0}, page_content='StudentID: 1\nName: Aarav\nMaths: 85\nScience: 80\nEnglish: 78\nHistory: 72\nTotal: 315'),
 Document(metadata={'source': 'student_performance.csv', 'row': 1}, page_content='StudentID: 2\nName: Ishita\nMaths: 78\nScience: 85\nEnglish: 82\nHistory: 76\nTotal: 321'),
 Document(metadata={'source': 'student_performance.csv', 'row': 2}, page_content='StudentID: 3\nName: Rohan\nMaths: 92\nScience: 88\nEnglish: 89\nHistory: 85\nTotal: 354'),
 Document(metadata={'source': 'student_performance.csv', 'row': 3}, page_content='StudentID: 4\nName: Meera\nMaths: 65\nScience: 60\nEnglish: 70\nHistory: 68\nTotal: 263'),
 Document(metadata={'source': 'student_performance.csv', 'row': 4}, page_content='StudentID: 5\nName: Kiran\nMaths: 74\nScience: 72\nEnglish: 75\nHistory: 70\nTotal: 291'),
 Document(metadata={'source': 'student_performance.csv', 'row': 5}, page_content='StudentID: 6\nName: Dev\nMaths: 88\nScience: 91\nEng

## Pdf Loader


In [20]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('Gen_AI_Roadmap_2025_V2.pdf')

loaded_data = loader.load_and_split()


In [21]:
loaded_data

[Document(metadata={'source': 'Gen_AI_Roadmap_2025_V2.pdf', 'page': 0}, page_content='codebasics.io  \n \n1 \nGen AI Roadmap for Beginners  \nFollowing is the roadmap  to learning  Gen AI Skills  for a total beginner. It includes FREE learning \nresources for technical skills (or tool skills) and soft (or core) skills                         \nThis roadmap will help you become a Gen  AI Engineer (a.k.a Gen AI Developer)  \nFind Your Suitability : Before you start your learning journey, it is important you find out if AI \nengineering  career really suits your natural abilities and interests . Take this test to know your \nsuitability : https://codebasics.io/survey/find -your-match -ds \nProceed further if the results show that this career role matches  you. \nTotal Duration: 6 Months  (4 hours  of study every day, 6 days a week )'),
 Document(metadata={'source': 'Gen_AI_Roadmap_2025_V2.pdf', 'page': 1}, page_content='codebasics.io  \n \n2 \n \nWeek 0: Computer Science Fundamentals 💻 \n

## WikipediaLoader

In [22]:
# wikipedia data loader

from langchain_community.document_loaders import WikipediaLoader

loader = WikipediaLoader(query="Tesla", load_max_docs=1)

loaded_data = loader.load()[0].page_content

In [23]:
loaded_data

'Tesla, Inc. ( TEZ-lə or   TESS-lə) is an American multinational automotive and clean energy company. Headquartered in Austin, Texas, it designs, manufactures and sells battery electric vehicles (BEVs), stationary battery energy storage devices from home to grid-scale, solar panels and solar shingles, and related products and services.\nTesla was incorporated in July 2003 by Martin Eberhard and Marc Tarpenning as Tesla Motors. Its name is a tribute to inventor and electrical engineer Nikola Tesla. In February 2004, Elon Musk led Tesla\'s first funding round and became the company\'s chairman; in 2008, he was named chief executive officer. In 2008, the company began production of its first car model, the Roadster sports car, followed by the Model S sedan in 2012, the Model X SUV in 2015, the Model 3 sedan in 2017, the Model Y crossover in 2020, the Tesla Semi truck in 2022 and the Cybertruck pickup truck in 2023.\nTesla is one of the world\'s most valuable companies in terms of market c

In [24]:
from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("human", "Answer this {question}, here is some extra {context}"),
    ]
)

messages = chat_template.format_messages(
    name="Tesla",
    question="Tell me about tesla",
    context=loaded_data
)

In [26]:
llamaChatModel.invoke(messages).content

'Here\'s an overview of Tesla, Inc.:\n\n**Company Overview**\n\nTesla, Inc. is an American multinational automotive and clean energy company headquartered in Austin, Texas. The company designs, manufactures, and sells battery electric vehicles (BEVs), stationary battery energy storage devices, solar panels, and solar shingles, as well as related products and services.\n\n**History**\n\nTesla was founded in July 2003 by Martin Eberhard and Marc Tarpenning as Tesla Motors. The company\'s name is a tribute to inventor and electrical engineer Nikola Tesla. In February 2004, Elon Musk led Tesla\'s first funding round and became the company\'s chairman. In 2008, Musk was named CEO.\n\n**Vehicle Models**\n\nTesla has produced several vehicle models, including:\n\n1. Roadster (2008) - a sports car\n2. Model S (2012) - a sedan\n3. Model X (2015) - an SUV\n4. Model 3 (2017) - a sedan\n5. Model Y (2020) - a crossover\n6. Tesla Semi (2022) - a truck\n7. Cybertruck (2023) - a pickup truck\n\n**Mark