# Building Your First Complete RAG System with LangChain

Welcome! In this notebook, we'll build a **complete Retrieval-Augmented Generation (RAG)** system using **LangChain** and **OpenAI**.

## What is RAG?

RAG combines information retrieval with language models to answer questions based on your own documents.

## The Complete RAG Pipeline:

1. **Load** documents 
2. **Chunk** them into smaller pieces
3. **Embed** chunks into vectors
4. **Store** in vector database
5. **Retrieve** relevant chunks based on query
6. **Generate** answers using an LLM with retrieved context

Let's build it step by step!

## Step 1: Install Required Packages

In [None]:
!pip install langchain langchain-community langchain-openai chromadb sentence-transformers tiktoken openai python-dotenv -q

## Step 2: Set Up OpenAI API Key

You'll need an OpenAI API key. Get one at: https://platform.openai.com/api-keys

**Create a `.env` file** in this directory with:
```
OPENAI_API_KEY=your-key-here
```

In [1]:
import os
from dotenv import load_dotenv

# Load API key from .env file
# Create a .env file in this directory with: OPENAI_API_KEY=your-key-here
load_dotenv()
api_key = os.environ.get("OPENAI_API_KEY")

if api_key:
    print("✅ API key loaded!")
else:
    print("⚠️  Warning: OPENAI_API_KEY not found in .env file")
    print("Create a .env file with: OPENAI_API_KEY=your-key-here")

✅ API key loaded!


## Step 3: Import LangChain Components

In [10]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

print("✅ All libraries imported successfully!")

✅ All libraries imported successfully!


## Step 4: Load Documents

In [6]:
loader = DirectoryLoader(
    'documents/',
    glob="**/*.txt",
    loader_cls=TextLoader
)

documents = loader.load()
print(f"✅ Loaded {len(documents)} documents")

✅ Loaded 3 documents


## Step 5: Chunk Documents

In [7]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

chunks = text_splitter.split_documents(documents)
print(f"✅ Created {len(chunks)} chunks")

✅ Created 10 chunks


## Step 6: Create Embeddings

We'll use OpenAI's embedding model:

In [None]:
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    openai_api_key=api_key
)

# Test embedding
test_embedding = embeddings.embed_query("What is RAG?")
print(f"✅ Embedding model loaded!")
print(f"Embedding dimension: {len(test_embedding)}")

## Step 7: Create Vector Store

In [12]:
print("Creating vector store...")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)
print(f"✅ Vector store created with {len(chunks)} chunks!")

Creating vector store...
✅ Vector store created with 10 chunks!


## Step 7b: OR Load Existing Vector Store (Skip Step 4-7 if using this)

**If you've already created the vector store**, you can load it directly instead of recreating it:

This is useful when you:
- Want to re-run the notebook without rebuilding the index
- Already have the embeddings saved
- Just want to query the existing data

In [None]:
# Load existing vector store (uncomment to use)
# vectorstore = Chroma(
#     persist_directory="./chroma_db",
#     embedding_function=embeddings
# )
# print(f"✅ Loaded existing vector store!")

# To use this:
# 1. Comment out Steps 4-7 above (or just don't run them)
# 2. Uncomment the code above
# 3. Run this cell instead

## Step 7c: Smart Approach - Load if Exists, Create if Not

**Best practice**: Check if the vector store already exists:

In [None]:
# Smart approach - load if exists, create if not (uncomment to use)
# import os
# 
# if os.path.exists("./chroma_db"):
#     print("Loading existing vector store...")
#     vectorstore = Chroma(
#         persist_directory="./chroma_db",
#         embedding_function=embeddings
#     )
#     print("✅ Loaded existing vector store!")
# else:
#     print("Creating new vector store...")
#     vectorstore = Chroma.from_documents(
#         documents=chunks,
#         embedding=embeddings,
#         persist_directory="./chroma_db"
#     )
#     print(f"✅ Vector store created with {len(chunks)} chunks!")

## Step 8: Set Up OpenAI LLM

We'll use GPT-3.5-turbo for fast, cost-effective answers:

In [None]:
# Initialize OpenAI LLM
llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0,  # 0 = more focused, 1 = more creative
    openai_api_key=api_key
)

print("✅ OpenAI LLM initialized!")

## Step 9: Test Retrieval Only

Before we do full generation, let's see what documents we retrieve:

In [None]:
# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Test retrieval
query = "What is machine learning?"
retrieved_docs = retriever.invoke(query)

print(f"Query: {query}\n")
print(f"Retrieved {len(retrieved_docs)} documents:\n")

for i, doc in enumerate(retrieved_docs, 1):
    print(f"{i}. {doc.page_content[:200]}...\n")

Query: What is machine learning?

Retrieved 3 documents:

1. There are three main types of machine learning:

1. Supervised Learning: The algorithm learns from labeled training data. Common applications include classification and regression tasks. Examples incl...

2. Machine Learning Fundamentals

Machine learning is a subset of artificial intelligence that enables computers to learn from data without being explicitly programmed. It focuses on developing algorithm...

3. 3. Reinforcement Learning: The algorithm learns through trial and error by receiving rewards or penalties. This is commonly used in robotics, game playing, and autonomous vehicles.

Key concepts in ma...



In [18]:
for i, doc in enumerate(retrieved_docs, 1):
    print(f"{i}. {doc.page_content}")
    print("="*20)

1. There are three main types of machine learning:

1. Supervised Learning: The algorithm learns from labeled training data. Common applications include classification and regression tasks. Examples include email spam detection and house price prediction.

2. Unsupervised Learning: The algorithm finds patterns in unlabeled data. Clustering and dimensionality reduction are common techniques. Applications include customer segmentation and anomaly detection.
2. Machine Learning Fundamentals

Machine learning is a subset of artificial intelligence that enables computers to learn from data without being explicitly programmed. It focuses on developing algorithms that can identify patterns and make decisions based on input data.

There are three main types of machine learning:
3. 3. Reinforcement Learning: The algorithm learns through trial and error by receiving rewards or penalties. This is commonly used in robotics, game playing, and autonomous vehicles.

Key concepts in machine learning i

## Step 10: Create the RAG Prompt Template

This prompt tells the LLM how to use the retrieved context:

In [19]:
# Create prompt template
template = """You are a helpful assistant answering questions based on the provided context.

Context:
{context}

Question: {question}

Answer the question based on the context above. If you cannot answer based on the context, say so.

Answer:"""

prompt = ChatPromptTemplate.from_template(template)
print("✅ Prompt template created!")

✅ Prompt template created!


## Step 11: Build the RAG Chain

This combines retrieval + generation into one pipeline:

In [20]:
# Helper function to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build the RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("✅ RAG chain built!")

✅ RAG chain built!


## Step 12: Test the Complete RAG System!

Now let's ask questions and get AI-generated answers based on our documents:

In [21]:
# Ask a question
question = "What is machine learning?"

print(f"Question: {question}\n")
print("Generating answer...\n")

answer = rag_chain.invoke(question)

print(f"{'='*60}")
print("ANSWER:")
print(f"{'='*60}")
print(answer)
print(f"{'='*60}")

Question: What is machine learning?

Generating answer...

ANSWER:
Machine learning is a subset of artificial intelligence that enables computers to learn from data without being explicitly programmed. It focuses on developing algorithms that can identify patterns and make decisions based on input data.


## Step 13: Try More Questions

In [22]:
# Question 2
question2 = "How do REST APIs work?"
answer2 = rag_chain.invoke(question2)

print(f"Q: {question2}\n")
print(f"A: {answer2}\n")

Q: How do REST APIs work?

A: REST APIs work by using standard HTTP methods (GET, POST, PUT, DELETE) to interact with resources on a server. Clients make requests to specific endpoints, providing headers, body data (for POST/PUT requests), and parameters as needed. The server processes these requests and responds with the appropriate HTTP status code to indicate the outcome of the operation. Data exchange typically occurs in JSON format, allowing for easy communication between different software applications.



In [25]:
# Question 4 - Test with something NOT in documents
question4 = "What is quantum computing?"
answer4 = rag_chain.invoke(question4)

print(f"Q: {question4}\n")
print(f"A: {answer4}\n")
print("="*60)
print("Expected: The AI should say something like:")
print("'I cannot answer this question based on the provided context.'")
print("or")
print("'The context does not contain information about quantum computing.'")

Q: What is quantum computing?

A: Quantum computing is not directly related to the context provided on machine learning fundamentals.

Expected: The AI should say something like:
'I cannot answer this question based on the provided context.'
or
'The context does not contain information about quantum computing.'


## Step 14: Understanding the Complete System

Congratulations! You've built a **complete RAG system** that:

✅ Loads and chunks documents  
✅ Generates embeddings  
✅ Stores in vector database  
✅ Retrieves relevant context  
✅ **Generates AI answers using GPT-3.5**  

## Why This Is Powerful:

- LLM answers based on YOUR data
- Works with private/domain-specific documents
- No need to fine-tune the LLM
- Combines semantic search + generation
- Uses powerful GPT models

## But Still Has Limitations:

- ❌ Everything hardcoded in notebook
- ❌ Can't reuse as a module
- ❌ No API for applications to use
- ❌ Not production-ready

**Next step:** Refactor this into clean Python modules and create an API!