# 1. **Building RAG with Cloud-Based OpenAI Models**

## **What's Covered?**
1. Introduction to Cloud RAG with OpenAI
2. Setting Up the Environment
3. Loading and Processing Documents
4. Creating Vector Store with Hugging Face Embeddings (Local)
5. Setting Up OpenAI Cloud LLM
6. Building the RAG Chain with LCEL
7. Querying the System


## **2. Setting Up the Environment**

### **Required Libraries**
We'll install the necessary packages for our RAG system:
- `langchain` and `langchain-community`: Core LangChain functionality
- `langchain-chroma`: ChromaDB integration
- `langchain-huggingface`: Hugging Face embeddings integration (local)
- `langchain-openai`: OpenAI integration for cloud LLM
- `sentence-transformers`: Required for embedding models
- `chromadb`: Vector database

In [1]:
# Install required packages
# Note: Run this cell once. It may take several minutes to complete.

# ! pip install langchain langchain-community langchain-chroma langchain-huggingface langchain-openai
# ! pip install sentence-transformers chromadb
#! pip install langchain-openai

Collecting langchain-openai
  Downloading langchain_openai-1.1.10-py3-none-any.whl.metadata (3.1 kB)
Collecting openai<3.0.0,>=2.20.0 (from langchain-openai)
  Downloading openai-2.24.0-py3-none-any.whl.metadata (29 kB)
Collecting tiktoken<1.0.0,>=0.7.0 (from langchain-openai)
  Using cached tiktoken-0.12.0-cp312-cp312-win_amd64.whl.metadata (6.9 kB)
Collecting jiter<1,>=0.10.0 (from openai<3.0.0,>=2.20.0->langchain-openai)
  Using cached jiter-0.13.0-cp312-cp312-win_amd64.whl.metadata (5.3 kB)
Collecting sniffio (from openai<3.0.0,>=2.20.0->langchain-openai)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Downloading langchain_openai-1.1.10-py3-none-any.whl (87 kB)
   ---------------------------------------- 0.0/87.2 kB ? eta -:--:--
   ------------------------------------- -- 81.9/87.2 kB 2.3 MB/s eta 0:00:01
   ---------------------------------------- 87.2/87.2 kB 2.5 MB/s eta 0:00:00
Downloading openai-2.24.0-py3-none-any.whl (1.1 MB)
   ----------------------------


[notice] A new release of pip is available: 24.0 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## **3. Loading and Processing Documents**

### **Step 3.1: Load Documents from Local Folder**
We'll use `DirectoryLoader` and `TextLoader` to load all `.txt` files from the `data/` folder.

In [2]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import TextLoader

# Load all .txt files from the data folder
loader = DirectoryLoader(
    'data/', 
    glob="*.txt", 
    show_progress=True, 
    loader_cls=TextLoader,
    loader_kwargs={'encoding': 'utf-8'}  # Ensure proper encoding
)

# Load documents
documents = loader.load()

print(f"✓ Loaded {len(documents)} documents")
print(f"✓ First document preview: {documents[0].page_content[:200]}...")

  from .autonotebook import tqdm as notebook_tqdm
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 58.84it/s]

✓ Loaded 3 documents
✓ First document preview: Alzheimer's disease (AD) is a neurodegenerative disease and is the most common form of dementia, accounting for around 60–70% of cases. 
The most common early symptom is difficulty in remembering rece...





### **Step 3.2: Split Documents into Chunks**
We use `RecursiveCharacterTextSplitter` to break documents into smaller chunks for better retrieval.

**Key Parameters:**
- `chunk_size`: Maximum characters per chunk (500)
- `chunk_overlap`: Overlap between chunks to maintain context (50)

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Initialize text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

# Split documents into chunks
chunks = text_splitter.split_documents(documents)

print(f"✓ Created {len(chunks)} chunks from {len(documents)} documents")
print(f"\n--- Sample Chunk ---")
print(f"Content: {chunks[0].page_content[:300]}...")
print(f"Metadata: {chunks[0].metadata}")

✓ Created 180 chunks from 3 documents

--- Sample Chunk ---
Content: Alzheimer's disease (AD) is a neurodegenerative disease and is the most common form of dementia, accounting for around 60–70% of cases. 
The most common early symptom is difficulty in remembering recent events. 
As the disease advances, symptoms can include problems with language, disorientation (in...
Metadata: {'source': 'data\\alzheimers_1.txt'}


## **4. Creating Vector Store with Hugging Face Embeddings**

### **Step 4.1: Initialize Hugging Face Embeddings**
We'll use `BAAI/bge-base-en-v1.5` - a powerful open-source embedding model (local).

**Note**: We keep embeddings local because:
- Cost-effective (free after download)
- Fast for batch processing
- Privacy-friendly
- OpenAI embeddings can be expensive for large datasets

In [4]:
from langchain_huggingface import HuggingFaceEmbeddings

# Initialize embedding model
# Note: First run will download the model (~400MB)
print("Loading embedding model... (this may take a minute on first run)")

embedding_model = HuggingFaceEmbeddings(
    model_name="BAAI/bge-base-en-v1.5",
    model_kwargs={'device': 'cpu'},  # Use 'cuda' if you have a GPU
    encode_kwargs={'normalize_embeddings': True}  # Normalize for cosine similarity
)

print("✓ Embedding model loaded successfully!")

# Test the embedding model
test_embedding = embedding_model.embed_query("What is Alzheimer's disease?")
print(f"✓ Embedding dimension: {len(test_embedding)}")

Loading embedding model... (this may take a minute on first run)


Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 199/199 [00:00<00:00, 599.98it/s, Materializing param=pooler.dense.weight]
[1mBertModel LOAD REPORT[0m from: BAAI/bge-base-en-v1.5
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


✓ Embedding model loaded successfully!
✓ Embedding dimension: 768


### **Step 4.2: Create Persistent ChromaDB Vector Store**
We'll create a ChromaDB vector store that persists to disk.

**Benefits of Persistence:**
- No need to re-embed documents on restart
- Faster startup times
- Efficient storage

In [5]:
from langchain_chroma import Chroma

# Initialize ChromaDB with persistence
print("Creating vector store...")

db = Chroma(
    collection_name="alzheimers_knowledge_base",
    embedding_function=embedding_model,
    persist_directory="./chroma_vectorstore"
)

# Add documents to the vector store
# Note: This will take time on first run
db.add_documents(documents=chunks)

print(f"✓ Vector store created with {len(db.get()['ids'])} embeddings")
print(f"✓ Data persisted to: ./chroma_vectorstore")

Creating vector store...
✓ Vector store created with 900 embeddings
✓ Data persisted to: ./chroma_vectorstore


### **Step 4.3: Verify Vector Store (Optional)**
Let's verify that our vector store is working correctly.

In [6]:
# Check vector store contents
total_docs = len(db.get()["ids"])
print(f"Total documents in vector store: {total_docs}")

# Perform a test similarity search
test_query = "What are the symptoms of Alzheimer's?"
test_results = db.similarity_search(test_query, k=2)

print(f"\nTest search for: '{test_query}'")
print(f"Found {len(test_results)} relevant chunks:")
for i, doc in enumerate(test_results, 1):
    print(f"\n--- Result {i} ---")
    print(doc.page_content[:200] + "...")

Total documents in vector store: 900

Test search for: 'What are the symptoms of Alzheimer's?'
Found 2 relevant chunks:

--- Result 1 ---
Alzheimer's disease (AD) is a neurodegenerative disease and is the most common form of dementia, accounting for around 60–70% of cases. 
The most common early symptom is difficulty in remembering rece...

--- Result 2 ---
Alzheimer's disease (AD) is a neurodegenerative disease and is the most common form of dementia, accounting for around 60–70% of cases. 
The most common early symptom is difficulty in remembering rece...


## **5. Setting Up OpenAI Cloud LLM**

### **Understanding Cloud LLM Benefits**
- **GPT-4/GPT-3.5**: Advanced instruction following
- **Reliability**: Consistent, high-quality responses
- **No Hallucination**: Better at admitting "I don't know"
- **API-Based**: No local GPU or memory requirements

### **Prerequisites**
1. OpenAI API Key (from https://platform.openai.com/api-keys)
2. Save your key in a text file: `openai_api_key.txt`

In [7]:
from langchain_openai import ChatOpenAI

# Read OpenAI API key from text file
print("Loading OpenAI API key...")

try:
    with open("openai_api_key.txt", "r") as f:
        OPENAI_API_KEY = f.read().strip()
    print(" API key loaded successfully")
except FileNotFoundError:
    print(" Error: openai_api_key.txt not found!")
    print("Please create a file named 'openai_api_key.txt' with your OpenAI API key")
    raise

# Initialize OpenAI Chat Model
llm = ChatOpenAI(
    api_key=OPENAI_API_KEY,
    model="gpt-3.5-turbo",  # Use "gpt-4" for better quality, "gpt-3.5-turbo" for speed
    temperature=0.1,  # Low temperature for factual responses
    max_tokens=512
)

print(" OpenAI GPT-3.5-Turbo model initialized successfully!")
print("\nModel Configuration:")
print(f"  - Model: gpt-3.5-turbo")
print(f"  - Temperature: 0.1 (factual)")
print(f"  - Max Tokens: 512")

Loading OpenAI API key...
 API key loaded successfully
 OpenAI GPT-3.5-Turbo model initialized successfully!

Model Configuration:
  - Model: gpt-3.5-turbo
  - Temperature: 0.1 (factual)
  - Max Tokens: 512


In [8]:
# Test the LLM with a simple query
print("Testing OpenAI LLM...")
print("-" * 60)

test_prompt = "Answer in one sentence: What is 2+2?"
test_response = llm.invoke(test_prompt)

print(f"Test Question: What is 2+2?")
print(f"Response: {test_response.content}")
print()

print("-" * 60)
print("✓ OpenAI LLM is working! Ready for RAG chain.")

Testing OpenAI LLM...
------------------------------------------------------------


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

## **6. Building the RAG Chain with LCEL**

### **Step 6.1: Load Existing Vector Store**
If you've already created the vector store, you can load it directly.

In [None]:
from langchain_chroma import Chroma

# Load existing vector store
db = Chroma(
    collection_name="alzheimers_knowledge_base",
    embedding_function=embedding_model,
    persist_directory="./chroma_vectorstore"
)

print(f"✓ Loaded vector store with {len(db.get()['ids'])} documents")

### **Step 6.2: Create Retriever with MMR Search**
We'll use Maximal Marginal Relevance (MMR) for diverse retrieval.

**MMR Benefits:**
- Reduces redundancy in retrieved chunks
- Increases diversity of information
- Better coverage of the topic

In [None]:
# Create retriever with MMR search
retriever = db.as_retriever(
    search_type="mmr",  # Maximal Marginal Relevance
    search_kwargs={
        "k": 4,  # Return top 4 chunks
        "fetch_k": 10,  # Fetch 10 candidates before MMR reranking
        "lambda_mult": 0.5  # Balance between relevance and diversity (0.5 = balanced)
    }
)

print("✓ Retriever created with MMR search")

# Test retriever
test_docs = retriever.invoke("What causes Alzheimer's disease?")
print(f"✓ Retrieved {len(test_docs)} documents for test query")

### **Step 6.3: Create Prompt Template**
We'll design a prompt that instructs the LLM to answer based only on context.

**OpenAI models are excellent at following instructions, so this prompt works well.**

In [None]:
from langchain_core.prompts import ChatPromptTemplate

PROMPT_TEMPLATE = """You are an AI assistant specialized in answering questions about Alzheimer's disease.

Context Information:
{context}

Question: {question}

Instructions:
- Answer the question based ONLY on the context provided above
- If the answer is not in the context, respond with "I don't know based on the provided information"
- Be concise and accurate
- Do not make up information
- Do not mention "according to the context" in your answer

Answer:"""

prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)

print("✓ Prompt template created")

### **Step 6.4: Create Helper Function**
Format retrieved documents into a single context string.

In [None]:
def format_docs(docs):
    """
    Format retrieved documents into a single string.
    
    Args:
        docs: List of retrieved documents
    
    Returns:
        str: Formatted context string
    """
    return "\n\n".join(doc.page_content for doc in docs)

print("✓ Helper function defined")

### **Step 6.5: Initialize Output Parser**
Use `StrOutputParser` to parse the LLM output to a string.

In [None]:
from langchain_core.output_parsers import StrOutputParser

# Initialize output parser
output_parser = StrOutputParser()

print("✓ Output parser initialized")

### **Step 6.6: Build the RAG Chain using LCEL**
Now we'll assemble all components into a single RAG chain using LangChain Expression Language.

**Chain Structure:**
1. **Input**: User question
2. **Retrieval**: Get relevant chunks (context)
3. **Format**: Combine context and question into prompt
4. **Generate**: OpenAI LLM generates answer
5. **Parse**: Extract string output

In [None]:
from langchain_core.runnables import RunnablePassthrough

# Build the RAG chain using LCEL
rag_chain = (
    {
        "context": retriever | format_docs,  # Retrieve and format documents
        "question": RunnablePassthrough()     # Pass through the question as-is
    }
    | prompt_template    # Format into prompt
    | llm                # Generate response with OpenAI
    | output_parser      # Parse to string
)

print("✓ RAG chain assembled successfully!")
print("\nChain components:")
print("  1. Retriever (MMR) → formats context")
print("  2. RunnablePassthrough → passes question")
print("  3. Prompt Template → combines context + question")
print("  4. OpenAI LLM → generates answer")
print("  5. Output Parser → extracts string")

## **7. Querying the System**

### **Test the RAG Pipeline**
Let's test our RAG system with various questions about Alzheimer's disease.

In [None]:
# Example Query 1: Basic information
query_1 = "What is Alzheimer's disease?"

print(f"Question: {query_1}")
print("\nProcessing...")
response_1 = rag_chain.invoke(query_1)
print(f"\nAnswer: {response_1}")

In [None]:
# Example Query 2: Symptoms
query_2 = "What are the early symptoms of Alzheimer's?"

print(f"Question: {query_2}")
print("\nProcessing...")
response_2 = rag_chain.invoke(query_2)
print(f"\nAnswer: {response_2}")

In [None]:
# Example Query 3: Causes
query_3 = "What causes Alzheimer's disease?"

print(f"Question: {query_3}")
print("\nProcessing...")
response_3 = rag_chain.invoke(query_3)
print(f"\nAnswer: {response_3}")

In [None]:
# Example Query 4: Diagnosis
query_4 = "How is Alzheimer's disease diagnosed?"

print(f"Question: {query_4}")
print("\nProcessing...")
response_4 = rag_chain.invoke(query_4)
print(f"\nAnswer: {response_4}")

In [None]:
# Example Query 5: Prevention
query_5 = "Can Alzheimer's disease be prevented?"

print(f"Question: {query_5}")
print("\nProcessing...")
response_5 = rag_chain.invoke(query_5)
print(f"\nAnswer: {response_5}")

In [None]:
# Example Query 6: Out-of-context question (should say "I don't know")
query_6 = "What is the treatment for diabetes?"

print(f"Question: {query_6}")
print("\nProcessing...")
response_6 = rag_chain.invoke(query_6)
print(f"\nAnswer: {response_6}")
print("\n✓ OpenAI should correctly say 'I don't know' for non-Alzheimer's questions")

# RAG System Execution Summary

## Steps in Order of Execution

### Step 1: Install Dependencies
Install required packages (langchain, langchain-openai, etc.)

### Step 2: Load Documents
Load `.txt` files from `data/` folder using DirectoryLoader and TextLoader

### Step 3: Split Documents into Chunks
Use RecursiveCharacterTextSplitter (chunk_size=500, chunk_overlap=50)

### Step 4: Create Embeddings
Initialize HuggingFaceEmbeddings with BAAI/bge-base-en-v1.5 (~400MB download)

### Step 5: Create Vector Store
Initialize Chroma with persistence and add document chunks

### Step 6: Setup API Key
Create `openai_api_key.txt` file with your OpenAI API key

### Step 7: Load OpenAI LLM
Initialize ChatOpenAI with gpt-3.5-turbo model (cloud-based)

### Step 8: Create Retriever
Create retriever from Chroma with MMR search (k=4)

### Step 9: Create Prompt Template
Define template with context and question placeholders

### Step 10: Define Helper Function
Create format_docs() to join retrieved documents

### Step 11: Initialize Output Parser
Create StrOutputParser instance

### Step 12: Build RAG Chain
Assemble chain using LCEL: retriever -> prompt -> llm -> parser

### Step 13: Query the System
Use rag_chain.invoke(question) to get answers

---

## Component Flow
User Question -> Retriever -> format_docs -> Prompt -> OpenAI LLM -> Parser -> Answer

