
# Session 3 Hands‑On — Build a Mini RAG Chatbot (Local Flan‑T5, No Streamlit)

In this session, we will learn how to **retrieve and generate answers** using a local language model (Flan‑T5 Small).  
We’ll use the sample **college guide text** as our dataset and build a simple RAG (Retrieval‑Augmented Generation) chatbot that works directly inside this notebook.

---

## Learning Objectives
By the end of this section, you will be able to:
- Load and reuse precomputed **embeddings and FAISS index**
- Understand how **retrieval** works in a RAG pipeline
- Use a **local model (Flan‑T5)** for text generation
- Combine both into a **chat‑like notebook interface**



## 🔹 What Is RAG?

**Retrieval‑Augmented Generation (RAG)** combines two parts:
1. **Retriever** — finds the most relevant passages (chunks) for a given query.
2. **Generator** — uses these retrieved texts to form a coherent answer.

Think of it as an **open‑book exam**:
- The retriever is the student flipping through notes 📚.
- The generator is the student writing the answer ✍️.

This makes AI more accurate and less “hallucinative.”



## 0) Setup (Run once)
This installs all dependencies for embeddings, retrieval, generation, and the Streamlit chat UI.

In [1]:
# Force Transformers to use PyTorch backend only
import os
os.environ['TRANSFORMERS_NO_TF'] = '1'
os.environ['USE_TF'] = '0'
print("Set TRANSFORMERS_NO_TF and USE_TF")


Set TRANSFORMERS_NO_TF and USE_TF


In [2]:

# Install runtime deps (CPU-friendly)
!pip -q install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
!pip -q install sentence-transformers faiss-cpu transformers accelerate streamlit pdfminer.six



[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: C:\Users\varsh\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip

[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: C:\Users\varsh\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [2]:

import os, json, textwrap, time, pathlib
import numpy as np
from typing import List, Dict

from sentence_transformers import SentenceTransformer
import faiss

from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
from openai import OpenAI

print("Imports ready")


  from .autonotebook import tqdm as notebook_tqdm


Imports ready



## 2) Chunk the document
RAG works best when documents are split into small, overlapping chunks.
We'll use a simple fixed‑size chunker (you can replace with smarter chunkers later).


In [None]:

def chunk_text(text: str, chunk_size: int = 400, overlap: int = 60) -> List[Dict]:
    words = text.split()
    chunks = []
    start = 0
    idx = 0
    while start < len(words):
        end = min(len(words), start + chunk_size)
        chunk_words = words[start:end]
        chunk = " ".join(chunk_words)
        chunks.append({"id": idx, "text": chunk})
        if end == len(words):
            break
        start = end - overlap
        idx += 1
    return chunks

with open("data/college_guide.txt","r",encoding="utf-8") as f:
    raw_text = f.read()

chunks = #TODO: Chunk the text by passing appropriate arguments
#Keep chunk_size - 120 and overlap - 20
len(chunks), chunks[0]


(2,
 {'id': 0,
  'text': 'SRM College Guide (Sample) Admissions: - Application opens in January; last date is March 31. - Entrance exams are conducted in April; results by May 15. Library: - Open Mon–Sat, 8:00 AM – 8:00 PM; Sun, 10:00 AM – 4:00 PM. - Late fees: ₹2 per day for overdue books. - Digital library: Use your college email to access e‑journals. Attendance & Exams: - Minimum attendance: 75% per course. - Internal assessments: 2 tests + 1 assignment per semester. - End‑semester exams: 60% of total grade. - Make‑up exams allowed only with medical certificate. Labs & Wi‑Fi: - Labs open 9:00 AM – 6:00 PM on weekdays. - Wi‑Fi: Connect to "SRM‑Campus"; login with student ID. Clubs & Events: -'})


## 3) Build Embeddings + FAISS Index
We'll use `all-MiniLM-L6-v2` (fast and accurate).


In [None]:

EMBED_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
embedder = SentenceTransformer("")#TODO: Load the embedding model

texts = [c["text"] for c in chunks]
emb = embedder.encode(texts, convert_to_numpy=True, normalize_embeddings=True)

dim = emb.shape[1]
index = faiss.IndexFlatIP(dim)  # cosine sim if vectors normalized
index.add(emb)

np.save("data/embeddings.npy", emb)
import json
with open("data/chunks.json","w",encoding="utf-8") as f:
    json.dump(chunks, f, ensure_ascii=False, indent=2)

print(f"Indexed {len(chunks)} chunks with dim={dim}")


Indexed 2 chunks with dim=384


## 4) Retrieval function
Retrieval finds the most relevant text chunks for a given question using **cosine similarity** between embeddings.

In [None]:
def retrieve(query: str, k: int = 4):
    q = embedder.encode([query], convert_to_numpy=True, normalize_embeddings=True)
    scores, idxs = index.search(q, k)
    results = []
    for score, idx_ in zip(scores[0], idxs[0]):
        payload = chunks[int(idx_)]
        results.append({"score": float(score), "id": payload["id"], "text": payload["text"]})
    return results

retrieve("What are library timings?")


[{'score': 0.20035353302955627,
  'id': 0,
  'text': 'SRM College Guide (Sample) Admissions: - Application opens in January; last date is March 31. - Entrance exams are conducted in April; results by May 15. Library: - Open Mon–Sat, 8:00 AM – 8:00 PM; Sun, 10:00 AM – 4:00 PM. - Late fees: ₹2 per day for overdue books. - Digital library: Use your college email to access e‑journals. Attendance & Exams: - Minimum attendance: 75% per course. - Internal assessments: 2 tests + 1 assignment per semester. - End‑semester exams: 60% of total grade. - Make‑up exams allowed only with medical certificate. Labs & Wi‑Fi: - Labs open 9:00 AM – 6:00 PM on weekdays. - Wi‑Fi: Connect to "SRM‑Campus"; login with student ID. Clubs & Events: -'},
 {'score': 0.15765821933746338,
  'id': 1,
  'text': '9:00 AM – 6:00 PM on weekdays. - Wi‑Fi: Connect to "SRM‑Campus"; login with student ID. Clubs & Events: - Tech clubs: AI/ML Club (Fri 5 PM), Robotics Club (Wed 4 PM). - Cultural: Music Club (Tue 5 PM), Drama Soc


## 5) Load the Local Text‑Generation Model

We’ll use **Flan‑T5 Small**, a lightweight open model that runs offline.


In [None]:

from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM

MODEL = "google/flan-t5-small"


tokenizer = AutoTokenizer.from_pretrained()# TODO: Load tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained()# TODO: Load model

gen = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_new_tokens=256)

print("Local model loaded successfully")


Device set to use cpu


Local model loaded successfully



## 6)  Compose the RAG Prompt

The **prompt** combines:
- The user’s **question**
- The **retrieved context** from the FAISS index

This helps the model focus only on factual data from our dataset.


In [None]:
#TODO: Define a prompt template and provide appropriate instructions for the model to work
PROMPT_TEMPLATE = """Your intruction foes here.......

Question: {question}

Context:
{context}
"""
print("Prompt template ready")


Prompt template ready



## Combine Everything: The RAG Answer Function

This function retrieves context, builds a prompt, and generates an answer using Flan‑T5.


In [8]:

def rag_answer(query, k=1):
    # Retrieve top‑k chunks (list of dicts: {'score','id','text'})
    retrieved = retrieve(query, k)
    ctx = "\n\n".join([item["text"] for item in retrieved])

    # Build prompt
    prompt = PROMPT_TEMPLATE.format(question=query, context=ctx)

    # Generate response
    raw = gen(prompt)[0]["generated_text"].strip()
    one_line = raw.splitlines()[0]

    return one_line, retrieved

# Quick test
rag_answer("What is the minimum attendance requirement?")[0]


'75% per course.'


## Step 6 — Interactive Chat Session

Now let’s turn it into a simple chat loop inside the notebook!
Type questions and see how your model responds using the college dataset.


In [10]:

print("Mini RAG Chat (type 'exit' to quit)\n")

while True:
    q = input("Ask a question: ")
    if q.lower() in ["exit", "quit"]:
        print("👋 Exiting chat.")
        break

    answer, ctx = rag_answer(q)

    print("\n🤖 Answer:", answer)
    print("\n📚 Contexts used:")
    for item in ctx:  # list of dicts {'score','id','text'}
        print(f"  • (score {item['score']:.3f}) {item['text'][:90]}...")
    print("-" * 80)


Mini RAG Chat (type 'exit' to quit)


🤖 Answer: 75% per course.

📚 Contexts used:
  • (score 0.352) SRM College Guide (Sample) Admissions: - Application opens in January; last date is March ...
--------------------------------------------------------------------------------

🤖 Answer: January; last date is March 31. - Entrance exams are conducted in April; results by May 15. Library: - Open Mon–Sat, 8:00 AM – 8:00 PM; Sun, 10:00 AM – 4:00 PM. - Late fees: 2 per day for overdue books. - Digital library: Use your college email to access ejournals. Attendance & Exams: - Minimum attendance: 75% per course. - Internal assessments: 2 tests + 1 assignment per semester. - Endsemester exams: 60% of total grade. - Makeup exams allowed only with medical certificate. Labs & WiFi: - Labs open 9:00 AM – 6:00 PM on weekdays. - WiFi: Connect to "SRMCampus"; login with student ID. Clubs & Events:

📚 Contexts used:
  • (score 0.339) SRM College Guide (Sample) Admissions: - Application opens in January; 

## Mini Challenges

1) **Add content:** Open `data/college_guide.txt` and add 3–5 new bullet points (e.g., canteen timings, sports complex rules). Re‑run *steps 2–3* to rebuild chunks and index.  
2) **Tune retrieval:** Change `Top‑K`, try 2, 4, 6. When do answers improve?  
3) **Guardrails:** In `PROMPT_TEMPLATE`, add a rule: *“If you are not sure, ask a follow‑up question.”* Does behavior change?  
4) **Model swap:** Replace `flan‑t5-small` with openai or other powerfull models using API.  


## Recall

You have now built a fully functional **Retrieval‑Augmented Generation (RAG)** chatbot using your own dataset.

**Pipeline recap:**
```
Dataset (college_guide.txt)
     ↓
Chunks → Embeddings → FAISS Index
     ↓
Retriever → Prompt Composer → Generator
     ↓
Interactive Q&A Assistant
```

**Key takeaways:**
- RAG helps keep LLMs factual by grounding answers in real data.
- Retrieval quality = context quality → better answers.
- This same method powers modern chatbots and assistants today.
