## Task: Build a Campus FAQ Chatbot using RAG

### Objective:
Learn how Retrieval-Augmented Generation (RAG) works by building a small chatbot that answers questions about your college using vector embeddings and a mini vector database.

#### Step 0: Setup

1. Install required packages:

In [1]:
pip install streamlit sentence-transformers faiss-cpu numpy

Note: you may need to restart the kernel to use updated packages.


#### Step 1: Prepare the Data

Task: Create a small FAQ dataset with at least 5 Q&A pairs.
Example:

Q: When does the library open?
A: The library opens at 8 AM and closes at 8 PM.

In [2]:
faq_text = """
Q: When does the library open?
A: The library opens at 8 AM and closes at 8 PM.

Q: How can I access my class schedule?
A: You can view your class schedule through the student portal using your login credentials.

Q: Where do I go if I lose my student ID?
A: You should report to the Student Services Office to request a replacement student ID card.

Q: What is the process for borrowing textbooks?
A: Textbooks can be borrowed from the library by scanning your student ID at the circulation desk. The standard loan period is two weeks.

Q: How can I join a school club or organization?
A: You can sign up during the student fair at the start of the semester or visit the Student Activities Office for more information.

Q: What should I do if I miss an exam?
A: You should immediately contact your course instructor and provide valid documentation to request a makeup exam.
"""

Checkpoint:

Students should have a list of questions and answers ready.

#### Step 2: Split Text into Chunks

Task: Split your FAQ into separate lines to treat each Q&A as a chunk.

In [3]:
lines = [line.strip() for line in faq_text.split("\n") if line.strip()]


Checkpoint:

Ensure each Q&A is a separate element in a Python list.

#### Step 3: Create Embeddings

Task: Convert each line to a vector using SentenceTransformer.

In [4]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(lines)

#### Step 4: Build the FAISS Index

Task: Store all embeddings in a FAISS vector database.

In [5]:
import faiss
import numpy as np

dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))

#### Step 5: Query the Database
Task: Take a user question, convert it to a vector, and find the most relevant FAQ line.

In [8]:
user_question = "What is the address of the school"
q_emb = model.encode([user_question])
D, I = index.search(np.array(q_emb), k=1)
q_index = I[0][0]
answer = lines[q_index + 1] if q_index + 1 < len(lines) else "Answer not found."

print("Answer:", answer)

Answer: A: You should report to the Student Services Office to request a replacement student ID card.


#### Step 6: Make it Interactive with Streamlit
Task: Use Streamlit to create a simple chatbot UI.

In [7]:
import streamlit as st

st.title("Campus FAQ Chatbot")
user_question = st.text_input("Ask your question:")
if user_question:
    q_emb = model.encode([user_question])
    D, I = index.search(np.array(q_emb), k=1)
    st.write("Answer:", lines[I[0][0]])


2025-10-01 13:29:39.765 
  command:

    streamlit run /opt/anaconda3/envs/nlp-env/lib/python3.12/site-packages/ipykernel_launcher.py [ARGUMENTS]
2025-10-01 13:29:39.767 Session state does not function when running a script without `streamlit run`


#### Step 7: Reflection

Questions for students:

How does the chatbot “understand” the question?
- The chatbot understands the question by turning both the FAQ and the user’s question into vector embeddings using SentenceTransformers, then using the vector database FAISS to find and retrieve the closest match.

What happens if the user asks something not in the FAQ?
- If the question is not in the FAQ, it will return the nearest match and give a wrong or irrelevant answer.

How could you improve this system to handle more questions or longer documents?
- The system could be improved by adding more FAQs and setting a similarity threshold so when the match is low it can reply "Sorry, I don't have the answer" rather than providing the wrong answer.