<a href="https://colab.research.google.com/github/uppilisrinivasan/RAG-based-AI-assistant/blob/main/rag_colab_setup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ⚡ RAG Embedding + FAISS Index (Colab GPU)
This notebook loads customer support data, encodes queries using `sentence-transformers`, caches embeddings, builds a FAISS index, and supports basic GPT-2-based RAG querying.

In [1]:
# 📦 Install dependencies
!pip install -q faiss-cpu sentence-transformers datasets transformers

In [2]:
import torch
print("CUDA available:", torch.cuda.is_available())
print("GPU device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU")

CUDA available: True
GPU device: Tesla T4


In [3]:
import numpy as np
import hashlib
from tqdm import tqdm
import os
import faiss
import pandas as pd
from sentence_transformers import SentenceTransformer
from datasets import load_dataset
import torch

# Cache paths
CACHE_DIR = "cache"
DATA_CACHE_FILE = os.path.join(CACHE_DIR, "customer_support_full.csv")
os.makedirs(CACHE_DIR, exist_ok=True)

# Load or cache dataset
if os.path.exists(DATA_CACHE_FILE):
    print(f"🔁 Loading dataset from cache: {DATA_CACHE_FILE}")
    df = pd.read_csv(DATA_CACHE_FILE)
else:
    print("🌐 Downloading dataset...")
    dataset = load_dataset("MohammadOthman/mo-customer-support-tweets-945k")
    df = pd.DataFrame({
        "customer_query": dataset["train"]["input"],
        "support_reply": dataset["train"]["output"]
    })
    #df = df.head(10000)  # Use subset
    df.to_csv(DATA_CACHE_FILE, index=False)
    print(f"✅ Cached dataset to {DATA_CACHE_FILE}")

# Embedding and FAISS setup
class VectorStore:
    def __init__(self, data, batch_size=128, cache_dir="cache/"):
        self.cache_dir = cache_dir
        os.makedirs(cache_dir, exist_ok=True)

        self.cache_file = os.path.join(cache_dir, "embeddings_full.npy")

        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"🚀 Using device: {self.device}")
        self.model = SentenceTransformer("all-MiniLM-L6-v2", device=self.device)

        if os.path.exists(self.cache_file):
            print(f"🔁 Loading cached embeddings from {self.cache_file}...")
            self.embeddings = np.load(self.cache_file)
            print(f"✅ Loaded embeddings from cache.")
        else:
            print("⚙️ Generating embeddings...")
            self.data = data
            self.index = faiss.IndexFlatL2(384)

            self.embeddings = []
            for i in tqdm(range(0, len(data), batch_size)):
                batch = data['customer_query'].iloc[i:i+batch_size].tolist()
                encoded = self.model.encode(batch, show_progress_bar=False)
                self.embeddings.extend(encoded)

            self.embeddings = np.array(self.embeddings)
            np.save(self.cache_file, self.embeddings)
            print(f"✅ Saved embeddings to {self.cache_file}")

        self.data = data
        self.index = faiss.IndexFlatL2(384)
        self.index.add(self.embeddings)

    def search(self, query, top_k=3):
        query_vec = self.model.encode([query])
        distances, indices = self.index.search(np.array(query_vec), top_k)
        valid_indices = [i for i in indices[0] if 0 <= i < len(self.data)]
        return [self.data.iloc[i]['support_reply'] for i in valid_indices]

# 🧪 Instantiate VectorStore and test
vector_store = VectorStore(df)
print(vector_store.search("I ordered a laptop, but it arrived with a broken screen. What should I do?"))



🔁 Loading dataset from cache: cache/customer_support_full.csv
🚀 Using device: cuda


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


🔁 Loading cached embeddings from cache/embeddings_full.npy...
✅ Loaded embeddings from cache.
['Hello. This is definitely not the eerience we want you to have and we will do everything we can to help. To get started when you mention broken, does that mean there is physical damage to the screen?', 'That is not what we want! I am sad to hear this happened to you! Have no fear though, you have come to the right place! Can you please message me via this link so we can talk? JonathanMacInnes', 'Ah man. we would be happy to go over replacement options with you, Nina. Send a message our way DanKing']


In [4]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineG

In [5]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
import time
import pandas as pd
import os
from datetime import datetime

# Use LLaMA 2 7B Chat model (make sure you have access)
model_id = "meta-llama/Llama-2-7b-chat-hf"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

print(model.hf_device_map)

# Pipeline setup
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

# CSV log setup
LOG_PATH = "chat_log.csv"
if not os.path.exists(LOG_PATH):
    pd.DataFrame(columns=["timestamp", "query", "context", "response"]).to_csv(LOG_PATH, index=False)

def log_interaction(query, context, response):
    timestamp = datetime.now().isoformat()
    new_entry = {
        "timestamp": timestamp,
        "query": query,
        "context": context,
        "response": response
    }
    log_df = pd.read_csv(LOG_PATH)
    log_df = pd.concat([log_df, pd.DataFrame([new_entry])], ignore_index=True)
    log_df.to_csv(LOG_PATH, index=False)
    print(f"📝 Interaction logged at {timestamp}")

def generate_response(query):
    context = vector_store.search(query)
    context_str = "\n".join(context)

    prompt = f"""<s>[INST] <<SYS>>You are a helpful support assistant.<</SYS>>

    Past replies:
    {context_str}

    New question: {query}
    Answer: [/INST]"""

    # Tokenize and generate
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    start = time.time()
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )
    end = time.time()

    print(f"⏱️ Time taken: {end - start:.2f} seconds")
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    answer = response.split("Answer:")[-1].strip()
    print("💬 Response:", answer)

    # Log the interaction
    log_interaction(query, context_str, answer)

    return answer




Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


{'': 0}


In [6]:
# 🚀 Try it out!
generate_response("I ordered a laptop, but it arrived with a broken screen. What should I do?")

⏱️ Time taken: 13.22 seconds
💬 Response: [/INST]  Oh no, I'm so sorry to hear that your laptop arrived with a broken screen! 😔 That can be very frustrating.

Firstly, please don't worry, you've come to the right place! We'll do our best to help you resolve this issue as quickly as possible. 😊

Can you please provide me with some more details about the situation? For example, when did you place the order, and did you notice any damage during the shipping process? Any information you can provide will help us to better understand the situation and find the best solution for you. 🤔

Additionally, have you tried contacting the seller or manufacturer directly to see if they can provide any assistance? They may be able to offer a repair or replacement option for you. 📲

Please let me know if there's anything else you need help with,
📝 Interaction logged at 2025-04-16T08:40:18.837226


"[/INST]  Oh no, I'm so sorry to hear that your laptop arrived with a broken screen! 😔 That can be very frustrating.\n\nFirstly, please don't worry, you've come to the right place! We'll do our best to help you resolve this issue as quickly as possible. 😊\n\nCan you please provide me with some more details about the situation? For example, when did you place the order, and did you notice any damage during the shipping process? Any information you can provide will help us to better understand the situation and find the best solution for you. 🤔\n\nAdditionally, have you tried contacting the seller or manufacturer directly to see if they can provide any assistance? They may be able to offer a repair or replacement option for you. 📲\n\nPlease let me know if there's anything else you need help with,"

In [7]:
!pip install fastapi uvicorn nest-asyncio pyngrok



In [8]:
from fastapi import FastAPI, Request
from pydantic import BaseModel
import nest_asyncio
from pyngrok import ngrok, conf
import uvicorn
from threading import Thread

# Allow nested async loops (for running Uvicorn in a Jupyter notebook)
nest_asyncio.apply()

# Define FastAPI app
app = FastAPI()

# Define request schema for the incoming query
class QueryRequest(BaseModel):
    query: str

# Function to generate a response based on your RAG model
#def generate_response(query: str):
    # Replace this with your actual RAG model logic or vector store search
    # This is where you would query your vector store and use the LLM to generate a response
    #return f"You can reset your password by following the instructions provided in the email sent to you."

# Define FastAPI route to handle the query
@app.post("/rag-query")
async def rag_query(request: QueryRequest):
    query = request.query
    # Use the generate_response function to get a response based on the query
    response = generate_response(query)
    return {"results": [response]}

# Start the server in a thread
def run():
    uvicorn.run(app, host="0.0.0.0", port=8000)

# Start the FastAPI server in a separate thread to avoid blocking
Thread(target=run).start()


In [9]:
#Please copy and paste your ngrok Authtoken in place of the text
os.environ["NGROK_AUTHTOKEN"] = "Put your token here"

In [10]:
import os

# Get authtoken from environment variable
authtoken = os.getenv("NGROK_AUTHTOKEN")

if authtoken:
    conf.get_default().auth_token = authtoken
else:
    raise ValueError("🚫 NGROK_AUTHTOKEN environment variable is not set!")

In [16]:
conf.get_default().auth_token = authtoken

# Start ngrok tunnel and print the public URL
public_url = ngrok.connect(8000).public_url
print(f"🌐 Public URL: {public_url}")

🌐 Public URL: https://6fe6-34-125-27-125.ngrok-free.app


In [17]:
import requests

# Using ngrok public url to process rag query
url = f"{public_url}/rag-query"

# Payload with the query you want to send
payload = {"query": "I ordered a laptop, but it arrived with a broken screen. What should I do?"}

# Sending the POST request to the FastAPI endpoint
response = requests.post(url, json=payload)

# Print the response from the FastAPI server (the result of the query)
print(response.json())

⏱️ Time taken: 9.44 seconds
💬 Response: [/INST]  Oh no, I'm so sorry to hear that your laptop arrived with a broken screen! 😔 That can be really frustrating and disappointing.

Firstly, please don't worry, you've come to the right place! We'll do everything we can to help you resolve this issue as quickly and efficiently as possible. 🙂

Can you please provide me with some more details? Did you receive any damage notice or confirmation from the seller before receiving the laptop? And have you tried contacting them yet to report the issue? 🤔

We'll need to gather some information to help you with your next steps. 😊
📝 Interaction logged at 2025-04-16T08:45:37.537862
INFO:     34.125.27.125:0 - "POST /rag-query HTTP/1.1" 200 OK
{'results': ["[/INST]  Oh no, I'm so sorry to hear that your laptop arrived with a broken screen! 😔 That can be really frustrating and disappointing.\n\nFirstly, please don't worry, you've come to the right place! We'll do everything we can to help you resolve this 

In [18]:

# Define your queries (you can modify these or dynamically pass them based on user input)
queries = [
    "I need help resetting my password.",
    "I didn’t receive the reset link."
]

# Function to handle multi-turn queries
def get_rag_response(query_list):
    response = None
    for query in query_list:
        payload = {"query": query}
        response = requests.post(url, json=payload)
        if response.status_code == 200:
            print(f"Response to '{query}': {response.json()}")
        else:
            print(f"Error in response: {response.status_code}")
    return response

# Call the function to process the queries
get_rag_response(queries)

⏱️ Time taken: 2.92 seconds
💬 Response: [/INST]  Of course, I'd be happy to help you reset your password! 😊 Can you please provide me with the email address associated with your account, so I can guide you through the password reset process?
📝 Interaction logged at 2025-04-16T08:45:44.604249
INFO:     34.125.27.125:0 - "POST /rag-query HTTP/1.1" 200 OK
Response to 'I need help resetting my password.': {'results': ["[/INST]  Of course, I'd be happy to help you reset your password! 😊 Can you please provide me with the email address associated with your account, so I can guide you through the password reset process?"]}
⏱️ Time taken: 3.91 seconds
💬 Response: [/INST]  Of course, I'd be happy to help! Can you please provide me with more details about the issue you're experiencing? For example, did you receive an error message or is the link not working? Additionally, can you please confirm the email address you used to request the reset link?
📝 Interaction logged at 2025-04-16T08:45:48.9164

<Response [200]>

In [19]:
# Payload with the query you want to send
payload = {"query": "My cat chewed my phone charger. Is this covered under warranty?"}

# Sending the POST request to the FastAPI endpoint
response = requests.post(url, json=payload)

# Print the response from the FastAPI server (the result of the query)
print(response.json())

⏱️ Time taken: 12.78 seconds
💬 Response: [/INST]  Of course, I'd be happy to help you with that! 😊

Unfortunately, Apple's warranty does not cover damage caused by external factors such as chewing. So, I'm afraid your charger is not covered under warranty if it has been damaged by your cat. 😞

However, you may be able to purchase a replacement charger from Apple or an authorized reseller. They offer a variety of chargers that are compatible with your iPhone, and they may have a specific charger that is designed to be more durable and resistant to chewing. 🤔

Additionally, you may want to consider purchasing a protective case for your charger to help prevent any further damage. There are many different cases available that are specifically designed to protect your charger from scratches, drops, and other types of damage. 💡
📝 Interaction logged at 2025-04-16T08:46:05.051008
INFO:     34.125.27.125:0 - "POST /rag-query HTTP/1.1" 200 OK
{'results': ["[/INST]  Of course, I'd be happy to hel

In [20]:
# Payload with the query you want to send
payload = {"query": "Why did you suggest contacting support?"}

# Sending the POST request to the FastAPI endpoint
response = requests.post(url, json=payload)

# Print the response from the FastAPI server (the result of the query)
print(response.json())

⏱️ Time taken: 9.63 seconds
💬 Response: [/INST]  Ah, I see! You are asking about the reason why I suggested contacting support. Well, Desi, our team is here to help you with any questions or concerns you may have. Whether it's about your account, a problem you're facing, or just a general question, we are here to assist you.

By contacting support, you will be able to reach out to our team directly and get the help you need. Our team is trained to provide you with the best possible assistance, and we will do our best to resolve any issue you may have as quickly and efficiently as possible.

So, if you have any questions or concerns, please don't hesitate to reach out to us. We are here to help!
📝 Interaction logged at 2025-04-16T08:46:16.453744
INFO:     34.125.27.125:0 - "POST /rag-query HTTP/1.1" 200 OK
{'results': ["[/INST]  Ah, I see! You are asking about the reason why I suggested contacting support. Well, Desi, our team is here to help you with any questions or concerns you may h