# Smart Chatbot: A Hybrid Chatbot with Knowledge Retrieval

## Cell 1: Install Required Libraries
Installs Hugging Face Transformers for LLMs, Gradio for web UI, Sentence-Transformers for embeddings, and FAISS for vector search (RAG).

In [1]:
!pip install -q transformers accelerate gradio sentence-transformers faiss-cpu

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m52.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


## Cell 2: Import Modules
Import all necessary libraries for transformers, embeddings, FAISS, Gradio UI, and PyTorch

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from sentence_transformers import SentenceTransformer
import faiss
import gradio as gr
import torch
import numpy as np

## Cell 3: Load Language Model and tokenizer
This model handles user prompts and generates replies.

In [3]:
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
chat_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cuda:0


## Cell 4: Setup RAG - Load Embedding Model & Vector Store

Load embedding model and create FAISS index to store and search document embeddings. Also initialize a list to keep original texts.

In [4]:
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
index = faiss.IndexFlatL2(384)
doc_texts = []

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Cell 5: Function to Upload and Process Documents
This function takes a text document, embeds it using the SentenceTransformer, and stores it in FAISS.

In [5]:
def add_document(text):
    global doc_texts
    doc_texts.append(text)
    embedding = embedding_model.encode([text])
    index.add(embedding)
    return

This function resets both the document list and FAISS index.

In [6]:
def clear_documents():
    global doc_texts, index
    doc_texts = []
    index = faiss.IndexFlatL2(384)
    return

## Cell 6: RAG-based Chat Function
Main function to handle user input. If documents are available, it retrieves the most relevant one.
Based on similarity, it decides whether to include the document in the prompt.

In [7]:
def chat_with_model(message, history):
    if len(doc_texts) > 0 and index.ntotal > 0:
        query_embedding = embedding_model.encode([message])
        D, I = index.search(query_embedding, k=1)

        # FAISS returns squared L2 distance — convert to similarity
        distance = D[0][0]
        similarity = 1 / (1 + distance)  # normalize (pseudo-cosine)

        if similarity > 0.6:
            context = doc_texts[I[0][0]]
            prompt = f"<|user|>\nContext: {context}\nQuestion: {message}\n<|assistant|>\n"
        else:
            context = None
            prompt = f"<|user|>\n{message}\n<|assistant|>\n"
    else:
        prompt = f"<|user|>\n{message}\n<|assistant|>\n"

    # Generate
    response = chat_pipeline(prompt, max_new_tokens=200, do_sample=True, temperature=0.7)[0]["generated_text"]
    reply = response.split("<|assistant|>")[-1].strip()

    history.append((f"You: {message}", f"Bot: {reply}"))
    return history, history

## Cell 7: Gradio UI for Chatbot
Use Gradio Blocks to create a simple web interface where users can upload text and chat with the bot.

In [8]:
# UI with Gradio
with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("# 🤖 Smart Chatbot \nAsk questions or upload documents!")
    chatbot = gr.Chatbot(label="Chat History")
    msg = gr.Textbox(label="Your Message", placeholder="Ask me anything!", lines=1)
    upload = gr.Textbox(label="Paste document here (optional)", lines=5)
    state = gr.State([])

    upload.submit(lambda doc: add_document(doc), upload, upload)
    clear_btn = gr.Button("🧹 Clear All Documents")
    clear_btn.click(fn=clear_documents, outputs=upload)
    msg.submit(chat_with_model, [msg, state], [chatbot, state])

demo.launch(share=True)

  chatbot = gr.Chatbot(label="Chat History")


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://2f883bf20a9a5bd60b.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


