<a href="https://colab.research.google.com/github/srigit-dot/machine-learning/blob/main/LangChainUSsing1LLMandOther.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [29]:
# Install dependencies
!pip install faiss-cpu transformers langchain sentence-transformers gradio

# Imports
import gradio as gr
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document
from langchain.llms import HuggingFacePipeline
from transformers import pipeline

# Load data from uploaded file
with open("sample_data.txt", "r", encoding="utf-8") as f:
    text_data = f.read()

# Convert to LangChain documents
documents = [Document(page_content=chunk) for chunk in text_data.split("\n") if chunk.strip()]

# Split documents for chunking
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
docs = text_splitter.split_documents(documents)

# Create embeddings and vector store
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(docs, embeddings)

# Load HuggingFace LLM
hf_pipeline = pipeline(
    "text-generation",
    model="distilgpt2",  # fast and small model
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7
)
llm = HuggingFacePipeline(pipeline=hf_pipeline)

# Query function
def ask_chatbot(query):
    try:
        results = vector_store.similarity_search(query, k=3)
        context = "\n".join([doc.page_content for doc in results])
        prompt = f"Context:\n{context}\n\nUser: {query}\n\nAnswer:"
        output = llm(prompt)[0]["generated_text"]
        return output.replace(prompt, "").strip()
    except Exception as e:
        return f"⚠️ Error: {str(e)}"

# Launch Gradio interface
gr.Interface(
    fn=ask_chatbot,
    inputs="text",
    outputs="text",
    title="LangChain Chatbot with FAISS"
).launch()


Device set to use cpu


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://dc5e90a65f4cd802c5.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


