<a href="https://colab.research.google.com/github/tarakantaacharya/NLPinternal/blob/main/chatbot_file.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [11]:
! pip install chromadb



In [26]:
! pip install streamlit



In [12]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [13]:
from transformers import pipeline

# Load the text generation model (You can change it to any LLM like Mistral, Llama2)
generator = pipeline("text2text-generation", model="google/flan-t5-large")

Device set to use cpu


In [14]:
def refine_response(user_query, chroma_results):
    """
    Uses an LLM to filter and refine ChromaDB results into a structured response.
    """
    # Extract top k RAG texts
    top_results = [doc for doc in chroma_results["documents"][0] if doc]

    # If no relevant data found, return a default response
    if not top_results:
        return "I'm sorry, but I couldn't find relevant information."

    # Step 1: Classify query type (Single Answer or Multiple Answers)
    specific_query_keywords = ["where", "location", "fee", "tuition", "established", "affiliated", "co-ed"]
    is_specific_query = any(word in user_query.lower() for word in specific_query_keywords)

    # Step 2: Construct a structured LLM prompt
    if is_specific_query:
        prompt = (
            f"Based on the given data, answer in a full sentence:\n"
            f"Question: {user_query}\n"
            f"Data: {top_results}\n"
            f"Answer in proper English, giving a meaningful response."
        )
        max_answers = 1  # Single, structured response
    else:
        prompt = (
            f"List the best possible answers based on this query:\n"
            f"Question: {user_query}\n"
            f"Data: {top_results}\n"
            f"Provide a well-structured answer with multiple options."
        )
        max_answers = 3  # Multiple structured responses

    # Step 3: Generate the refined answer using the LLM
    response = generator(prompt, max_length=150, num_return_sequences=max_answers)

    # Step 4: Format the response properly
    refined_responses = [ans["generated_text"].strip() for ans in response]

    return " ".join(refined_responses) if is_specific_query else "\n".join(refined_responses)

In [15]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")  # Lightweight model for embeddings

In [16]:
import chromadb
chroma_client = chromadb.PersistentClient(path="/content/drive/MyDrive/college_rag_db")
collection = chroma_client.get_or_create_collection(name="college_info")

In [23]:
def search_college_info(query, top_k=3):
    query_embedding = model.encode(query).tolist()
    results = collection.query(query_embeddings=[query_embedding], n_results=top_k)

    # Filter the results using LLM
    refined_answer = refine_response(query, results)
    print("Response:",refined_answer)

In [22]:
while(True):
  text = input("Enter your query (or type 'Bye' or 'Exit' to quit): ")
  if text == "Bye" or text == "Exit" :
    break
  else :
    search_college_info(text)

Enter your query (or type 'Bye' or 'Exit' to quit): Where is R V R AND J C COLLEGE OF ENGINEERING located?
Response: GUNTUR, Andhra Pradesh.
Enter your query (or type 'Bye' or 'Exit' to quit): Bye
