# RAG Masters Thesis Chatbot

i200762

Muhammad Umar Waseem

## Imports

In [39]:
import os
import getpass
import pickle

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.document_loaders import UnstructuredExcelLoader
from langchain.vectorstores.faiss import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

## Model Setup

In [40]:
if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Provide your Google API Key")
else:
    print("Google API Key already set from env")

Google API Key already set from env


In [41]:
llm = ChatGoogleGenerativeAI(model="gemini-pro", google_api_key=os.environ["GOOGLE_API_KEY"])
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

## Data Preprocessing

In [48]:
loader = UnstructuredExcelLoader("dataset.xlsx")
docs = loader.load_and_split()

In [49]:
print(len(docs))

264


In [66]:
print(docs[0].page_content)

S. #
Thesis Title and Access Link (to PDF Report)
Link to the MS Thesis Report
Thesis Abstract


1
21I-2191 Laraib Afzal\n\n\n\nProsodic alignment for Automatic dubbing\n\n
https://drive.google.com/file/d/11nobbbLcTsNzHQHrs5mscGYP1bQ487wM/view?usp=drive_link
Automatic dubbing is the process of replacing the audio track of a video with a different language. In automatic dubbing, prosodic alignment is used to match the suprasegmental features like timbre, prosody, duration, pauses and intonation of the original speech with synthesed speech, in order to produce a natural-sounding dubbed video. This is done by analyzing and mapping these features of the original and translated speech. Existing research on automatic dubbing lack to addresses these features in source video which impact the overall naturalness and fluency of Synthesized speech. To solve this we proposed end-to-end architecture, following modular approach, to generate high quality dubbed video. In this research, we mainly focu

## Vector Database

In [51]:
vectordb = FAISS.from_documents(documents=docs ,embedding=embeddings)

In [52]:
with open("vectorstore.pkl", "wb") as f:
    pickle.dump(vectordb, f)

## Langchain Prompting Utilities

In [53]:
with open("vectorstore.pkl", "rb") as f:
    my_vector_database = pickle.load(f)

retriever = my_vector_database.as_retriever(search_kwargs={"k": 5})

In [54]:
template = """
You are a helpful AI assistant.
Answer based on the context provided. 
context: {context}
input: {input}
answer:
"""

prompt = PromptTemplate.from_template(template)
print(prompt)
print("\nInput Variables: ", prompt.input_variables)
print("\nPrompt Template: ", prompt.template)

input_variables=['context', 'input'] template='\nYou are a helpful AI assistant.\nAnswer based on the context provided. \ncontext: {context}\ninput: {input}\nanswer:\n'

Input Variables:  ['context', 'input']

Prompt Template:  
You are a helpful AI assistant.
Answer based on the context provided. 
context: {context}
input: {input}
answer:



In [55]:
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
retrieval_chain = create_retrieval_chain(retriever, combine_docs_chain)

## Get Contextural Answer

In [72]:
query = "Who did a thesis based on emotion detection and what was the title of the thesis, and who was the supervisor?"
response = retrieval_chain.invoke({ "input": query })

print("\n\nUser Query: ", query)
print("Model Response: ", response["answer"])



User Query:  Who did a thesis based on emotion detection and what was the title of the thesis, and who was the supervisor?
Model Response:  **Muhammad Farrukh Bashir** did a thesis based on emotion detection and the title of the thesis was "Emotion detection from Urdu text". His supervisor was **Dr. Waseem Shahzad**.


In [73]:
# context docs passed to the model

context_docs = response["context"]
for doc in context_docs:
    print("-"*50)
    print(doc.page_content)
    print("\n\n")

--------------------------------------------------
Deep Neural Network Based Detection of Wmotion in NL \n\nAdil Majeed (MS-CS)\nSupervisor: Dr. Hassan Mujtaba
https://drive.google.com/file/d/1p_akfJpKNPD5Hn8mcyubSQELWpphvRKb/view?usp=sharing
Emotion detection is playing a very important role in our life. People express their emotions in different ways i.e face expression, gestures, speech, and text. This research focuses 00 detecting emotions from the Roman Urdu text. Previously, A lot of work has been done on different languages for emotion detection but there is limited work done in Roman Urdu. Therefore, there is a need to explore Roman Urdu as it is the most widely used language on social media platforms for communication. One major issue for the Roman Urdu is the absence of benchmark corpora for emotion detection from text because language assets are essential for different natural language process-ing (NIL') tasks. There are many useful applications of the emotional analysis of 