<a href="https://colab.research.google.com/github/waghmareps12/RANDOM_COLLAB_LLM_NOTEBOOKS/blob/main/Implementing_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers torch accelerate gradio langchain chromadb sentence_transformers bitsandbytes



In [None]:
from langchain.document_loaders import TextLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

**Load Document**

In [None]:
loader = TextLoader('/content/Sentiment Analysis of Social Media Text_ Leveraging Fine-Tuned Language Models to Unveil a Wider Spectrum of Emotions.txt')
documents = loader.load()
documents

[Document(page_content="\ufeffSentiment Analysis of Social Media Text: Leveraging Fine-Tuned Language Models to Unveil a Wider Spectrum of Emotions\nEddy Ejembi\neddyejembi2018@gmail.com\n\n\nAbstract\nThe emergence of social media has provided a platform for individuals to express a wide range of sentiments and emotions. Sentiment analysis, the task of determining the emotional tone behind text data, has gained prominence for its relevance in various domains. This research project aims to address a notable gap in existing sentiment analysis systems by focusing on sentiments that are often overlooked, including depression, suicidal thoughts, feelings of threat, fear, and other emotionally charged states. The project utilizes fine-tuned language models to achieve more accurate and comprehensive sentiment analysis in the context of social media. Through data collection, preprocessing, fine-tuning, and evaluation, the research contributes to the improvement of sentiment analysis for a bro

**Split Document**

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
texts

[Document(page_content='\ufeffSentiment Analysis of Social Media Text: Leveraging Fine-Tuned Language Models to Unveil a Wider Spectrum of Emotions\nEddy Ejembi\neddyejembi2018@gmail.com', metadata={'source': '/content/Sentiment Analysis of Social Media Text_ Leveraging Fine-Tuned Language Models to Unveil a Wider Spectrum of Emotions.txt'}),
 Document(page_content='Abstract\nThe emergence of social media has provided a platform for individuals to express a wide range of sentiments and emotions. Sentiment analysis, the task of determining the emotional tone behind text data, has gained prominence for its relevance in various domains. This research project aims to address a notable gap in existing sentiment analysis systems by focusing on sentiments that are often overlooked, including depression, suicidal thoughts, feelings of threat, fear, and other emotionally charged states. The project utilizes fine-tuned language models to achieve more accurate and comprehensive sentiment analysis

**Initialize Vector Datastore**

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings

In [None]:
embedding = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
vectordb = Chroma.from_documents(texts, embedding)
vectordb

<langchain_community.vectorstores.chroma.Chroma at 0x78d75eedce20>

In [None]:
# Perform a simple search on the document

# **Search**
query = "What did the user say about lilbert ?"
results = vectordb.similarity_search(query, k=3)
for result in results:
    print(f"Document: {result.page_content}")


Document: Materials and Methods
This research project adopts a fine-tuning approach using a state-of-the-art language model, GPT-3.5-turbo, as the basis for sentiment analysis. Fine-tuning is a process that allows a pre-trained language model to adapt to a specific task or domain, making it particularly well-suited for addressing the research aims. GPT-3.5 turbo is a Large Language Chat Model rooted in the GPT 3.5 architecture. This model powers the ChatGPT platform and is distinguished by its substantial capacity, featuring an impressive 175 billion parameters. It has been meticulously developed using a vast corpus of real-world textual data.
The adoption of GPT-3.5-turbo aligns seamlessly with the pursuit of advancing sentiment analysis on social media, equipping us with a versatile and potent tool that holds immense promise for understanding and categorizing a broad spectrum of emotions and expressions within the digital realm.
Data Collection
Document: GPT-3.5 Turbo represents a si

Add Large Language Model

In [None]:
#Login to Hugginface
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.c

In [None]:
!huggingface-cli whoami

eddyejembi


In [None]:
import transformers
from transformers import AutoTokenizer
import torch
from torch import cuda, bfloat16

**Set quantization configuration for less GPU Memory Usage**

In [None]:
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

In [None]:
model_id = "meta-llama/Llama-2-7b-chat-hf"

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
)
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
)

tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [None]:
#Model Pipeline

from transformers import pipeline

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

In [None]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=pipeline)

In [None]:
llm(prompt= "What is Sentiment analysis")

  warn_deprecated(


'What is Sentiment analysis?\n Hinweis: This article is part of a larger series on Natural Language Processing (NLP) techniques. You can find the complete series here.\nSentiment analysis is a type of text analysis that focuses on identifying and categorizing the emotions or attitudes expressed in a piece of text. It is a popular application of Natural Language Processing (NLP) techniques, which can be used to analyze large amounts of text data to extract insights and meaning.\nIn this article, we will provide an overview of sentiment analysis, including its definition, types, and applications. We will also discuss some of the challenges and limitations of sentiment analysis, as well as some of the tools and techniques used to perform it.\nDefinition of Sentiment Analysis:\nSentiment analysis is the process of identifying and categorizing the emotions or attitudes expressed in a piece of text. It involves using NLP techniques to analyze the language used in a text and determine its sen

**Initialize Chain**

In [None]:
from langchain.chains import RetrievalQA

In [None]:
retriever = vectordb.as_retriever()

retrieve = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
)

***Perform Retrieval Augumented Generation (RAG)***

In [None]:
def rag(qa, query):
    print(f"Query: {query}\n")
    result = qa.run(query)
    print("\nResult: ", result)

In [None]:
prompt = "How does the author describe Sentiment analysis?"
rag(retrieve, prompt)

Query: How does the author describe Sentiment analysis?



[1m> Entering new RetrievalQA chain...[0m


  warn_deprecated(



[1m> Finished chain.[0m

Result:  Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

________________
Introduction
Sentiment analysis, also known as opinion mining, has established itself as a valuable tool for understanding the emotional content present in text data. This field of research and application has been widely adopted in various domains, such as marketing, customer service, politics, and public opinion analysis. Existing sentiment analysis systems primarily focus on categorizing text into common categories: positive, negative, and neutral (Wankhade et al., 2022; Sutar et al., 2016; Kakde & Losarwar, 2018; Gah et al., 2017). These systems are instrumental in understanding the general sentiment of social media content, which often pertains to subjects like product reviews, political discourse, and general public sentiment.

Sentiment analysis has proven its impo

In [None]:
prompt2 = "What improvement is the author ttrying to acheive in Sentiment analysis?"

print(rag(retrieve, prompt2))

Query: What improvement is the author ttrying to acheive in Sentiment analysis?



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m

Result:  Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022, February 7). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence, Rev(55), 5731–5780. doi.org/10.1007/s10462-022-10144-1

________________
Introduction
Sentiment analysis, also known as opinion mining, has established itself as a valuable tool for understanding the emotional content present in text data. This field of research and application has been widely adopted in various domains, such as marketing, customer service, politics, and public opinion analysis. Existing sentiment analysis systems primarily focus on categorizing text into common categories: positive, negativ

In [None]:
prompt3 = "How does the proposed method in this paper address the drawbacks of existing sentiment analysis models?"
rag(retrieve, prompt3)

Query: How does the proposed method in this paper address the drawbacks of existing sentiment analysis models?



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m

Result:  Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

________________
Introduction
Sentiment analysis, also known as opinion mining, has established itself as a valuable tool for understanding the emotional content present in text data. This field of research and application has been widely adopted in various domains, such as marketing, customer service, politics, and public opinion analysis. Existing sentiment analysis systems primarily focus on categorizing text into common categories: positive, negative, and neutral (Wankhade et al., 2022; Sutar et al., 2016; Kakde & Losarwar, 2018; Gah et al., 2017). These systems are instrumental in understanding the general sentiment of social me

In [None]:
prompt4 = "What does the author say about BERT, and ALBERT, and how does he plan to approach the problem?"
rag(retrieve, prompt4)

Query: What does the author say about BERT, and ALBERT, and how does he plan to approach the problem?



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m

Result:  Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Choudhury, M. D., Gamon, M., Counts, S., & Horvitz, E. (2021, August). Predicting Depression via Social Media. Proceedings of the International AAAI Conference on Web and Social Media, 7(1), 128-137. https://doi.org/10.1609/icwsm.v7i1.14432
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018, October 11). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics. 10.18653/v1/N19-1423
Ekman, P. (1992). An argument for basic emotions. 6(3-4), 169–200. 10.1080/02699939208411068
Gah, S., Gyamfi, N. K., & Katsriku, F. (2017, April). Sentiment Analy