In [10]:

import kagglehub
nelgiriyewithana_emotions_path = kagglehub.dataset_download('nelgiriyewithana/emotions')

print('Data source import complete.')


Data source import complete.


---

</div>

### Table of Contents
1. [`Model Overview`](#1)
2. [`Pulling Model`](#2)
3. [`Emotions Data Set Classification`](#3)
4. [`Conclusion`](#4)

[`Math & Physics Fun with Gus`](https://www.youtube.com/MathPhysicsFunWithGus)

---

<a name='1'>
    
    
# <p style="background-color: red; padding: 5px; border-radius: 2px; border: solid 4px orange; text-align: left; font-family: 'Computer Modern'; font-size: 40px; color: white; margin-top: 0; margin-bottom: 0;">1 | Model Overview </p>
    
The presented model is a multi-label classification using the RoBERTa-base architecture, trained on the `go_emotions` dataset with 28 emotion labels. Its purpose is to classify text into one or more emotion categories. The model outputs a vector of probabilities for each label, and a common practice is to apply a 0.5 threshold for predictions.

## Training and Architecture
Trained using Hugging Face Transformers, the model employs `AutoModelForSequenceClassification` with `problem_type="multi_label_classification"`. Training spans 3 epochs, utilizing a learning rate of 2e-5 and weight decay of 0.01.

## Inference
Inference is facilitated through the Hugging Face Transformers pipeline for text classification, providing a straightforward method for predicting labels and their probabilities.

## Evaluation Metrics and Optimization
Detailed metrics include accuracy, precision, recall, and F1 score, both overall and per label. The notebook explores optimizing thresholds for binarizing model outputs, balancing precision and recall for enhanced F1 score.

## Dataset Challenges
The `go_emotions` dataset presents challenges, with varying label performance attributed to imbalance, limited examples, and potential labeling errors. The author suggests data cleaning as a potential avenue for substantial model improvement.

In summary, the model demonstrates RoBERTa's effectiveness for multi-label emotion classification, offering insights into training, evaluation, and opportunities for dataset refinement.

[`Reference Page: SamLowe/roberta-base-go_emotions`](https://huggingface.co/SamLowe/roberta-base-go_emotions)

In [11]:
!pip install transformers
!pip install ipywidgets



In [12]:
# Data Imports
import pandas as pd

# Model Imports
import transformers

<a name=2>

  
# <p style="background-color: red; padding: 5px; border-radius: 2px; border: solid 4px orange; text-align: left; font-family: 'Computer Modern'; font-size: 40px; color: white; margin-top: 0; margin-bottom: 0;"> 2 | Pulling Model and Getting Test Scripts </p>

In [13]:
# Pulling model
classifier = transformers.pipeline(
    task="text-classification",
    model="SamLowe/roberta-base-go_emotions",
    top_k=None
)

# Sentences to test
sentences = [
    "I LOVE math!",
    "I am not having a great day",
    "Leave me alone, please"
]

# Using model to get text classes
model_outputs = classifier(sentences)

Device set to use cpu


## <p style="background-color: red; padding: 5px; border-radius: 2px; border: solid 4px orange; text-align: left; font-family: 'Computer Modern'; font-size: 40px; color: white; margin-top: 0; margin-bottom: 0;">  2.1 | Building DataFrame </p>

In [14]:
# Dictionaries for each sentence
data = []
for sentence, label_list in zip(sentences, model_outputs):
    for label_dict in label_list:
        data.append({'sentence': sentence, 'label': label_dict['label'], 'score': label_dict['score']})

# Forming dataframe
df = pd.DataFrame(data)
df.set_index(['sentence', 'label'], inplace=True)
display(df)

Unnamed: 0_level_0,Unnamed: 1_level_0,score
sentence,label,Unnamed: 2_level_1
I LOVE math!,love,0.950637
I LOVE math!,admiration,0.048898
I LOVE math!,approval,0.017202
I LOVE math!,joy,0.016136
I LOVE math!,neutral,0.008876
...,...,...
"Leave me alone, please",surprise,0.000640
"Leave me alone, please",relief,0.000527
"Leave me alone, please",excitement,0.000503
"Leave me alone, please",gratitude,0.000482


In [15]:
# Sortting DataFrame by sentence & score in descending order
df_sorted = df.sort_values(['sentence', 'score'], ascending=[True, False])

# Group by sentence and taking top 3 rows for each group
df_top3 = pd.DataFrame(df_sorted.groupby('sentence').head(3))
display(df_top3.style.background_gradient())

Unnamed: 0_level_0,Unnamed: 1_level_0,score
sentence,label,Unnamed: 2_level_1
I LOVE math!,love,0.950637
I LOVE math!,admiration,0.048898
I LOVE math!,approval,0.017202
I am not having a great day,disappointment,0.466695
I am not having a great day,sadness,0.398495
I am not having a great day,annoyance,0.068066
"Leave me alone, please",neutral,0.688471
"Leave me alone, please",sadness,0.141373
"Leave me alone, please",annoyance,0.069881


<a name=3>

# <p style="background-color: red; padding: 5px; border-radius: 2px; border: solid 4px orange; text-align: left; font-family: 'Computer Modern'; font-size: 40px; color: white; margin-top: 0; margin-bottom: 0;">  3 | Emotions Data Set Classification </p>
    

Six categories: sadness (0), joy (1), love (2), anger (3), fear (4), and surprise (5). We will change the model pipline slightly and add `top_k=1`, for most probable category.

## <p style="background-color: red; padding: 5px; border-radius: 2px; border: solid 4px orange; text-align: left; font-family: 'Computer Modern'; font-size: 40px; color: white; margin-top: 0; margin-bottom: 0;">  3.1 | Reading Emotions Dataset </p>

In [16]:
# Getting data
df = pd.read_csv("/content/text.csv")[['text', 'label']]

# Mapping for categories
category_mapping = {
    'sadness': 0,
    'joy': 1,
    'love': 2,
    'anger': 3,
    'fear': 4,
    'surprise': 5
}

# Applying mapping to 'label' column
df['label'] = df['label'].map({n: cat for cat, n in category_mapping.items()})

# Displaying the updated DataFrame
display(df.head())

Unnamed: 0,text,label
0,i just feel really helpless and heavy hearted,fear
1,ive enjoyed being able to slouch about relax a...,sadness
2,i gave up my internship with the dmrg and am f...,fear
3,i dont know i feel so lost,sadness
4,i am a kindergarten teacher and i am thoroughl...,fear


## <p style="background-color: red; padding: 5px; border-radius: 2px; border: solid 4px orange; text-align: left; font-family: 'Computer Modern'; font-size: 40px; color: white; margin-top: 0; margin-bottom: 0;"> 3.2 | Running Model on Subset of Data </p>

In [17]:
# Pulling model and setting top_k=1
classifier = transformers.pipeline(
    task="text-classification",
    model="SamLowe/roberta-base-go_emotions",
    top_k=1
)

Device set to use cpu


In [18]:
%%time
# Subset size
n = 100

# Using model to get text classes
model_outputs = classifier(df['text'].tolist()[0:n])

CPU times: user 17.4 s, sys: 277 ms, total: 17.7 s
Wall time: 28.4 s


In [19]:
def format_model_data(df):
    df_pred = df.iloc[0:n, :].copy()
    df_pred.loc[:, 'pred_category'] = [values[0]['label'] for values in model_outputs]
    df_pred.loc[:, 'pred_score'] = [values[0]['score'] for values in model_outputs]
    return df_pred

df_pred = format_model_data(df)
display(df_pred)

Unnamed: 0,text,label,pred_category,pred_score
0,i just feel really helpless and heavy hearted,fear,sadness,0.867159
1,ive enjoyed being able to slouch about relax a...,sadness,joy,0.703794
2,i gave up my internship with the dmrg and am f...,fear,sadness,0.816547
3,i dont know i feel so lost,sadness,sadness,0.643219
4,i am a kindergarten teacher and i am thoroughl...,fear,nervousness,0.566698
...,...,...,...,...
95,i feel like i havent had a moment to breathe s...,joy,joy,0.619754
96,i am feeling helpless because he is who the re...,sadness,sadness,0.710805
97,i had not been in a convertible in many many y...,joy,admiration,0.695662
98,i feel like a whiner because my pain is really...,sadness,sadness,0.435532


## <p style="background-color: red; padding: 5px; border-radius: 2px; border: solid 4px orange; text-align: left; font-family: 'Computer Modern'; font-size: 40px; color: white; margin-top: 0; margin-bottom: 0;"> 3.3 | Model Output </p>

In [20]:
for index, row in df_pred.iterrows():
    print(f"Index: {index}", end=' | ')

    print(f"Label: {row['label']}", end=' | ')
    print(f"Roberta: {row['pred_category']}")
    print(row['text'])
    print("\n")

    if index == 3:
        break

Index: 0 | Label: fear | Roberta: sadness
i just feel really helpless and heavy hearted


Index: 1 | Label: sadness | Roberta: joy
ive enjoyed being able to slouch about relax and unwind and frankly needed it after those last few weeks around the end of uni and the expo i have lately started to find myself feeling a bit listless which is never really a good thing


Index: 2 | Label: fear | Roberta: sadness
i gave up my internship with the dmrg and am feeling distraught


Index: 3 | Label: sadness | Roberta: sadness
i dont know i feel so lost




<a name=4>

# <p style="background-color: red; padding: 5px; border-radius: 2px; border: solid 4px orange; text-align: left; font-family: 'Computer Modern'; font-size: 40px; color: white; margin-top: 0; margin-bottom: 0;"> 4 | Conclusion </p>

Upon comparing the labeled sentiments in the dataset with the predicted sentiments by the Roberta model, it is evident that the model consistently outperforms the provided class labels, showcasing its effectiveness in accurately identifying and classifying emotional tones.

- **In Index 0:** The label is "fear," the Roberta model predicts "sadness," indicating a nuanced understanding of the text that aligns more closely with the complexity of human emotions.

- **Index 1:** Demonstrates the model's ability to identify positive sentiments even when labeled as "sadness," highlighting its capacity to capture subtle nuances and context in the language.

- **Index 2:** Here the label is "fear," Roberta predicts "sadness," suggesting a more refined interpretation of emotional expressions that goes beyond simplistic categorizations.


In summary, the Roberta model excels in providing more accurate and nuanced predictions compared to the class labels, showcasing its superior capability in understanding and categorizing diverse emotional expressions.