<a href="https://colab.research.google.com/github/zarafatima-y/PoetryEmotionalAnalysis/blob/main/EmotionalInferenceWithDeepSeekR1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
# NUCLEAR OPTION - Reset everything
!pip uninstall transformers unsloth unsloth_zoo -y
!pip install --upgrade transformers
!pip install accelerate bitsandbytes

In [2]:
from huggingface_hub import login
from google.colab import userdata

hf_token = userdata.get('HuggingFaceToken')
login(token=hf_token)

In [3]:
import wandb

wb_token = userdata.get("WabKey")

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Poetry Dataset',
    job_type="training",
    anonymous="allow"
)

  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mzarafatima-ay[0m ([33mzarafatima-ay-york-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [4]:
# Trying the original DeepSeek model, not Unsloth's version
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"

tokenizer = AutoTokenizer.from_pretrained(model_name, token=hf_token)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_4bit=True,
    token=hf_token
)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/7.39G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

DeepSeek-R1-Distill-Llama-8B is a smaller, more efficient model that is a distilled version of the larger DeepSeek-R1 model. It uses the Llama3.1-8B architecture as a base and has been fine-tuned with knowledge from the DeepSeek-R1 model to improve reasoning, math, and coding performance while reducing computational requirements. Therefore, the key difference is that the distilled model sacrifices some of the raw capabilities of the larger model for increased efficiency and easier deployment

In [6]:
from google.colab import files
import pandas as pd

uploaded = files.upload()

data = pd.read_csv("isear_clean_drop.csv")
data = data.rename(columns={'SIT': 'SITUATION', 'Field1': 'EMOTION'})
data.head()

Saving isear_clean_drop.csv to isear_clean_drop.csv


Unnamed: 0,ID,EMOTION,SITUATION
0,11001,joy,"During the period of falling in love, each tim..."
1,11001,fear,When I was involved in a traffic accident.
2,11001,anger,When I was driving home after several days of...
3,11001,sadness,When I lost the person who meant the most to me.
4,11001,disgust,The time I knocked a deer down - the sight of ...


In [28]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully and provide your reasoning inside <think> tags.

### Instruction:
You are an emotion analysis expert with advanced knowledge in emotional interpretation and emotional nuance.
Analyze the emotional undertones of the following sentence and determine the dominant emotion using ONLY this list:
Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame.

IMPORTANT:
After the reasoning, output EXACTLY on a new line:
FINAL_LABEL: <one word from the list above>
Do NOT output anything else on that line.

### Text:
{}

### Response:
<think>{}</think>

FINAL_LABEL:"""

HardCoding Situation Text In to Test Prompt

In [32]:
text = "At a gathering I found myself involuntarily sitting next to two  people who expressed opinions that I considered very low and  discriminating."
prompt = prompt_style.format(text, "")

inputs = tokenizer([prompt], return_tensors="pt", padding=True, truncation=True, max_length=1024)
inputs = {k: v.to(next(model.parameters()).device) for k, v in inputs.items()}

gen = model.generate(
    input_ids = inputs["input_ids"],
    attention_mask = inputs["attention_mask"],
    max_new_tokens = 1200,
    use_cache = True,
)

raw = tokenizer.batch_decode(gen, skip_special_tokens=True)[0]
print("RAW OUTPUT:\n", raw, "\n")# Test Poem from reddit database

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


RAW OUTPUT:
 Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully and provide your reasoning inside <think> tags.

### Instruction:
You are an emotion analysis expert with advanced knowledge in emotional interpretation and emotional nuance.
Analyze the emotional undertones of the following sentence and determine the dominant emotion using ONLY this list:
Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame.

IMPORTANT:
After the reasoning, output EXACTLY on a new line:
FINAL_LABEL: <one word from the list above>
Do NOT output anything else on that line.

### Text:
At a gathering I found myself involuntarily sitting next to two  people who expressed opinions that I considered very low and  discriminating.

### Response:
<think></think>

FINAL_LABEL: Sadness 



In [33]:
import re
from tqdm import tqdm

EMOTIONS = ["Anger", "Disgust", "Fear", "Guilt", "Joy", "Sadness", "Shame"]

def extract_final_label(raw_text):
    """
    Extract the emotion ONLY from the model’s FINAL_LABEL line.
    This avoids false matches from the prompt.
    """
    match = re.search(r"FINAL_LABEL:\s*(.*)$", raw_text, flags=re.I)
    if not match:
        return None

    candidate = match.group(1).strip().split()[0]  # first token after FINAL_LABEL:

    # Normalize + validate
    for e in EMOTIONS:
        if candidate.lower().strip(".,;:") == e.lower():
            return e
    return candidate.capitalize()


preds = []

print("Running inference on FIRST 20 rows...\n") #As test

for i, row in tqdm(data.iloc[:20].iterrows(), total=20):
    text = row["SITUATION"]

    prompt = prompt_style.format(text, "")

    inputs = tokenizer(
        [prompt],
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=1024
    )
    inputs = {k: v.to(next(model.parameters()).device) for k, v in inputs.items()}

    gen = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=128,
        do_sample=False,
        temperature=0.0,
        use_cache=True,
    )

    raw = tokenizer.batch_decode(gen, skip_special_tokens=True)[0]

    pred = extract_final_label(raw)
    preds.append(pred)

data.loc[:19, "predicted_emotion"] = preds

print("\n=== FIRST 20 PREDICTIONS ===")
print(data.loc[:19, ["SITUATION", "EMOTION", "predicted_emotion"]])

Running inference on FIRST 20 rows...



  0%|          | 0/20 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  5%|▌         | 1/20 [00:01<00:22,  1.18s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 10%|█         | 2/20 [00:02<00:21,  1.19s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 15%|█▌        | 3/20 [00:03<00:21,  1.26s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 20%|██        | 4/20 [00:04<00:19,  1.24s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 25%|██▌       | 5/20 [00:06<00:19,  1.28s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 30%|███       | 6/20 [00:07<00:17,  1.27s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 35%|███▌      | 7/20 [00:08<00:16,  1.27s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 40%|████      | 8/20 [00:09<00:14,  1.24s/it]Setting `p


=== FIRST 20 PREDICTIONS ===
                                            SITUATION  EMOTION  \
0   During the period of falling in love, each tim...      joy   
1          When I was involved in a traffic accident.     fear   
2   When I was driving home after  several days of...    anger   
3   When I lost the person who meant the most to me.   sadness   
4   The time I knocked a deer down - the sight of ...  disgust   
5                     When I did not speak the truth.    shame   
6   When I caused problems for somebody because he...    guilt   
7   When I got a letter offering me the Summer job...      joy   
8   When I was going home alone one night in Paris...     fear   
9   When I was talking to HIM at a party for the f...    anger   
10  When my friends did not ask me to go to a New ...  sadness   
11  When I saw all the very drunk kids (13-14 year...  disgust   
12  When I could not remember what to say about a ...    shame   
13  When my uncle and my neighbour came home u




Output Of First 20 cases in a Structured DataSet

In [36]:
data = data[["ID", "SITUATION", "EMOTION", "predicted_emotion"]]
data.head(10)


Unnamed: 0,ID,SITUATION,EMOTION,predicted_emotion
0,11001,"During the period of falling in love, each tim...",joy,Joy
1,11001,When I was involved in a traffic accident.,fear,Sadness
2,11001,When I was driving home after several days of...,anger,Anger
3,11001,When I lost the person who meant the most to me.,sadness,Sadness
4,11001,The time I knocked a deer down - the sight of ...,disgust,Sadness
5,11001,When I did not speak the truth.,shame,Sadness
6,11001,When I caused problems for somebody because he...,guilt,Sadness
7,11002,When I got a letter offering me the Summer job...,joy,Joy
8,11002,When I was going home alone one night in Paris...,fear,Fear
9,11002,When I was talking to HIM at a party for the f...,anger,Sadness


Now To Test Models Inference On All 5724 Cases

In [38]:
import transformers
transformers.logging.set_verbosity_error()
import re
from tqdm import tqdm

EMOTIONS = ["Anger", "Disgust", "Fear", "Guilt", "Joy", "Sadness", "Shame"]

def extract_final_label(raw_text):
    """
    Extract emotion ONLY from the FINAL_LABEL line.
    This avoids matching words in the prompt, withous this matching is inaccurate and repetitive.
    """
    match = re.search(r"FINAL_LABEL:\s*(.*)$", raw_text, flags=re.I)
    if not match:
        return None

    candidate = match.group(1).strip().split()[0]


    for e in EMOTIONS:
        if candidate.lower().strip(".,;:") == e.lower():
            return e

    return candidate.capitalize()


preds = []

print(f"Running inference on ALL {len(data)} rows...\n")

for i, row in tqdm(data.iterrows(), total=len(data)):
    text = row["SITUATION"]

    prompt = prompt_style.format(text, "")

    inputs = tokenizer(
        [prompt],
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=1024
    )
    inputs = {k: v.to(next(model.parameters()).device) for k, v in inputs.items()}

    gen = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=128,
        do_sample=False,
        temperature=0.0,
        use_cache=True,
    )

    raw = tokenizer.batch_decode(gen, skip_special_tokens=True)[0]

    pred = extract_final_label(raw)
    preds.append(pred)

data["predicted_emotion"] = preds

# I Reordered columns (ID, SITUATION, EMOTION, predicted_emotion) for better readability.
data = data[["ID", "SITUATION", "EMOTION", "predicted_emotion"]]
data.to_csv("emotion_predictions_full.csv", index=False)

print("\n✅ DONE! Saved file: emotion_predictions_full.csv")

Running inference on ALL 5724 rows...



100%|██████████| 5724/5724 [2:10:49<00:00,  1.37s/it]


✅ DONE! Saved file: emotion_predictions_full.csv





In [43]:
data["predicted_emotion"] = data["predicted_emotion"].str.lower()
data.head(50)
data.to_csv("emotion_predictions_full.csv", index=False)
from google.colab import files
files.download("emotion_predictions_full.csv") # Just downloaded to desktop

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Testing Module Inference

In [45]:
data["predicted_emotion"] = data["predicted_emotion"].fillna("unknown")

In [49]:
TARGET_EMOTIONS = ["anger", "disgust", "fear", "guilt", "joy", "sadness", "shame"]
clean = data[data["EMOTION"].isin(TARGET_EMOTIONS)].copy()
clean = clean[clean["predicted_emotion"].notna()]

In [50]:
from sklearn.metrics import classification_report, accuracy_score

y_true = clean["EMOTION"]
y_pred = clean["predicted_emotion"]

print("Accuracy:", accuracy_score(y_true, y_pred))

print(classification_report(
    y_true,
    y_pred,
    labels=TARGET_EMOTIONS,
    zero_division=0
))

Accuracy: 0.5370370370370371
              precision    recall  f1-score   support

       anger       0.58      0.32      0.41       809
     disgust       0.74      0.45      0.56       807
        fear       0.79      0.70      0.74       818
       guilt       0.86      0.25      0.39       820
         joy       0.80      0.88      0.84       817
     sadness       0.29      0.96      0.45       825
       shame       0.98      0.19      0.32       828

   micro avg       0.54      0.54      0.54      5724
   macro avg       0.72      0.54      0.53      5724
weighted avg       0.72      0.54      0.53      5724



This script evaluates the zero-shot emotion-classification performance of the DeepSeek R1 Distill-Llama-8B model on the ISEAR dataset. It loads each situation from the dataset, formats it using a custom prompt, runs it through the model, extracts the model’s final one-word emotion output, and appends the result as a new column (predicted_emotion). The code then saves the full dataframe and allows later calculation of accuracy and F1-scores. No fine-tuning is performed here — the script simply tests how well the base model can infer emotions from text.