# Introduction

Sarcasm is a sophisticated language phenomenon, which would cause much confusion to exist sentiment classification systems.     
So sarcasm detection, a task of predicting whether a given text contains sarcasm, has received much research attention.     

Recently, many methods have been proposed for sarcasm detection, which could be broadly classified into two categories.     
One is the text-only method which only concentrate on the utterance itself, such as exploiting incongruity expressions to detect the sarcasm text.     
Another direction is based on extra information, which exploits external knowledge to assist the detection procedure, such as user history, and common sense knowledge.

We propose an unsupervised sarcasm detection method.     

First, we leverage the external sentiment knowledge to mask prominent tokens. Then the masked texts are fed into the pre-trained generation model, which follows the remaining logic structure to generate texts.     
There is a good chance that these reborn texts would not be sarcastic or make more sense.     

Second, after obtaining the similarity score between the generated sentence and the original one, features beneath the scores will be extracted to decide whether a sentence is sarcasm.     

Then, we construct several unsupervised baselines and conduct experiments on IAC-V2 dataset.

# Imports and Reading Data

In [None]:
!pip install senticnet

Collecting senticnet
  Downloading senticnet-1.6-py3-none-any.whl.metadata (2.6 kB)
Downloading senticnet-1.6-py3-none-any.whl (51.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.9/51.9 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: senticnet
Successfully installed senticnet-1.6


In [None]:
import numpy as np
import pandas as pd

from senticnet.senticnet import SenticNet

import nltk
from nltk.corpus import stopwords
nltk.download('punkt')
nltk.download('stopwords')
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

from transformers import AutoTokenizer, AutoModel
from transformers import BartTokenizer, BartForConditionalGeneration
import torch

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import accuracy_score, precision_score, f1_score
from sklearn.metrics import confusion_matrix

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
df = pd.read_csv("/content/drive/My Drive/AlifResearch/iSarcasm/train.csv")
df1 = pd.read_csv("/content/drive/My Drive/AlifResearch/iSarcasm/test.csv")

In [None]:
df

Unnamed: 0,id,text,label,emoji,tag
0,1,why do small shouldered tiny guys wear huge t ...,0,,
1,2,"good morning , please go and vote ! <repeated>...",0,🙅,"<url>,</hashtag>,<hashtag>,<number>,<repeated>"
2,3,is it even christmas if there isn ’ t a fight ...,1,,
3,4,helping mum with her maths work for the course...,0,,
4,5,<hashtag> dear customer </hashtag> i am sorry ...,0,,"<hashtag>,</hashtag>"
...,...,...,...,...,...
3111,3112,fcking hate being let down . don ’ t get my ho...,0,,
3112,3113,last day in my twenties 😫,0,😫,
3113,3114,who ’ s dick do i have to suck for some dominos,0,,
3114,3115,<user> yet if you threw cold water it would st...,0,,<user>


In [None]:
df1

Unnamed: 0,id,text,label,emoji,tag
0,3464,saw poppin fresh in the macy ' s parade . my d...,1,,
1,3465,i knew as soon as i heard doing ford was cutti...,0,,"<url>,<percent>"
2,3466,great advice from well established individuals...,0,,<user>
3,3467,"eating apple sauce , chicken thighs , broccoli...",0,,"<hashtag>,</hashtag>"
4,3468,<user> ur not a real smiler if ur not expectin...,1,,<user>
...,...,...,...,...,...
882,4346,imagine that it ' s going to cost me <number> ...,0,,<number>
883,4347,people really out here tryna argue you do not ...,0,,<url>
884,4348,"<user> and their relentless running game , on ...",0,,"<number>,<user>"
885,4349,why is it that whether i get out of bed at <nu...,0,,"<number>,<allcaps>,<repeated>,</allcaps>"


In [None]:
# Concatenate vertically
df = pd.concat([df, df1], ignore_index=True)
df

Unnamed: 0,id,text,label,emoji,tag
0,1,why do small shouldered tiny guys wear huge t ...,0,,
1,2,"good morning , please go and vote ! <repeated>...",0,🙅,"<url>,</hashtag>,<hashtag>,<number>,<repeated>"
2,3,is it even christmas if there isn ’ t a fight ...,1,,
3,4,helping mum with her maths work for the course...,0,,
4,5,<hashtag> dear customer </hashtag> i am sorry ...,0,,"<hashtag>,</hashtag>"
...,...,...,...,...,...
3998,4346,imagine that it ' s going to cost me <number> ...,0,,<number>
3999,4347,people really out here tryna argue you do not ...,0,,<url>
4000,4348,"<user> and their relentless running game , on ...",0,,"<number>,<user>"
4001,4349,why is it that whether i get out of bed at <nu...,0,,"<number>,<allcaps>,<repeated>,</allcaps>"


In [None]:
df = df.drop(columns=['id', 'emoji', 'tag'])
df

Unnamed: 0,text,label
0,why do small shouldered tiny guys wear huge t ...,0
1,"good morning , please go and vote ! <repeated>...",0
2,is it even christmas if there isn ’ t a fight ...,1
3,helping mum with her maths work for the course...,0
4,<hashtag> dear customer </hashtag> i am sorry ...,0
...,...,...
3998,imagine that it ' s going to cost me <number> ...,0
3999,people really out here tryna argue you do not ...,0
4000,"<user> and their relentless running game , on ...",0
4001,why is it that whether i get out of bed at <nu...,0


In [None]:
df['class'] = df['label'].map({0: 'notsarc', 1: 'sarc'})
df

Unnamed: 0,text,label,class
0,why do small shouldered tiny guys wear huge t ...,0,notsarc
1,"good morning , please go and vote ! <repeated>...",0,notsarc
2,is it even christmas if there isn ’ t a fight ...,1,sarc
3,helping mum with her maths work for the course...,0,notsarc
4,<hashtag> dear customer </hashtag> i am sorry ...,0,notsarc
...,...,...,...
3998,imagine that it ' s going to cost me <number> ...,0,notsarc
3999,people really out here tryna argue you do not ...,0,notsarc
4000,"<user> and their relentless running game , on ...",0,notsarc
4001,why is it that whether i get out of bed at <nu...,0,notsarc


In [None]:
# Drop the old 'label' column and rename 'tweet' to 'text'
df = df.drop(columns=['label'])
df = df.rename(columns={'tweet': 'text'})

In [None]:
df

Unnamed: 0,text,class
0,why do small shouldered tiny guys wear huge t ...,notsarc
1,"good morning , please go and vote ! <repeated>...",notsarc
2,is it even christmas if there isn ’ t a fight ...,sarc
3,helping mum with her maths work for the course...,notsarc
4,<hashtag> dear customer </hashtag> i am sorry ...,notsarc
...,...,...
3998,imagine that it ' s going to cost me <number> ...,notsarc
3999,people really out here tryna argue you do not ...,notsarc
4000,"<user> and their relentless running game , on ...",notsarc
4001,why is it that whether i get out of bed at <nu...,notsarc


In [None]:
import re
# Function to remove text inside <>
def remove_brackets(text):
    return re.sub(r'<.*?>', '', text).strip()

# Apply the function to the 'text' column
df['text'] = df['text'].apply(remove_brackets)
df

Unnamed: 0,text,class
0,why do small shouldered tiny guys wear huge t ...,notsarc
1,"good morning , please go and vote ! it only t...",notsarc
2,is it even christmas if there isn ’ t a fight ...,sarc
3,helping mum with her maths work for the course...,notsarc
4,dear customer i am sorry that the mobile phon...,notsarc
...,...,...
3998,imagine that it ' s going to cost me pound to...,notsarc
3999,people really out here tryna argue you do not ...,notsarc
4000,"and their relentless running game , on the bri...",notsarc
4001,why is it that whether i get out of bed at or...,notsarc


In [None]:
!pip install emoji

Collecting emoji
  Downloading emoji-2.12.1-py3-none-any.whl.metadata (5.4 kB)
Downloading emoji-2.12.1-py3-none-any.whl (431 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/431.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m431.4/431.4 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: emoji
Successfully installed emoji-2.12.1


In [None]:
import emoji
# Function to remove emojis
def remove_emojis(text):
    return emoji.replace_emoji(text, replace='')

# Apply the function to the 'text' column
df['text'] = df['text'].apply(remove_emojis)
df

Unnamed: 0,text,class
0,why do small shouldered tiny guys wear huge t ...,notsarc
1,"good morning , please go and vote ! it only t...",notsarc
2,is it even christmas if there isn ’ t a fight ...,sarc
3,helping mum with her maths work for the course...,notsarc
4,dear customer i am sorry that the mobile phon...,notsarc
...,...,...
3998,imagine that it ' s going to cost me pound to...,notsarc
3999,people really out here tryna argue you do not ...,notsarc
4000,"and their relentless running game , on the bri...",notsarc
4001,why is it that whether i get out of bed at or...,notsarc


In [None]:
# df= df.drop('id', axis= 1)
# df

# Understanding Data

In [None]:
df.dtypes

Unnamed: 0,0
text,object
class,object


In [None]:
df.columns

Index(['text', 'class'], dtype='object')

In [None]:
text_data_original = list(df['text'])
text_data = [x.lower() for x in text_data_original]
print(*text_data, sep = "\n")

why do small shouldered tiny guys wear huge t shirts ?
good morning , please go and vote !  it only takes  minutes and a low turnout will hand victory to the brexit party   e uelections 2019
is it even christmas if there isn ’ t a fight with neighbours and a broken wrist ?
helping mum with her maths work for the course she ’ s taking and i ’ m slowly realising i am not great at maths
dear customer  i am sorry that the mobile phone reseller in the mall fucked you over . we all are not a bunch of sheisters . i hope your other life issues gets better and that i earned your future business .
anyone fancy writing my lit review for me ? can not . be . arsed .
so the  episode about ladonna was one of the most poignant and sad investigations of abuse of power and discrimination in institutions . v hard to listen to but so important .
middle aged women are bitchier than most people i know
baby tobias has arrived ! i ’ ll be taking a spot of leave but back into the swing of things for spring / s

In [None]:
label_data = list(df['class'])
print(*label_data, sep = "\n")

notsarc
notsarc
sarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
sarc
sarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
sarc
notsarc
notsarc
notsarc
notsarc
sarc
sarc
sarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
sarc
notsarc
sarc
notsarc
notsarc
notsarc
notsarc
sarc
sarc
notsarc
sarc
notsarc
notsarc
notsarc
notsarc
notsarc
sarc
notsarc
notsarc
notsarc
sarc
notsarc
notsarc
notsarc
notsarc
sarc
notsarc
notsarc
notsarc
sarc
notsarc
notsarc
notsarc
notsarc
notsarc
sarc
notsarc
sarc
notsarc
notsarc
sarc
sarc
notsarc
notsarc
notsarc
notsarc
notsarc
sarc
notsarc
notsarc
notsarc
sarc
notsarc
notsarc
notsarc
notsarc
notsarc
sarc
sarc
notsarc
notsarc
notsarc
notsarc
sarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
notsarc
not

# Overview

The proposed framework contains three main components:     

1) Sentences mask and generation.     
This procedure first recognizes main components of sentences which will be properly masked to cause more impact on original sentences, and then fulfills the texts generation work;     

2) Sentences representation.     
It is expected to calculate dense vectors of sentences;     

3) Sarcastic utterances detection leverages.     
the similarity scores between original and regenerated sentences to detect whether an utterance is sarcastic.

# Sentences Mask and Generation
## 1)
"First, we use the sentiment common knowledge retrieved from SenticNet to recognize affective words in the sentence 𝑥,     
and split those words into two sets according to its sentiment polarities:    
PW = {pw1, pw2, ..., pwh} and    
NW = {nw1, nw2, ..., nwk},     
h + k <= n."

In [None]:
def tokenize_sentence(sentence):
    tokens = word_tokenize(sentence)

    lemmatizer = WordNetLemmatizer()

    clean_tokens = []
    for tok in tokens:
        clean_tok = lemmatizer.lemmatize(tok).lower().strip()
        clean_tokens.append(clean_tok)

    return clean_tokens

In [None]:
def get_sentiment_polarity_from_senticnet(word):
    sn = SenticNet()

    word = word.lower()

    try:
        return sn.polarity_label(word)
    except:
        return "neutral"

In [None]:
def analyze_sentiment(sentences):
    positive_words = []
    negative_words = []

    for sentence in sentences:
        words = tokenize_sentence(sentence)

        PW = set()
        NW = set()

        for word in words:
            sentiment_polarity = get_sentiment_polarity_from_senticnet(word)
            if sentiment_polarity == "positive":
                PW.add(word.lower())
            elif sentiment_polarity == "negative":
                NW.add(word.lower())

        positive_words.append(PW)
        negative_words.append(NW)

    return positive_words, negative_words

In [None]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...


True

In [None]:
positive_words, negative_words = analyze_sentiment(text_data)

for i, sentence in enumerate(text_data):
    print(f"Sentence: {sentence}")
    print(f"Positive Words: {positive_words[i]}")
    print(f"Negative Words: {negative_words[i]}")
    print("- - - - - - - - - -")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Sentence: the one thing i wish i ’ d brought travelling with me is multi vitamins ! budget backpacking is not great for getting a balanced diet .
Positive Words: {'multi', 'backpacking', 'travelling', 'brought', 'great'}
Negative Words: {'vitamin'}
- - - - - - - - - -
Sentence: win a cyberpower  gaming pc plus  fifa   on pc .  ,  ,
Positive Words: {'plus', 'fifa', 'gaming', 'win'}
Negative Words: set()
- - - - - - - - - -
Sentence: i cannot wait for halloween        
Positive Words: set()
Negative Words: {'wait'}
- - - - - - - - - -
Sentence: thought i ' d be a sheep . it ' s yanny for me
Positive Words: set()
Negative Words: set()
- - - - - - - - - -
Sentence: bored of all these manager rumors .  viera , arteta and gerrard ? are these the guys to replace rafa or mo diame ?  nufc
Positive Words: {'rumor'}
Negative Words: {'bored'}
- - - - - - - - - -
Sentence: note to self .  the cold air is not good with asthma , bronchi

In [None]:
df["PW"] = positive_words
df["NW"] = negative_words
df

Unnamed: 0,text,class,PW,NW
0,why do small shouldered tiny guys wear huge t ...,notsarc,"{tiny, huge, shirt, wear}",{}
1,"good morning , please go and vote ! it only t...",notsarc,"{turnout, good, party, victory}",{low}
2,is it even christmas if there isn ’ t a fight ...,sarc,"{christmas, wrist, fight}",{broken}
3,helping mum with her maths work for the course...,notsarc,"{great, slowly, math, work}",{mum}
4,dear customer i am sorry that the mobile phon...,notsarc,"{hope, better}","{sorry, fucked}"
...,...,...,...,...
3998,imagine that it ' s going to cost me pound to...,notsarc,"{imagine, travel, pound}",{cost}
3999,people really out here tryna argue you do not ...,notsarc,{},{argue}
4000,"and their relentless running game , on the bri...",notsarc,"{army, running}","{relentless, dangerous}"
4001,why is it that whether i get out of bed at or...,notsarc,{},{}


In [None]:
def mask_sentence(sentence, mask_words, max_mask_count = 5):
    masked_sentence = []

    for word in sentence.split():
        if word in mask_words and max_mask_count > 0:
            masked_sentence.append("<mask>")
            max_mask_count -= 1
        else:
            masked_sentence.append(word)

    return " ".join(masked_sentence)

In [None]:
def construct_masked_sentences(sentences, union_PW_SW1, union_NW_SW2):
    masked_pos_sentences = []
    masked_neg_sentences = []

    for i, sentence in enumerate(sentences):

        masked_pos_sentence = mask_sentence(sentence, union_PW_SW1[i])
        masked_pos_sentences.append(masked_pos_sentence)

        masked_neg_sentence = mask_sentence(sentence, union_NW_SW2[i])
        masked_neg_sentences.append(masked_neg_sentence)

    return masked_pos_sentences, masked_neg_sentences

In [None]:
masked_pos_sentences, masked_neg_sentences = construct_masked_sentences(text_data, positive_words, negative_words)

for i, sentence in enumerate(text_data):
    print(f"Original Sentence: {sentence}")
    print(f"Masked Positive Sentence: {masked_pos_sentences[i]}")
    print(f"Masked Negative Sentence: {masked_neg_sentences[i]}")
    print("- - - - - - - - - -")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Original Sentence: the one thing i wish i ’ d brought travelling with me is multi vitamins ! budget backpacking is not great for getting a balanced diet .
Masked Positive Sentence: the one thing i wish i ’ d <mask> <mask> with me is <mask> vitamins ! budget <mask> is not <mask> for getting a balanced diet .
Masked Negative Sentence: the one thing i wish i ’ d brought travelling with me is multi vitamins ! budget backpacking is not great for getting a balanced diet .
- - - - - - - - - -
Original Sentence: win a cyberpower  gaming pc plus  fifa   on pc .  ,  ,
Masked Positive Sentence: <mask> a cyberpower <mask> pc <mask> <mask> on pc . , ,
Masked Negative Sentence: win a cyberpower gaming pc plus fifa on pc . , ,
- - - - - - - - - -
Original Sentence: i cannot wait for halloween        
Masked Positive Sentence: i cannot wait for halloween
Masked Negative Sentence: i cannot <mask> for halloween
- - - - - - - - - -
Original

In [None]:
dfnew = pd.DataFrame({"text": text_data_original, "maskedPosSentence": masked_pos_sentences, "maskedNegSentence": masked_neg_sentences})
dfnew

Unnamed: 0,text,maskedPosSentence,maskedNegSentence
0,why do small shouldered tiny guys wear huge t ...,why do small shouldered <mask> guys <mask> <ma...,why do small shouldered tiny guys wear huge t ...
1,"good morning , please go and vote ! it only t...","<mask> morning , please go and vote ! it only ...","good morning , please go and vote ! it only ta..."
2,is it even christmas if there isn ’ t a fight ...,is it even <mask> if there isn ’ t a <mask> wi...,is it even christmas if there isn ’ t a fight ...
3,helping mum with her maths work for the course...,helping mum with her maths <mask> for the cour...,helping <mask> with her maths work for the cou...
4,dear customer i am sorry that the mobile phon...,dear customer i am sorry that the mobile phone...,dear customer i am <mask> that the mobile phon...
...,...,...,...
3998,imagine that it ' s going to cost me pound to...,<mask> that it ' s going to cost me <mask> to ...,imagine that it ' s going to <mask> me pound t...
3999,people really out here tryna argue you do not ...,people really out here tryna argue you do not ...,people really out here tryna <mask> you do not...
4000,"and their relentless running game , on the bri...","and their relentless <mask> game , on the brin...","and their <mask> running game , on the brink o..."
4001,why is it that whether i get out of bed at or...,why is it that whether i get out of bed at or ...,why is it that whether i get out of bed at or ...


## 4)
"These two masked sentences are fed into the pre-trained generation model to fulfill the generation procedure.     
𝑨{𝑎1, ..., 𝑥2, ..., 𝑥𝑛−1, ..., 𝑎𝑜 } = 𝐵𝐴𝑅𝑇 ( [𝑚]1, 𝑥2, ..., 𝑥𝑛−1, [𝑚]𝑛 )----(1)  
Thus, we will obtain two reborn sentences     
𝐴 = {𝑎1, 𝑎2, ..., 𝑎𝑜 } and     
𝐵 = {𝑏1, 𝑏2, ..., 𝑏𝑝 }."

In [None]:
%pip install transformers



In [None]:
def generate_reborn_sentences(masked_sentences):
    tokenizer = BartTokenizer.from_pretrained("facebook/bart-base")
    model = BartForConditionalGeneration.from_pretrained("facebook/bart-base")

    i = 0
    reborn_sentences = []
    for masked_sentence in masked_sentences:
        inputs = tokenizer(masked_sentence, return_tensors="pt")
        generated_encoded = model.generate(inputs['input_ids'])
        reborn_sentence = tokenizer.batch_decode(generated_encoded, skip_special_tokens=True)[0]
        reborn_sentences.append(reborn_sentence)
        i = i + 1
        if (i % 100 == 0):
            print(f'Processed {i} sentences')

    return reborn_sentences

In [None]:
from google.colab import userdata
userdata.get('HF_TOKEN')

'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'

In [None]:
import os
os.environ["HF_TOKEN"] = "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"

In [None]:
reborn_pos_sentences = generate_reborn_sentences(masked_pos_sentences)

reborn_neg_sentences = generate_reborn_sentences(masked_neg_sentences)

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]



Processed 100 sentences
Processed 200 sentences
Processed 300 sentences
Processed 400 sentences
Processed 500 sentences
Processed 600 sentences
Processed 700 sentences
Processed 800 sentences
Processed 900 sentences
Processed 1000 sentences
Processed 1100 sentences
Processed 1200 sentences
Processed 1300 sentences
Processed 1400 sentences
Processed 1500 sentences
Processed 1600 sentences
Processed 1700 sentences
Processed 1800 sentences
Processed 1900 sentences
Processed 2000 sentences
Processed 2100 sentences
Processed 2200 sentences
Processed 2300 sentences
Processed 2400 sentences
Processed 2500 sentences
Processed 2600 sentences
Processed 2700 sentences
Processed 2800 sentences
Processed 2900 sentences
Processed 3000 sentences
Processed 3100 sentences
Processed 3200 sentences
Processed 3300 sentences
Processed 3400 sentences
Processed 3500 sentences
Processed 3600 sentences
Processed 3700 sentences
Processed 3800 sentences
Processed 3900 sentences
Processed 4000 sentences
Processed

In [None]:
print("Reborn Sentences for Masked Positive Sentences:")
for i, reborn_sentence in enumerate(reborn_pos_sentences):
    print(f"Reborn Sentence {i + 1}: {reborn_sentence}")

Reborn Sentences for Masked Positive Sentences:
Reborn Sentence 1: why do small shouldered guys wear t shirts?
Reborn Sentence 2: Good morning, please go and vote! it only takes minutes and a low turnout will
Reborn Sentence 3: is it even possible if there isn ’ t a problem with neighbours and a broken
Reborn Sentence 4: helping mum with her maths homework for the course she ’ s taking and i
Reborn Sentence 5: dear customer i am sorry that the mobile phone reseller in the mall fucked you
Reborn Sentence 6: anyone interested in writing my lit book for me? can not. be. ar
Reborn Sentence 7: so the episode about ladonna was one of the most shocking and sad investigations of abuse
Reborn Sentence 8: middle aged women are bitchier than most people i know
Reborn Sentence 9: The time for tobias has arrived! i ’ ll be taking a spot of
Reborn Sentence 10: do not know how to make atmos'so limited. what do they have to
Reborn Sentence 11: sad shit. way too much shit.
Reborn Sentence 12: for anybo

In [None]:
print("\nReborn Sentences for Masked Negative Sentences:")
for i, reborn_sentence in enumerate(reborn_neg_sentences):
    print(f"Reborn Sentence {i + 1}: {reborn_sentence}")


Reborn Sentences for Masked Negative Sentences:
Reborn Sentence 1: why do small shouldered tiny guys wear huge t shirts?
Reborn Sentence 2: good morning, please go and vote! it only takes minutes and a lot of turnout
Reborn Sentence 3: is it even christmas if there isn ’ t a fight with neighbours and a
Reborn Sentence 4: helping her with her maths work for the course she ’ s taking and i
Reborn Sentence 5: dear customer i am sorry to hear that the mobile phone reseller in the mall
Reborn Sentence 6: anyone fancy writing my lit review for me? can not. be...
Reborn Sentence 7: so the episode about ladonna was one of the most poignant and powerful investigations of the
Reborn Sentence 8: middle aged women are bitchier than most people i know
Reborn Sentence 9: baby tobias has arrived! i ’ ll be taking a spot of leave but
Reborn Sentence 10: do not understand decision to make atmos'so limited. what do they have to
Reborn Sentence 11: Young. way too young.
Reborn Sentence 12: for anybody w

In [None]:
dfnew["rebornPosSentence"] = reborn_pos_sentences
dfnew["rebornNegSentence"] = reborn_neg_sentences
dfnew

Unnamed: 0,text,maskedPosSentence,maskedNegSentence,rebornPosSentence,rebornNegSentence
0,why do small shouldered tiny guys wear huge t ...,why do small shouldered <mask> guys <mask> <ma...,why do small shouldered tiny guys wear huge t ...,why do small shouldered guys wear t shirts?,why do small shouldered tiny guys wear huge t ...
1,"good morning , please go and vote ! it only t...","<mask> morning , please go and vote ! it only ...","good morning , please go and vote ! it only ta...","Good morning, please go and vote! it only take...","good morning, please go and vote! it only take..."
2,is it even christmas if there isn ’ t a fight ...,is it even <mask> if there isn ’ t a <mask> wi...,is it even christmas if there isn ’ t a fight ...,is it even possible if there isn ’ t a problem...,is it even christmas if there isn ’ t a fight ...
3,helping mum with her maths work for the course...,helping mum with her maths <mask> for the cour...,helping <mask> with her maths work for the cou...,helping mum with her maths homework for the co...,helping her with her maths work for the course...
4,dear customer i am sorry that the mobile phon...,dear customer i am sorry that the mobile phone...,dear customer i am <mask> that the mobile phon...,dear customer i am sorry that the mobile phone...,dear customer i am sorry to hear that the mobi...
...,...,...,...,...,...
3998,imagine that it ' s going to cost me pound to...,<mask> that it ' s going to cost me <mask> to ...,imagine that it ' s going to <mask> me pound t...,"""I know that it's going to cost me money to ge...",imagine that it's going to cost me pound to tr...
3999,people really out here tryna argue you do not ...,people really out here tryna argue you do not ...,people really out here tryna <mask> you do not...,people really out here tryna argue you do not ...,people really out here tryna tell me you do no...
4000,"and their relentless running game , on the bri...","and their relentless <mask> game , on the brin...","and their <mask> running game , on the brink o...","and their relentless ground game, on the brink...","and their formidable running game, on the brin..."
4001,why is it that whether i get out of bed at or...,why is it that whether i get out of bed at or ...,why is it that whether i get out of bed at or ...,"why is it that whether i get out of bed at or,...","why is it that whether i get out of bed at or,..."


# Sentences Representation
"We embed the original sentence 𝑥 and its corresponding reborn texts 𝐴 and 𝐵     
into 𝑑-dimentional embedding 𝑯𝑡 ∈ R𝑑     
via pre-trained BERT-base:     
𝑯𝑥, 𝑯𝐴, 𝑯𝐵 = 𝐵𝐸𝑅𝑇 (𝑥), 𝐵𝐸𝑅𝑇 (𝐴), 𝐵𝐸𝑅𝑇 (𝐵)."

In [None]:
def embed_sentences(sentences):
    tokenizer = AutoTokenizer.from_pretrained("princeton-nlp/sup-simcse-bert-base-uncased")
    model = AutoModel.from_pretrained("princeton-nlp/sup-simcse-bert-base-uncased")

    i = 0
    embeddings = []
    for sentence in sentences:
        inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True)
        with torch.no_grad():
            outputs = model(**inputs).last_hidden_state.mean(dim=1)
        embeddings.append(outputs)
        i = i + 1
        if (i % 100 == 0):
            print(f'Processed {i} sentences')

    return torch.stack(embeddings)

In [None]:
x_embeddings = embed_sentences(text_data)

A_embeddings = embed_sentences(reborn_pos_sentences)

B_embeddings = embed_sentences(reborn_neg_sentences)

tokenizer_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Processed 100 sentences
Processed 200 sentences
Processed 300 sentences
Processed 400 sentences
Processed 500 sentences
Processed 600 sentences
Processed 700 sentences
Processed 800 sentences
Processed 900 sentences
Processed 1000 sentences
Processed 1100 sentences
Processed 1200 sentences
Processed 1300 sentences
Processed 1400 sentences
Processed 1500 sentences
Processed 1600 sentences
Processed 1700 sentences
Processed 1800 sentences
Processed 1900 sentences
Processed 2000 sentences
Processed 2100 sentences
Processed 2200 sentences
Processed 2300 sentences
Processed 2400 sentences
Processed 2500 sentences
Processed 2600 sentences
Processed 2700 sentences
Processed 2800 sentences
Processed 2900 sentences
Processed 3000 sentences
Processed 3100 sentences
Processed 3200 sentences
Processed 3300 sentences
Processed 3400 sentences
Processed 3500 sentences
Processed 3600 sentences
Processed 3700 sentences
Processed 3800 sentences
Processed 3900 sentences
Processed 4000 sentences
Processed

In [None]:
for i, sentence in enumerate(text_data):
    print(f"Embedding for Original Lowercase Sentence {i + 1} ({sentence}):")
    print(x_embeddings[i])
    print("- - - - - - - - - -")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
          3.4725e-01, -2.4786e-01, -2.7578e-01, -2.6902e-01, -6.9727e-02,
         -1.0435e-01,  2.9424e-01,  2.7911e-01, -6.7370e-02,  1.4964e-02,
         -8.9923e-02, -1.1077e-01, -6.1527e-02, -3.2078e-02,  1.0491e-01,
         -2.3540e-01, -6.5371e-02,  1.7637e-02,  7.8999e-02,  2.2984e-01,
         -2.9960e-01, -2.3310e-01,  1.8909e-01,  2.2909e-01, -6.2361e-01,
         -2.5710e-01,  4.4839e-02,  9.2926e-02, -1.2252e-01,  7.8538e-02,
         -6.5198e-02,  3.0073e-02, -1.7900e-01]])
- - - - - - - - - -
Embedding for Original Lowercase Sentence 3972 (spent the morning nursing a very hungover daughter , after she came home at some ungodly hour .  oh how times have changed ):
tensor([[ 5.3628e-02,  5.9410e-02,  5.4324e-01, -5.0224e-01,  2.3215e-02,
         -1.9734e-01,  2.9660e-01,  6.2760e-01,  2.0810e-01,  2.5437e-01,
          3.5356e-01, -2.3374e-01, -2.4814e-01,  6.0689e-01, -3.9613e-01,
          2.0445e-01,  5.

In [None]:
for i, sentence in enumerate(reborn_pos_sentences):
    print(f"Embedding for Reborn Positive Sentence {i + 1} ({sentence}):")
    print(A_embeddings[i])
    print("- - - - - - - - - -")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
         -1.9210e-01,  1.9792e-01, -1.5465e-01,  8.8301e-02, -1.2642e-01,
          1.4596e-01, -1.4046e-01,  3.6745e-01, -2.6395e-01,  4.4506e-01,
         -1.6915e-01, -4.4751e-01,  5.2419e-02, -2.5035e-01,  2.2111e-01,
          3.5708e-03, -7.0938e-02,  2.8115e-02, -9.8571e-02,  1.7957e-01,
          1.2341e-01,  5.6465e-02, -3.4853e-01, -1.4499e-01, -4.3736e-01,
          2.0762e-01,  3.9424e-02,  2.0453e-01,  2.4881e-01, -6.2714e-01,
          3.4922e-01, -2.1000e-01,  1.4135e-01,  2.3626e-01,  1.9931e-01,
         -4.9623e-01, -4.3056e-03, -2.9703e-01,  7.8587e-02, -2.2732e-01,
         -6.3072e-01, -6.3592e-02,  1.6644e-01,  1.5034e-01, -2.8242e-01,
         -2.8161e-01, -3.2655e-01,  6.1098e-01, -2.7273e-01, -2.0216e-01,
         -6.5935e-02, -3.0178e-01,  2.0147e-01,  4.6679e-02, -4.8368e-02,
         -1.3231e-01, -3.7464e-01, -6.7826e-01, -3.3808e-01,  7.1021e-02,
         -4.1296e-01, -1.0412e-01, -2.8828e-02,

In [None]:
for i, sentence in enumerate(reborn_neg_sentences):
    print(f"Embedding for Reborn Negative Sentence {i + 1} ({sentence}):")
    print(B_embeddings[i])
    print("- - - - - - - - - -")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
          4.6132e-01, -5.3686e-01, -1.4633e-01, -1.2558e-01, -4.3290e-01,
          2.2941e-03,  2.6573e-01,  3.2299e-01,  2.9147e-01,  1.8500e-01,
         -3.2248e-01,  1.9400e-01, -1.0467e-02,  2.3563e-01, -4.1244e-02,
         -1.4074e-01, -2.3866e-01,  1.2857e-01,  2.4971e-02,  2.3071e-01,
         -5.0418e-01, -2.1720e-01, -1.4548e-01,  1.6598e-01, -7.6404e-01,
         -2.7170e-01,  2.0507e-02,  2.0171e-02,  2.0353e-01, -1.7305e-01,
          1.3793e-01, -9.8937e-03, -2.1236e-01]])
- - - - - - - - - -
Embedding for Reborn Negative Sentence 3972 (spent the morning nursing a very sick daughter, after she came home at some point):
tensor([[-3.6182e-01, -1.0838e-01,  2.2165e-01, -2.8312e-01, -2.2319e-01,
          2.6743e-01,  5.3863e-01,  2.1642e-01,  1.8862e-01,  6.3483e-02,
          3.2252e-01, -1.8719e-01,  6.4171e-02,  6.1693e-01, -1.3156e-01,
          1.0946e-01,  3.2220e-01,  7.6147e-02, -3.6017e-01, -2.5184e-

In [None]:
dfnew["xEmbedding"] = x_embeddings.tolist()
dfnew["AEmbedding"] = A_embeddings.tolist()
dfnew["BEmbedding"] = B_embeddings.tolist()
dfnew

Unnamed: 0,text,maskedPosSentence,maskedNegSentence,rebornPosSentence,rebornNegSentence,xEmbedding,AEmbedding,BEmbedding
0,why do small shouldered tiny guys wear huge t ...,why do small shouldered <mask> guys <mask> <ma...,why do small shouldered tiny guys wear huge t ...,why do small shouldered guys wear t shirts?,why do small shouldered tiny guys wear huge t ...,"[[0.7081335783004761, 0.1210532933473587, -0.5...","[[0.8444538712501526, 0.08848767727613449, -0....","[[0.7081335783004761, 0.1210532933473587, -0.5..."
1,"good morning , please go and vote ! it only t...","<mask> morning , please go and vote ! it only ...","good morning , please go and vote ! it only ta...","Good morning, please go and vote! it only take...","good morning, please go and vote! it only take...","[[-0.14586037397384644, -0.6124593019485474, 1...","[[0.3118084967136383, -0.19156460464000702, 0....","[[0.27934470772743225, -0.1009899377822876, 0...."
2,is it even christmas if there isn ’ t a fight ...,is it even <mask> if there isn ’ t a <mask> wi...,is it even christmas if there isn ’ t a fight ...,is it even possible if there isn ’ t a problem...,is it even christmas if there isn ’ t a fight ...,"[[0.08108999580144882, -0.23094923794269562, 0...","[[0.24495679140090942, -0.26381880044937134, 0...","[[0.07284236699342728, -0.3454526364803314, 0...."
3,helping mum with her maths work for the course...,helping mum with her maths <mask> for the cour...,helping <mask> with her maths work for the cou...,helping mum with her maths homework for the co...,helping her with her maths work for the course...,"[[0.20817291736602783, 0.24460144340991974, 0....","[[0.3260971009731293, 0.2396547496318817, 0.37...","[[0.19207440316677094, 0.06466709077358246, 0...."
4,dear customer i am sorry that the mobile phon...,dear customer i am sorry that the mobile phone...,dear customer i am <mask> that the mobile phon...,dear customer i am sorry that the mobile phone...,dear customer i am sorry to hear that the mobi...,"[[0.4123021364212036, 0.09793900698423386, 0.5...","[[0.40137946605682373, 0.4336777329444885, 0.4...","[[0.3087065815925598, 0.4243756830692291, 0.50..."
...,...,...,...,...,...,...,...,...
3998,imagine that it ' s going to cost me pound to...,<mask> that it ' s going to cost me <mask> to ...,imagine that it ' s going to <mask> me pound t...,"""I know that it's going to cost me money to ge...",imagine that it's going to cost me pound to tr...,"[[0.37059521675109863, 0.1753413826227188, 0.4...","[[0.29857173562049866, -0.04625463858246803, 0...","[[0.3948019742965698, 0.1345805823802948, 0.38..."
3999,people really out here tryna argue you do not ...,people really out here tryna argue you do not ...,people really out here tryna <mask> you do not...,people really out here tryna argue you do not ...,people really out here tryna tell me you do no...,"[[0.6794640421867371, 0.38444259762763977, -0....","[[0.6794640421867371, 0.38444259762763977, -0....","[[0.8590003848075867, 0.3202894628047943, -0.3..."
4000,"and their relentless running game , on the bri...","and their relentless <mask> game , on the brin...","and their <mask> running game , on the brink o...","and their relentless ground game, on the brink...","and their formidable running game, on the brin...","[[-0.20612402260303497, 0.10531388223171234, -...","[[-0.3634330928325653, -0.004419433884322643, ...","[[-0.5948479771614075, -0.06920871883630753, 0..."
4001,why is it that whether i get out of bed at or...,why is it that whether i get out of bed at or ...,why is it that whether i get out of bed at or ...,"why is it that whether i get out of bed at or,...","why is it that whether i get out of bed at or,...","[[0.6105952858924866, 0.4410528838634491, 0.49...","[[0.5307811498641968, 0.00047351655666716397, ...","[[0.5307811498641968, 0.00047351655666716397, ..."


# Sarcastic Utterances Detection
## 1)
"We utilize cosine similarity to measure the similarity between representations of original sentence 𝐻𝑥     
and generation texts 𝐻𝐴/𝐻𝐵.

Then we use the following equation to calculate a difference score of each sentence:     
diff = sim(𝐻𝑥, 𝐻𝐴) < 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 || sim(𝐻𝑥, 𝐻𝐵) < 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑     
where || means "or" logical operator."

In [None]:
def calculate_difference_scores(x_embeddings, A_embeddings, B_embeddings, threshold):
    i = 0
    diff_scores = []
    for x_emb, A_emb, B_emb in zip(x_embeddings, A_embeddings, B_embeddings):
        sim_Hx_HA = cosine_similarity(x_emb, A_emb)
        sim_Hx_HB = cosine_similarity(x_emb, B_emb)

        diff = (sim_Hx_HA < threshold) or (sim_Hx_HB < threshold)
        diff_scores.append(diff)
        i = i + 1
        if (i % 100 == 0):
            print(f'Processed {i} embeddings')

    return diff_scores

In [None]:
threshold = 0.755

diff_scores = calculate_difference_scores(x_embeddings, A_embeddings, B_embeddings, threshold)
diff_scores

Processed 100 embeddings
Processed 200 embeddings
Processed 300 embeddings
Processed 400 embeddings
Processed 500 embeddings
Processed 600 embeddings
Processed 700 embeddings
Processed 800 embeddings
Processed 900 embeddings
Processed 1000 embeddings
Processed 1100 embeddings
Processed 1200 embeddings
Processed 1300 embeddings
Processed 1400 embeddings
Processed 1500 embeddings
Processed 1600 embeddings
Processed 1700 embeddings
Processed 1800 embeddings
Processed 1900 embeddings
Processed 2000 embeddings
Processed 2100 embeddings
Processed 2200 embeddings
Processed 2300 embeddings
Processed 2400 embeddings
Processed 2500 embeddings
Processed 2600 embeddings
Processed 2700 embeddings
Processed 2800 embeddings
Processed 2900 embeddings
Processed 3000 embeddings
Processed 3100 embeddings
Processed 3200 embeddings
Processed 3300 embeddings
Processed 3400 embeddings
Processed 3500 embeddings
Processed 3600 embeddings
Processed 3700 embeddings
Processed 3800 embeddings
Processed 3900 embedd

[array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[ True]]),
 array([[ True]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[ True]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[ True]]),
 array([[ True]]),
 array([[ True]]),
 array([[False]]),
 array([[ True]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[ True]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[False]]),
 array([[Fal

## 2)
"Since the sarcastic utterances are influenced more than normal texts during the masking and generation procedure,     
the difference score of sarcastic texts should be greater than a non-sarcastic one.

If we have a threshold value which separates sarcastic texts and normal texts,     
we can yield the prediction 𝑦 by:     
𝑦 = I(diff)."

In [None]:
predicted_labels = [int(diff) for diff in diff_scores]
print(predicted_labels)
print(sum(predicted_labels))

[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 

  predicted_labels = [int(diff) for diff in diff_scores]


In [None]:
labels = ["sarc" if diff else "notsarc" for diff in diff_scores]
print(labels)

['notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'sarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'sarc', 'sarc', 'notsarc', 'sarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'notsarc', 'notsarc', 'sarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'sarc', 'sarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'notsarc', 'notsarc', 'sarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'sarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'notsarc', 'sarc', 'notsarc', 

In [None]:
dffinal = pd.DataFrame({"text": text_data, "class": label_data, "prediction": labels})
dffinal

Unnamed: 0,text,class,prediction
0,why do small shouldered tiny guys wear huge t ...,notsarc,notsarc
1,"good morning , please go and vote ! it only t...",notsarc,notsarc
2,is it even christmas if there isn ’ t a fight ...,sarc,notsarc
3,helping mum with her maths work for the course...,notsarc,notsarc
4,dear customer i am sorry that the mobile phon...,notsarc,notsarc
...,...,...,...
3998,imagine that it ' s going to cost me pound to...,notsarc,notsarc
3999,people really out here tryna argue you do not ...,notsarc,notsarc
4000,"and their relentless running game , on the bri...",notsarc,notsarc
4001,why is it that whether i get out of bed at or...,notsarc,notsarc


# Main Experiment Results

In [None]:
true_labels = [1 if pred == "sarc" else 0 for pred in df["class"]]
print(true_labels)
print(predicted_labels)

accuracy = accuracy_score(true_labels, predicted_labels)
precision = precision_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels)

print("Accuracy:", accuracy)
print("Precision:", precision)
print("F1 Score:", f1)

[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [None]:
conf_matrix = confusion_matrix(true_labels, predicted_labels)

print("Confusion Matrix:")
print(conf_matrix)

Confusion Matrix:
[[2774  523]
 [ 586  120]]
