# Gen-Z Slang Toxicity vs. Common English Slang Toxicity

In this notebook, I will use the Perspectives API using Google Cloud to test the Perspectives model for biases. Specifically, I want to test the hypothesis that the Perspective model is less likely to correctly identify toxicity levels when Gen-Z slang is used compared to when a similar phrase that uses general and widely recognized language.

### Initializing API and Toxicity value scraper

In this cell, I initialize the API client with my API key. I then create a getToxicity definition, which makes a API call to the model with the phrase I want to test for toxicity, to make testing easier. 

In [20]:
from googleapiclient import discovery
import json

API_KEY = 'AIzaSyDmXOHBwxWRcy2wO-D91_ZRIvGp3H-_ef8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

    
def getToxicity(phrase):
  analyze_request = {
    'comment': { 'text': f'{phrase}' },
    'requestedAttributes': {'TOXICITY': {}}
  }
  jsonResponse = client.comments().analyze(body=analyze_request).execute()
  
  val = float(jsonResponse['attributeScores']['TOXICITY']['summaryScore']['value'])
  
  if val < 0.5:
        toxicResult = "Non-Toxic"
  elif val > 0.5:
        toxicResult = "Toxic"
  else:
        toxicResult = "Inconclusive"
  
  return f"[{phrase}] " + str(val) + " " + toxicResult

### Gathering predicted toxicity data

In this cell, I use various queries all developed organically that are sent to the Perspectives model to be assessed. 

In [25]:
print(getToxicity('Thats a big yikes from me.')) #Common Gen-Z Toxic Slang
print(getToxicity('Thats a stupid idea.')) #Common Toxic English
print("\n")

print(getToxicity('This song is so ass.'))  #Gen-Z Toxic Slang
print(getToxicity('I detest this melody.')) #Common Toxic English Slang
print("\n")

print(getToxicity('This grub is straight-up so horrible.'))
print(getToxicity('This food is outright repulsive.'))
print("\n")

print(getToxicity('Homies a legit clown.'))
print(getToxicity('Hes a bona fide buffoon.'))
print("\n")

print(getToxicity('Theyre not just late, theyre last century late.'))
print(getToxicity('Their chronic lateness is a flagrant defiance of time.'))
print("\n")

print(getToxicity('That movie was so bad omfg.'))
print(getToxicity('That cinematic disaster is a masterclass in tedium.'))
print("\n")

print(getToxicity('Bros got an ego the size of a planet, no cap ong fr.'))
print(getToxicity('Their egomania is a colossal spectacle of self-obsession.'))
print("\n")

print(getToxicity('Bro is so stupid hes like the punching bag of the group'))
print(getToxicity('Hes the grandmaster of fools, without a doubt.'))
print("\n")

print(getToxicity('Bro is allergic to being a good person'))
print(getToxicity('He always refrains from being a good person.'))
print("\n")

print(getToxicity('Yo mama so fat'))
print(getToxicity('Their mother is quite obese.'))
print("\n")

[Thats a big yikes from me.] 0.05651112 Non-Toxic
[Thats a stupid idea.] 0.6667219 Toxic


[This song is so ass.] 0.85333383 Toxic
[I detest this melody.] 0.33504707 Non-Toxic


[This grub is straight-up so horrible.] 0.4475325 Non-Toxic
[This food is outright repulsive.] 0.31963667 Non-Toxic


[Homies a legit clown.] 0.4977744 Non-Toxic
[Hes a bona fide buffoon.] 0.51980776 Toxic


[Theyre not just late, theyre last century late.] 0.02779579 Non-Toxic
[Their chronic lateness is a flagrant defiance of time.] 0.14429577 Non-Toxic


[That movie was so bad omfg.] 0.50503236 Toxic
[That cinematic disaster is a masterclass in tedium.] 0.103060275 Non-Toxic


[Bros got an ego the size of a planet, no cap ong fr.] 0.2524482 Non-Toxic
[Their egomania is a colossal spectacle of self-obsession.] 0.2854835 Non-Toxic


[Bro is so stupid hes like the punching bag of the group] 0.8403191 Toxic
[Hes the grandmaster of fools, without a doubt.] 0.6588125 Toxic


[Bro is allergic to being a good person]

In [26]:
print(getToxicity('Thats totally lit!')) # Gen-Z Positive Slang
print(getToxicity('What a wonderful idea!')) # Common Positive English
print("\n")

print(getToxicity('This bop slaps!')) # Gen-Z Positive Slang
print(getToxicity('This tune is delightful.')) # Common Positive English
print("\n")

print(getToxicity('Fam, youre looking snatched!')) # Gen-Z Positive Slang
print(getToxicity('Youre looking exceptionally sharp today.')) # Common Positive English
print("\n")

print(getToxicity('Your vibe is so dope.')) 
print(getToxicity('Your presence is very pleasant.')) 
print("\n")

print(getToxicity('This is the tea, sis!')) 
print(getToxicity('This is the absolute truth, my friend!')) 
print("\n")

print(getToxicity('Squad goals asf')) 
print(getToxicity('A truly admirable team effort.')) 
print("\n")

print(getToxicity('Your fit is on point!')) 
print(getToxicity('Your outfit is quite stylish.')) 
print("\n")

print(getToxicity('Keep it 100, always.')) 
print(getToxicity('Always stay true to yourself.')) 
print("\n")

print(getToxicity('We stan a queen!')) 
print(getToxicity('She is an exemplary leader.')) 
print("\n")

print(getToxicity('This content is fire!')) 
print(getToxicity('This material is outstanding.')) 
print("\n")

[Thats totally lit!] 0.042657252 Non-Toxic
[What a wonderful idea!] 0.011246625 Non-Toxic


[This bop slaps!] 0.20973456 Non-Toxic
[This tune is delightful.] 0.027324399 Non-Toxic


[Fam, youre looking snatched!] 0.2922276 Non-Toxic
[Youre looking exceptionally sharp today.] 0.046368107 Non-Toxic


[Your vibe is so dope.] 0.4394682 Non-Toxic
[Your presence is very pleasant.] 0.049336795 Non-Toxic


[This is the tea, sis!] 0.046120718 Non-Toxic
[This is the absolute truth, my friend!] 0.019980038 Non-Toxic


[Squad goals asf] 0.18364124 Non-Toxic
[A truly admirable team effort.] 0.007351146 Non-Toxic


[Your fit is on point!] 0.017466826 Non-Toxic
[Your outfit is quite stylish.] 0.054779384 Non-Toxic


[Keep it 100, always.] 0.015644746 Non-Toxic
[Always stay true to yourself.] 0.025910228 Non-Toxic


[We stan a queen!] 0.01017851 Non-Toxic
[She is an exemplary leader.] 0.014011159 Non-Toxic


[This content is fire!] 0.19219314 Non-Toxic
[This material is outstanding.] 0.02543884 Non-To

### Formatting and Transforming Data

In this cell, I transform the four categories into four variables, and then map the toxic and non-toxic parameters into 1s and 0s respectively. This ensures that the accuracy method in the next step is able to understand the data.

In [23]:
import pandas as pd
predictedVsActualData = pd.read_csv("predictedvsactual.csv")

PvAData_AGen = predictedVsActualData['Actual Toxicity'].iloc[20:40].reset_index(drop=True)
PvAData_ACom = predictedVsActualData['Actual Toxicity'].iloc[0:20].reset_index(drop=True)
PvAData_PGen = predictedVsActualData['Predicted Toxicity'].iloc[20:40].reset_index(drop=True)
PvAData_PCom = predictedVsActualData['Predicted Toxicity'].iloc[0:20].reset_index(drop=True)

PvAData_AGen = PvAData_AGen.map({'Toxic': 1, 'Non-Toxic': 0})
PvAData_ACom = PvAData_ACom.map({'Toxic': 1, 'Non-Toxic': 0})
PvAData_PGen = PvAData_PGen.map({'Toxic': 1, 'Non-Toxic': 0})
PvAData_PCom = PvAData_PCom.map({'Toxic': 1, 'Non-Toxic': 0})

### Analyzing Accuracy and Finalizing Results

In this cell, I use a accuracy method to determine how accurate the Perspective model is at determining the correct toxicity level. I then use print statements to organize the outputs in a clear way. 

In [24]:
def class_wise_acc(y_actual, y_predicted):
    total_p = 0
    total_n = 0
    TP = 0
    TN = 0
    for i in range(len(y_predicted)):
        if y_actual[i] == 1:
            total_p += 1
            if y_actual[i] == y_predicted[i]:
                TP += 1
        elif y_actual[i] == 0:
            total_n += 1
            if y_actual[i] == y_predicted[i]:
                TN += 1
                
    TP_rate = TP / total_p if total_p else 0
    TN_rate = TN / total_n if total_n else 0
    
    return (TP_rate, TN_rate)

accuracyToxic_Com, accuracyNonT_Com = class_wise_acc(PvAData_ACom, PvAData_PCom)
accuracyToxic_Gen, accuracyNonT_Gen = class_wise_acc(PvAData_AGen, PvAData_PGen)

print("Percent of toxic english phrases correctly identified: " + str(accuracyToxic_Com * 100))
print("Percent of toxic GenZ phrases correctly identified: " + str(accuracyToxic_Gen * 100))
print("Percent of non-toxic english phrases correctly identified: " + str(accuracyNonT_Com * 100))
print("Percent of non-toxic Gen-Z phrases correctly identified: " + str(accuracyNonT_Gen * 100))

Percent of toxic english phrases correctly identified: 30.0
Percent of toxic GenZ phrases correctly identified: 40.0
Percent of non-toxic english phrases correctly identified: 100.0
Percent of non-toxic Gen-Z phrases correctly identified: 100.0


These numbers mean that 30% of the common toxic english phrases were correctly identified as toxic, and 100% of the positive common english phrases were correctly identified as non-toxic. 40% of the common toxic Gen-Z phrases were correctly identified as toxic, and 100% of the positive Gen-Z phrases were correctly identified as non-toxic. 