<a href="https://colab.research.google.com/github/ykim71/google_toxicity/blob/main/google_toxicity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Load File (can be done locally)

In [None]:
"""
Run this code and it will bring your Google account access permission. 
This gives Colab direct access for any files in your Google Drive.
"""
from google.colab import drive

drive.mount("/content/drive")

Mounted at /content/drive


In [None]:
"""
This code changes your working directory that Colab is on. I created and set 'toxicity' folder in my Google Drive. 
I can load and save data on the 'toxicity' folder in my Google Drive
"""
%cd drive/'MyDrive'/toxicity/

/content/drive/MyDrive/toxicity


## Upload file from local

In [None]:
"""
Or, run this code to upload your file directly to Colab. 
"""

from google.colab import files

uploaded = files.upload()


## load your file to Colab

In [None]:
"""
load your file on Colab using following code. Replace 'sample_code_review.csv' with your file name. 
I set my data as 'sample_text' so you can replce it other name. 
"""

import pandas as pd

sample_text = pd.read_csv('sample_code_review.csv')


# Perpective API toxicity 



> Language Attributes: https://developers.perspectiveapi.com/s/about-the-api-attributes-and-languages



> API Request: https://developers.perspectiveapi.com/s/docs-get-started (note UT Google Account may not work; recommend using personal Google account for request)



In [None]:
"""
load packages/libraries
"""
from googleapiclient import discovery
from googleapiclient.errors import HttpError


In [None]:
"""
Add your API here;
"""
API_KEY='YOUR-API-KEY'


In [None]:
"""
Run this code if you want to analyze text data 4 measures of Toxicity, Likely to reject, Insult, and Identity Attact. 
See below comments for other variables and descriptions in detail.

"""
# variable descriptions: https://github.com/conversationai/perspectiveapi
# you can replace toxicity attributes here:
analyze_request = {
   'comment': { 'text': 'xx'}, # setting formats (id, text)
   'requestedAttributes': {'TOXICITY@6': {}, # see the actual variable name from the Perspective API page
                           'LIKELY_TO_REJECT@2': {}, 
                           'INSULT': {}, 
                           'IDENTITY_ATTACK': {} 
                           },
   'doNotStore': True, # for other settings, https://developers.perspectiveapi.com/s/about-the-api-methods
}



In [None]:
"""
take random 3 samples to see if data has loaded successfully; 'text' is the column that you want to analyze.
"""

sample_text.sample(3)

Unnamed: 0,uid_sample,text
14,1483,Feel good piece. Father And Son Graduate Moreh...
0,921,A little bit of American truth? Mike Pence: Th...
18,1498,"Left Action Fight back, and tell Boehner & M..."


In [None]:
"""
Assign your column name (that contain text data you want to analyze) in the code
"""
import csv
import codecs
import json
import time
import pandas as pd

# setting attributes, can add more attiributes 

service = discovery.build('commentanalyzer', 'v1alpha1', developerKey=API_KEY)

start = time.time()

comments_toxicity_list = []
comments_reject_list = []
comments_insult_list = []
comments_identity_list = []

"""
HERE is the code you need to chage. My data file name is 'sample_text' and the column name is 'text'. 
You can change your file and the text column name here.  
For other example, if your data name is 'df' and the text columne name is 'comment', 
the first line of following code is supposed to be:

for i in df.comment.values.tolist(): 

Once you change this code, run this code block.
"""

for i in sample_text.text.values.tolist(): 
  analyze_request['comment']['text'] = i
  
  try:
    response = service.comments().analyze(body=analyze_request).execute()
    i = json.loads(json.dumps(response, indent=2))
    
    comments_toxicity = i['attributeScores']['TOXICITY@6']['summaryScore']['value']
    comments_reject = i['attributeScores']['LIKELY_TO_REJECT@2']['summaryScore']['value']
    comments_insult = i['attributeScores']['INSULT']['summaryScore']['value']
    comments_identity = i['attributeScores']['IDENTITY_ATTACK']['summaryScore']['value']
        
  except HttpError:
    comments_toxicity = "error"
    comments_reject = "error"
    comments_insult = "error"
    comments_identity = "error"
            
  comments_toxicity_list.append(comments_toxicity)
  comments_reject_list.append(comments_reject)
  comments_insult_list.append(comments_insult)
  comments_identity_list.append(comments_identity)
        
sample_text = sample_text.join(pd.DataFrame({'toxicity': comments_toxicity_list, 
                                             'reject': comments_reject_list, 
                                             'insult': comments_insult_list, 
                                             'attack': comments_identity_list}))

end=time.time()
print("complete time: ", round(end -start, 2))

complete time:  0.88


In [None]:
"""
take random 3 samples to see if data has computed successfully;
"""
sample_text.sample(3)

Unnamed: 0,uid_sample,text,toxicity,reject,insult,attack
0,921,A little bit of American truth? Mike Pence: Th...,0.277589,0.836571,0.118428,0.213282
16,1486,"During an interview with CNBC on Wednesday, Tr...",0.041074,0.331779,0.126426,0.021339
20,1819,Gonna leave this one right here AFROPUNK üôà...,0.153451,0.973788,0.262546,0.448278


In [None]:
"""
Save the data to your Google Drive OR the Colab environment. 
You can also find your data at the folder icon at the left side and download it.
"""

sample_text.to_csv('toxicity.csv')