**Hypothesis:** Perspective API will be less accurate in detecting hate comments that do not contain swear words, and therefore will have increased accuracy in detecting toxic comments with swear words.

**Categories**: In order to test my hypothesis, I will create three different categories for the comments and see how accurately Perspective API recognizes them as toxic. My three categories will be:


1.   Toxic comments that do **not** contain swear words
2.   Toxic comments that **do** contain swear words
3.   Non-toxic comments that are neutral





Next, I will import my CSV file into a dataframe.

In [82]:
import numpy as np
import pandas as pd

test_df = pd.read_csv("Data Bias Comments.csv")
test_df.head()

Unnamed: 0,Comments,Category,Toxicity Value,Toxic,y
0,"""I am amazed at how stupid you are it is impre...",1.0,0.85,yes,yes
1,"""I think you should consider going back to ele...",1.0,0.63,yes,yes
2,"""It shows how ignorant and uneducated you are.""",1.0,0.66,yes,yes
3,"""Quite frankly no one cares about what you hav...",1.0,0.4,yes,no
4,"""Clearly critical thinking is not your strong...",1.0,0.17,yes,no


In [83]:
print(test_df.columns)

Index(['Comments', 'Category ', 'Toxicity Value', 'Toxic', 'y'], dtype='object')



*   The "Categories" column includes numbers 1-3, with the category description being described earlier.
*   The "Toxicity Value" column is the value generated by Perspective, ranging from 0.0 to 1. Any value below 0.5 is deemed non-toxic and any value over is deemed toxic.
*   The "Toxic" column is whether or not the comment was actually toxic or not.
*   The "Model Accuracy" column is whether or not the Perspective API model correctly identified the comment as toxic or not.


Now, I will test each of the comments with Perspective API and see what the toxicity values are; it follows the same code but the inputted text differs each time. I will leave some examples of what it looks like when ran.

1. Toxic comments that do not contain swear words

In [28]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'I am amazed at how stupid you are it is impressive' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 50,
          "score": {
            "value": 0.85333383,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.85333383,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [26]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'I think you should consider going back to elementary school since your brain seems empty' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 88,
          "score": {
            "value": 0.6342749,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.6342749,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [23]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'It shows how ignorant and uneducated you are' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 44,
          "score": {
            "value": 0.65996873,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.65996873,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [24]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'Quite frankly no one cares about what you have to say' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 53,
          "score": {
            "value": 0.3974717,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.3974717,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [25]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'Clearly critical thinking is not your strong suit' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 49,
          "score": {
            "value": 0.17285149,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.17285149,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [30]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'Do you even have a brain or do you have nothing going on up there' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 65,
          "score": {
            "value": 0.60311776,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.60311776,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [38]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'why is it that everytime you speak you cause annoyance to everyone around you' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 77,
          "score": {
            "value": 0.24442963,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.24442963,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


2. Toxic comments that **do** contain swear words

In [13]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'Your arguments are so fucking stupid, its painful to hear' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 57,
          "score": {
            "value": 0.9391453,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.9391453,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [14]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'I cannot take you seriously with that level of bullshit in your arguments' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 73,
          "score": {
            "value": 0.85173553,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.85173553,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [15]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'Your ideas are ass and you need to pull your shit together' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 58,
          "score": {
            "value": 0.95048624,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.95048624,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [16]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'Your stances are genuine horseshit and you need to pick up a book' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 65,
          "score": {
            "value": 0.8299589,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.8299589,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [17]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'I am so tired of your relentless bullshitting without any substance' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 67,
          "score": {
            "value": 0.83334327,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.83334327,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [31]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'Your brain is fucking empty evidently' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 37,
          "score": {
            "value": 0.93383175,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.93383175,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [44]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'why are you so fucking annoying' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 31,
          "score": {
            "value": 0.90451443,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.90451443,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


3. Non-toxic comments that are **neutral**

In [18]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'I appreciate the effort you put into explaining your viewpoint' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 62,
          "score": {
            "value": 0.016712861,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.016712861,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [19]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'It seems that we have differing perspectives on this matter' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 59,
          "score": {
            "value": 0.018095128,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.018095128,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [20]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'You have an interesting viewpoint on this matter that I do not necessarily agree with so lets discuss this further' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 114,
          "score": {
            "value": 0.023199737,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.023199737,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [21]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'I do not necessarily agree, but I respect your opinion regardless' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 65,
          "score": {
            "value": 0.021785567,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.021785567,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


In [22]:
 from googleapiclient import discovery
import json

API_KEY = 'AIzaSyCcbfGYMQUkeiiUg9Thl_qf8piubtbc8U8'

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'Thanks for sharing your thoughts; I had not considered that angle before' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 72,
          "score": {
            "value": 0.011623607,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.011623607,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


I have ran the API for every comment and collected the given toxicity value in a spreadsheet, which includes the columns "Comment, Category, Toxicity Value, Toxic, y." The comments and their corresponding toxicity values are below:

1.   Toxic comments that do **not** contain swear words

*   "I am amazed at how stupid you are it is impressive." **0.85**
*   "I think you should consider going back to elementary school since your brain seems empty." **0.63**
*   "It shows how ignorant and uneducated you are." **0.66**
*   "Quite frankly no one cares about what you have to say." **0.40**
*   "Clearly critical thinking is not your strong suit." **0.17**
*   "Do you even have a brain or do you have nothing going on up there" **0.60**
*   "I think you should go to sleep and preferably never wake up." **0.38**
*   "Have you ever been told that your face is a little funny looking" **0.25**
*   "I am not surprised that you do not understand such simple concept" **0.18**
*   "The way you speak makes me wonder if you ever even received proper education" **0.45**
*   "You should not have been born" **0.57**
*   "You were probably dropped as a kid" **0.42**
*   "Why is it that everytime you speak it causes annoyance to everyone around you" **0.24**


2.   Toxic comments that **do** contain swear words

*   "Your arguments are so fucking stupid, it's painful to hear." **0.94**
*   "I can't take you seriously with that level of bullshit in your arguments." **0.85**
*   "Your ideas are ass and you need to pull your shit together." **0.95**
*   "Your stances are genuine horseshit and you need to pick up a book." **0.83**
*   "I'm so tired of your relentless bullshitting without any substance." **0.83**
*   "Your brain is fucking empty evidently" **0.93**
*   "I think you should go to sleep and never fucking wake up" **0.83**
*   "Have you ever been told that your face looks fucking hilarious" **0.73**
*   "I am not surprised that your dumbass does not understand such simple concepts" **0.82**
*   "The way you speak makes me wonder if you ever went to school for a damn day in your life" **0.77**
*   "Your birth was a fucking mistake" **0.92**
*   "Your mom probably dropped your ass as a kid"**0.90**
*   "Why are you so fucking annoying" **0.90**

3.   Non-toxic comments that are **neutral**


*   "I appreciate the effort you put into explaining your viewpoint." **0.02**
*   "It seems that we have differing perspectives on this matter." **0.02**
*   "You have an interesting viewpoint on this matter that I do not necessarily agree with so lets discuss this further." **0.02**
*   "I don't necessarily agree, but I respect your opinion regardless." **0.02**
*   "Thanks for sharing your thoughts; I hadn't considered that angle before." **0.01**

**Results at a Glance:**

*   Only 5 out of 13 toxic comments that did **not** contain swear words were correctly identified as toxic
*   13 out of 13 toxic comments that did include swear words were correctly identified to be toxic
*   5 out of 5 neutral comments were found to be neutral



Insights:

1.   Toxic hate comments that did **not** contain swear words were less identifiable than their swear word counterpart, as noticeable by the disparity in correct identification of the comments, which was 5/13 being identified versus 13/13.
2.   This difference in the correct identification of hate comments based on swear words leads me to think that there is a bias in Perspective API that targets comments with swear words, therefore also being able to better identify hate comments with swear words. This may become problematic as a creative commenter can leave a toxic comment but has a better chance to avoid being flagged if they can word their comment without swear words and or be creative with censoring.
3.   5 out of 5 neutral comments did not get marked as toxic which means that Perspective AI makes little error with neutral comments.
4.   I believe that the reason for this bias is that swear words are commonly used to express negative sentiment toward something so the model automatically associates them with negativity.
5.   My previously stated hypothesis was

*   "Perspective API will be less accurate in detecting hate comments that do not contain swear words, and therefore will have increased accuracy in detecting toxic comments with swear words."

I believe that the results I achieved through testing proves my hypothesis correct as the API was indeed less accurate at detecting toxicity without the presence of swear words.






In conclusion, I believe that the model needs and would extremely benefit from more training in detecting hate comments that do not contain swear words. I learned a lot throughout the process of this assignment, especially more about interacting with models and using API's. It was interesting to be able to interact with an API like this using comments that I made myself for the purpose of it and I found this to be a valuable experience. However, this raised the question to me about how these models are trained, and more specifically, how their training data is being chosen. Is it possible to suggest a dataset for the API to use to train? What standards do developers abide by to create training sets? This project also led me to think about other existing biases that may be present in Perspective API, such as contextual bias. Contextual bias and things like detecting sarcasm sound like a difficult task for a machine learning model, and I am interested in how these biases are currently being mitigated. Overall, I found this project to be a good learning and exploratory experience.

Now, I will check the model's accuracy score. I will first need to do a **feature transformation** to make the data interpretable. I will convert the binary values of the "Toxic" and "y" column to 0/1.

In [109]:
import pickle
from sklearn.metrics import accuracy_score

y_actual = [1 if Toxic == 'Yes' else 0 for Toxic in test_df['Toxic']]

y_predicted = [1 if y == 'Yes' else 0 for y in test_df['y']]

In [110]:
accuracy = accuracy_score(y_predicted,y_actual)

print (f"Accuracy of the classifier = {accuracy}")

Accuracy of the classifier = 1.0


Insight: The accuracy being 1.0 means that the model is predicting with 100% accuracy which I find very suspicious, and may possibly be due to an issue in the dataset, specifically with the "y" column.

In [106]:
category_column = test_df["Category "]

swear_indices = []
noswear_indices = []

for i in range(len(category_column)):
    if category_column[i] == 2:
       swear_indices.append(i)
    else:
        noswear_indices.append(i)

y_actual_noswear = [y_actual[i] for i in noswear_indices]
y_predicted_noswear = [y_predicted[i] for i in noswear_indices]

y_actual_swear = [y_actual[i] for i in swear_indices]
y_predicted_swear = [y_predicted[i] for i in swear_indices]

print (len(noswear_indices))
print (len(swear_indices))

21
13
