# Artificial Intelligence for Complex Problems

## How to build a profanity detector: Two approaches

The best way to see how AI works in practice is to try it. In this class, we're going to look at a classic use case for machine learning: detecting profanity in written text. First, we'll try to do this using *without* using machine learning. Then we'll try it using machine learning and see if theres's a difference.

In [None]:
!git clone https://github.com/texturejc/AI_for_complex_problems


In [None]:

#Codde to import relevant python libraries
import pandas as pd #data science library
import plotly.express as px #visualisation library
import plotly.graph_objects as go
import scipy #scientific computing library
from scipy.spatial import distance
import random

#Linguistic data that we'll need
data = pd.read_csv('/content/AI_for_complex_problems/VAD_rescale.csv', index_col = 'word')

## Approach 1: Creating a hypothesis concerning the nature of profanity and using that to predict what words are likely to count as profane

Exercise: everyone take five minutes to come up with the linguistic features of profanity.

In [None]:
data = data[['V.Mean.Sum', 'A.Mean.Sum', 'D.Mean.Sum']]
data.columns = ['valence', 'arousal', 'dominance']

In [None]:
emotions = ['sadness', 'anger', 'happiness', 'depression', 'disgust', 'fear', 'surprise', 'hysteria'] #add to list

In [None]:
emo_words = []

for i in emotions:
    if i in data.index:
        emo_words.append(i)

emo_df = data.loc[emo_words]

In [None]:
emo_df

In [None]:

fig = px.scatter_3d(emo_df, x='valence', y='arousal', z='dominance',
               hover_data = [emo_df.index])

fig.update_traces(marker=dict(size = 8, line=dict(width=2,
                                        color='DarkSlateGrey')),
                  selector=dict(mode='markers'))

fig.show()

In [None]:
from IPython.display import IFrame
IFrame(src='https://texturejc.github.io/qual1a/all_vad_words.html', width=1050, height=1050)

In [None]:
bad_words = ['fuck', 'shit', 'asshole', 'dickhead', 'moron']

sample = ['confident', 'christmas', 'kitten', 'prize', 'apple']

test_words = bad_words + sample

test_df = data.loc[test_words]

In [None]:
fig = px.scatter_3d(test_df, x='valence', y='arousal', z='dominance',
               hover_data = [test_df.index])

fig.update_traces(marker=dict(size = 9, line=dict(width=2,
                                        color='DarkSlateGrey')),
                  selector=dict(mode='markers'))

for i in bad_words:
    for j in bad_words:
        fig.add_trace(
            go.Scatter3d(
            x=[test_df.loc[i]['valence'], test_df.loc[j]['valence']],
            y=[test_df.loc[i]['arousal'], test_df.loc[j]['arousal']],
            z=[test_df.loc[i]['dominance'], test_df.loc[j]['dominance']],
            mode='lines',
            line=dict(color='red', width=2),
            )
            )


for i in sample:
    for j in sample:
        fig.add_trace(
            go.Scatter3d(
            x=[test_df.loc[i]['valence'], test_df.loc[j]['valence']],
            y=[test_df.loc[i]['arousal'], test_df.loc[j]['arousal']],
            z=[test_df.loc[i]['dominance'], test_df.loc[j]['dominance']],
            mode='lines',
            line=dict(color='blue', width=2)
            )
            )

for i in bad_words:
    for j in sample:
        fig.add_trace(
            go.Scatter3d(
            x=[test_df.loc[i]['valence'], test_df.loc[j]['valence']],
            y=[test_df.loc[i]['arousal'], test_df.loc[j]['arousal']],
            z=[test_df.loc[i]['dominance'], test_df.loc[j]['dominance']],
            mode='lines',
            line=dict(color='green', width=2)
            )
            )
fig.update_traces(showlegend=False)

fig.show()

In [None]:
dist_1 = distance.euclidean(data.loc['asshole'], data.loc['confident'])
dist_2 = distance.euclidean(data.loc['asshole'], data.loc['dickhead'])

print(dist_1, dist_2)

In [None]:
bad_word = 'fuck'
words = []
dist = []


for i in data.index:
    dist.append(distance.euclidean(data.loc[i], data.loc[bad_word]))
    words.append(i)

profane = pd.Series(dist, index = words)


In [None]:
profane = profane.sort_values()

In [None]:
prof_small = profane.head(n = 100)

In [None]:
fig = px.scatter(x = prof_small.index, y = prof_small)
fig.layout.yaxis.title = "distance from {}".format(bad_word)
fig.layout.xaxis.title = "word"
fig.show()


### What are the problems of this approach?

## Approach 2: Training a machine learning algorithm to predict profanity

Exercise: everyone take five minutes to come up with the linguistic features of profanity.

In [None]:
import numpy as np
import gensim
import gensim.downloader as api
import seaborn as sns
sns.set()

from sklearn.metrics import precision_score, f1_score, recall_score, accuracy_score, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

## Logistic regression

Linear regression based machine learning methods are used to predict numerical outputs. That is, based on the paramteters estimated from its training data, the algorithm takes a series of inputs and predicts the estimated value of their output. However, whilst this method is very useful, there are many situations where we want to predict a *category* rather than a number. For example, we may wish to estimate whether the data predicts whether or not test is passed based on hours studied, or what species of animal is represented based on features in an image, or whether a customer is likely to make a purchase based on their browsing history. This is like regression modelling, except that the numbers in the data are used to predict a categorical rather than a numerical output.

Several machine learning algorithms exist for dealing with this situation. We're going to look at one of these––logistic regression––and explore how it can be used for predicting binary categories. This is when we have two outputs (pass or fail, cat or dog, sale or not-sale), with these outcomes represented as $0$ and $1$. It can also be adapted to deal with more than two categories, but we won't be looking at that case.

The logic of binary logistic regression works as follows. The model uses a function called the logistic function to map the outputs of a regression into the range $(0,1)$. This is useful, because we can then interpret the results of the regression model as probabilities: if they are greater than $0.5$, the model predicts success; if they are less, the model predicts failure. In more detail, the logistic function has the following form:

$$p(x) = \frac{1}{1+e^{-t}}$$

As you can see, the logistic function compresses its output into the $0$ to $1$ range:

In [None]:
t = np.arange(-10, 10, 0.1)

def logistic(x):
    y = 1/(1+np.exp(-x))
    return y

sns.lineplot(x = t, y = [logistic(i) for i in t])

## Logistic regression in detail

The key to understanding logistic regression comes with interpreting the parameter $t$. In logistic regression with a predictor variable $x$ and a predicted variable $y$, this corresponds to the equation for linear regression:

$$y = \beta_{0} + \beta_{1}x$$

where $\beta_{0}$ is the $y$ intercept and $\beta_{1}$ is the slope of the line of best fit.

That is:

$$p(x) = \frac{1}{1+e^{-(\beta_{0} + \beta_{1}x)}}$$

But how should we interpret this? To understand what's happening, we need to know about *log odds*. The log odds of an event is simply the logarithm of probility of that event occurring divided by the probability of it not occurring. That is, the log odds of an event of proability $p$ is:

$$\ln{\frac{p}{1-p}}$$

Imagine that in a class of students, the probability of a student passing an exam after attending 90% of classes is 0.75. The log odds of a student passing is therefore:

$$\ln{\frac{0.75}{0.25}} = 1.09$$

If this number is large, there is a high probability of success; if it is small, there is a low probability of success.

For every value taken by the predictor variable $x$ (the percentage of classes attended) the log odds of a student passing can be calcualted. And because the log odds come in the form of a continuous number than a category, they can then be estimated using a standard linear regression model:

$$y =\ln{\frac{p(1)}{p(0)}} = \beta_{0} + \beta_{1}x$$

where $1$ represents passing the exam and $0$ represents failing. By estimating the $\beta_{0}$ and $\beta_{1}$ parameters and inserting them into the logistic function, we can therefore 'squash' our regression outputs into the (0,1) range, and interpret the results as probabilities. When greater than 0.5, the event is classed as one category (1 = pass) and less than 0.5 it's classed as another (0 = fail).

## Evaluating model performance on categorical data

 Four metrics are typically used to assess model performance when there are categorical predictors: *precision*, *recall*, *accuracy*, and the *F1 score*. These all take values between $0$ and $1$, where $1$ is a perfect score.

1. Precision: The precision score of a model measures how many of the positive predictions made by the model are in truth positive predictions. It's calculated as follows, where $TP$ means 'True Positive' and $FP$ means 'False Positive':

$$Precision = \frac{TP}{TP+FP}$$

2. Recall: The recall score of a model measures what fraction of the true positive cases the model manages to predict. It's calculated as follows, where $FN$ means 'False Negative':

$$Recall = \frac{TP}{TP+FN}$$

3. Accuracy: The accuracy score of model measures what fraction of correct predictions the model makes relative to the total number of predictions it makes. It's calculated as follows:

$$Accuracy = \frac{TN+TP}{TN+FP+TP+FN}$$

4. F1: The F1 score of a model is the harmonic mean of precision and recall. It's used because it gives a way of capturing both metrics in a single score. It's calculated as follows:

$$F1 = \frac{2TP}{2TP+FP+FN}$$

Which metric is chosen to evaluate the model depends on the purpose it's being used for. But in practice, the $F1$ is usually the best measure to use, as it gives a more rounded appreciation of model performance.


## Logistic regression and NLP

There are lots of situations in NLP where logistic regression is useful. For instance, we might want to classify whether or not an email is spam based on the language used, or classify comments as being toxic or not. Here, we're going to build a profanity detector. That is, we will train a logistic regression classifier to evaluate whether or not a word is a profanity, with 'profanity' being understood to include terms of racial, sexual, religious, and other forms of abuse. We will do this in the following way:

1. Obtain a list of profanity words and non-profanity words and label them, where $1$ denotes a profanity and $0$ a non-profanity.
2. Get word embeddings for these words using a pre-trained model from Twitter.
3. Train a logistic regression model on a fraction of this dataset.
4. Evaluate our model's performance on a the retained test sample.
5. See how our model performs in the wild.

In [None]:
model = api.load("glove-twitter-200")

In [None]:
profanity_list = pd.read_csv("/content/AI_for_complex_problems/profanity_en.csv")



In [None]:
profanity_df = pd.DataFrame()
profanity_df['word'] = [i for i in profanity_list['text'] if i in model.key_to_index]
profanity_df['category'] = 1

non_profanity = random.sample([i for i in data.index], len(profanity_df))
notprofanity_df = pd.DataFrame()
notprofanity_df['word'] = [i for i in non_profanity if i in model.key_to_index]
notprofanity_df['category'] = 0

dataset = pd.concat([profanity_df, notprofanity_df], axis = 0).reset_index(drop = True)



In [None]:
vectors = []

for i in dataset['word']:
    vectors.append(model[i])

vecs_df = pd.DataFrame(vectors)
dataset = pd.concat([dataset, vecs_df], axis = 1)
dataset = dataset.reset_index(drop = True)
dataset = dataset.drop('word', axis = 1)

In [None]:
dataset

In [None]:
X = dataset.drop(['category'], axis = 'columns')
y = dataset['category']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
clf = LogisticRegression(random_state=0).fit(X_train, y_train) #This fits the model to the training data
preds = clf.predict(X_test) # This outputs the model predictions on the withheld test data

In [None]:
print(classification_report(y_test, preds))

In [None]:
test_cases = ['snot', 'scum', 'fuck', 'turd', 'piss', 'commie', 'nazi', 'bloody', 'peasant', 'bigot', 'apple',\
              'zebra', 'cloud', 'christian', 'muslim', 'jew', 'slut', 'hound', 'merde', \
              'scheisse', 'rando', 'mierda', 'mutt', 'sexy', 'angel', 'devil']

profanity_words = [i.lower() for i in profanity_list['text']]

test_cases = [i for i in test_cases if i not in profanity_words]

In [None]:
in_data = []
tests = []
confidence = []

for i in test_cases:
    try:
        tests.append(clf.predict(model[i.lower()].reshape(1, -1))[0])
        confidence.append(clf.predict_proba(model[i.lower()].reshape(1, -1))[0].max())
        in_data.append(i)
    except:
        pass

tests_w = []
for i in tests:
    if i == 0:
        tests_w.append('not profanity')
    elif i == 1:
        tests_w.append('profanity')
tests_df = pd.DataFrame()
tests_df['word'] = in_data
tests_df['prediction'] = tests_w
tests_df['confidence'] = confidence

In [None]:
tests_df