<h1> Comment Toxicity Detection </h1>
<p> In this project I will create a deep neural network which can hopefully accurately detect whether a comment made online was "toxic" or not. There are 6 categorical values that a comment can have, being "toxic", "severe toxic", "obscene", "threat", "insult" and "identity hate". </p>

<h3> Importing libraries needed and the Dataset </h3>

In [2]:
import tensorflow as tf
import os
import pandas as pd
import numpy as np
df = pd.read_csv("./archive/train.csv")
df.head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0000997932d777bf,Explanation\nWhy the edits made under my usern...,0,0,0,0,0,0
1,000103f0d9cfb60f,D'aww! He matches this background colour I'm s...,0,0,0,0,0,0
2,000113f07ec002fd,"Hey man, I'm really not trying to edit war. It...",0,0,0,0,0,0
3,0001b41b1c6bb37e,"""\nMore\nI can't make any real suggestions on ...",0,0,0,0,0,0
4,0001d958c54c6e35,"You, sir, are my hero. Any chance you remember...",0,0,0,0,0,0


<h3> Data Preprocessing </h3>

<p> I will start off by preprocessing the text data. This includes lower-casing, removing stop words, punctuation removal and then using TextVectorization, which converts words to integers to be used by the model. I will also do other steps like splitting training and testing data.</p>

In [3]:
from nltk.corpus import stopwords
from tensorflow.keras.layers import TextVectorization

stop = stopwords.words('english')

def remove_stop_words(s):
    return ' '.join(word for word in s.split() if word not in stop)

df['comment_text'] = df['comment_text'].apply(remove_stop_words)


X = df['comment_text']
y = df[df.columns[2:]].values

MAX_FEATURES = 200000 #number of words in the vocab

vectorizer = TextVectorization(max_tokens=MAX_FEATURES, output_sequence_length=1800, output_mode='int', standardize='lower_and_strip_punctuation')
vectorizer.adapt(X.values)
vectorized_text = vectorizer(X.values)
vectorized_text #each integer represents a word in the vocab

Metal device set to: Apple M1 Pro


2023-07-19 18:15:13.395363: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


<tf.Tensor: shape=(159571, 1800), dtype=int64, numpy=
array([[  591,   140,    59, ...,     0,     0,     0],
       [    1,   145,  2465, ...,     0,     0,     0],
       [  358,   378,    19, ...,     0,     0,     0],
       ...,
       [32414,  7384,   314, ...,     0,     0,     0],
       [   27,   477,    13, ...,     0,     0,     0],
       [   27,     2,    66, ...,     0,     0,     0]])>

In [4]:
dataset = tf.data.Dataset.from_tensor_slices((vectorized_text, y))
dataset = dataset.cache() #improve performance
dataset = dataset.shuffle(160000) # prevent overfitting in case data is arranged in specific way
dataset = dataset.batch(16) # each batch has 16 data points
dataset = dataset.prefetch(8) # while model works on one batch, tensorflow can preload others so theres no bottleneck

In [5]:
dataset.as_numpy_iterator().next() #view the first batch that will be fed into training model

(array([[   6,  387,    2, ...,    0,    0,    0],
        [1686,  882,  505, ...,    0,    0,    0],
        [  19, 1608,   80, ...,    0,    0,    0],
        ...,
        [1584, 7438, 1381, ...,    0,    0,    0],
        [3321,    4, 3321, ...,    0,    0,    0],
        [1140,  356,   12, ...,    0,    0,    0]]),
 array([[0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [1, 0, 1, 0, 1, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]]))

In [6]:
training_data = dataset.take(int(len(dataset) * 0.7)) #take 70% of the data to use for training
validation_data = dataset.skip(int(len(dataset)*.7)).take(int(len(dataset)*.2))
testing_data = dataset.skip(int(len(dataset)*.9)).take(int(len(dataset)*.1))

<h3> Create and Train Neural Network </h3>

In [7]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Bidirectional, Dense, Embedding

In [8]:
model = Sequential()
#create embedding layer which is able to capture relationship between words. Words closer together have more similar meaning.
#might have been better if i had used Word2Vec but i wanted to try creating my own Embedding Layer
model.add(Embedding(MAX_FEATURES+1, 32))
#bidirectional lstm is important for nlp - eg phrases like "i don't hate you". need to remember earlier words and consider both directions. 
#lstm addresses issues from traditional rnn like exploading/vanishing gradient problem.
model.add(Bidirectional(LSTM(32, activation='tanh')))
#feature extraction
model.add(Dense(128, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
#since final data for 6 categories need to be between 0 and 1
model.add(Dense(6, activation='sigmoid')) 

In [9]:
model.compile(loss="BinaryCrossentropy", optimizer="Adam")

In [10]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 32)          6400032   
                                                                 
 bidirectional (Bidirectiona  (None, 64)               16640     
 l)                                                              
                                                                 
 dense (Dense)               (None, 128)               8320      
                                                                 
 dense_1 (Dense)             (None, 256)               33024     
                                                                 
 dense_2 (Dense)             (None, 128)               32896     
                                                                 
 dense_3 (Dense)             (None, 6)                 774       
                                                        

In [11]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(
    monitor='val_loss',  # which metric to monitor.
    min_delta=0,  # minimum change to qualify as an improvement.
    patience=1,  # number of epochs with no improvement to stop training.
    verbose=1,  # print messages.
    restore_best_weights=True  # restore the best weights from the epoch with the best monitored metric.
)


history = model.fit(training_data, epochs = 3, validation_data = validation_data, callbacks = early_stopping)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<h3> Try making some predictions </h3>

<p> Using the example below - the text is detected as toxic, obscene and  an insult. However, it's not considered a threat, severely tocic or identity hate </p>

In [12]:
input_text = vectorizer("You suck! Balls!")

In [13]:
model.predict(np.expand_dims(input_text, 0)) > 0.5



array([[ True, False,  True, False,  True, False]])

In [14]:
df.columns[2:] 

Index(['toxic', 'severe_toxic', 'obscene', 'threat', 'insult',
       'identity_hate'],
      dtype='object')

<h3> Evaluating the Model's Performance </h3>

In [15]:
from tensorflow.keras.metrics import Precision, Recall, CategoricalAccuracy

precision = Precision() #lower value shows fewer false positives
recall = Recall() #lower value shows fewer false negatives
accuracy = CategoricalAccuracy() # correct predictions / total predictions

In [25]:
%%capture

for batch in testing_data.as_numpy_iterator():
    X_test_batch, y_test_batch = batch
    
    predict = model.predict(X_test_batch)
    print(predict.shape)
    y_test_batch = y_test_batch.flatten() #true values
    predict = predict.flatten() #predicted values
    
    precision.update_state(y_test_batch, predict)
    recall.update_state(y_test_batch, predict)
    accuracy.update_state(y_test_batch, predict)

In [26]:
print(f'Precision: {precision.result().numpy()}, Recall:{recall.result().numpy()}, Accuracy:{accuracy.result().numpy()}')

Precision: 0.8509753942489624, Recall:0.7661963701248169, Accuracy:0.49949848651885986


<p> The precision is a higher than recall, which means the model is making a lot of false negatives. While trying the model out, I noticed that the model rarely predicts categories like "threat" since they do not occur frequently in the training dataset. A possible improvement would be to assign a higher weight to minority categories. </p>

<h3> Sharing Model using Gradio </h3>

In [27]:
!pip install gradio jinja2



In [28]:
import gradio as gr

In [29]:
model.save("toxicity.h5")
model = tf.keras.models.load_model('toxicity.h5')

In [30]:
def score_comment(comment):
    vectorized_comment = vectorizer([comment])
    results = model.predict(vectorized_comment)
    
    text = ''
    for idx, col in enumerate(df.columns[2:]):
        text += '{}: {}\n'.format(col, results[0][idx]>0.5)
    
    return text

In [32]:
interface = gr.Interface(fn=score_comment, 
                         inputs=gr.inputs.Textbox(lines=2, placeholder='Comment to score'),
                        outputs='text')

In [33]:
interface.launch()

Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.




In [34]:
for col in df.columns[2:]:
    print(df[col].value_counts())

toxic
0    144277
1     15294
Name: count, dtype: int64
severe_toxic
0    157976
1      1595
Name: count, dtype: int64
obscene
0    151122
1      8449
Name: count, dtype: int64
threat
0    159093
1       478
Name: count, dtype: int64
insult
0    151694
1      7877
Name: count, dtype: int64
identity_hate
0    158166
1      1405
Name: count, dtype: int64
