There are some Neural Network architectural choices I will be using to guide my decision in building this model.
My first choice of NNDL algorithm will be the Long/Short Term Memory.LSTM networks introduce a memory cell. They can process data with memory gaps.
Because we have a large number of relevant data, and we want to find out relevant data from it, then LSTMs is the way to go. Due to the nature of this multiclass multilabel problem: the last layer activation of the NN must be Sigmoid and loss function is binary_crossentropy. I will attempt different optimizers and metric to find the best model.

In [57]:
#Libaries 
import sys, os, re, csv, codecs, numpy as np, pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation
from keras.layers import Bidirectional, GlobalMaxPool1D
from keras.models import Model
from keras import initializers, regularizers, constraints, optimizers, layers
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

In [None]:
#Colab specific code to set up environment and reading in csv file
## Code piece to mount my Google Drive
from google.colab import drive 
drive.mount("/content/drive", force_remount=True)
# See the list of files in this local folder 
!ls -l '/content/drive/My Drive/Colab Notebooks'
# Change the working directory to Colab Notebooks
import os
os.chdir('/content/drive/My Drive/Colab Notebooks')

# Ensure the files are there by listing
!ls -l

In [59]:
#Reading in the train.csv
train= pd.read_csv('/content/drive/My Drive/Colab Notebooks/train.csv')

#Reading in the test.csv
test= pd.read_csv('/content/drive/My Drive/Colab Notebooks/test.csv')


**Pre-Processing Data**

In [60]:
#Labels to be classified
list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
y = train[list_classes].values
list_sentences_train = train["comment_text"]
list_sentences_test = test["comment_text"]

In [61]:
#Tokenization - Breaking down sentences into words or this case tokens 
#Indexing - We put the words in a dictionary-like structure and give them an index each For eg, {1:"I",2:"love",3:"cats",4:"and",5:"dogs"}
#Index Representation- will represent the sequence of words in the comments in the form of index, and feed this chain of index into our LSTM. For eg, [1,2,3,4,2,5]

#limiting the number of features to 10000
max_features = 10000
tokenizer = Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(list(list_sentences_train))
list_tokenized_train = tokenizer.texts_to_sequences(list_sentences_train)
list_tokenized_test = tokenizer.texts_to_sequences(list_sentences_test)

In [62]:
list_tokenized_train[:1]


[[688,
  75,
  1,
  126,
  130,
  177,
  29,
  672,
  4511,
  1116,
  86,
  331,
  51,
  2278,
  50,
  6864,
  15,
  60,
  2756,
  148,
  7,
  2937,
  34,
  117,
  1221,
  2825,
  4,
  45,
  59,
  244,
  1,
  365,
  31,
  1,
  38,
  27,
  143,
  73,
  3462,
  89,
  3085,
  4583,
  2273,
  985]]

 I will be using padding. I could make shorter sentences as long as the others by filling the shortfall by zeros and also have to trim the longer ones to the same length(maxlen) as the short ones. 


In [63]:
#I have set the max length to be 200.
maxlen = 200
X_t = pad_sequences(list_tokenized_train, maxlen=maxlen)
X_te = pad_sequences(list_tokenized_test, maxlen=maxlen)

In [64]:
totalNumWords = [len(one_comment) for one_comment in list_tokenized_train]


In [65]:
 #maxlen=200 as defined earlier
 inp = Input(shape=(maxlen, ))


In [66]:
embed_size = 128
#first layer of the model
x = Embedding(max_features, embed_size)(inp)
#second layer
x = LSTM(60, return_sequences=True,name='lstm_layer')(x)
#third layer
x = GlobalMaxPool1D()(x)
#fourth layer
x = Dropout(0.1)(x)
#fifth layer
x = Dense(50, activation="relu")(x)
#sixth layer
x = Dropout(0.1)(x)
#Last Layer of Activation must be sigmoid
x = Dense(6, activation="sigmoid")(x)



In [67]:
#Initial Model with adam as optimizer and accuracy as metric
model = Model(inputs=inp, outputs=x)
#Loss function is binary_crossentropy
model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

In [68]:
#Number of training example in a single batch
batch_size = 32
#Training on two epoch
epochs = 2
model.fit(X_t,y, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7ff1929395c0>

In [69]:
#Model two has SGD as optimizer and not adam
model2 = Model(inputs=inp, outputs=x)
#Loss function is binary_crossentropy
model2.compile(loss='binary_crossentropy',
                  optimizer='SGD',
                  metrics=['accuracy'])

In [70]:
batch_size = 32
epochs = 2
model2.fit(X_t,y, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7ff1ff9df828>

In [71]:
#Model three has SGD as optimizer since and AUC as metric
model3 = Model(inputs=inp, outputs=x)
#Loss function is binary_crossentropy
model3.compile(loss='binary_crossentropy',
                  optimizer='SGD',
                  metrics=['AUC'])

In [72]:
batch_size = 32
epochs = 2
model3.fit(X_t,y, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7ff1fe4a1668>

In [73]:
#Model foour has adam as optimizer and AUC as metric
model4 = Model(inputs=inp, outputs=x)
#Loss function is binary_crossentropy
model4.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['AUC'])


In [56]:
batch_size = 32
epochs = 2
model4.fit(X_t,y, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7ff1fd1b44e0>

**Model three is the best model with the SGD as the optimizer and AUC as the metric.**



