Code Author: Tanzid Sultan

**Simple Natural Language Processing Models** 

`Bag of words`: This is a simple way of creating numerical representation of text data. Given a vocabulary, i.e. a set of words, we can use one-hot encoding to assign a vector to each word. The dimensions of the vector equals the size of the vocabulary. The vector corresponding to any given word has a single `1` and the rest of the elements are zeros. 

Then given a sentence, we can contruct a corresponding vector by adding up the vectors for each unique word in that sentence.

This is demonstrated in the example below. 

In [5]:
import numpy as np

# all words lower-case for convenience
vocabulary = ['loves', 'cute','chocolate', 'is', 'cat', 'my', 'the', 'eating', 'i', 'four']

# create onehot encoded word vectors and store them in a dictionary
onehots = {}
for i, word in enumerate(vocabulary):
    word_vector = np.zeros(shape=(len(vocabulary)))
    word_vector[i] = 1
    onehots[word] = word_vector


# test sentence
sentence = ['my', 'cat', 'loves', 'eating', 'chocolate']
sentence = set(sentence) # to remove duplicate words

# vector representation of the test sentence
sentence_vec = np.zeros(shape=(len(vocabulary)))
for word in sentence:
    sentence_vec += onehots[word]

print(f"Sentence vector: {sentence_vec}")

Sentence vector: [1. 0. 1. 0. 1. 1. 0. 1. 0. 0.]


Encoding `IMDB movie reviews`: We will now read in text from a file containing IMDB movie reviews and create onehot encoded vocabulary. We will also read in text from a file containing the movie rating corresponding to these reviews.

In [13]:
# open file and store every line of text in a list (each line is a separate movie review)
f = open('reviews.txt')
raw_reviews = f.readlines()
f.close()

f = open('labels.txt')
raw_labels = f.readlines()
f.close()

In [28]:
# create a list of unique words pulled from every review (map iterates over every review and uses the lambda function to extract all unique words from the review)

reviews_unique_words = list(map(lambda x: set(x.split(" ")), raw_reviews))

# now create a vocabulary from all unique words across all the reviews
vocab = set()
for review in reviews_unique_words:
    for word in review:
        if(len(word) > 0):
            vocab.add(word)
vocab = list(vocab)

# enumerate the words in the vocabulary and store the indices in a dictionary
word_index = {}
for i, word in enumerate(vocab):
    word_index[word] = i

# now store the indices of all unique words appearing in a review
input_dataset = []
for review in reviews_unique_words:
    review_indices = []
    for word in review:
        try:
            review_indices.append(word_index[word])
        except:
            ""    
    input_dataset.append(review_indices)

# now store all the movie rating in a list
target_dataset = []
for rating in raw_labels:
    if(rating  == 'positive\n'):
        target_dataset.append(1)
    else:
        target_dataset.append(0)

We will now build a simple 3 layer neural network and train it to predict the target label (i.e. positive or negative) for a given review text. We will use a sigmoid activation function in the hidden layer. Since the input is the one-hot encoded vector corresponding to review, it ois a vector of size equal to the vocabulary size and is filled with mostly 0s and a relatively small number of 1s. So, instead of doing a vector multiplication of this input vector with the weights matrix, we will simply sum up only the weight components corresponding to the non-zero components of the input vector, avoiding the unnecessary multiplications by 0s. Our input dataset is set up in a way that will facilitate this (since the inputs are just a list of all the position indices of 1s)  

In [49]:
import numpy as np

'''
    Input layer class: Input layer does not perform any operations
'''
class input_layer(object):
    '''
        class constructor
    '''
    def __init__(self) -> None:
        pass

    ''' 
        Input layer forward pass
    '''
    def forward(self, L_0):
        self.L_0 = L_0
        return self.L_0
    
''' 
    Hidden layer class: Hidden layer performs 2 operations. First it performs matrix multiplication
                        of inputs L_0 with weights W_0. Then it operates on this result with the Relu
                        function.
'''    
class hidden_layer(object):
    '''
        class constructor
    '''
    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Hidden layer forward propagation
    '''
    def forward(self, L): 
        self.L = L
        return self.forward_matrix_mult()

    def forward_matrix_mult(self):
        # instead of matrix multiplication, we just sum up the relevant weight components
        self.Z = np.sum(self.W(self.L), axis = 0)
           
        return self.forward_sigmoid()
    
    def forward_sigmoid(self):
        return sigmoid(self.Z)
   
    ''' 
        Hidden layer backpropagation of derivatives
    '''
    def backward(self, D):
        self.backward_sigmoid(D)

    def backward_sigmoid(self, D):
        # dE/dZ
        dE_dZ = D * sigmoid_deriv(self.Z) 
        self.backward_matrix_mult(dE_dZ)

    def backward_matrix_mult(self, D):
        # dE/dW0
        self.W_grad = np.dot((self.L).T, D)

    ''' 
        Gradient descent optimization of hidden layer weights
    '''
    def update_weights(self, alpha):
        self.W -= alpha * self.W_grad

       

''' 
    Ouput layer class: Performs two operations, first matrix multiplication of inputs L_1 with weights
                       W_1. This result is then operated on by squared error function.  
'''
class output_layer(object):
    
    ''' 
        class constructor
    '''

    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Output layer forward propagation
    '''
    def forward(self, L, Y, soft):
        self.L = L
        self.Y = Y
        return self.forward_matrix_mult(soft)

    def forward_matrix_mult(self, soft):
        self.P = np.dot(self.L, self.W) 
        if(soft):
            self.P = softmax(self.P)
        
        return self.P, self.forward_error()
 
    def forward_error(self):
        return np.sum((self.P - self.Y)**2) / self.P.shape[0]

    '''     
        Output layer backpropagation of derivatives
    '''
    def backward(self):
        return self.backward_error()

    def backward_error(self):
        # dE/dP
        dE_dP = 2*(self.P - self.Y) / self.P.shape[0]
        return self.backward_matrix_mult(dE_dP)

    def backward_matrix_mult(self, D):
        # dE/dW1
        self.W_grad = np.dot((self.L).T, D)
        # dE/dL1
        dE_dL = np.dot(D, (self.W).T)
        return dE_dL
    
    ''' 
        Gradient descent optimization of output layer weights
    '''
    def update_weights(self, alpha):
        self.W -= alpha * self.W_grad

'''
    A 3-layer neural network class
'''
class three_layer_net(object):
    ''' 
        class constructor: Takes in the following parameters- number of neurons in input layer (which is the number of feature attributes for each instance), number of hidden layers (has to be at least 1 and can be arbitrarily large), number of neurons in the output layer (which is the number of target attributes) and gradient descent step-size (alpha)
    '''
    def __init__(self, input_neurons, hidden_neurons, output_neurons) -> None:
        self.input_neurons  = input_neurons
        self.hidden_neurons = hidden_neurons
        self.output_neurons = output_neurons
        
        np.random.seed(1)
        # initialize weights W0 between input layer and hidden layer 
        W0 = 0.2*np.random.random(size=(input_neurons, hidden_neurons)) - 0.1
        # initialize weights W1 between hidden layer and output layer
        W1 = 0.2*np.random.random(size=(hidden_neurons, output_neurons)) - 0.1 

        # initialize layer objects
        self.layer_0 = input_layer()
        self.layer_1 = hidden_layer(W0)
        self.layer_2 = output_layer(W1)

    ''' 
        neural network forward pass
    '''
    def forward_net(self, L0, Y, soft):
        # input layer forward pass
        self.L0 = self.layer_0.forward(L0) 
        # hidden layer forward pass 
        self.L1 = self.layer_1.forward(self.L0) 
        # output layer forward pass
        self.L2, error = self.layer_2.forward(self.L1, Y, soft) 

        return self.L2, error

    ''' 
        neural network backward pass
    ''' 
    def backward_net(self):
       # output layer backpropagation
       D = self.layer_2.backward() 
       # hidden layer backpropagation
       self.layer_1.backward(D) 

    '''     
        weight optimization
    '''
    def optimize(self, alpha):
        # update output layer weights
        self.layer_2.update_weights(alpha)
        # update hidden layer weights
        self.layer_1.update_weights(alpha)

    '''     
        train the network
    ''' 
    def train(self, X_train, y_train, X_test, y_test, alpha, niters=1, soft=False):
        print(f"Softmax Enabled: {soft}")
        print(f"Alpha: {alpha}")
        print("Training in progress...")
        #training iterations
        for i in range(niters):
            total_error = 0.0
            train_correct_count = 0
            # train using batch of instances
            for j in range(len(X_train)):

                X = X_train[j]
                y = y_train[j]

                # forward propagation
                prediction, error = self.forward_net(X, y, soft)
                total_error += error
                
                train_correct_count += int(np.abs(prediction-y) < 0.5)
                
                #if(i == (niters-1)):
                #    print(f"Instance# {j+1}, Target: {y}, Prediction: {prediction}")

                # backpropagation
                self.backward_net()

                # weight optimization
                self.optimize(alpha)

            # predict using test instances
            test_correct_count = 0
            for j in range(len(X_test)):
                X = X_test[j]
                y = y_test[j]

                # forward propagation
                prediction, error = self.forward_net(X, y, soft=False)
                test_correct_count += int(np.abs(prediction-y) < 0.5)

            print(f"Iteration# {i+1}, Total error: {total_error}, Training accuracy: {train_correct_count/len(y_train)}, Testing accuracy: {test_correct_count/len(y_test)}")



def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-1.0 * x))

def sigmoid_deriv(x):
    return sigmoid(x) * (1.0 - sigmoid(x)) 

def softmax(x): 
    ex = np.exp(x)
    return ex/np.sum(ex, axis = 1, keepdims = True)  

In [50]:
# initialize a three layer network object
input_neurons = len(vocab)
hidden_neurons = 100
output_neurons = 1 
net = three_layer_net(input_neurons, hidden_neurons, output_neurons)

In [51]:
# preprocess the training and testing datasets
nreviews = len(input_dataset)
X_train = input_dataset[:nreviews-1000]
y_train = target_dataset[:nreviews-1000]
X_test = input_dataset[nreviews-1000:]
y_test = target_dataset[nreviews-1000:]

# train the network with the reviews dataset
net.train(X_train, y_train, X_test, y_test, alpha = 0.01, niters = 2, soft = True)

Softmax Enabled: True
Alpha: 0.01
Training in progress...


TypeError: 'numpy.ndarray' object is not callable