Code Author: Tanzid Sultan

**Simple Natural Language Processing Models** 

`Bag of words`: This is a simple way of creating numerical representation of text data. Given a vocabulary, i.e. a set of words, we can use one-hot encoding to assign a vector to each word. The dimensions of the vector equals the size of the vocabulary. The vector corresponding to any given word has a single `1` and the rest of the elements are zeros. 

Then given a sentence, we can contruct a corresponding vector by adding up the vectors for each unique word in that sentence.

This is demonstrated in the example below. 

In [9]:
import numpy as np

# all words lower-case for convenience
vocabulary = ['loves', 'cute','chocolate', 'is', 'cat', 'my', 'the', 'eating', 'i', 'four']

# create onehot encoded word vectors and store them in a dictionary
onehots = {}
for i, word in enumerate(vocabulary):
    word_vector = np.zeros(shape=(len(vocabulary)))
    word_vector[i] = 1
    onehots[word] = word_vector


# test sentence
sentence = ['my', 'cat', 'loves', 'eating', 'chocolate']
sentence = set(sentence) # to remove duplicate words

# vector representation of the test sentence
sentence_vec = np.zeros(shape=(len(vocabulary)))
for word in sentence:
    sentence_vec += onehots[word]

print(f"Sentence vector: {sentence_vec}")

Sentence vector: [1. 0. 1. 0. 1. 1. 0. 1. 0. 0.]


Encoding `IMDB movie reviews`: We will now read in text from a file containing IMDB movie reviews and create onehot encoded vocabulary. We will also read in text from a file containing the movie rating corresponding to these reviews.

In [10]:
# open file and store every line of text in a list (each line is a separate movie review)
f = open('reviews.txt')
raw_reviews = f.readlines()
f.close()

f = open('labels.txt')
raw_labels = f.readlines()
f.close()

In [11]:
# create a list of unique words pulled from every review (map iterates over every review and uses the lambda function to extract all unique words from the review)

reviews_unique_words = list(map(lambda x: set(x.split(" ")), raw_reviews))

# now create a vocabulary from all unique words across all the reviews
vocab = set()
for review in reviews_unique_words:
    for word in review:
        if(len(word) > 0):
            vocab.add(word)
vocab = list(vocab)

# enumerate the words in the vocabulary and store the indices in a dictionary
word_index = {}
for i, word in enumerate(vocab):
    word_index[word] = i

# now store the indices of all unique words appearing in a review
input_dataset = []
for review in reviews_unique_words:
    review_indices = []
    for word in review:
        try:
            review_indices.append(word_index[word])
        except:
            ""    
    input_dataset.append(review_indices)

# now store all the movie rating in a list
target_dataset = []
for rating in raw_labels:
    if(rating  == 'positive\n'):
        target_dataset.append(1)
    else:
        target_dataset.append(0)

We will now build a simple 3 layer neural network and train it to predict the target label (i.e. positive or negative) for a given review text. We will use a sigmoid activation function in the hidden layer. Since the input is the one-hot encoded vector corresponding to review, it ois a vector of size equal to the vocabulary size and is filled with mostly 0s and a relatively small number of 1s. So, instead of doing a vector multiplication of this input vector with the weights matrix, we will simply sum up only the weight components corresponding to the non-zero components of the input vector, avoiding the unnecessary multiplications by 0s. Our input dataset is set up in a way that will facilitate this (since the inputs are just a list of all the position indices of 1s)  

In [7]:
import numpy as np
import math
from collections import Counter

'''
    Input layer class: Input layer does not perform any operations
'''
class input_layer(object):
    '''
        class constructor
    '''
    def __init__(self) -> None:
        pass

    ''' 
        Input layer forward pass
    '''
    def forward(self, L_0):
        self.L_0 = L_0
        return self.L_0
    
''' 
    Hidden layer class: Hidden layer performs 2 operations. First it performs matrix multiplication
                        of inputs L_0 with weights W_0. Then it operates on this result with the Relu
                        function.
'''    
class hidden_layer(object):
    '''
        class constructor
    '''
    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Hidden layer forward propagation
    '''
    def forward(self, L): 
        self.L = L
        return self.forward_matrix_mult()

    def forward_matrix_mult(self):
        # instead of matrix multiplication, we just sum up the relevant weight components
        self.Z = np.sum(self.W[self.L], axis = 0)
        self.Z = (self.Z).reshape(1,(self.W).shape[1])   

        return self.forward_sigmoid()
    
    def forward_sigmoid(self):
        return sigmoid(self.Z)
   
    ''' 
        Hidden layer backpropagation of derivatives
    '''
    def backward(self, D):
        self.backward_sigmoid(D)

    def backward_sigmoid(self, D):
        # dE/dZ
        dE_dZ = D * sigmoid_deriv(self.Z) 
        self.backward_matrix_mult(dE_dZ)

    def backward_matrix_mult(self, D):
        # dE/dW0
        self.W_grad = D

    ''' 
        Gradient descent optimization of hidden layer weights
    '''
    def update_weights(self, alpha):
        # only need to update the weights in the relevant rows (i.e. the weights that contribute to the non-zero input components)
        self.W[self.L] -= alpha * self.W_grad

       

''' 
    Ouput layer class: Performs two operations, first matrix multiplication of inputs L_1 with weights
                       W_1. This result is then operated on by squared error function.  
'''
class output_layer(object):
    
    ''' 
        class constructor
    '''

    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Output layer forward propagation
    '''
    def forward(self, L, Y):
        self.L = L
        self.Y = Y
        return self.forward_matrix_mult()

    def forward_matrix_mult(self):
        self.P = np.dot(self.L, self.W) 
        # apply sigmoid function
        self.P = sigmoid(self.P)
        
        return self.P, self.forward_error()
 
    def forward_error(self):
        return np.sum((self.P - self.Y)**2) / self.P.shape[0]

    '''     
        Output layer backpropagation of derivatives
    '''
    def backward(self):
        return self.backward_error()

    def backward_error(self):
        # dE/dP
        dE_dP = 2*(self.P - self.Y) / self.P.shape[0]
        return self.backward_matrix_mult(dE_dP)

    def backward_matrix_mult(self, D):
        # dE/dW1
        self.W_grad = np.dot((self.L).T, D)
        # dE/dL1
        dE_dL = np.dot(D, (self.W).T)
        return dE_dL
    
    ''' 
        Gradient descent optimization of output layer weights
    '''
    def update_weights(self, alpha):
        self.W -= alpha * self.W_grad

'''
    A 3-layer neural network class
'''
class three_layer_net(object):
    ''' 
        class constructor: Takes in the following parameters- number of neurons in input layer (which is the number of feature attributes for each instance), number of hidden layers (has to be at least 1 and can be arbitrarily large), number of neurons in the output layer (which is the number of target attributes) and gradient descent step-size (alpha)
    '''
    def __init__(self, input_neurons, hidden_neurons, output_neurons) -> None:
        self.input_neurons  = input_neurons
        self.hidden_neurons = hidden_neurons
        self.output_neurons = output_neurons
        
        np.random.seed(1)
        # initialize weights W0 between input layer and hidden layer 
        W0 = 0.2*np.random.random(size=(input_neurons, hidden_neurons)) - 0.1
        # initialize weights W1 between hidden layer and output layer
        W1 = 0.2*np.random.random(size=(hidden_neurons, output_neurons)) - 0.1 

        # initialize layer objects
        self.layer_0 = input_layer()
        self.layer_1 = hidden_layer(W0)
        self.layer_2 = output_layer(W1)

    ''' 
        neural network forward pass
    '''
    def forward_net(self, L0, Y):
        # input layer forward pass
        self.L0 = self.layer_0.forward(L0) 
        # hidden layer forward pass 
        self.L1 = self.layer_1.forward(self.L0) 
        # output layer forward pass
        self.L2, error = self.layer_2.forward(self.L1, Y) 

        return self.L2, error

    ''' 
        neural network backward pass
    ''' 
    def backward_net(self):
       # output layer backpropagation
       D = self.layer_2.backward() 
       # hidden layer backpropagation
       self.layer_1.backward(D) 

    '''     
        weight optimization
    '''
    def optimize(self, alpha):
        # update output layer weights
        self.layer_2.update_weights(alpha)
        # update hidden layer weights
        self.layer_1.update_weights(alpha)

    '''     
        train the network
    ''' 
    def train(self, X_train, y_train, X_test, y_test, alpha, niters=1):
        print(f"Alpha: {alpha}")
        print("Training in progress...")
        #training iterations
        for i in range(niters):
            total_error = 0.0
            train_correct_count = 0
            # train using batch of instances
            for j in range(len(X_train)):

                X = X_train[j]
                y = y_train[j]

                # forward propagation
                prediction, error = self.forward_net(X, y)
                total_error += error
                
                train_correct_count += int(np.abs(prediction-y) < 0.5)
                
                #if(i == (niters-1)):
                #    print(f"Instance# {j+1}, Target: {y}, Prediction: {prediction}")

                # backpropagation
                self.backward_net()

                # weight optimization
                self.optimize(alpha)

            # predict using test instances
            test_correct_count = 0
            for j in range(len(X_test)):
                X = X_test[j]
                y = y_test[j]

                # forward propagation
                prediction, error = self.forward_net(X, y)
                test_correct_count += int(np.abs(prediction-y) < 0.5)

            print(f"Iteration# {i+1}, Total error: {total_error}, Training accuracy: {train_correct_count/len(y_train)}, Testing accuracy: {test_correct_count/len(y_test)}")

    # use trained neural network weights to finds 10 words which may have similar meaning to a target word
    def similar_words(self, target, word_index):
        
        # two words can be considered to have similar meaning if the set of weights connecting one word to all the hidden neurons
        # is similar to the set of weights connecting the other word to all the hidden neurons, The weights for any specific word
        # is just the row corresponding to that word in the W0 weights matrix. To measure similarity between two words, we will
        # compute the euclidean distance between the corresponding two rows of W0  
        if(target in word_index):
            target_row_index = word_index[target]  
            target_weights =  self.layer_1.W[target_row_index]
            similarities = Counter()

            # measure similarity with all words in the vocabulary, and find the 10 words with the higest similarities
            for word,index in word_index.items():
                word_weights = self.layer_1.W[index]
                diff = target_weights - word_weights 
                squared_dist = math.sqrt(np.sum(diff * diff)) 
                
                # define negative distance as similarity, means that most similar words will be the least negative (i.e. distance closest to zero) 
                # also makes it conveninent to extract these values from the Counter object using the most_common method
                similarity = -squared_dist
                similarities[word] = similarity
            
            most_similar_words = similarities.most_common(10)
            return most_similar_words

        else:
            print("ERROR! Target does not exist in vocabulary!")

def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-1.0 * x))

def sigmoid_deriv(x):
    return sigmoid(x) * (1.0 - sigmoid(x)) 


In [43]:
# initialize a three layer network object
input_neurons = len(vocab)
hidden_neurons = 100
output_neurons = 1 
net = three_layer_net(input_neurons, hidden_neurons, output_neurons)

# preprocess the training and testing datasets
nreviews = len(input_dataset)
X_train = input_dataset[:nreviews-1000]
y_train = target_dataset[:nreviews-1000]
X_test = input_dataset[nreviews-1000:]
y_test = target_dataset[nreviews-1000:]

# train the network with the reviews dataset
net.train(X_train, y_train, X_test, y_test, alpha = 0.01, niters = 3)

Alpha: 0.01
Training in progress...
Iteration# 1, Total error: 2885.648401665013, Training accuracy: 0.8171666666666667, Testing accuracy: 0.855
Iteration# 2, Total error: 1678.234589679232, Training accuracy: 0.9072916666666667, Testing accuracy: 0.855
Iteration# 3, Total error: 1224.7013356524926, Training accuracy: 0.9335, Testing accuracy: 0.858


In [51]:
net.similar_words('excellent', word_index)

[('excellent', -0.0),
 ('noir', -0.7785588749677791),
 ('rare', -0.7875624448592276),
 ('perfect', -0.7965609387727814),
 ('superb', -0.805372592806488),
 ('amazing', -0.8313731968730242),
 ('wonderfully', -0.8327719264000264),
 ('subtle', -0.8368666697618214),
 ('today', -0.8398486723370631),
 ('incredible', -0.8493702723041733)]

In [52]:
net.similar_words('awful', word_index)


[('awful', -0.0),
 ('poorly', -0.7333796422359578),
 ('worst', -0.8426149699529178),
 ('boring', -0.8509156465208111),
 ('disappointing', -0.8707427207042696),
 ('lacks', -0.8851981068712745),
 ('fails', -0.8890817135905027),
 ('disappointment', -0.89706320449745),
 ('waste', -0.9069090786367358),
 ('mess', -0.9324828669005073)]

In [55]:
net.similar_words('awful', word_index)

net.similar_words('amazing', word_index)

[('amazing', -0.0),
 ('loved', -0.7128342465078742),
 ('noir', -0.7264585180265127),
 ('rare', -0.7411112604421914),
 ('brilliant', -0.7514086206069763),
 ('fascinating', -0.7607812748237435),
 ('perfect', -0.761149648676041),
 ('fantastic', -0.7688480503700348),
 ('touching', -0.770137246064581),
 ('delightful', -0.7714280593197891)]

**`Training a neural network to fill in the blanks in a phrase`**

Next, we will train a neural network to predict a missing word in a phrase (group of consecutive words selected from within a sentence), using the movie reviews data set. The training process involves taking a given phrase from a sentence, removing one of the words and using that word as the target label and optimizing the weights to make this target the most likely prediction. The training is done over every every phrase in a sentenece, over many sentences. In this case, the target label can be any word from the vocabulary, so the output size is very large (compared to the previous example problem of prediciting movie ratings where the output was binary). To make to computations more tractable, we can constrain the output labels to only a (randomly chosen) small subset of the vocabulary. This approximation still yields accurate results.

In [1]:
import sys, math, collections
import numpy as np
import random

In [3]:
f = open('reviews.txt')
raw_reviews = f.readlines()
f.close()

In [4]:
# create a list of unique words pulled from every review (map iterates over every review and uses the lambda function to extract all unique words from the review)
reviews_unique_words = list(map(lambda x: set(x.split(" ")), raw_reviews))

# now create a vocabulary from all unique words across all the reviews
vocab = set()
for review in reviews_unique_words:
    for word in review:
        if(len(word) > 0):
            vocab.add(word)
vocab = list(vocab)

# enumerate the words in the vocabulary and store the indices in a dictionary
word_index = {}
for i, word in enumerate(vocab):
    word_index[word] = i


# now create a list of all reviews/sentences 
reviews_sentences = list(map(lambda x: x.split(" "), raw_reviews))

# create a list of word indices for the sequence of words in each sentence 
input_dataset = []
concatenated = []
for sentence in reviews_sentences:
    sentence_indices = []
    for word in sentence:
        try:
            sentence_indices.append(word_index[word])
            concatenated.append(word_index[word])
        except:
            ""    
    input_dataset.append(sentence_indices)

# convert all the concatenated word indices into a numpy array
concatenated = np.array(concatenated)

# shuffle the odering of the sentences
random.shuffle(input_dataset)

Now we design a three layer neural network for this model

In [117]:
'''
    Input layer class: Input layer does not perform any operations
'''
class input_layer(object):
    '''
        class constructor
    '''
    def __init__(self) -> None:
        pass

    ''' 
        Input layer forward pass
    '''
    def forward(self, L_0):
        self.L_0 = L_0
        return self.L_0
    
''' 
    Hidden layer class: Hidden layer performs 2 operations. First it performs matrix multiplication
                        of inputs L_0 with weights W_0. Then it operates on this result with the Relu
                        function.
'''    
class hidden_layer(object):
    '''
        class constructor
    '''
    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Hidden layer forward propagation
    '''
    def forward(self, L): 
        self.L = L
        return self.forward_matrix_mult()

    def forward_matrix_mult(self):
        # due to the sparsity of the input vector (because it has mostly 0s except for a few 1s at the positions corresponding to the indices of the words in the input phrase), instead of matrix multiplication, we just sum up the rows in matrix W0 that correspond to the word indices of the words in the input phrase (like in the previous example)
        #self.Z = np.mean(self.W[self.L], axis = 0)
        self.Z = np.sum(self.W[self.L], axis = 0)

        self.Z = (self.Z).reshape(1,(self.W).shape[1])   
        return self.Z
    
    ''' 
        Hidden layer backpropagation of derivatives
    '''
    def backward(self, D):
        self.backward_matrix_mult(D)

    def backward_matrix_mult(self, D):
        # dE/dW0 = dot-product(L0.T, D) 
        # since the input vector L0 is just a row vector with 0s everywhere except for a few 1s at the positions corresponding to the indices of the words in the input phrase, the dot-product between L0.T and D is a matrix with most rows filled with zeros except for the rows corresponding to the indices for the iput phrase words, which contain a copy of D. So instead of storing this entire matrix, we can only store a single copy of the identical nonzero rows which is just D     
        self.W_grad = D

    ''' 
        Gradient descent optimization of hidden layer weights
    '''
    def update_weights(self, alpha):
        # only need to update the weights in the relevant rows (i.e. the weights that contribute to the non-zero input components)
        # all the relevant rows get updated with the same gradient row vector (i.e. D)
        self.W[self.L] -= alpha * self.W_grad


''' 
    Ouput layer class: Performs two operations, first matrix multiplication of inputs L_1 with weights
                       W_1. This result is then operated on by squared error function.  
'''
class output_layer(object):
    
    ''' 
        class constructor
    '''

    def __init__(self, W) -> None:
        self.W = W
        self.W_grad = np.zeros_like(W)

    ''' 
        Output layer forward propagation
    '''
    def forward(self, L, Y, target_label_indices):
        self.L = L
        self.Y = Y
        self.target_label_indices = target_label_indices
        return self.forward_matrix_mult()

    def forward_matrix_mult(self):
        # since we're only going to be predicting target labels from a small subset of the vocabulary, we only need to multiply with the weights in rows for those secific words. Since we store the transpose of the W1 matrix, we need to tarnsbose back to the original shape before matrix multiplication
        self.R = np.dot(self.L, self.W[self.target_label_indices].T) 
        # apply sigmoid function
        self.P = self.R.copy()
        self.P = sigmoid(self.P)
        
        return self.P, self.forward_error()
 
    def forward_error(self):
        return np.sum((self.P - self.Y)**2) / self.P.shape[0]

    '''     
        Output layer backpropagation of derivatives
    '''
    def backward(self):
        return self.backward_error()

    def backward_error(self):
        # dE/dP
        dE_dP = 2*(self.P - self.Y) / self.P.shape[0]
        return self.backward_sigmoid(dE_dP)

    def backward_sigmoid(self, D):
        # dE/dR
        dE_dR = D * sigmoid_deriv(self.R) 
        return self.backward_matrix_mult(dE_dR)


    def backward_matrix_mult(self, D):
        # dE/dW1 
        # also take the transpose of this gradient matrix since we're storing the transpose of the weights matrix
        #self.W_grad = np.dot((self.L).T, D)
        self.W_grad = (np.dot((self.L).T, D)).T
        # dE/dL1
        # since W is already transposed, we don't need to transpose it inside the dot product
        # also only need to include the specific rows for words in the output labels list
        #dE_dL = np.dot(D, (self.W).T)
        dE_dL = np.dot(D, self.W[self.target_label_indices])
        return dE_dL
    
    ''' 
        Gradient descent optimization of output layer weights
    '''
    def update_weights(self, alpha):
        # only need to update the rows for the words in the output labels list
        self.W[self.target_label_indices] -= alpha * self.W_grad

'''
    A 3-layer neural network class
'''
class three_layer_net(object):
    ''' 
        class constructor: Takes in the following parameters- number of neurons in input layer (which is the number of feature attributes for each instance), number of hidden layers (has to be at least 1 and can be arbitrarily large), number of neurons in the output layer (which is the number of target attributes) and gradient descent step-size (alpha)
    '''
    def __init__(self, input_neurons, hidden_neurons, output_neurons) -> None:
        self.input_neurons  = input_neurons
        self.hidden_neurons = hidden_neurons
        self.output_neurons = output_neurons
        
        np.random.seed(1)
        # initialize weights W0 between input layer and hidden layer 
        W0 = 0.2*np.random.random(size=(input_neurons, hidden_neurons)) - 0.1
       
        # initialize weights W1 between hidden layer and output layer 
        # Note: We initialize this as the transpose of W1, so the first dimension has size of output neurons instead of hidden neurons.
        # We do this so that when computing the prediction, we can extract out the specific rows that we want (coressponding to the target labels subset) and then transpose those rows back to the original shape before matrix multiplication with the layer inputs  
        W1 = 0.02*np.random.random(size=(output_neurons, hidden_neurons)) - 0.01 

        # initialize layer objects
        self.layer_0 = input_layer()
        self.layer_1 = hidden_layer(W0)
        self.layer_2 = output_layer(W1)

    ''' 
        neural network forward pass
    '''
    def forward_net(self, L0, Y, target_label_indices):
        # input layer forward pass
        self.L0 = self.layer_0.forward(L0) 
        # hidden layer forward pass 
        self.L1 = self.layer_1.forward(self.L0) 
        # output layer forward pass
        self.L2, error = self.layer_2.forward(self.L1, Y, target_label_indices) 

        return self.L2, error

    ''' 
        neural network backward pass
    ''' 
    def backward_net(self):
       # output layer backpropagation
       D = self.layer_2.backward() 
       # hidden layer backpropagation
       self.layer_1.backward(D) 

    '''     
        weight optimization
    '''
    def optimize(self, alpha):
        # update output layer weights
        self.layer_2.update_weights(alpha)
        # update hidden layer weights
        self.layer_1.update_weights(alpha)

    '''     
        train the network
    ''' 
    def train(self, sentences, concatenated, word_index, target_size, phrase_half_length, alpha, niters=1):
        print(f"Alpha: {alpha}")
        print("Training in progress...")
        #training iterations
        for iter in range(niters):
            total_error = 0.0
            train_correct_count = 0
            counter = 0
            percent_done = 0
            # iterate over sentences
            for i in range(len(sentences)):
                sentence  = sentences[i]
                if((int(i*100/len(sentences))%5) == 0 and (int(i*100/len(sentences)/5) > 0.0) and (int(i*100/len(sentences)) != percent_done)):
                    print(f"Iteration# {iter}, % completed: {int(i*100/len(sentences))}")
                    percent_done = int(i*100/len(sentences))
                # iterate over phrases in sentence (i.e. focus on middle word which is going to be the missing word in the phrase)
                for j in range(len(sentence)):
                    
                    # randomly pick a small subset of the words as our target labels (including the focus word itself)
                    target_label_indices = [sentence[j]] + (concatenated[np.random.randint(0, len(concatenated),size=target_size).tolist()]).tolist()    

                    # input is the list of words in the phrase with the missing word removed (missing word is in the middle word of our phrase)
                    lo = max(0, j-phrase_half_length)
                    hi = min(j+phrase_half_length, len(sentence))
                    phrase_words_indices = sentence[lo:j]  + sentence[j+1:hi] 

                    #print(f"j: {j}, lo: {lo}, hi: {hi-1}")
                    #print(f"Missing word: {sentence[j]}")
                    #print(f"Phrase words: {phrase_words_indices}")

                    # the target vector is just a vector of length equal to the number of target labels, with 1 at the zeroth position (corresponding to the missing/focus word) and zeros everywhere else
                    y = np.zeros(shape=(1,target_size+1))
                    y[0,0] = 1
                    X = phrase_words_indices

                    # forward propagation
                    prediction, error = self.forward_net(X, y, target_label_indices)
                    total_error += error
                    train_correct_count += int(np.argmax(prediction) == np.argmax(y))
                 
                    # backpropagation
                    self.backward_net()

                    # weight optimization
                    self.optimize(alpha)

                    counter += 1

            print(f"Iteration# {i+1}, Total error: {total_error}, Training accuracy: {train_correct_count/counter}")

    # use trained neural network weights to finds 10 words which may have similar meaning to a target word
    def similar_words(self, target, word_index):
        
        # two words can be considered to have similar meaning if the set of weights connecting one word to all the hidden neurons
        # is similar to the set of weights connecting the other word to all the hidden neurons, The weights for any specific word
        # is just the row corresponding to that word in the W0 weights matrix. To measure similarity between two words, we will
        # compute the euclidean distance between the corresponding two rows of W0  
        if(target in word_index):
            target_row_index = word_index[target]  
            target_weights =  self.layer_1.W[target_row_index]
            similarities = Counter()

            # measure similarity with all words in the vocabulary, and find the 10 words with the higest similarities
            for word,index in word_index.items():
                word_weights = self.layer_1.W[index]
                diff = target_weights - word_weights 
                squared_dist = math.sqrt(np.sum(diff * diff)) 
                
                # define negative distance as similarity, means that most similar words will be the least negative (i.e. distance closest to zero) 
                # also makes it conveninent to extract these values from the Counter object using the most_common method
                similarity = -squared_dist
                similarities[word] = similarity
            
            most_similar_words = similarities.most_common(10)
            return most_similar_words

        else:
            print("ERROR! Target does not exist in vocabulary!")

    # use trained neural network to fill in a missing word in a sentence
    def complete_sentence(self, sentence, missing_word, word_index):

        # remove missing word from sentence
        if missing_word in sentence:
            sentence.remove(missing_word)

        # convert words in sentence to word indices
        sentence_indices = []
        for word in sentence:
            sentence_indices.append(word_index[word])
            
        target_size = int(len(vocab))
        np.random.seed(1)
        #target_label_indices = np.random.randint(0, len(vocab),size=target_size).tolist()
        target_label_indices = list(range(len(vocab)))
        y = np.zeros(shape=(1,len(target_label_indices)))
        # predict the missing word
        prediction, error = self.forward_net(sentence_indices, y, target_label_indices)
        prediction = prediction[0]

        # get the top 5 predictions
        top_pos = np.argpartition(prediction, - 5)[-5:]
        target_label_indices = np.array(target_label_indices)
        top_pred_indices = target_label_indices[top_pos]
        #print(f"Top 5 predicted word indices: {top_pred_indices}")
        
        top_five_pred_words = []
        for index in top_pred_indices:
            top_five_pred_words.append(vocab[index])

        return top_five_pred_words


def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-1.0 * x))

def sigmoid_deriv(x):
    return sigmoid(x) * (1.0 - sigmoid(x)) 


In [125]:
# initialize a three layer network object
input_neurons = len(vocab)
hidden_neurons = 50
output_neurons = len(vocab) 
net = three_layer_net(input_neurons, hidden_neurons, output_neurons)

# train the network with the reviews dataset
net.train(input_dataset, concatenated, word_index, 7, 4, alpha = 0.025, niters = 24)

Alpha: 0.025
Training in progress...
Iteration# 0, % completed: 5
Iteration# 0, % completed: 10
Iteration# 0, % completed: 15
Iteration# 0, % completed: 20
Iteration# 0, % completed: 25
Iteration# 0, % completed: 30
Iteration# 0, % completed: 35
Iteration# 0, % completed: 40
Iteration# 0, % completed: 45
Iteration# 0, % completed: 50
Iteration# 0, % completed: 55
Iteration# 0, % completed: 60
Iteration# 0, % completed: 65
Iteration# 0, % completed: 70
Iteration# 0, % completed: 75
Iteration# 0, % completed: 80
Iteration# 0, % completed: 85
Iteration# 0, % completed: 90
Iteration# 0, % completed: 95
Iteration# 25000, Total error: 4921206.442160997, Training accuracy: 0.40422915239938306
Iteration# 1, % completed: 5
Iteration# 1, % completed: 10
Iteration# 1, % completed: 15
Iteration# 1, % completed: 20
Iteration# 1, % completed: 25
Iteration# 1, % completed: 30
Iteration# 1, % completed: 35
Iteration# 1, % completed: 40
Iteration# 1, % completed: 45
Iteration# 1, % completed: 50
Iterat

In [219]:
# sentence completion test
def complete_sentence(sentence, missing_word, word_index):

    # remove missing word from sentence
    if missing_word in sentence:
        sentence.remove(missing_word)

    # convert words in sentence to word indices
    sentence_indices = []
    for word in sentence:
        sentence_indices.append(word_index[word])
        
    target_size = int(len(vocab))
    np.random.seed(1)
    #target_label_indices = np.random.randint(0, len(vocab),size=target_size).tolist()
    target_label_indices = list(range(len(vocab)))
    y = np.zeros(shape=(1,len(target_label_indices)))
    # predict the missing word
    prediction, error = net.forward_net(sentence_indices, y, target_label_indices)
    prediction = prediction[0]

    # get the top 10 predictions
    top_pos = np.argpartition(prediction, - 5)[-5:]
    target_label_indices = np.array(target_label_indices)
    top_pred_indices = target_label_indices[top_pos]
    #print(f"Top 5 predicted word indices: {top_pred_indices}")
    
    top_five_pred_words = []
    for index in top_pred_indices:
        top_five_pred_words.append(vocab[index])

    return top_five_pred_words


In [220]:
test_sentence = ['i', 'am' ,'wearing', 'a', 'red', 'shirt']
missing_word='red'
predictions = complete_sentence(test_sentence.copy(), missing_word, word_index)
test_sentence[test_sentence.index(missing_word)] = "______"
print('\nTest sentence:\n')
print('\t\t'+ ' '.join(test_sentence)+'\n')
print(f"Top 5 predictions for missing word: {predictions}")
print(f"\nActual missing word: {missing_word}")


Test sentence:

		i am wearing a ______ shirt

Top 5 predictions for missing word: ['yellow', 'red', 'white', 'huge', 'hot']

Actual missing word: red


In [216]:
test_sentence = ['the', 'dog' ,'was', 'barking', 'at', 'me']
missing_word='barking'
predictions = complete_sentence(test_sentence.copy(), missing_word)
test_sentence[test_sentence.index(missing_word)] = "______"
print('\nTest sentence:\n')
print('\t\t'+ ' '.join(test_sentence)+'\n')
print(f"Top 5 predictions for missing word: {predictions}")
print(f"\nActual missing word: {missing_word}")


Test sentence:

		the dog was ______ at me

Top 5 predictions for missing word: ['point', 'staring', 'struck', 'amazed', 'least']

Actual missing word: barking


In [217]:
net.similar_words('woman', word_index)

[('woman', -0.0),
 ('man', -1.3918852623910989),
 ('girl', -1.447632050478296),
 ('person', -1.4941693747963491),
 ('priest', -1.809010747263899),
 ('child', -1.8592442530664899),
 ('kid', -1.8920203574213839),
 ('lady', -1.8974354636928639),
 ('policeman', -1.9203765093346499),
 ('nun', -1.9397523988020404)]

In [122]:
net.similar_words('amazing', word_index)

[('amazing', -0.0),
 ('incredible', -1.0583268338046543),
 ('excellent', -1.198017297217095),
 ('awesome', -1.3299989299668475),
 ('outstanding', -1.3821164495577636),
 ('exceptional', -1.5083247701548774),
 ('extraordinary', -1.6097012692867936),
 ('astonishing', -1.7085799856837172),
 ('interesting', -1.7532179810511248),
 ('admirable', -1.7686110465717226)]

In [121]:
net.similar_words('beautiful', word_index)

[('beautiful', -0.0),
 ('lovely', -1.3854902070497324),
 ('gorgeous', -1.4999506403988139),
 ('vibrant', -1.5413974028249746),
 ('fantastic', -1.5595608665653098),
 ('stunning', -1.610193789412003),
 ('marvelous', -1.671694917057171),
 ('wonderful', -1.68151737172721),
 ('bright', -1.708524991453973),
 ('lush', -1.7213965425013187)]

In [101]:
net.similar_words('amazing', word_index)

[('amazing', -0.0),
 ('appalling', -4.792923468809057),
 ('incredible', -5.4006504928831065),
 ('amateurish', -5.730678993345326),
 ('awesome', -5.735518010316436),
 ('floudering', -5.802349038573225),
 ('idiotic', -5.846023789944601),
 ('dreadful', -5.897410165580461),
 ('radiohead', -5.9080414442329054),
 ('shotty', -5.911583823498508)]

In [134]:
net.similar_words('beautiful', word_index)

[('beautiful', -0.0),
 ('lovely', -1.340171670637983),
 ('wonderful', -1.4544853671590974),
 ('gorgeous', -1.469656208009047),
 ('fantastic', -1.493589105399052),
 ('stunning', -1.5379319783389194),
 ('vibrant', -1.6092332903789015),
 ('brilliant', -1.6456653276148194),
 ('marvelous', -1.6997167536323892),
 ('remarkable', -1.7106539688267413)]

In [133]:
net.similar_words('awful', word_index)

[('awful', -0.0),
 ('abysmal', -1.5808150077587555),
 ('appalling', -1.5943662703737012),
 ('awesome', -1.619191395636564),
 ('ok', -1.6742914967443405),
 ('dreadful', -1.6791055579462923),
 ('terrible', -1.6894983350141908),
 ('laughable', -1.727787534898245),
 ('okay', -1.7392721192672178),
 ('incredible', -1.7517186863862413)]

In [132]:
net.similar_words('terrible', word_index)

[('terrible', -0.0),
 ('horrible', -1.0003107948169352),
 ('dreadful', -1.2570685910972335),
 ('horrendous', -1.4245890490929722),
 ('horrid', -1.453505067277493),
 ('laughable', -1.458653268798284),
 ('fantastic', -1.509142601978743),
 ('pathetic', -1.50942261299218),
 ('brilliant', -1.520465343586054),
 ('ridiculous', -1.5651783661268759)]

In [218]:
net.similar_words('king', word_index)

[('king', -0.0),
 ('boogeyman', -2.078429776894064),
 ('hawking', -2.1883476971125586),
 ('bigwig', -2.2379397972456565),
 ('pinter', -2.2584340769642273),
 ('gospel', -2.25894116850655),
 ('blackmailer', -2.2635661618814256),
 ('occupant', -2.2779849405068626),
 ('aristocracy', -2.2791282687631047),
 ('boorish', -2.286695419905272)]

In [128]:
net.similar_words('person', word_index)

[('person', -0.0),
 ('woman', -1.4941693747963491),
 ('man', -1.5708201411400173),
 ('girl', -1.8768443914560258),
 ('kid', -2.018671045086497),
 ('guy', -2.0505786002915714),
 ('priest', -2.062187302557605),
 ('child', -2.09051224909826),
 ('soldier', -2.097332338682669),
 ('farmer', -2.155770412193974)]

In [130]:
net.similar_words('man', word_index)

[('man', -0.0),
 ('woman', -1.3918852623910989),
 ('person', -1.5708201411400173),
 ('girl', -1.6251092943780883),
 ('guy', -1.6782542629322135),
 ('soldier', -1.7136194710035026),
 ('boy', -1.7660792041766658),
 ('kid', -1.7823612535816316),
 ('lady', -1.8437585689328038),
 ('farmer', -1.8481688263349862)]