## Title :
Pavlos Recurrent Unit

## Description :
The goal of this exercise is to build the **Pavlos Recurrent Unit** discussed in class.

<img src="../fig/fig1.png" style="width: 500px;">

<img src="../fig/fig2.png" style="width: 500px;">

Alternative notation used in the exercise:

<img src="../fig/fig3.png" style="width: 500px;">

## Instructions:
- Read the IMDB dataset from the helper code given.
- Take a quick look at your training inputs and labels.
- Pad the values to a fix number `max_words` in-order to have sequences of the same size.
- Fill in the helper code given to build the PRU cell.
- Using the tensorflow.keras Functional API, build, compile and fit the PRU RNN and evaluate it on the test set.
- For reference, also refit the model with a vanilla RNN and a GRU.
- Again evaluate the model performance on the test set of both models and compare it with the PRU unit.

## Pavlos Recurrent Unit   <img src="./favicon.ico" alt="Pavlos" style =width:40px /> 

In this exercise, we will build the PRU as discussed in class to perform sentiment analysis in tensorflow.keras.
We will continue to use the custom dataset from the previous exercise.

In [137]:
# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras.layers import RNN
from tensorflow.keras.models import Model,Sequential
from tensorflow.keras.layers import Input,Dense,Embedding
from tensorflow.keras.layers import SimpleRNN
from tensorflow.keras.preprocessing import sequence
import pickle
from tensorflow.keras.datasets import imdb

In [138]:
# We use the same dataset as the previous exercise 
# with open('imdb_mini.pkl','rb') as f:
    # X_train, y_train, X_test, y_test = pickle.load(f)

In [139]:
import pandas as pd
data = pd.read_csv("IMDB Dataset.csv")

In [140]:
# https://www.kaggle.com/code/rafaeltiedra/step-by-step-imdb-sentiment-analysis

import re

def process(x):
    x = re.sub('[,\.!?:()"]', '', x)
    x = re.sub('<.*?>', ' ', x)
    x = re.sub('http\S+', ' ', x)
    # \S+: matches one or more non-whitespace characters
    x = re.sub('[^a-zA-Z0-9]', ' ', x)
    # matches any character that is not (^ negation) an uppercase letter (A-Z), a lowercase letter (a-z), or a digit (0-9).
    x = re.sub('\s+', ' ', x)
    # \s: matches any whitespace character, such as spaces, tabs, and newline characters.
    return x.lower().strip()

data['review'] = data['review'].apply(lambda x: process(x))

In [141]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to /home/ting/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /home/ting/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [142]:

sw_set = set(nltk.corpus.stopwords.words('english'))

def sw_remove(x):
    words = nltk.tokenize.word_tokenize(x)
    filtered_list = [word for word in words if word not in sw_set]
    return ' '.join(filtered_list)

data['review'] = data['review'].apply(lambda x: sw_remove(x))

In [143]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(data, test_size = 0.2, random_state=109)

In [144]:
train.head()

Unnamed: 0,review,sentiment
648,rented film interest american history especial...,positive
25489,dumb excuse thriller absolutely zero chemistry...,negative
23384,bit hope hour long film made footage old pover...,negative
28837,robert jannuciluca venantini venantino venanti...,positive
12168,gamer say like film fact right hate tried watc...,negative


In [145]:
X_train = train["review"]
y_train = 1 * (train["sentiment"] == "positive")
X_test = test["review"]
y_test = 1*(test["sentiment"] == "positive")

In [146]:
y_train

648      1
25489    0
23384    0
28837    1
12168    0
        ..
16368    1
16525    1
7925     0
19701    1
44294    1
Name: sentiment, Length: 40000, dtype: int64

In [147]:
from keras.preprocessing.text import Tokenizer

dict_size = 5000
tokenizer = Tokenizer(num_words=dict_size)
tokenizer.fit_on_texts(data['review'])

In [148]:
train_rev_tokens = tokenizer.texts_to_sequences(X_train)
test_rev_tokens = tokenizer.texts_to_sequences(X_test)
seq_lengths =  np.array([len(sequence) for sequence in train_rev_tokens])

In [149]:
# Similar to the previous exercise, we will pre-preprocess our review sequences
# We fix the vocabulary size to 5000 because our custom 
# dataset was curated with that
vocabulary_size = 5000
# Max word length for each review will be 500
max_words = 500
# we set the embedding size to 32
embedding_size=32
# Pre-padding sequences to max_words lenth
train_rev_tokens = sequence.pad_sequences(train_rev_tokens, maxlen=max_words,padding='pre')
test_rev_tokens = sequence.pad_sequences(test_rev_tokens, maxlen=max_words,padding='pre')

In [151]:
# We create the mapping between words and sequences
word2id = imdb.get_word_index()
# Retrieves a dict mapping words to their index in the IMDB dataset.

# We need to adjust the mapping by 3 because of tensorflow.keras preprocessing
# more here: https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset

word2id = {k:(v+3) for k,v in word2id.items()}
word2id["<PAD>"] = 0
# adds a special token "<PAD>" to the word2id dictionary and assigns it the numeric ID 0
word2id["<START>"] = 1
word2id["<UNK>"] = 2
# represent unknown or out-of-vocabulary words.
word2id["<UNUSED>"] = 3
# The purpose of this token may vary based on the specific implementation or use case, but it's often included for flexibility.

# Reversing the key,value pair will give the id2word
id2word = {i: word for word, i in word2id.items()}

In [152]:
word2id

{'fawn': 34704,
 'tsukino': 52009,
 'nunnery': 52010,
 'sonja': 16819,
 'vani': 63954,
 'woods': 1411,
 'spiders': 16118,
 'hanging': 2348,
 'woody': 2292,
 'trawling': 52011,
 "hold's": 52012,
 'comically': 11310,
 'localized': 40833,
 'disobeying': 30571,
 "'royale": 52013,
 "harpo's": 40834,
 'canet': 52014,
 'aileen': 19316,
 'acurately': 52015,
 "diplomat's": 52016,
 'rickman': 25245,
 'arranged': 6749,
 'rumbustious': 52017,
 'familiarness': 52018,
 "spider'": 52019,
 'hahahah': 68807,
 "wood'": 52020,
 'transvestism': 40836,
 "hangin'": 34705,
 'bringing': 2341,
 'seamier': 40837,
 'wooded': 34706,
 'bravora': 52021,
 'grueling': 16820,
 'wooden': 1639,
 'wednesday': 16821,
 "'prix": 52022,
 'altagracia': 34707,
 'circuitry': 52023,
 'crotch': 11588,
 'busybody': 57769,
 "tart'n'tangy": 52024,
 'burgade': 14132,
 'thrace': 52026,
 "tom's": 11041,
 'snuggles': 52028,
 'francesco': 29117,
 'complainers': 52030,
 'templarios': 52128,
 '272': 40838,
 '273': 52031,
 'zaniacs': 52133,

### ⏸ For the current problem, if the memory state size is 5, what will be the dimension of $W_{xh}$? 


#### A. (32,32)
#### B. (32,5)
#### D. (5,5)

In [153]:
### edTest(test_chow1) ###
# Submit an answer choice as a string below (eg. if you choose option A, put 'A')
answer1 = 'B'

In [162]:
# Complete the helper code below to build the Pavlos Recurrent Unit
# We do this by building a PRU cell unit
# which we can wrap around tf.keras.layers.RNN
# Read more here on layer subclassing https://keras.io/guides/making_new_layers_and_models_via_subclassing/

class PRUCell(tf.keras.layers.Layer):
    def __init__(self,units,**kwargs):
        self.units = units
        self.state_size = units
        self.activation = tf.math.tanh
        self.recurrent_activation = tf.math.sigmoid
        super(PRUCell, self).__init__(**kwargs)
        # calls the constructor of the parent class (tf.keras.layers.Layer)
        # It initializes the parent class, passing any additional keyword arguments (**kwargs) to it.
        # ensures that the initialization of both the child class and the parent class is properly executed. 
        
                
        # In the build function we initialize the weights
        # Which will be used for training        
    def build(self, input_shape):
        
        # Initializing weights for candidate Ht
        ## W_{XH}
        self.kernel_h = self.add_weight(shape=(input_shape[-1], self.units),
                                      initializer='uniform',
                                      name='kernel')
        ## W_{HH}
        self.recurrent_kernel_h = self.add_weight(
            shape=(self.units, self.units), 
            initializer='uniform',
            name='recurrent_kernel')
    
        
        # Initializing weights for PP gate
        ## W_{XPP} 
        self.kernel_pp = self.add_weight(shape=(input_shape[-1], self.units),
                                      initializer='uniform',
                                      name='PP_kernel')
        ## W_{HPP}
        self.recurrent_kernel_pp = self.add_weight(
            shape=(self.units, self.units),
            initializer='uniform',
            name='PP_recurrent_kernel')

        self.built = True
        # a flag provided by the tf.keras.layers.Layer class to keep track of whether the layer's build process has been completed.
        # The purpose of setting self.built = True is to ensure that the build process only happens once. 
        # When you call a Keras layer's build method, it initializes the layer's weights and any other components necessary for its functioning. 
        # Once this initialization is done, you don't want the build method to be called again accidentally, 
        # as it could lead to reinitialization of the weights and other components, potentially causing issues with training and model behavior.
        
        # Note that we do not include a bias term for ease of understanding
        
    def call(self, inputs, states):
        ## inputs: X_t 
        ## states: h_{t-1}
        ## self.XXXX contains the weights (see above)
        # Previous output comes from states tuple, H_{t-1}
        # print(states)
        print(inputs.shape)
        # x = keras.Input((None, 32)) defines an input placeholder for a sequence of data where each sequence can have variable length. 
        prev_output = states[0]
        
        # First we compute the PPgate
        # WRONG PP_XW = K.dot(self.kernel_pp.T, inputs.reshape(32,1)), reshape is the function for numpy not tensor
        PP_XW = K.dot(inputs, self.kernel_pp)
        # print(PP_XW.shape)
        # dot product between tensors
        PP_HV = K.dot(prev_output, self.recurrent_kernel_pp)
        # print(PP_HV.shape)
        PPgate = self.recurrent_activation(PP_XW + PP_HV)
        # column number are the same, can add up
        
        # Now we use the PPgate as per the equation for candidate Ht
        nn_XW = K.dot(inputs, self.kernel_h)
        # print(nn_XW.shape)
        dotted_output = PPgate*prev_output
        # print(dotted_output.shape)
        nn_HV = K.dot(dotted_output, self.recurrent_kernel_h)
        output = self.activation(nn_HV + nn_XW)
        # print(output.shape)
        return output, [output]    

In [163]:
# Now that we have our PRU RNN
# we will build a simple model similar to the previous exercise
# We will use the functional API to do this

hidden_state_units = 5 

# Specify the input dimensions HINT: It is max_words
inputs = Input(shape=(max_words,))
# The inputs will go in an embedding layer
embedding = Embedding(vocabulary_size,embedding_size, input_length=max_words)(inputs)
# Turns positive integers (indexes) into dense vectors of fixed size.
# embedding_size is the size of the dense vectors representing each word.
# input_length: Length of input sequences, when it is constant. 
# This argument is required if you are going to connect Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed). 

# The embeddings will be an input to the PRU layer
cell = PRUCell(hidden_state_units)
layer = RNN(cell)
hidden_output = layer(embedding)
# The output from the PRU block will go in a dense layer
output = Dense(1, activation='sigmoid')(hidden_output)
# Connecting the architecture using tf.keras.models.Model
pru_model = Model(inputs=inputs, outputs=output)

# Get the summary to see if your model is built correctly
print(pru_model.summary())

(None, 32)
(None, 32)
Model: "model_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_40 (InputLayer)       [(None, 500)]             0         
                                                                 
 embedding_39 (Embedding)    (None, 500, 32)           160000    
                                                                 
 rnn_39 (RNN)                (None, 5)                 370       
                                                                 
 dense_10 (Dense)            (None, 1)                 6         
                                                                 
Total params: 160376 (626.47 KB)
Trainable params: 160376 (626.47 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


In [156]:
embedding.shape
# (None, 500,32) indicates that the tensor has 3 dimensions, where the first dimension can vary depending on the batch size, and the second and third dimensions have a fixed size of 32.

TensorShape([None, 500, 32])

In [157]:
### edTest(test_chow2) ###
# Submit an answer choice as a string below (eg. if you choose option A, put 'A')
answer2 = 'C'

In [158]:
# Compile the model using 'binary_crossentropy' loss 
# and 'adam' optimizer, additionally add 'accuracy' metric
pru_model.compile(optimizer= "Adam", metrics = ["accuracy"], loss = "binary_crossentropy")

In [159]:
# Train the model with appropriate batch size and number of epochs
batch_size = 256
num_epochs = 3
pru_model.fit(train_rev_tokens, y_train, validation_data = (test_rev_tokens, y_test), epochs= num_epochs, batch_size = batch_size)

Epoch 1/3


2023-08-09 22:05:10.216333: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 80000000 exceeds 10% of free system memory.


Epoch 2/3
Epoch 3/3


<keras.src.callbacks.History at 0x7fde4def00d0>

In [161]:
# Evaluate the model on the custom test set and report the 
accuracy = pru_model.evaluate(test_rev_tokens, y_test)[1]
print(f'The accuracy for the PRU model is {100*accuracy:.2f}%')

The accuracy for the PRU model is 83.29%


### 🍲 Adding the bias to the PRU model

Go back and add a bias term to the PRUCell (one for the PPGate and the other for $H_t$)

Does your model performance improve under the same training conditions?


In [None]:
### edTest(test_chow3) ###
# Type your answer within in the quotes given
answer3 = '___'