<h1 align="center"> CSCI 544 - Applied Natural Language Processing</h1>

<h2 align="center">CSCI 544 - Assignment 3</h2> 

<h2>Name: Sri Manvith Vaddeboyina</h2>
<h2>USC ID: 1231409457</h2>

<p></p>

# 1. Data Generation

<b>Importing necessary libraries/packages</b>

In [1]:
import re
import sys 
import numpy as np
import pandas as pd
import contractions
import tensorflow as tf
from gensim import models
from bs4 import BeautifulSoup
import gensim.downloader as api
from sklearn.svm import LinearSVC
from gensim.models import Word2Vec
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

import warnings
warnings.filterwarnings('ignore')

<b>Read Data</b>

Reading Amazon US Beauty Reviews (tsv) dataset and retaining only the following two columns: <br>
<b>1. review_body</b> <br>
<b>2. star_rating</b>

In [2]:
df = pd.read_csv('data.tsv', on_bad_lines = 'skip', sep='\t')

In [3]:
df = df[['review_body','star_rating']]

<b>Dropping the entire rows where any of the column contains NA value</b>

In [4]:
df.dropna(inplace=True)

<b>Keep Reviews and Ratings</b>

<p>Create a three-class classification problem according to the ratings.</p>
<b>Ratings:</b><br>
<b>1 and 2 - class 1</b><br>
<b>3 - class 2</b><br>
<b>4 and 5 - class 3</b>

In [5]:
 df = df[
         df['star_rating'].eq('1') | 
         df['star_rating'].eq('2') | 
         df['star_rating'].eq('3') | 
         df['star_rating'].eq('4') | 
         df['star_rating'].eq('5')
        ]

<b>Verifying the datatype of each column and setting them correctly</b>

In [6]:
df['star_rating']=df['star_rating'].astype(int)
df['review_body']=df['review_body'].astype(str)

<b>Creating a 3-class classification on ratings</b>

In [7]:
def condition(x):
    if x==1 or x==2:
        return 1
    elif x==3:
        return 2
    elif x==4 or x==5:
        return 3
    
df['rating'] = df['star_rating'].apply(condition)

<b>We form three classes and select 20000 reviews randomly from each class.</b>

<b>Randomly selecting 20000 reviews from each of class 1,2 and 3.</b><br>
<b>Total: 60000 reviews</b>

In [8]:
df=df.groupby('rating').sample(n=20000)

In [9]:
df.drop(['star_rating'],inplace=True,axis=1)

<p></p>

<b>Data Cleaning<b>
<b> Removing the following as part of data cleaning:</b><br>
<b>1. URLs</b><br>
<b>2. HTML tags</b><br>
<b>2. Contractions Expansion</b><br>
<b>3. Non-alphabetic characters</b><br>
<b>4. Converting text to lower case</b><br>
<b>5. Removing extra spaces</b><br>

In [10]:
def lower_case(texts):
    return texts.lower()

In [11]:
def cleanhtml(texts):
    regex = re.compile('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});')
    cleantext = re.sub(regex, '', texts)
    
    return cleantext

In [12]:
def remove_url(texts):
    regex = re.compile('http\S+')
    cleantext = re.sub(regex, '', texts)
    return cleantext

In [13]:
def non_alphabetical(texts):
    regex = re.compile('[^a-zA-Z]') 
    cleantext = re.sub(regex, ' ', texts)

    regex = re.compile('_')
    cleantext = re.sub(regex, ' ', cleantext)
    
    return cleantext

In [14]:
def extra_spaces(texts):
    regex = re.compile('[\s]{2,}')
    cleantext = re.sub(regex, ' ', texts)
    
    return cleantext.rstrip()

In [15]:
def contractionfunction(text):
    expanded_words = []
    for word in text.split():
        # using contractions.fix to expand the shotened words
        expanded_words.append(contractions.fix(word))
        
    expanded_words = ' '.join(expanded_words)
    return expanded_words

In [16]:
def corpus_contractions(texts):
    expended_corpus = []
    for text in texts: 
        expended_corpus.append(contractionfunction(text))
    return expended_corpus

In [17]:
review_body = df.copy(deep = True).review_body.tolist() 
labels = df.copy(deep = True).rating.tolist() 
clean_review_body = []

for index , sen in enumerate(review_body):
    sen = lower_case(sen)
    sen = cleanhtml(sen)
    sen = remove_url(sen)
    sen = contractionfunction(sen)
    sen = non_alphabetical(sen)
    sen = extra_spaces(sen)
    clean_review_body.append(sen)

<b>Cleaning data</b>

In [18]:
df['clean_text'] = clean_review_body

# 

# 2. Word Embedding 

# (a) word2vec-google-news-300 Word2Vec model

<b>Loading the pretrained "word2vec-google-news-300" Word2Vec model from gensim library.</b>

In [19]:
google_news_word2vec = api.load('word2vec-google-news-300')

<b>Checking the semantic similarity of example words</b>

In [20]:
google_news_word2vec.most_similar(positive=["king","woman"],negative=["man"])[0]

('queen', 0.7118193507194519)

In [21]:
print("Similarity for: [good, better] : ", google_news_word2vec.similarity(w1="good", w2="better"))
print("Similarity for: [neat, clean] : ", google_news_word2vec.similarity(w1="neat", w2="clean"))
print("Similarity for: [big, huge] : ", google_news_word2vec.similarity(w1="big", w2="huge"))

Similarity for: [good, better] :  0.6120729
Similarity for: [neat, clean] :  0.29077712
Similarity for: [big, huge] :  0.7809856


<p></p>

# (b) Train a Word2Vec model using your own dataset

<b>Training a Word2Vec model using amazon reviews dataset. Generating the tokens and training the word2vec model.</b>

In [22]:
def get_reviews_tokens(reviews):
    reviews_tokens = []
    for rev in reviews:
        reviews_tokens.append(rev.split(" "))
    return reviews_tokens
reviews_tokens = get_reviews_tokens(df['clean_text'])

<b>Training a word2vec model with a embedding size as 300, window size as 13 and min word count as 9</b>

In [23]:
word2vec = Word2Vec(sentences = reviews_tokens, vector_size = 300, window = 13, min_count = 9)

<b>Checking the semantic similarity of example words</b>

In [24]:
w1 = word2vec.wv["good"]
w2 = word2vec.wv["better"]
print("Similarity for: [good, better] : ", cosine_similarity(w1.reshape(1,-1),w2.reshape(1,-1))[0][0])

w3 = word2vec.wv["neat"]
w4 = word2vec.wv["clean"]
print("Similarity for: [neat, clean] : ", cosine_similarity(w3.reshape(1,-1),w4.reshape(1,-1))[0][0])

w5 = word2vec.wv["big"]
w6 = word2vec.wv["huge"]
print("Similarity for: [big, huge] : ", cosine_similarity(w5.reshape(1,-1),w6.reshape(1,-1))[0][0])

Similarity for: [good, better] :  0.35388795
Similarity for: [neat, clean] :  0.19055185
Similarity for: [big, huge] :  0.61514467


<p></p>

<b>What do you conclude from comparing vectors generated by yourself and the pretrained model? Which of the Word2Vec models seems to encode semantic similarities between words better?</b>

<b>Reasoning:</b><br>
The pretrained word2vec model performed well as compared to the custom trained word2vec model.  The reason could be:

Pretrained models have been trained on very large datasets, which allow them to capture a wider range of relationships between words. This makes them more effective at capturing the subtle nuances of language. They have been trained on a diverse range of text data, which makes them more robust and adaptable to different contexts and domains.

# 

# 3. Simple models

# TF-IDF

<b>Splitting data into train and test splits 80:20 to be fed for TF-IDF</b>

In [25]:
X_train, X_test, y_train, y_test = train_test_split(df['clean_text'], df['rating'], test_size=0.20, random_state=42, stratify = df['rating'])

<b>TF-IDF (term frequency-inverse document frequency) is a statistical measure that evaluates how relevant a word is to a document in a collection of documents.</b>

In [26]:
vectorizer = TfidfVectorizer(ngram_range=(1,4))
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
y_train_tfidf = y_train
y_test_tfidf = y_test

<h3>Perceptron</h3>

<b>Perceptron is a single layer neural network that does certain computations to detect features or business intelligence in the input data.</b>

In [27]:
perceptron_text_clf = Perceptron()
perceptron_text_clf.fit(X_train_tfidf, y_train_tfidf)
perceptron_predictions = perceptron_text_clf.predict(X_test_tfidf)
perceptron_accuracy_tfidf = accuracy_score(y_test_tfidf, perceptron_predictions)
print("Accuracy of Perceptron on TF-IDF data : ", perceptron_accuracy_tfidf)

Accuracy of Perceptron on TF-IDF data :  0.7114166666666667


<h3>SVM</h3>

<b>The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points.</b>

In [28]:
svc_text_clf = LinearSVC()
svc_text_clf.fit(X_train_tfidf, y_train_tfidf)
svc_predictions = svc_text_clf.predict(X_test_tfidf)
svc_accuarcy_tfidf = accuracy_score(y_test_tfidf, svc_predictions)
print("Accuracy of SVM on TF-IDF data : ", svc_accuarcy_tfidf)

Accuracy of SVM on TF-IDF data :  0.7431666666666666


<p></p>

# Word2Vec

In [29]:
# Define a function to compute the word embeddings for a sentence
def get_review_embedding(data):
    words = data.split(" ")
    embeddings = np.array([google_news_word2vec[word] for word in words if word in google_news_word2vec])
    if embeddings.size == 0:
        return np.zeros(300)
    return np.mean(embeddings, axis=0)

<b>Creating the train and test data using the get_review_embedding() function. This data is fed to Perceptron and SVM models</b>

In [30]:
# Apply the get_sentence_vector function to each sentence in the X_train_df dataframe
train_review_vectors = [get_review_embedding(sentence) for sentence in X_train]

# Stack the resulting vectors into a 2D numpy array
X_train_w2v = np.stack(train_review_vectors, axis=0)

test_review_vectors = [get_review_embedding(sentence) for sentence in X_test]

# Stack the resulting vectors into a 2D numpy array
X_test_w2v = np.stack(test_review_vectors, axis=0)
y_train_w2v = y_train
y_test_w2v = y_test

<h3>Perceptron</h3>

<b>Perceptron is a single layer neural network that does certain computations to detect features or business intelligence in the input data.</b>

In [31]:
perceptron_text_clf = Perceptron()
perceptron_text_clf.fit(X_train_w2v, y_train_w2v)
perceptron_predictions = perceptron_text_clf.predict(X_test_w2v)
perceptron_accuracy_w2v = accuracy_score(y_test_w2v, perceptron_predictions)
print("Accuracy of Perceptron on Word2Vec data : ", perceptron_accuracy_w2v)

Accuracy of Perceptron on Word2Vec data :  0.5465


<h3>SVM</h3>

<b>The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points.</b>

In [32]:
svc_text_clf = LinearSVC()
svc_text_clf.fit(X_train_w2v, y_train_w2v)
svc_predictions = svc_text_clf.predict(X_test_w2v)
svc_accuracy_w2v = accuracy_score(y_test_w2v, svc_predictions)
print("Accuracy of SVM on Word2Vec data : ", svc_accuracy_w2v)

Accuracy of SVM on Word2Vec data :  0.6711666666666667


<p></p>

<b>What do you conclude from comparing performances for the models trained using the two different feature types (TF-IDF and your trained Word2Vec features)?</b><br><br>
<b>Reasoning:</b><br>
Between the TF-IDF and google pretrained Word2Vec accuracies on Perceptron and SVM, the accuracies of TF-IDF are better compared to pretrained Word2Vec. Reason could be that:
While Word2Vec is a powerful technique for capturing semantic and syntactic relationships between words, TF-IDF may be more appropriate for certain tasks that rely on keyword matching or text classification. In our current use-case, it is the lexical similarity that plays a greater role than the semantic similarity.


# 

# 4. Feedforward Neural Networks

# (a) FNN

<b>A Feed Forward Neural Network is an artificial neural network in which the connections between nodes does not form a cycle. </b>

In [33]:
y_train_w2v = np.array(y_train_w2v)
y_test_w2v = np.array(y_test_w2v)

<b>Feed forward Neural Network code with two hidden layers, each with 100 and 10 nodes, respectively. relu and softmax activation is used in the code and "y values-1" is taken to have class labels as 0, 1, 2 instead of 1, 2, 3</b>

In [34]:
def FNN(x_train, y_train, x_test, y_test, num_features, epochs, batch_size, learning_rate_val):
    model_fnn = tf.keras.Sequential(
                                    [   tf.keras.layers.InputLayer((num_features,)),
                                        tf.keras.layers.Dense(100,activation='relu'),
                                        tf.keras.layers.Dense(10,activation='relu'),
                                        tf.keras.layers.Dense(3,activation='softmax')
                                    ]
                                )

    model_fnn.compile(
                    optimizer = Adam(learning_rate=learning_rate_val),
                    loss='sparse_categorical_crossentropy',
                    metrics=['accuracy']
                )

    print(model_fnn.summary())

    model_fnn.fit(x_train,y_train-1, batch_size = batch_size, epochs = epochs)

    result = model_fnn.evaluate(x_test,y_test-1)
    return result[1]

<b>Passing the num_features, epochs, batch size and learning rate parameters to the FNN function</b>

In [35]:
fnn_accuracy = FNN(X_train_w2v, y_train_w2v, X_test_w2v, y_test_w2v, 300, 50, 64,0.001)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 100)               30100     
                                                                 
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
 dense_2 (Dense)             (None, 3)                 33        
                                                                 
Total params: 31,143
Trainable params: 31,143
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/50


2023-03-01 20:37:34.634487: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
2023-03-01 20:37:34.634526: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-03-01 20:37:34.634549: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (nlp): /proc/driver/nvidia/version does not exist
2023-03-01 20:37:34.634885: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [36]:
print("Accuracy of FNN model: ",fnn_accuracy)

Accuracy of FNN model:  0.6727499961853027


<p></p>

# (b) Top 10 FNN

<b>Concatenate the first 10 Word2Vec vectors for each review. Padding 0 values to the vectors if there is no sufficient review length</b>

In [37]:
def get_review_top10(reviews):
    top10_embeddings=[]
    count = 0

    for word in reviews.split(" "):
        if word in google_news_word2vec:
            count = count + 1
            if count > 10:
                break
            else:
                word_embedding = google_news_word2vec[word]
                top10_embeddings.extend(word_embedding)
                
    length = len(top10_embeddings)            
    if length == 0:
        return np.zeros(3000)
    
    if length < 3000:
        less = 3000 - length
        top10_embeddings += less * [0]

    return top10_embeddings

In [38]:
# Apply the get_sentence_vector function to each sentence in the X_train_df dataframe
train_top10_review_vectors = X_train.apply(get_review_top10)

# Stack the resulting vectors into a 2D numpy array
X_train_top10_w2v = np.stack(train_top10_review_vectors, axis=0)

test_top10_review_vectors = X_test.apply(get_review_top10)

# Stack the resulting vectors into a 2D numpy array
X_test_top10_w2v = np.stack(test_top10_review_vectors, axis=0)

In [39]:
fnn_top10_accuracy = FNN(X_train_top10_w2v, y_train_w2v, X_test_top10_w2v, y_test_w2v, 3000, 50, 64, 0.001)

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 100)               300100    
                                                                 
 dense_4 (Dense)             (None, 10)                1010      
                                                                 
 dense_5 (Dense)             (None, 3)                 33        
                                                                 
Total params: 301,143
Trainable params: 301,143
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoc

In [40]:
print("Accuracy of Top 10 FNN model: ",fnn_top10_accuracy)

Accuracy of Top 10 FNN model:  0.5889166593551636


<p></p>

<b>What do you conclude by comparing accuracy values you obtain with
those obtained in the “’Simple Models” section.</b><br>

<b> Reasoning:</b>

The results obtained from simple models are slightly better than the FNN and top 10 FNN. There is a possibility of FNN giving better accuracies if we build a much more complex architecture. The FNN model that uses only the first 10 word embeddings concatenated to represent the entire review has poorer performance compared to the traditional FNN implementation and the simple models. 

The reasons for this poor performance: Just using the first 10 words may not be sufficient and concatenating word embeddings may not accurately capture the contextual semantics.

# 

# 5. Recurrent Neural Networks

<b>Preparing data for RNN, GRU and LSTM code</b><br>
<b>Limiting the maximum review length to 20 by truncating longer reviews and padding shorter reviews with a null value (0).</b><br>
<b>Adding a random string value that is not present in word2vec model as part of padding if the length < 20 </b>

In [41]:
X_train_processed = []
for sentence in X_train.tolist():
    # Split the sentence by space and take the first 20 words
    words = sentence.split(' ')[:20]

    # Pad the rest with random_word if the sentence is less than 20 words
    words += ['random_word'] * (20 - len(words))

    # Replace each word with its corresponding vector or the default vector
    vectorized_words = []
    for word in words:
        if word in google_news_word2vec:
            vectorized_words.append(google_news_word2vec[word])
        else:
            vectorized_words.append(np.zeros((300,)))

    # Append the processed sentence to the output array
    X_train_processed.append(vectorized_words)

# Convert the output array to a numpy array
X_train = np.array(X_train_processed)

In [42]:
# Process X_train
X_test_processed = []
for sentence in X_test.tolist():
    # Split the sentence by space and take the first 20 words
    words = sentence.split(' ')[:20]

    # Pad the rest with random_word if the sentence is less than 20 words
    words += ['random_word'] * (20 - len(words))

    # Replace each word with its corresponding vector or the default vector
    vectorized_words = []
    for word in words:
        if word in google_news_word2vec:
            vectorized_words.append(google_news_word2vec[word])
        else:
            vectorized_words.append(np.zeros((300,)))

    # Append the processed sentence to the output array
    X_test_processed.append(vectorized_words)

# Convert the output array to a numpy array
X_test = np.array(X_test_processed)

# (a) Simple RNN

RNN works on the principle of saving the output of a particular layer and feeding this back to the input in order to predict the output of the layer.

<b>RNN code with input layer dimensions of (20,300) and 1 SimpleRNN layer with hidden state size of 20. "y values-1" is taken to have class labels as 0, 1, 2 instead of 1, 2, 3</b>

In [43]:
def RNN(x_train, y_train, x_test, y_test, epochs, batch_size, learning_rate_val):
    model_rnn = tf.keras.Sequential([ tf.keras.layers.InputLayer((20,300)),
                                    tf.keras.layers.SimpleRNN(20),
                                    tf.keras.layers.Dense(3,activation='softmax')])


    model_rnn.compile (
                        optimizer = Adam(learning_rate = learning_rate_val),
                        loss='sparse_categorical_crossentropy',
                        metrics=['accuracy']
                    )

    print(model_rnn.summary())
    
    model_rnn.fit(x_train,y_train-1, batch_size = batch_size, epochs = epochs)
    result = model_rnn.evaluate(x_test,y_test-1)
    return result[1]

In [44]:
rnn_accuracy = RNN(X_train, y_train, X_test, y_test, 50, 64, 0.001)

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn (SimpleRNN)      (None, 20)                6420      
                                                                 
 dense_6 (Dense)             (None, 3)                 63        
                                                                 
Total params: 6,483
Trainable params: 6,483
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38

In [45]:
print("Accuracy of RNN model:",rnn_accuracy)

Accuracy of RNN model: 0.6434999704360962


<P></P>

# (b) GRU

A gated recurrent unit (GRU) is part of a specific model of recurrent neural network that intends to use connections through a sequence of nodes to perform machine learning tasks associated with memory and clustering, for instance, in speech recognition. Gated recurrent units help to adjust neural network input weights to solve the vanishing gradient problem that is a common issue with recurrent neural networks.

<b>GRU code with input layer dimensions of (20,300) and 1 GRU layer with hidden state size of 20. "y values-1" is taken to have class labels as 0, 1, 2 instead of 1, 2, 3</b>

In [46]:
def GRU(x_train, y_train, x_test, y_test, epochs, batch_size, learning_rate_val):
    model_gru = tf.keras.Sequential([ tf.keras.layers.InputLayer((20,300)),
                                    tf.keras.layers.GRU(20),
                                    tf.keras.layers.Dense(3,activation='softmax')])


    model_gru.compile (
                        optimizer = Adam(learning_rate = learning_rate_val),
                        loss='sparse_categorical_crossentropy',
                        metrics=['accuracy']
                    )

    print(model_gru.summary())

    model_gru.fit(x_train,y_train-1, batch_size = batch_size, epochs = epochs)

    result = model_gru.evaluate(x_test,y_test-1)
    return result[1]

In [47]:
gru_accuracy = GRU(X_train, y_train, X_test, y_test, 50, 64, 0.001)

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 gru (GRU)                   (None, 20)                19320     
                                                                 
 dense_7 (Dense)             (None, 3)                 63        
                                                                 
Total params: 19,383
Trainable params: 19,383
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 

In [48]:
print("Accuracy of GRU model:",gru_accuracy)

Accuracy of GRU model: 0.6847500205039978


<p></p>

# (c) LSTM

LSTM stands for long short-term memory networks, used in the field of Deep Learning. It is a variety of recurrent neural networks (RNNs) that are capable of learning long-term dependencies, especially in sequence prediction problems.

<b>LSTM code with input layer dimensions of (20,300) and 1 LSTM layer with hidden state size of 20. "y values-1" is taken to have class labels as 0, 1, 2 instead of 1, 2, 3</b>

In [49]:
def LSTM(x_train, y_train, x_test, y_test, epochs, batch_size, learning_rate_val):
    model_lstm = tf.keras.Sequential([ tf.keras.layers.InputLayer((20,300)),
                                    tf.keras.layers.LSTM(20),
                                    tf.keras.layers.Dense(3,activation='softmax')])


    model_lstm.compile (
                        optimizer = Adam(learning_rate = learning_rate_val),
                        loss='sparse_categorical_crossentropy',
                        metrics=['accuracy']
                    )

    print(model_lstm.summary())

    model_lstm.fit(x_train,y_train-1, batch_size = batch_size, epochs = epochs)

    result = model_lstm.evaluate(x_test,y_test-1)
    return result[1]

In [50]:
lstm_accuracy = LSTM(X_train, y_train, X_test, y_test, 50, 64, 0.001)

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 20)                25680     
                                                                 
 dense_8 (Dense)             (None, 3)                 63        
                                                                 
Total params: 25,743
Trainable params: 25,743
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 

In [51]:
print("Accuracy of LSTM model:",lstm_accuracy)

Accuracy of LSTM model: 0.6710000038146973


<p></p>

<h3>Accuracy values</h3>

In [52]:
print("Accuracy of Perceptron on TF-IDF data : ", perceptron_accuracy_tfidf)
print("Accuracy of SVM on TF-IDF data : ", svc_accuarcy_tfidf)
print("Accuracy of Perceptron on Word2Vec data : ", perceptron_accuracy_w2v)
print("Accuracy of SVM on Word2Vec data : ", svc_accuracy_w2v)
print("Accuracy of FNN model: ",fnn_accuracy)
print("Accuracy of Top 10 FNN model: ",fnn_top10_accuracy)
print("Accuracy of RNN model:",rnn_accuracy)
print("Accuracy of GRU model:",gru_accuracy)
print("Accuracy of LSTM model:",lstm_accuracy)

Accuracy of Perceptron on TF-IDF data :  0.7114166666666667
Accuracy of SVM on TF-IDF data :  0.7431666666666666
Accuracy of Perceptron on Word2Vec data :  0.5465
Accuracy of SVM on Word2Vec data :  0.6711666666666667
Accuracy of FNN model:  0.6727499961853027
Accuracy of Top 10 FNN model:  0.5889166593551636
Accuracy of RNN model: 0.6434999704360962
Accuracy of GRU model: 0.6847500205039978
Accuracy of LSTM model: 0.6710000038146973


<p></p>

<b>What do you conclude by comparing accuracy values you obtain by GRU, LSTM, and Simple RNN.</b>

<b>Reasoning:</b><br>
The accuracy values of SimpleRNN, GRU and LSTM are in this order: LSTM >= GRU > SimpleRNN.
The reason could be that it is common to observe that GRU and LSTM tend to outperform Simple RNN in tasks that require processing of long-term dependencies or maintaining memory over a longer period of time. This is because GRU and LSTM have more sophisticated gating mechanisms that allow them to selectively forget or store information in their memory cells, while Simple RNN lacks these mechanisms and is prone to the vanishing gradient problem.

<h3 align="center">Thank You</h3>