# Lab 2 - GRU and text classification

## Structure of weights of GRU
Gated Recurrent Unit (GRU) is a simplified variation on the LSTM designed to solve the vanishing gradient problem. It uses two gates: update gate and reset gate. 

### Update gate
$$
z_t = \sigma(W^{(z)}x_t + U^{(z)}h_{t-1})
$$

$z_t$ is a value between $0$ and $1$, indicating how much information from the input at time step $t$, i.e. $x_t$, should be preserved.

### Reset gate
$$
r_t = \sigma(W^{(r)}x_t + U^{(r)}h_{t-1})
$$
$r_t$ us a value between $0$ and $1$, indicating how much information from the past, i.e. $h_{t-1}$, should be forgotten. 

### Preliminary current memory content
$$
h'_t = \tanh(Wx_t + r_t \odot Uh_{t-1})
$$
The preliminary current memory is composed of new memory content, which involves the input $x_t$, as well as information from the past, filtered by the reset gate.

### Final current memory content
$$
h_t = z_t \odot h_{t-1} + (1 - z_t) \odot h'_t
$$
The final current memory is based on the weighted combination of information from the past, i.e. $h_{t-1}$, and the preliminary current memory content, i.e. $h'_t$.

## Classification with GRU

### Setup

In [1]:
import numpy as np
import tensorflow_datasets as tfds
import tensorflow as tf

tfds.disable_progress_bar()

2022-10-12 17:59:55.340190: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
%matplotlib inline
import matplotlib.pyplot as plt

def plot_graphs(history, metric):
    plt.plot(history.history[metric])
    plt.plot(history.history['val_' + metric], '')
    plt.xlabel("Epochs")
    plt.ylabel(metric)
    plt.legend([metric, 'val_'+metric])
    plt.show()

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/Users/timowang/miniconda3/envs/py37/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/5q/jxx14mdj4l3588yg86j6l66m0000gn/T/ipykernel_46067/3940140945.py", line 1, in <module>
    get_ipython().run_line_magic('matplotlib', 'inline')
  File "/Users/timowang/miniconda3/envs/py37/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2418, in run_line_magic
    result = fn(*args, **kwargs)
  File "/Users/timowang/miniconda3/envs/py37/lib/python3.7/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/Users/timowang/miniconda3/envs/py37/lib/python3.7/site-packages/IPython/core/magic.py", line 187, in <lambda>
    call = lambda f, *a, **k: f(*a, **k)
  File "/Users/timowang/miniconda3/envs/py37/lib/python3.7/site-packages/IPython/core/magics/pylab.py", line 99, in matplotlib
    

TypeError: object of type 'NoneType' has no len()

### Prepare the data
For this example, we will use the yelp polarity review dataset. You can find details about the dataset [here](https://www.tensorflow.org/datasets/catalog/yelp_polarity_reviews).

In [None]:
dataset, info = tfds.load("yelp_polarity_reviews", with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset["train"], dataset["test"]

print(train_dataset.element_spec)

We can examine the content of the dataset. Then, we can shuffle the data and create batches for training.

In [None]:
for example, label in train_dataset.take(1):
    print("example\t", example)
    print("label\t", label)
    
print(example.shape)

In [None]:
BUFFER_SIZE = 10000
BATCH_SIZE = 64

train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

In [None]:
for example, label in train_dataset.take(1):
  print('texts: ', example.numpy()[:3])
  print()
  print('labels: ', label.numpy()[:3])

Tensorflow Keras provides `TextVectorization`, a layer/class for processing and encoding text into tokens for training. We will use its default version in this example. 

First, we will use the `adapt` method to set the vocabulary based on `train_dataset`. 

In [None]:
VOCAB_SIZE = 5000
text_vectorizer = tf.keras.layers.TextVectorization(max_tokens=VOCAB_SIZE)
text_vectorizer.adapt(train_dataset.map(lambda text, label: text))

Once that is done, we will encode texts into indices. By default, the tensors of indices are 0-padded to the longest sequence in the batch, if we do not set a fixed `output_sequence_length`.

In [None]:
encoded_example = text_vectorizer(example)[:3].numpy()
print(example[:3])
print(encoded_example)
print(encoded_example.shape)

### Build the classification model
We will employ `tf.keras.Sequential` to combine a sequence of Keras layers to build the model. 

First, we use `TextVectorization` layer to convert text to a sequence of token indicies, as is shown above. 

Then, we add a `Embedding` layer that can be trained to produce a vector for each token. This adopts the idea of Word2Vec to reduce the dimension of word representations (i.e. instead of one-hot encoding vector whose length is the size of the vocabulary, we have a smaller but more informational word vectors). 

After that, we utilize a recurrent neural network (RNN), in this example, a GRU, to process sequence input by iterating through the elements. Since we are approaching a text classification task, we will process the input both forward and backward through the RNN layer to obtain better representations on the text sequences. 

Lastly, once we obtain the text representations as single vectors using RNN, we use `tf.keras.layers.Dense` to do some further learning on the data, and generate a single logit as the classification output.

In [None]:
model = tf.keras.Sequential([
    text_vectorizer, 
    tf.keras.layers.Embedding(
        input_dim=len(text_vectorizer.get_vocabulary()),
        output_dim=256,
        mask_zero=True
    ),
    tf.keras.layers.Bidirectional(tf.keras.layers.GRU(256)),
    tf.keras.layers.Dense(512),
    tf.keras.layers.Dense(256),
    tf.keras.layers.Dense(64),
    tf.keras.layers.Dense(1),
])

Now we will check if the model produces expected output with a sample text, and we will compile the Keras model to configure the training process. 

In [None]:
sample_text = "Ate dinner here a week ago.  They advertised a free buffet for happy hour but turns out it is nothing more than chips and salsa.   They also have a fruit platter but who is going to eat that with the cantaloupe killing so many people. When we saw the \\""Buffet\\"" we decided to try dinner.  Did not like the food. Just a tiny step above fast food.  The side dish was sone kind of noodles but they were cut up in about 1 inch pieces.  The beans were mashed and I almost gagged on them.  The chile relleno was the best part of the meal but still not up to par with the Barrio Cafe,  I do not plan on going back here again.  Can\'t believe their quality of food since they are located in a high class neighborhood.\\nTheir adsvertising photos below are terrible.  Photgrapher used very bad lighting and it shows."
predictions = model.predict(np.array([sample_text]))
print(predictions)

In [None]:
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(1e-4), metrics=["accuracy"])

### Train the model

In [None]:
history = model.fit(train_dataset.take(1000), epochs=10, validation_data=test_dataset.take(100), validation_steps=1)

In [None]:
test_loss, test_acc = model.evaluate(test_dataset)

In [None]:
print('Test Loss:', test_loss)
print('Test Accuracy:', test_acc)

In [None]:
predictions = model.predict(np.array([sample_text]))
print(predictions)

In [None]:
plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')

## Hands on
Find a text dataset appropriate for text classification you are interested in available in the [tensorflow dataset collection](https://www.tensorflow.org/datasets/catalog/overview). Try training different versions of RNN models (e.g. LSTM and GRU) and compare the training results such as the loss curves as well as relevant metrics (e.g. accuracy, F1, etc.).

I would suggest that you do this exercise in the iPython console, which is similar to Jupyter Notebook, but works in the terminal. This way you can also try train your models on one of the deepdish servers. 

### TODOs
1. Find and download a dataset you are interested in, and examine the dataset content.
2. Build a `TextVectorizer` for your dataset.
3. Setup classification models based on uni-directional LSTM, uni-directional GRU, bi-directional LSTM and bi-directonal GRU. 
4. Run training and evaluation on these four models and collect results. 
5. Compare the performance in terms of relevant metrics and note the model with best performance. 