# Recurrent Neural Network
***
**Name**: Timothy Mason
***

## Goal
The goal of this assignment is to use TensorFlow to build some recurrent neural nets (RNNs) and to understand their limitations through experimentation.

## The Task
You will implement a recurrent neural net to learn the parity operator. The net will have a single input unit and a single output unit, and a fully-connected layer of H hidden units. The inputs and target outputs are binary. When an input sequence is presented, the output state at the end of the sequence should be a parity bit: output should be 1 if the input has an odd number of '1' values. For example, the sequence 1-0-0-1-0-1 should yield output 1 and the sequence 0-0-0-0-1-1 should yield output 0. Note that a target is given only at the end of each sequence. (Parity is easy to learn if there is a target at each step that indicates parity given the sequence so far.)

Parity is a hard problem for neural nets to learn because very similar inputs produce different outputs, and very dissimilar inputs can produce the same output.

The aspects of the task we will manipulate are:  H, the number of hidden units, N, the length of the input strings, and the activation function for the hidden units, either tanh or LSTM-style neurons. The output neuron should have a logistic activation function.

Tip: get started early.  Depending on your system, these nets can take several minutes to train.  Exploring hyperparameters (such as training rate) will be critical for success.


### Some Help
Below are some helper codes to:
- generate input strings and their parity.
- provide a callback to trigger early stopping during training
- plotting

In [1]:
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf
import numpy as np
from matplotlib import pyplot

def gen_binary_sequences(num, length):
  '''
  Generate :num: sequences of length :length:.
  '''
  return(np.random.randint(0, 2, size=(num, length), dtype=np.int32))

def calc_parity(seqs):
  '''
  Calculate sequence parity (1 if odd number of 1s, 0 if even number of 1s)
  '''
  return(seqs.sum(axis=-1) % 2)


class create_accuracy_callback(tf.keras.callbacks.Callback):
    '''
    Callback function to stop training at 100% accuracy
    '''
    def on_epoch_end(self, epoch, logs={}):
        if(logs.get('accuracy') == 1.0 and logs.get('val_accuracy') == 1.0):
            self.model.stop_training = True


def plot_accuracies(accuracies,Ns,Hs):
  '''
  Make a graph of mean % correct (and standard error) on the test set for the different values of H and N.
  Input: ndarray of |Hs|x|Ns|x(reps)
         actual values of Ns and Hs
  '''
  lenH,lenN,lenreps = accuracies.shape
  assert(lenH == len(Hs))
  assert(lenN == len(Ns))
  accuracies_mean = accuracies.mean(axis=2)
  accuracies_std = accuracies.std(axis=2)
  accuracies_stderr = accuracies_std/np.sqrt(lenreps)
  # plot
  fig = pyplot.figure()
  ax = fig.add_subplot(111)
  ax.axhline(0.5,linestyle="--",color="gray") # chance baseline
  centers = np.arange(lenN)
  for Hindex, H in enumerate(Hs):
    ax.bar(centers + 0.8/lenH*(Hindex-(lenH-1)/2),
           accuracies_mean[Hindex], 0.8/lenH, yerr=accuracies_stderr[Hindex],
           alpha=0.5, label=f"{H} hidden units")
  ax.set_xlabel("sequence length")
  pyplot.xticks(centers,Ns)
  ax.set_ylabel("accuracy")
  pyplot.legend(loc="lower left")

TensorFlow 2.x selected.


## Part 1
**Part 1.a**<br>
Fill in the code to create a net given H and N using the tanh activation function for the hidden units.  Keras has a number of RNN helper functions, although you can also write your own custom layers.

Remember that the net should only take one bit of input at a time from the input sequence, and output one logistic value (between 0 and 1) only after the input sequence is complete.

In [0]:
import tensorflow as tf
from tensorflow.keras import layers
from datetime import datetime

# %load_ext tensorboard
# # Clear any logs from previous runs
# !rm -rf ./logs/ 

def build_model(N,H):
    '''
    Builder for an RNN model. Model inputs are binary sequences of length :N:.

    At each sequence position, the input and prior state should be fully connected 
    to :H: hidden units with tanh activation.

    The output of the last state of the RNN should be fully connected to a single
    unit with logistic activation, to perform the final classification of the 
    sequence.
    '''

    # based on example Keras RNN code from https://www.tensorflow.org/guide/keras/rnn

    model = tf.keras.Sequential()
    model.add(layers.Embedding(input_dim=N, output_dim=H))
    model.add(layers.SimpleRNN(H))   # defaults are good (tanh activation, use a bias vector)
    model.add(layers.Dense(1, activation='sigmoid'))

    return model

**Part 1.b**<br>
Then fill in the code to train several such nets.  Each repetition should randomize the initial weights and generate a random training set of 10000 examples of length N as well as a random test set of 10000 examples of length N.  Save 10% of the training as validation, and use at least the provided check_accuracy callback as an early stopping condition.

Train nets for H ∈ {5, 25} and for N ∈ {5, 10, 15, 20}.  For each combination of H and N, run 10 replications of your simulation.  You will also need to try to find helpful learning rates; don't be surprised if your training is prone to cycles of stagnation for hundreds of epochs before quickly learning.

In [0]:
BATCH = 1000
MAX_EPOCHS = 2000

# logdir="logs/sequential_fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
# tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)

def train_model(model, x_train, y_train, x_test, y_test):
    '''
    Compiles and trains the :model:.
    '''
    # start by compiling the model; accuracy should be a tracked metric
    model.compile(loss='binary_crossentropy', metrics=['accuracy'])

    # train model
    check_accuracy = create_accuracy_callback()
    model.fit( x_train,
              y_train, 
              batch_size=BATCH, 
              epochs=MAX_EPOCHS, 
              verbose=0, 
              validation_data=(x_test, y_test),
              use_multiprocessing=True,
              callbacks=[check_accuracy] )
            #   callbacks=[check_accuracy, tensorboard_callback] )

In [0]:
Hs = [5,25]
Ns = [5,10,20,40]
REPS = 10
SAMPLES = 10000

test_accs = np.zeros([len(Hs),len(Ns),REPS])
for Nindex, N in enumerate(Ns):
    for rep in range(REPS):
        # (data is reusable across changes to the number of hidden units)
        # generate a random training set of 10000 examples of length N
        x_train = gen_binary_sequences(SAMPLES, N)
        y_train = calc_parity(x_train)

        # generate a random test set of 10000 examples of length N
        x_test = gen_binary_sequences(SAMPLES, N)
        y_test = calc_parity(x_test)

        for Hindex, H in enumerate(Hs):
            ds = datetime.now().strftime("%d-%b %T")
            print(f"{ds}: starting N={N}/{Ns}, H={H}/{Hs}, rep {rep+1}/{REPS}")
            # build model
            model = build_model(N,H)
            # train model
            train_model(model, x_train, y_train, x_test, y_test)
            # test model with newly generated test data (extracting test accuracy as test_acc) 
            x_test = gen_binary_sequences(SAMPLES, N)
            y_test = calc_parity(x_test)
            check_accuracy = create_accuracy_callback()
            test_acc = model.evaluate( x=x_test, 
                                    y=y_test, 
                                    batch_size=BATCH, 
                                    verbose=1, 
                                    callbacks=[check_accuracy] )[1]
                                    # callbacks=[check_accuracy, tensorboard_callback] )[1]

            test_accs[Hindex,Nindex,rep] = test_acc

02-Mar 02:35:17: starting N=5/[5, 10, 20, 40], H=5/[5, 25], rep 1/10
02-Mar 02:37:16: starting N=5/[5, 10, 20, 40], H=25/[5, 25], rep 1/10
02-Mar 02:40:53: starting N=5/[5, 10, 20, 40], H=5/[5, 25], rep 2/10
02-Mar 02:42:27: starting N=5/[5, 10, 20, 40], H=25/[5, 25], rep 2/10
02-Mar 02:46:00: starting N=5/[5, 10, 20, 40], H=5/[5, 25], rep 3/10
02-Mar 02:47:06: starting N=5/[5, 10, 20, 40], H=25/[5, 25], rep 3/10
02-Mar 02:50:30: starting N=5/[5, 10, 20, 40], H=5/[5, 25], rep 4/10
02-Mar 02:52:25: starting N=5/[5, 10, 20, 40], H=25/[5, 25], rep 4/10
02-Mar 02:55:51: starting N=5/[5, 10, 20, 40], H=5/[5, 25], rep 5/10
02-Mar 02:57:48: starting N=5/[5, 10, 20, 40], H=25/[5, 25], rep 5/10
02-Mar 03:01:13: starting N=5/[5, 10, 20, 40], H=5/[5, 25], rep 6/10
02-Mar 03:03:09: starting N=5/[5, 10, 20, 40], H=25/[5, 25], rep 6/10
02-Mar 03:06:35: starting N=5/[5, 10, 20, 40], H=5/[5, 25], rep 7/10
02-Mar 03:07:55: starting N=5/[5, 10, 20, 40], H=25/[5, 25], rep 7/10
02-Mar 03:11:29: starting N

**Part 1.c**<br>
Make the graph of mean % correct (and standard error) on the test set for the different values of H and N.

In [0]:
plot_accuracies(test_accs,Ns,Hs)
%tensorboard --logdir logs

## Part 2
Repeat the experiments of Part 1, but use LSTM neurons instead of tanh neurons in the recurrent layer.  Comment on your experiences.

In [0]:
def build_model_lstm(N,H):
    '''
    Builder for an LST model. Model inputs are binary sequences of length :N:.

    '''

    # based on example Keras RNN code from https://www.tensorflow.org/guide/keras/rnn

    model = tf.keras.Sequential()
    model.add(layers.Embedding(input_dim=N, output_dim=H))
    model.add(layers.LSTM(H))   
    model.add(layers.Dense(1, activation='sigmoid'))

    return model

In [0]:
test_accs_lstm = np.zeros([len(Hs),len(Ns),REPS])
for Nindex, N in enumerate(Ns):
    for rep in range(REPS):
        # (data is reusable across changes to the number of hidden units)
        # generate a random training set of 10000 examples of length N
        x_train = gen_binary_sequences(SAMPLES, N)
        y_train = calc_parity(x_train)

        # generate a random test set of 10000 examples of length N
        x_test = gen_binary_sequences(SAMPLES, N)
        y_test = calc_parity(x_test)

        for Hindex, H in enumerate(Hs):
            ds = datetime.now().strftime("%d-%b %T")
            print(f"{ds}: starting N={N}/{Ns}, H={H}/{Hs}, rep {rep+1}/{REPS}")
            # build model
            model = build_model_lstm(N,H)
            # train model
            train_model(model, x_train, y_train, x_test, y_test)
            # test model with newly generated test data (extracting test accuracy as test_acc) 
            x_test = gen_binary_sequences(SAMPLES, N)
            y_test = calc_parity(x_test)
            check_accuracy = create_accuracy_callback()
            test_acc = model.evaluate( x=x_test, 
                                    y=y_test, 
                                    batch_size=BATCH, 
                                    verbose=1, 
                                    callbacks=[check_accuracy] )[1]
                                    # callbacks=[check_accuracy, tensorboard_callback] )[1]

            test_accs_lstm[Hindex,Nindex,rep] = test_acc

In [0]:
plot_accuracies(test_accs_lstm,Ns,Hs)
