# Introduction Tutorial

This notebook will walk though several exercises using tensorflow to implement a range of machine learning algorithms. In particular this will briefly cover deep supervised learning before moving to deep reinforcement learning algorithms.

To start, make sure you have all dependencies install by running `pip install -r requirements.txt` from the directory containing this notebook.

In [1]:
# Import modules
import numpy as np
import tensorflow as tf

## Loading MNIST

We'll using a simple classification problem as an example to introduce how to create, train, and evaluate a model. The next cell will load the MNIST dataset which contains images of handwritten numbers.

In [2]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255, x_test/255

## Creating a Model

We'll walkthrough two methods of creating a model. The first is to use Keras' Sequential model. This is convenient for quickly creating and training a model for basic supervised learning and some reinforcement learning.

In [3]:
# Create sequential model
model = tf.keras.models.Sequential()

# Add a flatten layer to turn MNIST images into a vector
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))

# Add some dense layers with ReLU activations
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dense(64, activation='relu'))

# Add a layer for the scores of each class. Note that we leave the activation linear.
model.add(tf.keras.layers.Dense(10))

# Add an output layer for the softmax predictions
model.add(tf.keras.layers.Softmax())

# Test this on a sample from the dataset
prediction = model(x_train[:1]).numpy()
print(prediction)


[[0.13596807 0.10466186 0.06278337 0.03360726 0.07694284 0.09004687
  0.10907193 0.16892087 0.1500976  0.06789929]]


### Compiling the model

Keras' Model class (from which Sequential inherits), provides a convenient function for fitting data. First we must compile the model with an optimizer and loss function. We'll use the Adam optimizer which is a common "goto" optimizer and sparse categorical crossentropy which works well for classification.

In [4]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [5]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1f117989ac0>

In [6]:
model.evaluate(x_test, y_test, verbose=2)

313/313 - 0s - loss: 0.0811 - accuracy: 0.9762 - 203ms/epoch - 647us/step


[0.08106172829866409, 0.9761999845504761]

### Custom Models

Next let's look at the alternative way of creating and training a model. We can create a model by creating a custom class which inherits from Keras' Model class. This method is more involved than the previous method; however, it allows for more customization. The ability to customize is often critical to ML research. Simply running a model on a new problem is usually not enough to constitute a novel contribution. You're contributions will likely come in the form of new model architectures, loss functions, optimization strategies, etc. All of these will be easier to implement if you understand how to build the model from scratch.

In [7]:
class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__()

        # Add a flatten layer to turn MNIST images into a vector
        self.flatten = tf.keras.layers.Flatten(input_shape=(28,28))

        # Add some dense layers with ReLU activations
        self.hidden_layer1 = tf.keras.layers.Dense(128, activation='relu')
        self.hidden_layer2 = tf.keras.layers.Dense(64, activation='relu')

        # Add a layer for the scores of each class. Note that we leave the activation linear.
        self.score = tf.keras.layers.Dense(10)

        # Add an output layer for the softmax predictions
        self.probability = tf.keras.layers.Softmax()

        # Create an optimizer and loss function for training
        self.optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
        self.loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()

        # Parameters for minibatching
        self.batch_size = 64

        # Metrics
        self.loss_metric = tf.keras.metrics.Mean()
        self.accuracy_metric = tf.keras.metrics.Accuracy()
    
    def call(self, x):
        # Create a call function to pass input through each layer
        # IMPORTANT NOTE: Always implement this method as 'call', don't override the '__call__' method
        x = self.flatten(x)
        x = self.hidden_layer1(x)
        x = self.hidden_layer2(x)
        x = self.score(x)
        return self.probability(x)
    
    def train(self, x, y):
        N = y.shape[0]
        i = 0

        # Break up the data into minibatches
        while i < N:
            x_batch = x[i:min(i+self.batch_size, N)]
            y_batch = y[i:min(i+self.batch_size, N)]

            # This line lets us record gradients
            with tf.GradientTape() as tape:
                y_pred = self(x_batch)
                loss = self.loss_fn(y_batch, y_pred)
            
            # Get gradients and apply them to the model's parameters
            grads = tape.gradient(loss, self.trainable_weights)
            self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        
            self.loss_metric(loss)
            self.accuracy_metric(y_batch, np.argmax(y_pred.numpy(), axis=1))

            i += self.batch_size

        print("Average loss = {}, Accuracy = {}".format(self.loss_metric.result(), self.accuracy_metric.result()))
        self.loss_metric.reset_state()
        self.accuracy_metric.reset_state()
    
    def evaluate(self, x, y):
        y_pred = self(x)
        
        self.loss_metric.reset_state()
        self.accuracy_metric.reset_state()
        
        loss = self.loss_fn(y, y_pred)
        
        self.loss_metric(loss)
        self.accuracy_metric(y, np.argmax(y_pred.numpy(), axis=1))
        
        print("loss = {}; accuracy = {}".format(self.loss_metric.result(), self.accuracy_metric.result()))
        
        return self.loss_metric.result(), self.accuracy_metric.result()

In [8]:
my_model = MyModel()

In [9]:
for epoch in range(5):
    print("Epoch: ", epoch)
    my_model.train(x_train, y_train)

Epoch:  0
Average loss = 0.2980729639530182, Accuracy = 0.9144999980926514
Epoch:  1
Average loss = 0.12476835399866104, Accuracy = 0.9631666541099548
Epoch:  2
Average loss = 0.08305839449167252, Accuracy = 0.975516676902771
Epoch:  3
Average loss = 0.06055697798728943, Accuracy = 0.9821833372116089
Epoch:  4
Average loss = 0.04493153840303421, Accuracy = 0.9861166477203369


In [10]:
my_model.evaluate(x_test, y_test)

loss = 0.10411842912435532; accuracy = 0.9710000157356262


(<tf.Tensor: shape=(), dtype=float32, numpy=0.10411843>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.971>)

### Separating the Model and Trainer

I find that it's good practice to separate code for the model and the trainer. My prefered method of organization is to define a class for each model architecture and a corresponding function or class to train it. I'll give an example of this below while introducing a simple regression problem. 

In [11]:
class RegressionModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        
        # Add some dense layers with ReLU activations
        self.hidden_layer1 = tf.keras.layers.Dense(128, activation='relu')
        self.hidden_layer2 = tf.keras.layers.Dense(64, activation='relu')

        # Add a layer for the output value
        self.value = tf.keras.layers.Dense(1)
    
    def call(self, x):
        x = self.hidden_layer1(x)
        x = self.hidden_layer2(x)
        return self.value(x)

class RegressionTrainer:
    def __init__(self, model):
        self.model = model
        
        # Create an optimizer and loss function for training
        # Note that we're using a MSE loss function, which is typical for regression
        self.optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
        self.loss_fn = tf.keras.losses.MeanSquaredError()

        # Parameters for minibatching
        self.batch_size = 64

        # Metrics
        self.loss_metric = tf.keras.metrics.Mean()
    
    def train(self, x, y):
        N = y.shape[0]
        i = 0

        # Break up the data into minibatches
        while i < N:
            x_batch = x[i:min(i+self.batch_size, N)]
            y_batch = y[i:min(i+self.batch_size, N)]

            # This line lets us record gradients
            with tf.GradientTape() as tape:
                y_pred = self.model(x_batch)
                loss = self.loss_fn(y_batch, y_pred)
            
            # Get gradients and apply them to the model's parameters
            grads = tape.gradient(loss, self.model.trainable_weights)
            self.optimizer.apply_gradients(zip(grads, self.model.trainable_weights))
        
            self.loss_metric(loss)

            i += self.batch_size

        print("Average loss = {}".format(self.loss_metric.result()))
        self.loss_metric.reset_state()
            

### Regression

Next let's define some nonlinear function for the model to fit to. 

In [12]:
np.random.seed(2)
mu = np.random.rand(5,2) * 2 - 1
sig = np.random.rand(5) * .5 + .25
a = np.random.rand(5) * 2 - 1

def f(x):
    return np.sum([a[k] * np.exp(-np.sum((x - mu[k])**2/sig[k]**2)) for k in range(5)])

We'll also go ahead and plot this function so we have an idea of what output we should expect.

In [23]:
import plotly.graph_objects as go

n = 41
x1 = np.linspace(-1, 1, n)
x2 = x1.copy()

y = np.zeros((n, n))
for i in range(n):
    for j in range(n):
        y[i,j] = f(np.array([x1[i], x2[j]]))

fig = go.Figure(data=[go.Surface(z=y, x=x1, y=x2)])
fig.update_layout(title='f(x)', autosize=False,
                  width=500, height=500,
                  scene=dict(xaxis_title='x1', yaxis_title='x2', zaxis_title='y'),
                  margin=dict(l=65, r=50, b=65, t=90))
fig.show()

### Training the Regression Model

Now we can train the model. This will be similar to what we did before. 

In [14]:
x_train = np.random.rand(2000, 2) * 2 - 1
y_train = np.array([f(x_train[i]) for i in range(x_train.shape[0])]).reshape(x_train.shape[0], 1)

reg_model = RegressionModel()
trainer = RegressionTrainer(reg_model)

In [16]:
for epoch in range(20):
    print("Epoch:", epoch+1)
    trainer.train(x_train, y_train)

Epoch: 1
Average loss = 0.09204930812120438
Epoch: 2
Average loss = 0.04397781938314438
Epoch: 3
Average loss = 0.019919872283935547
Epoch: 4
Average loss = 0.012504778802394867
Epoch: 5
Average loss = 0.010354815050959587
Epoch: 6
Average loss = 0.008975794538855553
Epoch: 7
Average loss = 0.007918802089989185
Epoch: 8
Average loss = 0.006937338039278984
Epoch: 9
Average loss = 0.005985785275697708
Epoch: 10
Average loss = 0.005088819190859795
Epoch: 11
Average loss = 0.004235622938722372
Epoch: 12
Average loss = 0.0034362527076154947
Epoch: 13
Average loss = 0.002716886345297098
Epoch: 14
Average loss = 0.0021094235125929117
Epoch: 15
Average loss = 0.0016297370893880725
Epoch: 16
Average loss = 0.0012719136429950595
Epoch: 17
Average loss = 0.001015661982819438
Epoch: 18
Average loss = 0.0008363397209905088
Epoch: 19
Average loss = 0.0007100030197761953
Epoch: 20
Average loss = 0.0006220313953235745


In [24]:
x_test = np.zeros((n**2, 2))
for i in range(n):
    for j in range(n):
        x_test[i*n + j] = x1[i], x2[j]

y_pred = reg_model(x_test).numpy().reshape(n,n)

fig = go.Figure(data=[go.Surface(z=y_pred, x=x1, y=x2)])
fig.update_layout(title='Model Output', autosize=False,
                  width=500, height=500,
                  scene=dict(xaxis_title='x1', yaxis_title='x2', zaxis_title='y'),
                  margin=dict(l=65, r=50, b=65, t=90))
fig.show()

## Conclusion

This covers the basics of supervised deep learning with Tensorflow. The remaining tutorials will primarily cover reinforcement learning. I recommend you try some supervised problems on your own to get a better understanding of how various hyperparameters affect the outcome. Some things to try changing: batch sizes, learning rates, optimizers, loss functions, network size, and model architecture.