![Toronto AI](https://i.imgur.com/diILtDP.png)

# Neural Networks

A trained neural net can be thought of as a learned mapping.

Here are some examples of mappings that a neural net could learn:

*   Mapping English to French
*   Mapping pictures to text descriptions
*   Mapping live sensor data from a reusable rocket to control commands that land it
*   Mapping random vectors into images of flowers

In essence, we use neural nets to map one distribution of data onto another.

Here's an an example where I trained a neural net to map random vectors onto the space of flower photos, using a Generative Adversarial Network:

![](https://i.imgur.com/SaT9OEM.png)

# Tensors

* Tensors are multidimensional arrays.
* They are like boxes of data, that we use to contain our data, or the weights of our model.
* Tensors are used extensively in TensorFlow to represent:
  * 0-D - scalars
  * 1-D - vectors, text
  * 2-D - matrices, tables of data
  * 3-D - batches of matrices, a cube of data, e.g. an image
  * 4-D - convolution kernels, a monochrome video
  * 5-D - color video
  * 6-D - 3D vector fields, batches of colour video
  * 7-D - batches of 3D vector fields
  * 8-D - batches of layered 3D vector fields 
  * 9-D - batches of layered 3D vector fields through time
  * keep going...

* GPU memory is expensive, so Tensors are most commonly 4-D or less.

* It helps to visualize a Tensor as a Rubiks Cube - each cell holds a piece of scalar data (like a weight, a piece of input data, or a label).  For higher dimensional Tensors, think of each cell as holding a Tensor instead of a scalar.
![](https://i.imgur.com/KyOQVX9.png)

In [1]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


# Synthetic Dataset


In [40]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import random, math
from matplotlib import animation, rc
from IPython.display import HTML

plt.rcParams['image.cmap'] = 'viridis'

np.random.seed(20180118)

BATCH_SIZE = 64
DIM = 64
TWO_PI = 2.*math.pi
MIN_CIRCLE_WIDTH = int(DIM/8)
NUM_CIRCLE_DOTS = 500
CIRCLE_WIDTH = 3
DARKNESS = .5

# We'll use this to create our data set.
def add_circles(data):
    radius = int((random.random() * (DIM/2 - MIN_CIRCLE_WIDTH)) + MIN_CIRCLE_WIDTH)
    xpos = random.random()*DIM
    ypos = random.random()*DIM
    
    draw_circle(data, xpos, ypos, radius, DARKNESS, CIRCLE_WIDTH)

    return [xpos, ypos, radius+CIRCLE_WIDTH/2]


def draw_circle(data, xpos, ypos, radius, darkness, width):
    for i in range(NUM_CIRCLE_DOTS):
        for r in range(radius, radius+width):
            rad = TWO_PI * i/NUM_CIRCLE_DOTS
            x = int(round(r*math.cos(rad)+xpos))
            y = int(round(-r*math.sin(rad)+ypos))
            if x >= 0 and x < DIM and y >= 0 and y < DIM:
                data[x,y] = data[x,y] - darkness


            
# Create random noise and draw circles in it
def create_dataset_row():
    data = np.random.random((DIM, DIM)).astype(np.float32)
    label = add_circles(data)
    label = np.array(label).astype(np.float32)
    
    return (data, label)
    

def create_dataset(rows):
    
    labels  = []
    samples = []
    for i in range(rows):
        data, label = create_dataset_row()
        labels.append(label)
        samples.append(data)
    return (np.array(samples).astype(np.float32), np.array(labels).astype(np.float32))

        

# Create an animation so we can see our data set
dataset = create_dataset(BATCH_SIZE)
data   = dataset[0]
labels = dataset[1]


def plot_dataset(data):
    fig, ax = plt.subplots()

    def init():
        ax.cla()
        return ()

    def animate(i):
        ax.imshow(data[i%len(data)])
        return ()

    anim = animation.FuncAnimation(fig, animate, init_func=init, frames=BATCH_SIZE, interval=700, blit=True)
    plt.close(fig)
    fig.set_size_inches(10, 10, True)
    return HTML(anim.to_jshtml())


    
plot_dataset(data)

# Challenge

* We're searching for a signal in noisy image data.

* In this particular problem, we want to know the location and width of the circle hidden in the image.

* It would be complex to write a rule-based program (using lots of if conditions).  A good choice here is to use a neural network.

* Since we're using image data, we'll use a convolutional neural network.



![](https://i.imgur.com/o2qIsu4.png)

# Convolutional Layers


* a convolutional layer is used to learn a set of translation-invariant feature detectors, called filters.
* they are *much* more lightweight than a fully connected layer
* these are particularly useful for image data
* also useful in other areas of machine learning.


> ### Side quest: Reversible Residual Network
> * ResNets use convolutional layers - they can be very deep and very effective, but they use a lot of memory.
> * A recent paper from the University of Toronto shows how to reduce the memory footprint by typically at least an order of magnitude.
> * The Reversible Residual Network: Backpropagation Without Storing Activations
> * https://arxiv.org/abs/1707.04585 
> * *Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse*


### Filters (a.k.a. feature detectors)
* each filter has a small window through which it can read the input, and that window is passed across the entire image
* the weights are applied to the input channels at each position, and the result is written to the output channel for that filter.
* A convolutional layer with many filters will have an output channel for each filter, holding the results.

### Activation functions
* To add 'depth' to our convolutional layer (i.e. the depth in deep learning), we need to add a non-linearity to the output, called an activation function.
* We'll use a Leaky ReLu - it's nonlinear and simple.

![Leaky ReLu](https://i.imgur.com/KxYFRIL.png)

Further reading on activation functions:
* https://arxiv.org/abs/1505.00853 - Empirical Evaluation of Rectified Activations in Convolutional Network
* https://arxiv.org/abs/1709.06247 - Training Better CNNs Requires to Rethink ReLU

In [41]:


# Let's create a method that will add a convolutional layer to our graph


def Convolution(layer, name, num_filters, size=3, activation=tf.nn.relu): 

    with tf.variable_scope("detector", reuse=tf.AUTO_REUSE):
        
        layer = tf.layers.conv2d( 
              name=name, 
              inputs=layer,
              kernel_size=size,
              filters=num_filters,
              dilation_rate=1,
              padding='same',
              activation=activation
        )
        return layer
    


# Pooling

In [42]:
def AvgPooling(layer): 
    return tf.layers.average_pooling2d(layer, pool_size=(2,2), strides=(2,2))

def MaxPooling(layer): 
    return tf.layers.max_pooling2d(layer, pool_size=(2,2), strides=(2,2))

# Input pipeline

* Here we create an input pipeline for our data.
* This pipeline serves the purpose of generating training and test data.


In [43]:
import tensorflow as tf
import itertools

tf.reset_default_graph()

# 
def row_generator():
  for i in itertools.count(1):
    yield create_dataset_row()


    
def create_input_pipeline():

    dataset = tf.data.Dataset.from_generator(row_generator, (tf.float32, tf.float32))
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.repeat()
        
    pipeline_iterator = dataset.make_initializable_iterator()
    
    return pipeline_iterator

# Creating our graph

## Subgraph: The convolutional layers

* We are stacking our convolutional layers, so that later layers detect features on lower layers.
* Higher layers learn higher level features from lower layers that learn lower level features.


In [44]:

def convolutional_layers(batch):
    
    layer = tf.reshape(batch, [BATCH_SIZE, DIM, DIM, 1])  # We need a channel dimension, adding one here.
    
    with tf.variable_scope("convolutional"):
        
        F = 16
        KS = 7
        
        
#         layer = MaxPooling(layer)
        layer = Convolution(layer, "A1", num_filters=F, size=KS, activation=None)    # 32x32 pixels
        layer = Convolution(layer, "A2", num_filters=F, size=KS, activation=None)
        layer = Convolution(layer, "A3", num_filters=F, size=KS)
        layer = Convolution(layer, "A4", num_filters=F, size=KS)
        layer = Convolution(layer, "A5", num_filters=F, size=KS)
        layer = Convolution(layer, "A6", num_filters=F, size=KS, activation=None)
        layer = Convolution(layer, "A7", num_filters=F, size=KS, activation=None)
        layer = Convolution(layer, "A8", num_filters=F, size=KS)
        
        return layer
    

## Subgraph: The fully connected dense layers

* These layers are used to convert the tensor that was output from the convolutional layers down into a prediction.
* In our case, we want 3 outputs
  
  * The X and Y coordinate of the center of the circle
  * The radius of the circle

* Here we're dividing the channels into thirds, and we have attached one subnet of fully connected layers to each output.  
* We are asking the neural net that the sum of the outputs of each subnet is the corresponding prediction (for x, y, radius)


In [45]:

def dense_layers(layer): 
    
    layer = tf.reshape(layer, [BATCH_SIZE, -1])    # Make it a table, one row for each example in the batch.
    
    x_guess = add_subnet(layer, 0)  # Let's create a subnet for predicting the X coordinate,
    y_guess = add_subnet(layer, 1)  # A subnet for the y coordinate,
    r_guess = add_subnet(layer, 2)  # and, a subnet for predicting the radius
    
    y_guess2 = add_subnet(layer, 1)  # A subnet for the y coordinate,
    
    return x_guess, (y_guess+y_guess2)/2., r_guess




def add_subnet(layer, i):
    ROW_DIM = 1
    a_third  = int(layer.shape[ROW_DIM].value / 3)
    
    layer = tf.slice(layer, [0, i*a_third], [BATCH_SIZE, a_third])
    
    neurons = 16
    
    layer = tf.layers.dense(layer, neurons, activation=tf.nn.leaky_relu)
    layer = tf.layers.dense(layer, neurons, activation=tf.nn.leaky_relu)
    layer = tf.layers.dense(layer, neurons, activation=tf.nn.leaky_relu)
    layer = tf.layers.dense(layer, neurons, activation=tf.nn.leaky_relu)


    # Return the sum of the last layer as our prediction
    return tf.reduce_sum(layer, axis=[1])



# Objective function (a.k.a. the loss function)

* The objective function is what the system attempts to minimize
* The most important thing to remember is that the loss function needs to be differentiable with a minimum value at your goal.
* Convex functions are easy to minimize.

### Common objectives
* Minimizing the difference of squares (a.k.a. mean squared error)

* Minimizing the log loss - this is useful in classification tasks when dealing with probabilities.


In [46]:

def loss_function(x, y, r, labels):

    
    actual_x, actual_y, actual_r = labels[:,0], labels[:,1], labels[:,2]
    
    
    # Compute the mean squared error.
    
    x_error = tf.pow(x - actual_x, 2.)
    y_error = tf.pow(y - actual_y, 2.)
    r_error = tf.pow(r - actual_r, 2.)
    
    loss = x_error + y_error + r_error
    
    tf.summary.scalar("x_error",      tf.sqrt(tf.reduce_sum(x_error)/BATCH_SIZE))
    tf.summary.scalar("y_error",      tf.sqrt(tf.reduce_sum(y_error)/BATCH_SIZE))
    tf.summary.scalar("radius_error", tf.sqrt(tf.reduce_sum(r_error)/BATCH_SIZE))
    
    return loss
    

# Optimizer

There are many choices for optimizers.
For most applications, the Adam optimizer will give you good flexibility and fast training.

## Adam

* The adam optimizer is a gradient descent optimization alogirthm that adds two things:
  * First, it adds momentum to each weight of your model to help it descend.
  * Second, it slows down weights proportionally to how much they are oscillating 
  
* Both of the effects of the Adam optimizer have an exponential decay built in.  These are parameters to the optimizer.
  * alpha - The learning rate.  Typical values are 0.0003 to to 0.000003
  * beta1 - The decay rate of the momentum term.  Typical values are 0.5 to 0.9.
  * beta2 - The decay rate of the variance term.  Typical values are 0.9 to 0.999



Further reading: 
* https://arxiv.org/abs/1412.6980 Adam: A Method for Stochastic Optimization
* http://ruder.io/optimizing-gradient-descent/ An overview of gradient descent optimization algorithms


In [47]:
ALPHA = 0.0001
BETA1 = 0.5
BETA2 = 0.9


def add_optimizer(loss):
    
    opt      = tf.train.AdamOptimizer(learning_rate = ALPHA, beta1=BETA1, beta2=BETA2)
    train_op = opt.minimize(loss=loss)
    return train_op
    

# Connecting the model pieces together


In [48]:


def create_model(samples, labels):

    x, y, r = dense_layers( convolutional_layers( samples ) )

    loss = loss_function(x, y, r, labels)

    training_op = add_optimizer(loss)

    return {
        "training_op"   : training_op,
        "loss"          : tf.reduce_sum(loss),
        "samples"       : samples,
        "prediction"    : (x, y, r),
        "actual"        : labels
    }

# Preparing a Training run

In [49]:
tf.reset_default_graph()

latest_results = None
def run_session(session, run_name, num_iterations):
    global latest_results
    
    UPDATE_TENSORBOARD_PERIOD = 1

    with tf.device("/device:CPU:0"):
            #
        fw = tf.summary.FileWriter("/home/titan/dev/ai/circles/" + run_name, graph=session.graph, flush_secs=2)

    
        #
        pipeline = create_input_pipeline()

        sample_batch, labels_batch = pipeline.get_next()

        #
        model = create_model(sample_batch, labels_batch)

        # 
        tf.global_variables_initializer().run()

        #
        session.run(pipeline.initializer)

        #
        for step in range(num_iterations):
            
            new_batch = session.run(sample_batch)

            latest_results = session.run({
                "loss"       : model["loss"],
                "train_op"   : model["training_op"],
                "samples"    : model["samples"],
                "prediction" : model["prediction"],
                "actual"     : model["actual"],
                "summary"    : tf.summary.merge_all()
            })
            
            
            
            if step % UPDATE_TENSORBOARD_PERIOD == 0:
                print("Total loss: %d." % latest_results["loss"])
                fw.add_summary(latest_results["summary"], step)
        
        return latest_results

        


# Start Training!

In [50]:
tf.reset_default_graph()
session = tf.Session()

with session:
    
    latest_results = run_session(session, "400steps", 400)

print("Complete!")

Total loss: 203288.
Total loss: 189624.
Total loss: 186396.
Total loss: 177538.
Total loss: 169010.
Total loss: 128073.
Total loss: 104120.
Total loss: 79681.
Total loss: 64017.
Total loss: 60834.
Total loss: 53217.
Total loss: 41532.
Total loss: 42893.
Total loss: 44863.
Total loss: 42378.
Total loss: 44333.
Total loss: 40954.
Total loss: 42787.
Total loss: 45832.
Total loss: 45099.
Total loss: 33943.
Total loss: 39191.
Total loss: 36064.
Total loss: 34531.
Total loss: 30548.
Total loss: 30132.
Total loss: 36199.
Total loss: 35298.
Total loss: 33676.
Total loss: 31389.
Total loss: 30452.
Total loss: 22600.
Total loss: 24012.
Total loss: 29157.
Total loss: 24635.
Total loss: 28920.
Total loss: 26025.
Total loss: 24754.
Total loss: 26943.
Total loss: 22782.
Total loss: 28364.
Total loss: 24164.
Total loss: 24089.
Total loss: 23089.
Total loss: 26176.
Total loss: 24322.
Total loss: 29254.
Total loss: 23143.
Total loss: 28662.
Total loss: 18422.
Total loss: 24220.
Total loss: 25306.
Total

KeyboardInterrupt: 

# Visualizing the Results!

In [51]:

sample_batch = np.asarray(latest_results["samples"])
prediction   = latest_results["prediction"]
actual       = latest_results["actual"].transpose()

# Extract and round
xp, yp, rp      = [np.rint(p).astype(np.int32) for p in prediction]
xa, ya, ra      = [np.rint(a).astype(np.int32) for a in actual]

result_display = []


# Add the prediction and the actual to the dataset
for i, sample in enumerate(sample_batch):
    draw_circle(sample, xa[i], ya[i], ra[i],  DARKNESS, CIRCLE_WIDTH)
    draw_circle(sample, xp[i], yp[i], rp[i], -DARKNESS, CIRCLE_WIDTH)  # The prediction will appear bright
    result_display.append(sample)


plot_dataset(np.array(result_display))



In [None]:
import random
import itertools

def r(x):
    return random.random()*x


with tf.Session() as sess:
    dataset = tf.data.Dataset.from_generator(gen, (tf.float32, tf.float32))
#     dataset = tf.data.Dataset.range(1)
    dataset = dataset.repeat()
    dataset = dataset.map(lambda x,y: create_dataset(32))
    dataset = dataset.batch(BATCH_SIZE)
        
    pipeline_iterator = dataset.make_initializable_iterator()
    
    sess.run(pipeline_iterator.initializer)

    next_batch = pipeline_iterator.get_next()
    results = sess.run(next_batch)
    print(results)

# Thank you!

That was a lot of info!
Thank you for your time and hopefully this will help you on your path.

For your troubles, here's a flower:
![](https://i.imgur.com/i6wAoY9.png)
<p style="text-align:center">AI generated flower, orchestration by Dave MacDonald</p>

# Connect with the Toronto AI community

<img src="https://cdn.worldvectorlogo.com/logos/slack-1.svg" style="display:inline;width: 30px;padding-right:1em;"/> Come join us on our Slack Channel!  (Here is the <a href="https://join.slack.com/t/toronto-ai/shared_invite/enQtMjE5NTM5MzY3NTU0LTQ0ZDIyM2ZlZDYwMmRjY2I2NTEyMjZjYzJkNzljZTI1ZWRiMDkzYjUyZjRkMTc5ZDM0OGJmZjdmNzM5NDM5Zjk">Invite link</a> if you haven't signed up yet)

<img src="https://cdn.worldvectorlogo.com/logos/meetup.svg" style="display:inline;width: 30px;padding-right:1em;"/> Come to our next event - join our [Meetup Group](https://www.meetup.com/Toronto-AI/)

<img src="https://i.imgur.com/KGSBbUe.png" style="display:inline;width:30px;padding-right:1em;3"/> NEO tips <span style='font-size:.75em'>AdLG9AyRtCMSeAy98rmkkos7uFU6i7fLgd</span>


Toronto AI website: [http://torontoai.org/](http://torontoai.org/)<br>
We're also on Facebook and Twitter
<br><br>
Thank you, see you soon!


## Additional Resources

* [Deep Learning Book](http://www.deeplearningbook.org/)