# Neural networks in GraphLab

An example of building a neural network in GraphLab for image recognition. 

In [2]:
import graphlab as gl

In [4]:
#Load training and testing data
data = graphlab.SFrame('https://static.turi.com/datasets/mnist/sframe/train')
test_data = graphlab.SFrame('https://static.turi.com/datasets/mnist/sframe/test')

In [5]:
#Peele off a validation set from the training data
training_data, validation_data = data.random_split(0.8)

In [19]:
gl.canvas.set_target('ipynb')
training_data['image'].show()

In [21]:
# Have all of the images are the same size, since neural nets have fixed input size.
training_data['image'] = graphlab.image_analysis.resize(training_data['image'], 28, 28, 1, decode=True)
validation_data['image'] = graphlab.image_analysis.resize(validation_data['image'], 28, 28, 1, decode=True)
test_data['image'] = graphlab.image_analysis.resize(test_data['image'], 28, 28, 1, decode=True)

1. Create an instance of the neural net for the given data

Specify the type of network to create in `network_type`. Default ‘auto’ will create a ConvolutionNet for image input, and a MultiLayerPerceptrons for regular numerical input.

In [35]:
#Create neural net
net = gl.deeplearning.create(training_data, network_type='auto', target='label')

In [23]:
#inital layer set up
net.layers

layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 100
layer[4]: RectifiedLinearLayer
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 10
layer[7]: SoftmaxLayer

### Convolutional neural network (CNN)

Note thath the 0th layer is a convolutional layer which makes this network in a _convolutional network_. The name “convolutional neural network” indicates that the network employs convolution in place of general matrix multiplication in at least one of their layers. CNNs are most commonly applied to analyzing visual imagery, but also other areas such as natural language processing.

### Learning rate

The amount of change to the model during each step of this search process, or the step size, is called the _learning rate_ . It is a positive scalar (usually in (0,1), measuring the amount that the weights are updated during the iterations within training.

### Momentum

An exponentially decaying weighted average of the prior updates to the weight can be included when the weights are updated. This change to stochastic gradient descent is called _momentum_ and adds inertia to the update procedure, causing many past updates in one direction to continue in that direction in the future.

In [24]:
#neural net parameters
net.params

{'learning_rate': 0.001, 'momentum': 0.9}

In [25]:
#If you need to change hyperparamters
net.layers[4].num_hidden_units = 10

2. Build classifier on the neural network

In [26]:
# Train a NeuralNetClassifier using the specified network.
clsf = graphlab.neuralnet_classifier.create(training_data, target='label', network = net, validation_set=validation_data,
                                         metric=['accuracy', 'recall@2'],
                                         max_iterations=3)

Using network:

### network layers ###
layer[0]: ConvolutionLayer
  init_random = gaussian
  padding = 0
  stride = 2
  num_channels = 10
  num_groups = 1
  kernel_size = 3
layer[1]: MaxPoolingLayer
  padding = 0
  stride = 2
  kernel_size = 3
layer[2]: FlattenLayer
layer[3]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 100
layer[4]: RectifiedLinearLayer
  num_hidden_units = 10
layer[5]: DropoutLayer
  threshold = 0.5
layer[6]: FullConnectionLayer
  init_sigma = 0.01
  init_random = gaussian
  init_bias = 0
  num_hidden_units = 10
layer[7]: SoftmaxLayer
### end network layers ###

### network parameters ###
learning_rate = 0.001
metric = accuracy,recall@2
momentum = 0.9
### end network parameters ###



3. Classify the test data

In [29]:
y_pred = clsf.classify(test_data)

In [30]:
y_pred

row_id,class,probability
0,0,0.998546123505
1,0,0.999997854233
2,0,0.999761760235
3,0,0.999990224838
4,0,0.999931454659
5,0,0.998657941818
6,0,0.999450981617
7,0,0.999890685081
8,0,0.981650710106
9,0,0.999965548515


In [32]:
#predict the top 5 
y_pred_top2 = clsf.predict_topk(test_data, k=2)
y_pred_top2

row_id,class,probability
0,0,0.998546123505
0,6,0.00126831082162
1,0,0.999997854233
1,5,9.31468491672e-07
2,0,0.999761760235
2,6,0.000135054666316
3,0,0.999990224838
3,6,2.98468717119e-06
4,0,0.999931454659
4,2,3.03510823869e-05


4. Evaluate the classifier on the test data. Default metrics are accuracy, and confusion matrix.

In [34]:
metrics = clsf.evaluate(test_data)
metrics

{'accuracy': 0.9714999794960022, 'confusion_matrix': Columns:
 	target_label	int
 	predicted_label	int
 	count	int
 
 Rows: 70
 
 Data:
 +--------------+-----------------+-------+
 | target_label | predicted_label | count |
 +--------------+-----------------+-------+
 |      0       |        0        |  960  |
 |      2       |        0        |   2   |
 |      5       |        0        |   1   |
 |      6       |        0        |   2   |
 |      7       |        0        |   1   |
 |      8       |        0        |   1   |
 |      9       |        0        |   6   |
 |      1       |        1        |  1122 |
 |      4       |        1        |   1   |
 |      6       |        1        |   2   |
 +--------------+-----------------+-------+
 [70 rows x 3 columns]
 Note: Only the head of the SFrame is printed.
 You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.}