# The Mathematical Engineering of Deep Learning

## Practical 5 (Julia version)
**For an R or Python version see the [course website](https://deeplearningmath.org/)**.

In this practical we deal with a reduced version of the CIFAR10 dataset and train convolutional neural networks. We use a small subset of the dataset to be able to compute things in sensbile time within the tutorial.

The main focus is to see how "to assemble" different convolutional layers.

### CIFAR10
See [CIFAR10 website](https://www.cs.toronto.edu/~kriz/cifar.html)

In [1]:
using MLDatasets
CIFAR10Fulltrain_x, CIFAR10Fulltrain_y = CIFAR10.traindata()
CIFAR10Fulltest_x,  CIFAR10Fulltest_y  = CIFAR10.testdata()
@show size(CIFAR10Fulltrain_x)
classNames = CIFAR10.classnames()

size(CIFAR10Fulltrain_x) = (32, 32, 3, 50000)


10-element Array{String,1}:
 "airplane"
 "automobile"
 "bird"
 "cat"
 "deer"
 "dog"
 "frog"
 "horse"
 "ship"
 "truck"

We consider only 3 classes - not all 10

In [2]:
#use only "airplane", "cat" and "truck" to make the problem "easier and quicker" for the practical
usedLabels = [0,3,9]
trainFilter = [y ∈ usedLabels for y in CIFAR10Fulltrain_y]  #use \in +[TAB] to make ∈ (element of set checking)
testFilter = [y ∈ usedLabels for y in CIFAR10Fulltest_y]
CIFAR10Filteredtrain_x, CIFAR10Filteredtrain_y = CIFAR10Fulltrain_x[:,:,:,trainFilter], CIFAR10Fulltrain_y[trainFilter]
CIFAR10Filteredtest_x,  CIFAR10Filteredtest_y  = CIFAR10Fulltest_x[:,:,:,testFilter],  CIFAR10Fulltest_y[testFilter];
@show length(CIFAR10Filteredtrain_y)
@show length(CIFAR10Filteredtest_y);

length(CIFAR10Filteredtrain_y) = 15000
length(CIFAR10Filteredtest_y) = 3000


We stay with a small training data size and validation data size - also due to time constraints. For testing we could use the full test set.

In [3]:
trainLength = 2000
validateLength = 1000;
trainRange = 1:trainLength
validateRange = (trainLength+1):(trainLength+validateLength);

In [4]:
#Creates the training, validation, and test sets 
using Flux: onehotbatch

x_train = CIFAR10Filteredtrain_x[:,:,:,trainRange]
x_validate = CIFAR10Filteredtrain_x[:,:,:,validateRange]
x_test = CIFAR10Filteredtest_x

y_train = onehotbatch(CIFAR10Filteredtrain_y[trainRange], usedLabels)
y_validate = onehotbatch(CIFAR10Filteredtrain_y[validateRange], usedLabels)
y_test = onehotbatch(CIFAR10Filteredtest_y, usedLabels);

In [5]:
#Setup training mini-batches
using Flux: onehotbatch
batchSize = 50

x_train_batches = [x_train[:, :, :, r] for r in Iterators.partition(trainRange, batchSize)];
y_train_batches = [y_train[:,r] for r in Iterators.partition(trainRange,batchSize)];

numMiniBatches = length(x_train_batches)

40

### A convolutional model

We begin with a convolutional model that has two convolutional layers, some batch normalization, maxpooling, and dropout for the dense layers that follow.

In [7]:
using Flux

function buildModel1()
    Chain(  #Assuming 32x32x3 input layer (like CIFAR10)
      Conv((3, 3), 3 => 8, relu, pad=(1, 1), stride=(1, 1)),   #32x32x8 convolutional layer
      BatchNorm(8),
      x -> maxpool(x, (2, 2)),  #16x16x8
      Conv((3, 3), 8 => 4, relu, pad=(0, 0), stride=(1, 1)), #14x14x4  convolutional layer
      BatchNorm(4),
      x -> maxpool(x, (2, 2)), #7x7x4 ,
      flatten,  #196 neurons
      Dense(196, 80, relu),
      Dropout(0.5),
      Dense(80, 40, relu), #40 neurons
      Dropout(0.5),
      Dense(40, 3), #3 output neruons
      softmax)
end

model1 = buildModel1()

Chain(Conv((3, 3), 3=>8, relu), BatchNorm(8), #13, Conv((3, 3), 8=>4, relu), BatchNorm(4), #14, flatten, Dense(196, 80, relu), Dropout(0.5), Dense(80, 40, relu), Dropout(0.5), Dense(40, 3), softmax)

To "debug" the model, it is good to consider one sample image from the data

In [9]:
sampleImage = x_train_batches[1][:,:,:,1:1]

32×32×3×1 Array{N0f8,4} with eltype FixedPointNumbers.Normed{UInt8,8}:
[:, :, 1, 1] =
 0.604  0.549  0.549  0.533  0.506  …  0.682  0.643  0.686  0.647  0.639
 0.494  0.569  0.545  0.537  0.553     0.553  0.553  0.612  0.612  0.62
 0.412  0.49   0.451  0.478  0.533     0.173  0.451  0.604  0.624  0.639
 0.4    0.486  0.576  0.518  0.729     0.169  0.443  0.576  0.514  0.569
 0.49   0.588  0.541  0.592  0.843     0.224  0.455  0.608  0.369  0.169
 0.608  0.596  0.518  0.71   0.792  …  0.204  0.447  0.631  0.4    0.075
 0.675  0.682  0.667  0.796  0.643     0.173  0.443  0.627  0.424  0.078
 0.706  0.698  0.698  0.816  0.588     0.188  0.455  0.655  0.502  0.29
 0.557  0.525  0.671  0.816  0.541     0.31   0.486  0.647  0.604  0.525
 0.435  0.431  0.753  0.796  0.467     0.678  0.584  0.596  0.612  0.467
 0.416  0.522  0.859  0.702  0.369  …  0.851  0.627  0.639  0.714  0.431
 0.427  0.639  0.918  0.663  0.424     0.651  0.557  0.643  0.702  0.388
 0.482  0.753  0.898  0.643  0.424     0

In [12]:
#The output of the (untrained) modle on the sample image
model1(sampleImage)

3×1 Array{Float32,2}:
 0.4049066
 0.28563324
 0.30946016

In [13]:
length(model1) #The number of `layers`

13

In [14]:
#This function just debugs layer by layer.
function debugModel(model)
    for topLayer = 1:length(model)
        print("Output after $topLayer layers"); flush(stdout)
        outputSize = size(model[1:topLayer](sampleImage))
        display(outputSize)
    end
end
debugModel(model1)

Output after 1 layers

(32, 32, 8, 1)

Output after 2 layers

(32, 32, 8, 1)

Output after 3 layers

(16, 16, 8, 1)

Output after 4 layers

(14, 14, 4, 1)

Output after 5 layers

(14, 14, 4, 1)

Output after 6 layers

(7, 7, 4, 1)

Output after 7 layers

(196, 1)

Output after 8 layers

(80, 1)

Output after 9 layers

(80, 1)

Output after 10 layers

(40, 1)

Output after 11 layers

(40, 1)

Output after 12 layers

(3, 1)

Output after 13 layers

(3, 1)

**Task 1**: Modify the model to have (5x5) convolutions in the first laye and no max pooling layer after the second convolutional layer. You'll need to update the number of neurons in the dense layers. The use `debugModel()` to print the size evolution of your model. Call this model, `model2`.

In [15]:
#SOLUTION:

model2 = Chain(  #Assuming 32x32x3 input layer (like CIFAR10)
  Conv((5, 5), 3 => 8, relu, pad=(1, 1), stride=(1, 1)),   #30x3-x8 convolutional layer
  BatchNorm(8),
  x -> maxpool(x, (2, 2)),  #15x15x8
  Conv((3, 3), 8 => 4, relu, pad=(0, 0), stride=(1, 1)), #13x13x4  convolutional layer
  BatchNorm(4),
  flatten,  #676 neurons
  Dense(676, 80, relu),
  Dropout(0.5),
  Dense(80, 40, relu), #40 neurons
  Dropout(0.5),
  Dense(40, 3), #3 output neruons
  softmax)

Chain(Conv((5, 5), 3=>8, relu), BatchNorm(8), #17, Conv((3, 3), 8=>4, relu), BatchNorm(4), flatten, Dense(676, 80, relu), Dropout(0.5), Dense(80, 40, relu), Dropout(0.5), Dense(40, 3), softmax)

In [16]:
debugModel(model2)

Output after 1 layers

(30, 30, 8, 1)

Output after 2 layers

(30, 30, 8, 1)

Output after 3 layers

(15, 15, 8, 1)

Output after 4 layers

(13, 13, 4, 1)

Output after 5 layers

(13, 13, 4, 1)

Output after 6 layers

(676, 1)

Output after 7 layers

(80, 1)

Output after 8 layers

(80, 1)

Output after 9 layers

(40, 1)

Output after 10 layers

(40, 1)

Output after 11 layers

(3, 1)

Output after 12 layers

(3, 1)

### Training

We now train the model using basic functions of `Flux.jl`

In [17]:
#define the loss. function
using Flux
using Flux: crossentropy
loss(x, y, model) = crossentropy(model(x), y)

loss (generic function with 1 method)

In [18]:
loss(x_train_batches[1],y_train_batches[1],model1) #Loss on first batch

1.1553065f0

In [19]:
loss(x_train,y_train,model1) #Loss on all the training data

1.1206558f0

In [20]:
#define the accuracy function
using Flux: onecold
using Statistics
accuracy(x, y, model) = mean(onecold(model(x)) .== onecold(y))

accuracy (generic function with 1 method)

In [22]:
accuracy(x_validate,y_validate,model1) #should be around 0.33 as model is garbage at this point

0.338

In [25]:
#example of computing the gradient (automatic differentiantion) on the first minibatch
gs = gradient(()->loss(x_train_batches[1],y_train_batches[1],model1),params(model1))

Grads(...)

In [26]:
#Example of a single update step of the optimizer
using Flux: update!
η = 0.002
opt = ADAM(η)
update!(opt,params(model1),gs)

### A training loop

Now we use the above for a simple training loop. On each epoch we print:
* The epoch number
* The wall time
* The accuracy computed on the validation set
* The loss (computed on the training set)

In [29]:
using Dates
function trainModel(model; epochs = 20, η = 0.005)
    opt = ADAM(η)
    for ep in 1:epochs #Loop over epochs
        
        for bi in 1:numMiniBatches #Loop over minibatches
            gs = gradient(()->loss(x_train_batches[bi],y_train_batches[bi],model),params(model))
            update!(opt,params(model),gs)
        end
        
        acc = accuracy(x_validate,y_validate,model)
        ls = loss(x_train,y_train,model)
        time = Dates.format(now(), "HH:MM:SS")
        @show ep, time, acc, ls
    end
    
    return model
end

trainModel (generic function with 1 method)

This now trains a model of type "model 1":

In [30]:
trainedModel = trainModel(buildModel1())

(ep, time, acc, ls) = (1, "22:56:37", 0.645, 0.85031986f0)
(ep, time, acc, ls) = (2, "22:56:39", 0.727, 0.67430466f0)
(ep, time, acc, ls) = (3, "22:56:42", 0.759, 0.58003885f0)
(ep, time, acc, ls) = (4, "22:56:45", 0.764, 0.5340183f0)
(ep, time, acc, ls) = (5, "22:56:47", 0.788, 0.48535916f0)
(ep, time, acc, ls) = (6, "22:56:50", 0.776, 0.4810053f0)
(ep, time, acc, ls) = (7, "22:56:52", 0.768, 0.41606054f0)
(ep, time, acc, ls) = (8, "22:56:55", 0.745, 0.51387215f0)
(ep, time, acc, ls) = (9, "22:56:58", 0.795, 0.376185f0)
(ep, time, acc, ls) = (10, "22:57:00", 0.791, 0.3298902f0)
(ep, time, acc, ls) = (11, "22:57:03", 0.785, 0.35687378f0)
(ep, time, acc, ls) = (12, "22:57:06", 0.781, 0.34605777f0)
(ep, time, acc, ls) = (13, "22:57:08", 0.801, 0.26681858f0)
(ep, time, acc, ls) = (14, "22:57:11", 0.814, 0.25501287f0)
(ep, time, acc, ls) = (15, "22:57:13", 0.79, 0.25198582f0)
(ep, time, acc, ls) = (16, "22:57:16", 0.786, 0.26506403f0)
(ep, time, acc, ls) = (17, "22:57:19", 0.812, 0.2008826

Chain(Conv((3, 3), 3=>8, relu), BatchNorm(8), #13, Conv((3, 3), 8=>4, relu), BatchNorm(4), #14, flatten, Dense(196, 80, relu), Dropout(0.5), Dense(80, 40, relu), Dropout(0.5), Dense(40, 3), softmax)

### Testing

In [31]:
#This is the testing function
testModel(model) = accuracy(x_test,y_test,model)

testModel (generic function with 1 method)

In [32]:
testModel(trainedModel)

0.7953333333333333

**Task 2**: Attempt to create a different model archiecture - use two additional convolutional layer. Try to maximize the validation accuracy and we'll see in class who gets the best test accuracy (you can only check the test accuracy once).

In [33]:
#Solution (example)

function buildModel3()
    Chain(  #Assuming 32x32x3 input layer (like CIFAR10)
      Conv((5, 5), 3 => 10, relu, pad=(1, 1), stride=(1, 1)),  
      BatchNorm(10),
      x -> maxpool(x, (2, 2)), 
      Conv((3, 3), 10 => 6, relu, pad=(1, 1), stride=(1, 1)), 
      BatchNorm(6),
      x -> maxpool(x, (2, 2)), 
      Conv((3, 3), 6 => 6, relu, pad=(1, 1), stride=(1, 1)),
      BatchNorm(6),
      Conv((3, 3), 6 => 6, relu, pad=(1, 1), stride=(1, 1)), 
      BatchNorm(6),        
      flatten, 
      Dense(294, 120, relu),
      Dropout(0.5),
      Dense(120, 60, relu),
      Dropout(0.5),
      Dense(60, 3), #3 output neruons
      softmax)
end

model1 = buildModel1()

Chain(Conv((3, 3), 3=>8, relu), BatchNorm(8), #13, Conv((3, 3), 8=>4, relu), BatchNorm(4), #14, flatten, Dense(196, 80, relu), Dropout(0.5), Dense(80, 40, relu), Dropout(0.5), Dense(40, 3), softmax)

In [34]:
debugModel(buildModel3())

Output after 1 layers

(30, 30, 10, 1)

Output after 2 layers

(30, 30, 10, 1)

Output after 3 layers

(15, 15, 10, 1)

Output after 4 layers

(15, 15, 6, 1)

Output after 5 layers

(15, 15, 6, 1)

Output after 6 layers

(7, 7, 6, 1)

Output after 7 layers

(7, 7, 6, 1)

Output after 8 layers

(7, 7, 6, 1)

Output after 9 layers

(7, 7, 6, 1)

Output after 10 layers

(7, 7, 6, 1)

Output after 11 layers

(294, 1)

Output after 12 layers

(120, 1)

Output after 13 layers

(120, 1)

Output after 14 layers

(60, 1)

Output after 15 layers

(60, 1)

Output after 16 layers

(3, 1)

Output after 17 layers

(3, 1)

In [35]:
#training
trainedModel = trainModel(buildModel3(), η = 0.001, epochs = 40)

(ep, time, acc, ls) = (1, "22:59:03", 0.633, 0.90386766f0)
(ep, time, acc, ls) = (2, "22:59:10", 0.705, 0.6717061f0)
(ep, time, acc, ls) = (3, "22:59:17", 0.724, 0.56504637f0)
(ep, time, acc, ls) = (4, "22:59:23", 0.707, 0.5984488f0)
(ep, time, acc, ls) = (5, "22:59:30", 0.744, 0.49371997f0)
(ep, time, acc, ls) = (6, "22:59:36", 0.761, 0.46326068f0)
(ep, time, acc, ls) = (7, "22:59:43", 0.761, 0.47581998f0)
(ep, time, acc, ls) = (8, "22:59:50", 0.782, 0.4326664f0)
(ep, time, acc, ls) = (9, "22:59:56", 0.771, 0.39080012f0)
(ep, time, acc, ls) = (10, "23:00:03", 0.785, 0.38403004f0)
(ep, time, acc, ls) = (11, "23:00:10", 0.792, 0.34761646f0)
(ep, time, acc, ls) = (12, "23:00:16", 0.796, 0.33840582f0)
(ep, time, acc, ls) = (13, "23:00:23", 0.799, 0.30736512f0)
(ep, time, acc, ls) = (14, "23:00:29", 0.805, 0.2966043f0)
(ep, time, acc, ls) = (15, "23:00:36", 0.795, 0.2967552f0)
(ep, time, acc, ls) = (16, "23:00:43", 0.803, 0.2861434f0)
(ep, time, acc, ls) = (17, "23:00:49", 0.806, 0.2682241

Chain(Conv((5, 5), 3=>10, relu), BatchNorm(10), #29, Conv((3, 3), 10=>6, relu), BatchNorm(6), #30, Conv((3, 3), 6=>6, relu), BatchNorm(6), Conv((3, 3), 6=>6, relu), BatchNorm(6), flatten, Dense(294, 120, relu), Dropout(0.5), Dense(120, 60, relu), Dropout(0.5), Dense(60, 3), softmax)

In [36]:
#testing
testModel(trainedModel)

0.774