# The Mathematical Engineering of Deep Learning

## Practical 5 (Julia version)
**For an R or Python version see the [course website](https://deeplearningmath.org/)**.

In this practical we deal with a reduced version of the CIFAR10 dataset and train convolutional neural networks.

### CIFAR10
See [CIFAR10 website](https://www.cs.toronto.edu/~kriz/cifar.html)

In [276]:
using MLDatasets
CIFAR10Fulltrain_x, CIFAR10Fulltrain_y = CIFAR10.traindata()
CIFAR10Fulltest_x,  CIFAR10Fulltest_y  = CIFAR10.testdata()
@show size(CIFAR10Fulltrain_x)
classNames = CIFAR10.classnames()

size(CIFAR10Fulltrain_x) = (32, 32, 3, 50000)


10-element Array{String,1}:
 "airplane"
 "automobile"
 "bird"
 "cat"
 "deer"
 "dog"
 "frog"
 "horse"
 "ship"
 "truck"

In [277]:
#use only "airplane", "cat" and "truck" to make the problem "easier and quicker" for the practical
usedLabels = [0,3,9]
trainFilter = [y ∈ usedLabels for y in CIFAR10Fulltrain_y]  #use \in +[TAB] to make ∈ (element of set checking)
testFilter = [y ∈ usedLabels for y in CIFAR10Fulltest_y]
CIFAR10Filteredtrain_x, CIFAR10Filteredtrain_y = CIFAR10Fulltrain_x[:,:,:,trainFilter], CIFAR10Fulltrain_y[trainFilter]
CIFAR10Filteredtest_x,  CIFAR10Filteredtest_y  = CIFAR10Fulltest_x[:,:,:,testFilter],  CIFAR10Fulltest_y[testFilter];
@show length(CIFAR10Filteredtrain_y)
@show length(CIFAR10Filteredtest_y);

length(CIFAR10Filteredtrain_y) = 15000
length(CIFAR10Filteredtest_y) = 3000


In [278]:
trainLength = 2000
validateLength = 1000;
trainRange = 1:trainLength
validateRange = (trainLength+1):(trainLength+validateLength);

In [280]:
using Flux: onehotbatch

x_train = CIFAR10Filteredtrain_x[:,:,:,trainRange]
x_validate = CIFAR10Filteredtrain_x[:,:,:,validateRange]
x_test = CIFAR10Filteredtest_x

y_train = onehotbatch(CIFAR10Filteredtrain_y[trainRange], usedLabels)
y_validate = onehotbatch(CIFAR10Filteredtrain_y[validateRange], usedLabels)
y_test = onehotbatch(CIFAR10Filteredtest_y, usedLabels);

In [161]:
using Flux: onehotbatch
batchSize = 100

x_train_batches = [x_train[:, :, :, r] for r in Iterators.partition(trainRange, batchSize)];
y_train_batches = [y_train[:,r] for r in Iterators.partition(trainRange,batchSize)];

numMiniBatches = length(x_train_batches)

20

### A convolutional model

In [162]:
using Flux

function buildModel1()
    Chain(  #Assuming 32x32x3 input layer (like CIFAR10)
      Conv((3, 3), 3 => 8, relu, pad=(1, 1), stride=(1, 1)),   #32x32x8 convolutional layer
      BatchNorm(8),
      x -> maxpool(x, (2, 2)),  #16x16x8
      Conv((3, 3), 8 => 4, relu, pad=(0, 0), stride=(1, 1)), #14x14x4  convolutional layer
      BatchNorm(4),
      x -> maxpool(x, (2, 2)), #7x7x4 ,
      flatten,  #196 neurons
      Dense(196, 80, relu),
      Dropout(0.5),
      Dense(80, 40, relu), #40 neurons
      Dropout(0.5),
      Dense(40, 3), #3 output neruons
      softmax)
end

model1 = buildModel1()

Chain(Conv((3, 3), 3=>8, relu), BatchNorm(8), #181, Conv((3, 3), 8=>4, relu), BatchNorm(4), #182, flatten, Dense(196, 80, relu), Dropout(0.5), Dense(80, 40, relu), Dropout(0.5), Dense(40, 3), softmax)

In [163]:
sampleImage = x_train_batches[1][:,:,:,1:1]

32×32×3×1 Array{N0f8,4} with eltype FixedPointNumbers.Normed{UInt8,8}:
[:, :, 1, 1] =
 0.604  0.549  0.549  0.533  0.506  …  0.682  0.643  0.686  0.647  0.639
 0.494  0.569  0.545  0.537  0.553     0.553  0.553  0.612  0.612  0.62
 0.412  0.49   0.451  0.478  0.533     0.173  0.451  0.604  0.624  0.639
 0.4    0.486  0.576  0.518  0.729     0.169  0.443  0.576  0.514  0.569
 0.49   0.588  0.541  0.592  0.843     0.224  0.455  0.608  0.369  0.169
 0.608  0.596  0.518  0.71   0.792  …  0.204  0.447  0.631  0.4    0.075
 0.675  0.682  0.667  0.796  0.643     0.173  0.443  0.627  0.424  0.078
 0.706  0.698  0.698  0.816  0.588     0.188  0.455  0.655  0.502  0.29
 0.557  0.525  0.671  0.816  0.541     0.31   0.486  0.647  0.604  0.525
 0.435  0.431  0.753  0.796  0.467     0.678  0.584  0.596  0.612  0.467
 0.416  0.522  0.859  0.702  0.369  …  0.851  0.627  0.639  0.714  0.431
 0.427  0.639  0.918  0.663  0.424     0.651  0.557  0.643  0.702  0.388
 0.482  0.753  0.898  0.643  0.424     0

In [164]:
model1(sampleImage)

3×1 Array{Float32,2}:
 0.16442631
 0.47440314
 0.3611705

In [165]:
length(model1) #The number of `layers`

13

In [166]:
function debugModel(model)
    for topLayer = 1:length(model)
        print("Output after $topLayer layers"); flush(stdout)
        outputSize = size(model[1:topLayer](sampleImage))
        display(outputSize)
    end
end
debugModel(model1)

Output after 1 layers

(32, 32, 8, 1)

Output after 2 layers

(32, 32, 8, 1)

Output after 3 layers

(16, 16, 8, 1)

Output after 4 layers

(14, 14, 4, 1)

Output after 5 layers

(14, 14, 4, 1)

Output after 6 layers

(7, 7, 4, 1)

Output after 7 layers

(196, 1)

Output after 8 layers

(80, 1)

Output after 9 layers

(80, 1)

Output after 10 layers

(40, 1)

Output after 11 layers

(40, 1)

Output after 12 layers

(3, 1)

Output after 13 layers

(3, 1)

**Task 1**: Modify the model to have (5x5) convolutions in the first layer with padding of 2 and without a max pooling layer after the second convolutional layer. You'll need to update the number of neurons in the dense layers. The use `debugModel()` to print the size evolution of your model. Call this model, `model2`.

In [167]:
#SOLUTION:

model2 = Chain(  #Assuming 32x32x3 input layer (like CIFAR10)
  Conv((5, 5), 3 => 8, relu, pad=(1, 1), stride=(1, 1)),   #30x3-x8 convolutional layer
  BatchNorm(8),
  x -> maxpool(x, (2, 2)),  #15x15x8
  Conv((3, 3), 8 => 4, relu, pad=(0, 0), stride=(1, 1)), #13x13x4  convolutional layer
  BatchNorm(4),
  flatten,  #676 neurons
  Dense(676, 80, relu),
  Dropout(0.5),
  Dense(80, 40, relu), #40 neurons
  Dropout(0.5),
  Dense(40, 3), #3 output neruons
  softmax)

Chain(Conv((5, 5), 3=>8, relu), BatchNorm(8), #185, Conv((3, 3), 8=>4, relu), BatchNorm(4), flatten, Dense(676, 80, relu), Dropout(0.5), Dense(80, 40, relu), Dropout(0.5), Dense(40, 3), softmax)

In [168]:
debugModel(model2)

Output after 1 layers

(30, 30, 8, 1)

Output after 2 layers

(30, 30, 8, 1)

Output after 3 layers

(15, 15, 8, 1)

Output after 4 layers

(13, 13, 4, 1)

Output after 5 layers

(13, 13, 4, 1)

Output after 6 layers

(676, 1)

Output after 7 layers

(80, 1)

Output after 8 layers

(80, 1)

Output after 9 layers

(40, 1)

Output after 10 layers

(40, 1)

Output after 11 layers

(3, 1)

Output after 12 layers

(3, 1)

### Training

In [169]:
using Flux
using Flux: crossentropy
loss(x, y, model) = crossentropy(model(x), y)

loss (generic function with 1 method)

In [170]:
loss(x_train_batches[1],y_train_batches[1],model1) #Loss on first batch

1.1330861f0

In [171]:
loss(x_train,y_train,model1) #Loss on all the training data

1.1863401f0

In [172]:
using Flux: onecold
using Statistics
accuracy(x, y, model) = mean(onecold(model(x)) .== onecold(y))

accuracy (generic function with 1 method)

In [173]:
accuracy(x_validate,y_validate,model1) #should be around 0.33 as model is garbage at this point

0.359

In [174]:
#example of computing the gradient (automatic differentiantion) on the first minibatch
gs = gradient(()->loss(x_train_batches[1],y_train_batches[1],model1),params(model1))

Grads(...)

In [175]:
#Example of a single update step of the optimizer
using Flux: update!
η = 0.002
opt = ADAM(η)
update!(opt,params(model1),gs)

### A training loop

In [289]:
using Dates
function trainModel(model; epochs = 20, η = 0.005)
    opt = ADAM(η)
    for ep in 1:epochs #Loop over epochs
        
        for bi in 1:numMiniBatches #Loop over minibatches
            gs = gradient(()->loss(x_train_batches[bi],y_train_batches[bi],model),params(model))
            update!(opt,params(model),gs)
        end
        
        acc = accuracy(x_validate,y_validate,model)
        ls = loss(x_train,y_train,model)
        time = Dates.format(now(), "HH:MM:SS")
        @show ep, time, acc, ls
    end
    
    return model
end


trainModel (generic function with 1 method)

In [290]:
trainedModel = trainModel(buildModel1())

(ep, time, acc, ls) = (1, "22:23:14", 0.519, 0.95736796f0)
(ep, time, acc, ls) = (2, "22:23:17", 0.664, 0.73524684f0)
(ep, time, acc, ls) = (3, "22:23:19", 0.71, 0.6201586f0)
(ep, time, acc, ls) = (4, "22:23:22", 0.746, 0.5352092f0)
(ep, time, acc, ls) = (5, "22:23:24", 0.754, 0.53658116f0)
(ep, time, acc, ls) = (6, "22:23:27", 0.774, 0.4776568f0)
(ep, time, acc, ls) = (7, "22:23:29", 0.78, 0.4284192f0)
(ep, time, acc, ls) = (8, "22:23:32", 0.762, 0.4941161f0)
(ep, time, acc, ls) = (9, "22:23:34", 0.662, 0.7206606f0)
(ep, time, acc, ls) = (10, "22:23:37", 0.722, 0.5078989f0)
(ep, time, acc, ls) = (11, "22:23:39", 0.777, 0.43756983f0)
(ep, time, acc, ls) = (12, "22:23:42", 0.756, 0.5166903f0)
(ep, time, acc, ls) = (13, "22:23:44", 0.794, 0.38723493f0)
(ep, time, acc, ls) = (14, "22:23:46", 0.735, 0.46121362f0)
(ep, time, acc, ls) = (15, "22:23:49", 0.679, 0.7666417f0)
(ep, time, acc, ls) = (16, "22:23:51", 0.807, 0.30085725f0)
(ep, time, acc, ls) = (17, "22:23:54", 0.808, 0.29053882f0)


Chain(Conv((3, 3), 3=>8, relu), BatchNorm(8), #181, Conv((3, 3), 8=>4, relu), BatchNorm(4), #182, flatten, Dense(196, 80, relu), Dropout(0.5), Dense(80, 40, relu), Dropout(0.5), Dense(40, 3), softmax)

In [291]:
testModel(model) = accuracy(x_test,y_test,model)

testModel (generic function with 1 method)

In [292]:
testModel(trainedModel)

0.8086666666666666

**Task 2**: Attempt to create a different model archiecture - use two additional convolutional layer. Try to maximize the validation accuracy and we'll see in class who gets the best test accuracy (you can only check the test accuracy once).

In [308]:
#Solution (example)
function buildModel3()
    Chain(  #Assuming 32x32x3 input layer (like CIFAR10)
      Conv((5, 5), 3 => 10, relu, pad=(1, 1), stride=(1, 1)),  
      BatchNorm(10),
      x -> maxpool(x, (2, 2)), 
      Conv((3, 3), 10 => 6, relu, pad=(1, 1), stride=(1, 1)), 
      BatchNorm(6),
      x -> maxpool(x, (2, 2)), 
      Conv((3, 3), 6 => 6, relu, pad=(1, 1), stride=(1, 1)),
      BatchNorm(6),
      Conv((3, 3), 6 => 6, relu, pad=(1, 1), stride=(1, 1)), 
      BatchNorm(6),        
      flatten, 
      Dense(294, 120, relu),
      Dropout(0.5),
      Dense(120, 60, relu),
      Dropout(0.5),
      Dense(60, 3), #3 output neruons
      softmax)
end

model1 = buildModel1()

Chain(Conv((3, 3), 3=>8, relu), BatchNorm(8), #181, Conv((3, 3), 8=>4, relu), BatchNorm(4), #182, flatten, Dense(196, 80, relu), Dropout(0.5), Dense(80, 40, relu), Dropout(0.5), Dense(40, 3), softmax)

In [309]:
debugModel(buildModel3())

Output after 1 layers

(30, 30, 10, 1)

Output after 2 layers

(30, 30, 10, 1)

Output after 3 layers

(15, 15, 10, 1)

Output after 4 layers

(15, 15, 6, 1)

Output after 5 layers

(15, 15, 6, 1)

Output after 6 layers

(7, 7, 6, 1)

Output after 7 layers

(7, 7, 6, 1)

Output after 8 layers

(7, 7, 6, 1)

Output after 9 layers

(7, 7, 6, 1)

Output after 10 layers

(7, 7, 6, 1)

Output after 11 layers

(294, 1)

Output after 12 layers

(120, 1)

Output after 13 layers

(120, 1)

Output after 14 layers

(60, 1)

Output after 15 layers

(60, 1)

Output after 16 layers

(3, 1)

Output after 17 layers

(3, 1)

In [311]:
trainedModel = trainModel(buildModel3(),η = 0.001, epochs = 40)

(ep, time, acc, ls) = (1, "22:38:30", 0.499, 0.99232835f0)
(ep, time, acc, ls) = (2, "22:38:36", 0.608, 0.89747524f0)
(ep, time, acc, ls) = (3, "22:38:43", 0.642, 0.8235785f0)
(ep, time, acc, ls) = (4, "22:38:50", 0.655, 0.720899f0)
(ep, time, acc, ls) = (5, "22:38:56", 0.706, 0.6534424f0)
(ep, time, acc, ls) = (6, "22:39:03", 0.734, 0.5993063f0)
(ep, time, acc, ls) = (7, "22:39:09", 0.727, 0.5949997f0)
(ep, time, acc, ls) = (8, "22:39:16", 0.74, 0.56174f0)
(ep, time, acc, ls) = (9, "22:39:22", 0.75, 0.4936087f0)
(ep, time, acc, ls) = (10, "22:39:29", 0.767, 0.49218863f0)
(ep, time, acc, ls) = (11, "22:39:36", 0.753, 0.4795915f0)
(ep, time, acc, ls) = (12, "22:39:43", 0.773, 0.44230872f0)
(ep, time, acc, ls) = (13, "22:39:49", 0.767, 0.41371825f0)
(ep, time, acc, ls) = (14, "22:39:56", 0.77, 0.39146912f0)
(ep, time, acc, ls) = (15, "22:40:03", 0.778, 0.37203264f0)
(ep, time, acc, ls) = (16, "22:40:10", 0.795, 0.3630161f0)
(ep, time, acc, ls) = (17, "22:40:16", 0.778, 0.35663137f0)
(ep,

Chain(Conv((5, 5), 3=>10, relu), BatchNorm(10), #343, Conv((3, 3), 10=>6, relu), BatchNorm(6), #344, Conv((3, 3), 6=>6, relu), BatchNorm(6), Conv((3, 3), 6=>6, relu), BatchNorm(6), flatten, Dense(294, 120, relu), Dropout(0.5), Dense(120, 60, relu), Dropout(0.5), Dense(60, 3), softmax)

In [312]:
testModel(trainedModel)

0.7856666666666666