# Warm up

The code below showcases a convolutional network in Keras. It was designed to classify 100x100 rgb images into 10 classes.
This network... quite frankly, it sucks. Can you guess what's the problem? Is there just one problem?

In [1]:
import keras
import keras.layers as L
import keras.initializers as init
import tensorflow as tf

In [3]:
net = keras.models.Sequential()

net.add(L.InputLayer([100, 100, 3]))

net.add(L.Conv2D(filters=512, kernel_size=(3, 3), 
                 kernel_initializer=init.zeros()))
net.add(L.Activation('relu'))

net.add(L.Conv2D(filters=128, kernel_size=(3, 3), 
                 kernel_initializer=init.zeros()))
net.add(L.Activation('relu'))

net.add(L.Conv2D(filters=32, kernel_size=(3, 3), 
                 kernel_initializer=init.zeros()))
net.add(L.Activation('relu'))

net.add(L.MaxPool2D(pool_size=(6, 6)))

net.add(L.Conv2D(filters=8, kernel_size=(10, 10), 
                 kernel_initializer=init.RandomNormal(), padding='same'))
net.add(L.Activation('relu'))


net.add(L.Conv2D(filters=8, kernel_size=(10, 10), 
                 kernel_initializer=init.RandomNormal(), padding='same'))
net.add(L.Activation('relu'))

net.add(L.MaxPool2D(pool_size=(3, 3)))

net.add(L.Flatten()) # convert 3d tensor to a vector of features

net.add(L.Dense(units=512))
net.add(L.Activation('softmax'))

net.add(L.Dropout(rate=0.5))

net.add(L.Dense(units=512))
net.add(L.Activation('softmax'))

net.add(L.Dense(units=10))
net.add(L.Activation('sigmoid'))
net.add(L.Dropout(rate=0.5))

* [Conv2D](https://keras.io/layers/convolutional/#conv2d) - performs convolution:
    * filters: number of output channels;
    * kernel_size: an integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window;
    * padding: padding="same" adds zero padding to the input, so that the output has the same width and height, padding='valid' performs convolution only in locations where kernel and the input fully overlap;
    * activation: "relu", "tanh", etc.
    * input_shape: shape of input.
* [MaxPooling2D](https://keras.io/layers/pooling/#maxpooling2d) - performs 2D max pooling.
* [Flatten](https://keras.io/layers/core/#flatten) - flattens the input, does not affect the batch size.
* [Dense](https://keras.io/layers/core/#dense) - fully-connected layer.
    * Activation - applies an activation function.
* [LeakyReLU](https://keras.io/layers/advanced-activations/#leakyrelu) - applies leaky relu activation.
* [Dropout](https://keras.io/layers/core/#dropout) - applies dropout.

## Book of grudges
* zero init for weights will cause symmetry effect
* Too many filters for first 3x3 convolution - will lead to enormous matrix while there's just not enough relevant combinations of 3x3 images (overkill).
* Usually the further you go, the more filters you need.
* large filters (10x10 is generally a bad pactice, and you definitely need more than 10 of them
* the second of 10x10 convolution gets 8x6x6 image as input, so it's technically unable to perform such convolution.
* Softmax nonlinearity effectively makes only 1 or a few neurons from the entire layer to "fire", rendering 512-neuron layer almost useless. Softmax at the output layer is okay though
* Dropout after probability prediciton is just lame. A few random classes get probability of 0, so your probabilities no longer sum to 1 and crossentropy goes -inf.

In this exercise you have to train a new Convolutional Neural Network from scratch for the classification of images.

1. For this we will use the Keras library.
2. The aim is to achieve 99% accuracy (on validation/test set) the MNIST dataset http://yann.lecun.com/exdb/mnist/.
3. We have provided a basic Keras implementation of a CNN.
4. You are allowed to do whatever you want (except copy pasting) with the network as long as it is explained in your report.
5. Feel free to change the architecture of the network as well as parameters (e.g. learning rate, kernel sizes, ...).
6. You can try to guess parameters manually of you want, just make sure that it performs better than 99% on the validation set.
7. Sketch the final network architecture in your report.
8. Make sure you train the network on the GPU, otherwise it will be too slow.
9. Explain the plots: learning curve, accuracy wrt epoch.

In [1]:
import keras
import keras.layers as L
import keras.initializers as init
import tensorflow as tf
from keras.datasets import mnist

In [13]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))

X_trainX = X_train / 255.0
X_testX = X_test / 255.0

model = tf.keras.models.Sequential()

model.add(L.Conv2D(8, (3, 3), activation='relu', kernel_initializer=init.RandomNormal(stddev=0.1), input_shape=(28, 28, 1)))
model.add(L.Conv2D(16, (5, 5), activation='relu', kernel_initializer=init.RandomNormal(stddev=0.1), input_shape=(28, 28, 1)))
model.add(L.BatchNormalization())
model.add(L.MaxPooling2D((2, 2)))
model.add(L.Dropout(0.20))

model.add(L.Flatten())
model.add(L.Dense(100, activation='elu', kernel_initializer=init.RandomNormal(stddev=0.1)))
model.add(L.BatchNormalization())
model.add(L.Dropout(0.20))

model.add(L.Dense(10, activation='softmax'))

model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 26, 26, 8)         80        
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 22, 22, 16)        3216      
_________________________________________________________________
batch_normalization_5 (Batch (None, 22, 22, 16)        64        
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 11, 11, 16)        0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 11, 11, 16)        0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 1936)              0         
_________________________________________________________________
dense_6 (Dense)              (None, 100)              

In [14]:
loss_function = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(optimizer = keras.optimizers.Adam(), loss=loss_function, metrics=['accuracy'])
training_history = model.fit(X_train, y_train, epochs=20, batch_size=500, validation_data=(X_test, y_test))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [16]:
model.evaluate(X_test, y_test)



[0.026964688673615456, 0.991100013256073]

In [None]:
# voila - 99% accuracy in (in fact 10) 20 epochs without hyperparameter tuning

# Insights

1. here as a good training I'm just adding all the layers that I know (conv, pooling, droupout, batchnormalization etc.)
2. after dozens of trainings with different parameters of these layers I found out that using one conv layer gives around 98.5% of accuracy on the test data;
3. adding one more conv layer + batch normalization between immediately improves this to 99% even without tuning of optimizer hyperparameters in 10 epochs. 

This was a very interesting experience, as due to small size of data I was able to play a lot with the layers, hyperparameters (e.g. number of filters, etc.) 

# Going bigger

* Use `tf.keras.datasets.cifar10.load_data()` to get the data
* split to 70 - 30 train / val using `train_test_split`
* normalize the input like $x_{\text{norm}} = \frac{x}{255} - 0.5$
* We need to convert class labels to one-hot encoded vectors. Use `keras.utils.to_categorical`.

In [15]:
# normalize inputs
# convert class labels to one-hot encoded, should have shape (?, NUM_CLASSES)
y_train = ### YOUR CODE HERE
y_test = ### YOUR CODE HERE

x_val = ### YOUR CODE HERE
x_val = ### YOUR CODE HERE

y_test = ### YOUR CODE HERE
y_test = ### YOUR CODE HERE

In [None]:
# as this is repeating the next file I'll imput everything directly in the Assignment4_CNNs2.ipynb notebook