<a href="https://colab.research.google.com/github/valeriodipalo/Deep_Learning/blob/master/Notes/L4_Introduction_to_NN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 0. Preparing the enviroment 

In [None]:
! pip install tensorflow

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import tensorflow.keras.backend as K 
import tensorflow.keras.models as models 
import tensorflow.keras.layers as layers 

#  1. Initiating a NN 

## 1.1. Instatiating a small convnet
Let's suppose we want to create a CNN with 3 convolution layers with these characteristics
1. 32 filters of size 3x3
2. 64 filters of size 3x3.
3. 64 filters of size 3x3

### Initiate model 

In [None]:
# intiate the model: A CNN is a sequence of convolutional layers
model = models.Sequential() 

N.B. Is important to not run the previous cell twice, because it would add further layers. 

### Add Layers

In [None]:
# add layers: Conv2D is the one generally used for images
model.add(layers.Conv2D(32,(3,3), activation ='relu', input_shape = (28,28,1)))
# Input_shape: is necessary to specify this parameter in the first layer, 
#              it defines the shape of the image to analyze

# Evaluate the model 
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________


- Output shape - is given by:
Size of image (28) - Size of filter (3) + 1 = 28 - 3 + 1 = 26
- Total params - It is obtained as:
Kernel size ((3,3)) + The bias (1) + Number of filters(32) = (3 * 3 + 1) * 32 = 320 


### Max Pooling 

In [None]:
model.add(layers.MaxPooling2D((2,2)))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
Total params: 320
Trainable params: 320
Non-trainable params: 0
_________________________________________________________________


The Maxpooling is reducing the shape of the output, but the # of Param doesn't change. This means that the model is not learning anything.

### Adding other layers 

In [None]:
model.add(layers.Conv2D(64,(3,3),activation = 'relu')) #NO INPUT SHAPE 
model.add(layers.MaxPooling2D((2,2)))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
Total params: 18,816
Trainable params: 18,816
Non-trainable params: 0
_________________________________________________________________


We can see that the image is gonna shrink, because of:
- Convolution 
- MaxPooling 


In the second Conv2D layer, the model has to learn 18,496 parameters. As conv2d passes 32 output maps, the number of trainable parameters in this layer is (32 *3 *3 + 1) * 64 = 18496. 



In [None]:
model.add(layers.Conv2D(64,(3,3),activation = 'relu')) 
# No maxpooling 
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
Total params: 55,744
Trainable params: 55,744
Non-traina

In the third Conv2D layer, the model has to learn 36,928 parameters. As conv2d_1 passes 64
output maps, the number of trainable parameters in this layer is (64 *3 *3 + 1) * 64 = 36928.

The total number of parameters that this simple model has to learn is 320 +
18496+36928=55.744

N.B. in case in which the Kernel size is bigger then the image shape we will obtain the following error message 

```
ValueError: One of the dimensions in the output is <= @ due to downsampling in conv2d_3.
Consider increasing the input size. Received input shape [None, 3, 3, 64] which would produce output shape with a zero or negative value in a dimension.
```



### Flattening
By flattenening we will obtain a dense layer, which is essential to produce results. 

In [None]:
model.add(layers.Flatten()) 
model.add(layers.Dense(64,activation = 'relu')) # pass the number of filter in the layer before 
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
 flatten (Flatten)           (None, 576)               0

# 1.2. Adding a classifier on top of the convnet

### Output Layer
We need a last layer, which will have the shape of the number of classes that we want to predict (ex. in this case we want to classify digits, so will be 10). 
In this case, given that we are going to have a multiclass prediction problem, we will use as an ACTIVATION FUNCTION: SOFTMAX (and not relu). 

In [None]:
model.add(layers.Dense(10,activation='softmax'))
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
 flatten (Flatten)           (None, 576)               0

For harder problems we will need a more complex structure. In particular, we will see Transfer learning, in which you will not need to train all the parameters, but just the last ones. 

# 1.3. Compile the model

In this final step, we select three things:
1. An *optimizer*: how the network will measure its performance on the training set;
2. A *loss function*: the mechanism that the network will use to update itself and learn;
3. *Metrics* to monitor during training and testing. In this example, we just care about accuracy.

https://keras.io/api/optimizers/

https://keras.io/api/losses/

https://keras.io/api/metrics/

In [None]:
model.compile(optimizer='rmsprop',                  #we assume is the best 
              loss = 'categorical_crossentropy',   #depends on the specific problem  
              metrics = ['accuracy'])               #you chose which one you want


# 2. Preprocessing Images
Today we are going to work with Numpy arrays. 

minst: matrices representing digits images

### Preprocessing data
N.B.1 Keras works with float32, be careful to convert

N.B.2 Normalize data

**Vectorization**
• All inputs and targets in a neural networks must be tensors of floating-point data
• In this case, we skip the data vectorization step as the data are already vectors

**Value normalization**
• In our example, image data are encoded as integers in the 0-255 range
• We need to cast it to float32 and divide by 255 to end up with floating-point values in the 0-1 range


**Good practice**
• Data should take small values
• Be homogeneous (i.e, in the same range)

In [None]:
from tensorflow.keras.datasets import mnist 
(train_images,train_labels),(test_images,test_labels) = mnist.load_data()

In [None]:
print(train_images.shape)

(60000, 28, 28)


60000: the number of images

28,28 : the shape of the images 

We know that they are black and white because we don't have channel dimentions 

In [None]:
print(test_images.shape)

(10000, 28, 28)


In [None]:
print(train_images[0])

[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   3  18  18  18 126 136
  175  26 166 255 247 127   0   0   0   0]
 [  0   0   0   0   0   0   0   0  30  36  94 154 170 253 253 253 253 253
  225 172 253 242 195  64   0   0   0   0]
 [  0   0   0   0   0   0   0  49 238 253 253 253 253 253 253 253 253 251
   93  82  82  56  39   0   0   0   0   0]
 [  0   0   0   0   0   0   0  18 219 253 253 253 253 253 198 18

A representation of the first image of the train dataset. 

### Data

In [None]:
# let's put the data in the right format 
train_images = train_images.reshape(60000,28,28,1) # Adding the B/W channel information, NECESSARY FOR KERAS
train_images = train_images.astype('float32')/255

test_images = test_images.reshape(10000,28,28,1) # Adding the B/W channel information, NECESSARY FOR KERAS
test_images = test_images.astype('float32')/255

In [None]:
print(train_images[0])

[[[0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]]

 [[0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]]

 [[0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0. 

### Labels 
Currently, the labels are in the format of integer, to work in Keras, we need to format them as a ONE-HOT ENCODING 

In [None]:
train_labels[0]

5

In [None]:
from tensorflow.keras.utils import to_categorical
train_labels = to_categorical(train_labels)
train_labels[0]

test_labels = to_categorical(test_labels)

### Fitting the model
In this section we will fit the model on the preprocessed data. 
- epoch: number of iteration 
- batch_size: number of images that we pass before to update

https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network

In [None]:
model.fit(train_images,train_labels, epochs=5, batch_size = 64)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f3cbe63c150>

In [None]:
test_loss,test_acc = model.evaluate(test_images,test_labels)
print(test_loss)
print(test_acc)

0.033668503165245056
0.9897000193595886


In [None]:
# For access to the server:
ssh -CY 3056941@dsba.sm.unibocconi.it