# 8.2 Tensorflow and Keras
There is a special function in keras to load an image:
```
from tensorflow.keras.preprocessing.image import load_img
img = load_img(fullname, target_size=(299, 299))
x = np.array(img)
```

Image has 3 channels (RGB) each of which is a Numpy array, and this array contains numbers from 0 to 255.

# 8.3 Pre-trained CNN
There many pretrained models for Keras: https://keras.io/api/applications/

 To use one of them, *Xception*, run:
```
from tensorflow.keras.applications.xception import Xception
model = Xception(weights='imagenet', input_shape=(299, 299, 3))
X = np.array([x])
pred = model.predict(X)
```

The results don't make any sense, because our data needs *preprocessing*:
```
X = preprocess_input(X)
pred = model.predict(X)
```
The values [0; 255] are transformed to [-1; 1].

To make the predictions human-readable:
```
from tensorflow.keras.applications.xception import decode_predictions
decode_predictions(pred)
```
Here we see that for our image models predicts "jersey". It looks like a T-shirt, but remember that we use a pretrained model "*imagenet*" - it doesn't have class "T-shirt", the most closest class it has is "jersey".

# 8.4 Convolutional neural networks

CNN consists of: ***Convolutional layers, Vector representation, Dense layers***.

### 1. Convolutional layers  
Each layer consists of ***filters*** - small images (5x5, 3x3, etc), that contain simple shapes:   
- layer 1 - the simplest shapes: $-$, $/$, $|$, c
- layer 2 - more complex shapes: x, o, T
- layer 3 - even more complex shapes: s, w, q, [, ]
- ...

Inside the ***first*** layer each filter is thrown through the image building a ***feature map*** - 2x2 array with numbers. The bigger is number, the more similar is this part of image to the current filter. 
The output of each layer are these feature maps, their number is equal to the number of filters in the current layer - ***one feature map per filter***.    
Inside the ***second*** layer, there are built new feature maps, using feature maps from the previous layer and new filters - so here more complex forms can be detected.  
For example: specific piece of image has big similarity with \ and /, so first feature map has big numbers in this place. Second layer will probably recognise here a cross - X.

### 2. Vector representation
After all CLs, ***1-dim vector of features*** is built. It can have different areas for different types of features, referring to, for example, sleeve recognition, straight lines recognition etc - all info about image that NN was able to extract. The length of the array is equal to multiplied parameters of image - 299x299x3 (if there was no ***pooling*** of course).

### 3. Dense layers.
***Fully-connected layers.***
- *Binary classification*:  
  - $x = [x_1, x_2, x_3, ..., x_n] - $ feature array.   
  - $∑x_iw_i ⇒$ sigmoid $⇒$ predictions.  

- *Multiclass classification*:  
  - Using different trained weights for each class
  - $∑x_iw^1_i, ∑x_iw^2_i, ∑x_iw^3_i, ... ∑x_iw^m_i ⇒$ softmax $⇒$ $m$-dim array for probabilities of each of $m$ classes.


### 4. Pooling (CLs)


# 8.5 Transfer learning

***Transfer learning*** (TL) - focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. We will take pretrained CNN and vector representation from Xception "imagenet". We won't use a dense layer, because its specific to "imagenet": 1000 classes predicting different types of things, when we need 10 custom classes to predict only types of clothes we have.
### Loading data
```
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_gen = ImageDataGenerator(preprocessing_function=preprocess_input)

train_ds = train_gen.flow_from_directory(
    './clothing-dataset-small/train',
    target_size=(150, 150),
    batch_size=32
)
train_ds.class_indices
X, y = next(train_ds)
y[:5]
```
The same for validation dataset, but `shuffle=False`.
### Model building
We keep CNN, it's our ***base model***. We don't train it, just "freeze" and use. Only dense layers will be trained.

- `include_top=False` means we don't include dense layers from "imagenet"; by the way, "bottom" is CNN and vector representation. Remember we ***don't train*** this part:
```
base_model = Xception(
    weights='imagenet',
    include_top=False,
    input_shape=(150, 150, 3)
)
base_model.trainable = False   
```

- Creating a new "top":
```
inputs = keras.Input(shape=(150, 150, 3))
```

- Then we build the rest of our model:
```
base = base_model(inputs, training=False)
```

- The shape of outputs from base_model is (32, 5, 5, 2048). To use dense layers we need 2-dim array, that's why we need to apply ***pooling***. It takes average from each 5x5 image:
```
vectors = keras.layers.GlobalAveragePooling2D()(base)
```
- Above and under we use functions as it looks like in functional programming.
- Dense layer with the output of 10 classes:
```
outputs = keras.layers.Dense(10)(vectors)
model = keras.Model(inputs, outputs)
```
- Next predictions doesn't make any sense, because we have only build a model, but haven't trained it yet:
```
preds = model.predict(X)
preds[0]
```

## Training model and optimizer
- 
```
learning_rate = 0.01
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
```
- The parameter of dense layer is matrix of weights $W$. Optimizer will change the weights, and looking into losses, model will find the best $W$. `from_logits=True` means we don't apply $softmax$. There will be raw values in the output instead of probabilities.
```
loss = keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
history = model.fit(train_ds, epochs=10, validation_data=val_ds)
history.history['accuracy']
```


# 8.6 Adjusting the learning rate
\begin{array}{|c|c|c|c|} \hline
\text{ learning_rate}& \text{speed} & \text{apply on val data} & \text{explanation} & \text{overfitting}\\ \hline
\text{high} & \text{fast} & \text{poorly} & \text{many books/year, bad applying} & \text{yes}\\
\text{medium} & \text{OK} & \text{good} & \text{norm books/year, norm applying} & \text{OK}\\ 
\text{low} & \text{slooooow} & \text{very well} & \text{little books/year, great applying} & \text{no}\\  \hline
\end{array}

- Make a separate function for model building:
```
def make_model(learning_rate=0.01):
    base_model = Xception(
        ...
    )

    base_model.trainable = False
#########################################
    inputs = keras.Input(shape=(150, 150, 3))
    base = base_model(inputs, training=False)
    vectors = keras.layers.GlobalAveragePooling2D()(base)
    outputs = keras.layers.Dense(10)(vectors)
    model = keras.Model(inputs, outputs)
#########################################
    optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
    loss = keras.losses.CategoricalCrossentropy(from_logits=True)
    model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
    return model
  ```

- Adjusting learning rate:
```
scores = {}
for lr in [0.0001, 0.001, 0.01, 0.1]:
    print(lr)
    model = make_model(learning_rate=lr)
    history = model.fit(train_ds, epochs=10, validation_data=val_ds)
    scores[lr] = history.history
```

# 8.7 Checkpointing
- We want to save the model after each training epoch. Particularely, save only the best models.
- Save our old model:
```
model.save_weights('model_v1.h5', save_format='h5')
```
-
```
chechpoint = keras.callbacks.ModelCheckpoint(
    'xception_v1_{epoch:02d}_{val_accuracy:.3f}.h5',
    save_best_only=True,
    monitor='val_accuracy',
    mode='max'       #maximize accuracy
)
learning_rate = 0.001
model = make_model(learning_rate=learning_rate)
history = model.fit(
    train_ds,
    epochs=10,
    validation_data=val_ds,
    callbacks=[chechpoint]
)
```


# 8.8 Adding More Layers
Inner dense layers must have ***activation functions***. For output we have already used softmax and sigmoid. One of the popular for the inner layers is ***ReLU***:
\begin{equation*}
ReLU(x) = 
 \begin{cases}
   0 &\text{if } x < 0,\\
   x &\text{if } x \geq 0
 \end{cases}
\end{equation*}

- Edit function `make_model`:
```
def make_model(learning_rate=0.01, size_inner=100):
    ...
    inner = keras.layers.Dense(size_inner, activation='relu')(vectors)
    outputs = keras.layers.Dense(10)(inner)
    model = keras.Model(inputs, outputs)
```
- Adjust the output size of inner layer:
```
learning_rate = 0.001
scores = {}
for size in [10, 100, 1000]:
    print(size)

    model = make_model(learning_rate=learning_rate, size_inner=size)
    history = model.fit(train_ds, epochs=10, validation_data=val_ds)
    scores[size] = history.history

```

# 8.9 Regularization and Dropout
- A neural network might ***learn false patterns***, i.e. if it repeatedly recognizes a certain logo on a t-shirt it might learn that the logo defines the t-shirt which is wrong since the logo might also be seen on a hoodie. Model needs to focus on ***overall shape, not on details*** like logos.
- Main idea is hiding parts of the images (freeze) from being seen by the learning neural network.
- ***Dropout*** - randomly freezing parts of the image. We won't update some neurons in the inner layer each steps.
- Lets modify function `make_model`:
```
def make_model(learning_rate=0.01, size_inner=100, droprate=0.5):
    ...
```  
  ***Droprate*** - how much of neurons we freeze.
```
    inner = keras.layers.Dense(size_inner, activation='relu')(vectors)
    drop = keras.layers.Dropout(droprate)(inner)
    outputs = keras.layers.Dense(10)(drop)
```
- Adjust the droprate, increase the number of epochs:
```
learning_rate = 0.001
size = 100
scores = {}
for droprate in [0.0, 0.2, 0.5, 0.8]:
    print(droprate)
    model = make_model(
        learning_rate=learning_rate,
        size_inner=size,
        droprate=droprate
    )
    history = model.fit(train_ds, epochs=30, validation_data=val_ds)
    scores[droprate] = history.history
```
- Analyze graphics. Remember there can be *overfitting*: when accuracy on train is much more than accuracy on val.
```
for droprate, hist in scores.items():
    plt.plot(hist['val_accuracy'], label=('val=%s' % droprate))
plt.ylim(0.78, 0.86)
plt.legend()
```



# 8.10 Data Augmentation
***Data augmentation*** - generating more images from existing ones.
- Possible image transformations:
  - flip (horizontally, vertically)
  - rotation
  - shift (top, bottom, left, right)
  - shear
  - shrink (x, y)
  - zoom (in/out)
  - brightness/contrast
  - black patch
  - combine all
- Choosing augmentations
  - Use your own judgement
  - Look at the dataset. What kind of variations are there?
    - Are the objects always centered? (Rotate, shift)
  - Tune it as a hyperparameter.  
    Train it for 10-20 epochs. Is it better?
      - Yes ⇒ use
      - No ⇒ don't use
      - Same ⇒ train for more epochs

***Examples***: https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-07-neural-nets/07-augmentations.ipynb


# 8.11 Training a Larger Model


# 8.12 Using the Model
https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/course-zoomcamp/08-deep-learning/notebook.ipynb