# Results of my local various tryouts 

In this notebook I present results obtained on training different networks with several types of dataflow.

## TL;DR

Best performance | Test accuracy | Test F170| Model | Image size | Class weights | Other details
--- | --- | --- | --- | --- | --- | ---
accuracy | 0.784 | 0.412 | DenseNet169 | 452 x 452 | None | finetunning, test time augmentations 
F170 | 0.776 | 0.467 | InceptionResNetV2 | 451 x 451 | custom formula | finetunning, test time augmentations 

These results are obtained for models trained for less than 7h. Networks, like DenseNet161, could give a better results trained a longer time.

## More details


### Model architectures

I tried to finetune 3 networks available in keras : [VGG16](https://keras.io/applications/#vgg16), [Inception-ResNetV2](https://keras.io/applications/#inceptionresnetv2), [DenseNet161](https://github.com/farizrahman4u/keras-contrib/blob/master/keras_contrib/applications/densenet.py#L491), [DenseNet169](https://github.com/farizrahman4u/keras-contrib/blob/master/keras_contrib/applications/densenet.py#L437)

Idea is simply to load network weights trained on ImageNet, freeze bottom layers and train only the most top layers and final fully-connected layers. 

#### VGG16-based model

The model based on pretrained VGG16 architecture takes all 5 conv-pooling blocks of VGG16 and adds 3 fully-connected layers with batch normalizations. 
```
input --> [block1] --> [block2] --> [block3] --> [block4] --> [block5] --> [FC|BN] -> [FC|BN] -> [FC]
```
Finetunning block is the block 5, all other bottom blocks are freezed. There is about 27M parameters to train (image size is 224x224).

#### Inception-ResNetV2-based model

The model based on pretrained Inception-ResNetV2 architecture takes all layers and adds on the top global average pooling, dropout and final fully-connected layer. Inception-ResNetV2 architecture is composed of 6 blocks:
```
--> [Stem] --> [Inception-Resnet A] --> [Reduction A] --> 
               [Inception-Resnet B] --> [Reduction B] --> 
               [Inception-Resnet C] --> 
```

Finetunning is done on the block "Inception-Resnet C" and added final fully-connected layer. This is around 22M parameters to train (image size is 451x451).

#### DenseNet (169, 161)-based models

DenseNet is one of modern network architectures showing state-of-the-art performances on various datasets. The following table describes the size and accuracy of DenseNetImageNet models on the ImageNet dataset (single crop), for which weights are provided:

Model type | ImageNet Acc (Top 1) | ImageNet Acc (Top 5) | Params (M)
---|---|---|---
DenseNet-121    |    25.02 %            |        7.71 %         |     8.0      
DenseNet-169    |    23.80 %            |        6.85 %         |     14.3     
DenseNet-201    |    22.58 %            |        6.34 %         |     20.2 
DenseNet-161    |    22.20 %            |         -   %         |     28.9 


Models based on pretrained DenseNet architectures takes all layers and adds on the top global average pooling, dropout and final fully-connected layer. DenseNet architectures are composed of 4 dense and transition blocks:
```
--> [Conv] --> [Dense1] --> [Transition1] --> ... --> [Dense4] --> [BN|Relu] -->
```
Finetunning is done on all layers from 4th Dense block and added final fully-connected layer. This is around 6.5M (DenseNet169) and 10M (DenseNet161) parameters to train (image size is 452x452).






### Imbalanced data

As we see from the tutorial notebook classes are not equally represented, there are even classes with only 4 images.  It will be very difficult to train a network to predict correctly such classes. For more info on data see [this notebook](notebooks/data_visualization.ipynb)

#### Class weights 

A basic solution that can be done is to add `class_weights` to keras. There are following options I tested:
- No class weights
- class weights computed using `sklearn.utils.class_weight`
- class weights defined by a formula

I found out that models can train rather well without setting any class weights. Compare test accuracy, for example:
```
    - InceptionResNetV2 (451x451) + TTA + custom class weights
        - test_acc=0.776 | test_f170=0.467
   - InceptionResNetV2 (451x451) + TTA + no class weights
        - test_acc=0.777 | test_f170=0.427
```

##### No class weights


##### Class weights computed using `sklearn.utils.class_weight`

This can be done as 
```
from sklearn.utils.class_weight import compute_class_weight

class_weights = compute_class_weight('balanced', list(range(403)), y_array[train_indices])
```

##### Custom class weights

I did that in the following manner: 
- compute class count from training data generator
- define weights as 
```
class_weights[class_index] = np.log(1.0 + (max_count / class_weights[class_index])**2)
```


#### Random over/under-sampling

There is an option to oversample training dataset using non-trivial data augmentations. I used [sklearn-imbalanced](https://github.com/scikit-learn-contrib/imbalanced-learn) for random over/under-samplings. However, application of basic classes oversampling creates too much data. The most represented class has 4875 images, thus random oversampling can generate about `4875 * 403 = 1964625` images. I tested such approach in the very beginning.

Another approach is to undersample most common classes to a certain number and than oversample rare classes. Such approach gives less images to train on comparing to the previous approach. The difficulty here is to undersample correctly most common classes. A solution can be to cluster by color images from a single class and pick a balanced number of each cluster representatives to form. Unfortunately, such approach requires to handle data directly, without using provided `gen_builder` and this is forbidden by organizers. The code of this approach can be found [here](notebooks/balanced_dataflow.ipynb). There is still a solution to choose images randomly in the undersampling phase. Here is a code using `sklearn-imbalanced` to create generators with under+oversampled balanced data:
```
    # Under+Oversample training data:
    # - undersample randomly images that count is larger a threshold
    # - oversample randomly all images

    undersampling_threshold = 350

    class_counts = np.zeros((403, ), dtype=np.int)
    for class_index in self.y_array[train_indices]:
        class_counts[class_index] += 1

    classes_to_undersample = np.where(class_counts > undersampling_threshold)[0]

    train_indices_to_undersample = [index for index in train_indices if self.y_array[index] in classes_to_undersample]
    train_indices_to_oversample = [index for index in train_indices if
                                   self.y_array[index] not in classes_to_undersample]

    rs = RandomUnderSampler()

    train_indices_undersampled, new_y_array = rs.fit_sample(np.array(train_indices_to_undersample)[:, None],
                                                            self.y_array[train_indices_to_undersample])
    rs = RandomOverSampler()
    new_train_indices = np.concatenate((train_indices_undersampled, np.array(train_indices_to_oversample)[:, None]))
    new_y_array = np.concatenate((new_y_array, self.y_array[train_indices_to_oversample]))

    new_train_indices, _ = rs.fit_sample(new_train_indices, new_y_array)
    new_train_indices = new_train_indices.ravel()
    gen_train = self._get_generator(indices=new_train_indices, batch_size=batch_size)
    nb_train = len(new_train_indices)

```
However, my tests did not show a gain using such approach.

#### Multiple stage training approach

Idea is to separate training into two phases: 
- warm-up network on undersampled balanced data (some rare classes)
- continue the training on the whole training dataset

Such approach gives






### Hardware

- Nvidia 1080 ti
- i7 
- 32Gb RAM
- SSD