# Train Image Classifiers

In this notebook we will train an image classifier that classify fruit images, using MMClassificaiton.

## Prepare a Dataset

We have already prepared a dataset.

Credit to Zihao: https://github.com/TommyZihao/MMClassification_Tutorials

```
!curl -O https://zihao-openmmlab.obs.myhuaweicloud.com/20220716-mmclassification/dataset/fruit30/fruit30_split.zip
!unzip -d data fruit30_split.zip
```

The dataset should be categorized by folders, for MMClassification to read.

## Prepare a Config and Checkpoint File

For speed consideration, we use a lightweight neural network, MobileNetV2.

we use mim to download the config file and checkponit file.

```
!mim download mmcls --config mobilenet-v2_8xb32_in1k --dest .
!mv mobilenet-v2_8xb32_in1k.py mobilenet-v2_fruit.py
```

If you prefer to play with other models, navitage to [MMClassification model zoo](https://mmclassification.readthedocs.io/en/latest/model_zoo.html).

In [3]:
!mim download mmcls --config mobilenet-v2_8xb32_in1k --dest .
!mv mobilenet-v2_8xb32_in1k.py mobilenet-v2_fruit.py

processing mobilenet-v2_8xb32_in1k...
mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth exists in C:\Users\wangruohui\Desktop\sjtu-openmmlab-tutorial
Successfully dumped mobilenet-v2_8xb32_in1k.py to C:\Users\wangruohui\Desktop\sjtu-openmmlab-tutorial




## Modify the Config File

1. Remove some intermediate item for clean: `dataset_type`, `img_norm_cfg`, `train_pipeline`, `test_pipeline`
1. Modify model
    1. number of class: from 1000 to 30
    2. pretrain weights: from None to the downloaded checkpoint file, as we finetune the model instead of training from scratch
1. Data: for train/val/test 
    1. `type`: `ImageNet` -> `CustomDataset`
    2. `prefix`, which is the root path to images: modify to `"data/fruit30_split/train"` or `"data/fruit30_split/val"`
    3. `ann_file`, use folder name as class name: modify to `None`
1. Runner and Optimizer
    1. number of training epochs: `runner.max_epochs`
    1. learning rates: `optimizer.lr`, usually divided by 8 due to linear scaling rules.
1. Misc
    1. Decrease `log_confg.interval` for small computation power
    1. Increate `checkpoint_config.interval` to avoid saving too many checkpoint, to same time and disk space
1. Further parameter tuning you may try
    1. learning rates: Decrease `optimizer.lr` for finetuning 
    1. configure learning scheduler to decrease learning when loss saturates. Moreover, by setting `by_epoch=False`, we decrease learning rate by iteration instead of by epoches.
    1. Monitor loss decrease and re-tune
    1. More available lr_schedulers are available in [mmcv](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py)

## Launch Training

In command line 

```
mim train mmcls mobilenet-v2_fruit.py
```

## Understand Logs


The log is long but mainly contains the following parts:

1. Toolbox information
2. Dumped Config files
3. Model Initialization Logs
    1. Check `mmcls - INFO - load checkpoint from local path: mobilenet_v2_batch256_imagenet_20200708-3b2dc3af.pth`, which means pretrained weights are loaded correctly.
4. Information on Hooks: we don't configure this explicitly in this tutorial, so ignore that
5. Training progress
    1. Training logs: including current learning, training loss, time consumption, memory occupation
    2. Valiation logs: Accuracy on validation set