# Training models on CIFAR-10/100 datasets, using ***torchdistill***

## 1. Make sure you have access to GPU/TPU
Google Colab: *Runtime* -> *Change runtime type* -> *Hardware accelarator*: "GPU" or "TPU"

In [1]:
!nvidia-smi

Fri May 21 20:49:00 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   49C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## 2. Install ***torchdistill***

In [2]:
!pip install torchdistill

Collecting torchdistill
[?25l  Downloading https://files.pythonhosted.org/packages/1d/1e/98c4591040d5ba7b849432e4bc6a575a8c87aa228fa043cbfb1ead9695be/torchdistill-0.2.0-py3-none-any.whl (78kB)
[K     |████▏                           | 10kB 19.9MB/s eta 0:00:01[K     |████████▍                       | 20kB 19.0MB/s eta 0:00:01[K     |████████████▋                   | 30kB 16.6MB/s eta 0:00:01[K     |████████████████▉               | 40kB 15.0MB/s eta 0:00:01[K     |█████████████████████           | 51kB 7.1MB/s eta 0:00:01[K     |█████████████████████████▏      | 61kB 8.3MB/s eta 0:00:01[K     |█████████████████████████████▍  | 71kB 8.7MB/s eta 0:00:01[K     |████████████████████████████████| 81kB 5.6MB/s 
Collecting pyyaml>=5.4.1
[?25l  Downloading https://files.pythonhosted.org/packages/7a/a5/393c087efdc78091afa2af9f1378762f9821c9c1d7a22c5753fb5ac5f97a/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636kB)
[K     |████████████████████████████████| 645kB 33.7MB/s 
Ins

## 3. Clone ***torchdistill*** repository to use its example code and configuration files

In [3]:
!git clone https://github.com/yoshitomo-matsubara/torchdistill.git

Cloning into 'torchdistill'...
remote: Enumerating objects: 5065, done.[K
remote: Counting objects: 100% (847/847), done.[K
remote: Compressing objects: 100% (490/490), done.[K
remote: Total 5065 (delta 487), reused 600 (delta 295), pack-reused 4218[K
Receiving objects: 100% (5065/5065), 1.09 MiB | 13.29 MiB/s, done.
Resolving deltas: 100% (3101/3101), done.


## 4. Train models on CIFAR-10

Note that the hyperparameters of ResNet, WRN (Wide ResNet), and DenseNet-BC were chosen based on either train/val (splitting 50k samples into train:val = 45k:5k) or cross-validation, according to the original papers.  
For the final run (once the hyperparameters are finalized), the authors used all the training images (50k samples).  
- ResNet: https://github.com/facebookarchive/fb.resnet.torch
- WRN (Wide ResNet): https://github.com/szagoruyko/wide-residual-networks
- DenseNet-BC: https://github.com/liuzhuang13/DenseNet

The following examples demonstrate how to 1) tune hyperparameter and 2) do final-run with ResNet-20 on CIFAR-10 dataset, respectively.

### 4.1 Hyperparameter tuning based on train:val = 45k:5k
Let's start with a small model, ResNet-20, for tutorial.  

Open `torchdistill/configs/sample/cifar10/ce/resnet20-hyperparameter_tuning.yaml` and update hyperparameters as you wish e.g., number of epochs (*num_epochs*), batch size (*batch_size* in *train_data_loader* entry), learning rate (*lr* within *optimizer* entry), and so on.  
By default, the hyperparameters in the example config are identical to those in the final run config.
  
You will find a lot of module names from [PyTorch documentation](https://pytorch.org/docs/stable/index.html) and [torchvision](https://pytorch.org/docs/stable/torchvision/) such as [`SGD`](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD), [`MultiStepLR`](https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.MultiStepLR), [`CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss), [`CIFAR10`](https://pytorch.org/docs/stable/torchvision/datasets.html#torchvision.datasets.CIFAR10), [`RandomCrop`](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomCrop) (, and more). You can update their parameters or replace such modules with other modules in the packages. For instance, `SGD` could be replaced with [`Adam`](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam), and then you will change the parameters under `params` (at least delete `momentum` entry as the parameter is not for `Adam`). 

In [4]:
!python torchdistill/examples/image_classification.py --config torchdistill/configs/sample/cifar10/ce/resnet20-hyperparameter_tuning.yaml --log log/cifar10/ce/resnet20-hyperparameter_tuning.log

2021/05/21 20:49:41	INFO	torchdistill.common.main_util	Not using distributed mode
2021/05/21 20:49:41	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/cifar10/ce/resnet20-hyperparameter_tuning.yaml', device='cuda', dist_url='env://', log='log/cifar10/ce/resnet20-hyperparameter_tuning.log', seed=None, start_epoch=0, student_only=False, sync_bn=False, test_only=False, world_size=1)
2021/05/21 20:49:41	INFO	torchdistill.datasets.util	Loading dummy data
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./resource/dataset/cifar10/cifar-10-python.tar.gz
170499072it [00:03, 48999380.64it/s]                   
Extracting ./resource/dataset/cifar10/cifar-10-python.tar.gz to ./resource/dataset/cifar10
2021/05/21 20:49:49	INFO	torchdistill.datasets.util	Splitting `dummy` dataset (50000 samples in total)
2021/05/21 20:49:49	INFO	torchdistill.datasets.util	new dataset_id: `cifar10/train` (45000 samples)
2021/05/21 20:49:49	INFO	torchdistill.datasets.

### 4.2 Final run with hyperparameters determinded by the above hyperparameter-tuning
Once you tune the hyperparameters, you can update the values in **a config file whose name ends with "-final_run.yaml"**. Notice that the only difference between default example configs for hyperparameter tuning and final run is datasets entry.  
Here, the example final run config reuses the hyperparameters described in the original paper and/or the official repository.

In [5]:
!python torchdistill/examples/image_classification.py --config torchdistill/configs/sample/cifar10/ce/resnet20-final_run.yaml --log log/cifar10/ce/resnet20-final_run.log

2021/05/21 21:32:35	INFO	torchdistill.common.main_util	Not using distributed mode
2021/05/21 21:32:35	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/cifar10/ce/resnet20-final_run.yaml', device='cuda', dist_url='env://', log='log/cifar10/ce/resnet20-final_run.log', seed=None, start_epoch=0, student_only=False, sync_bn=False, test_only=False, world_size=1)
2021/05/21 21:32:35	INFO	torchdistill.datasets.util	Loading train data
Files already downloaded and verified
2021/05/21 21:32:36	INFO	torchdistill.datasets.util	dataset_id `cifar10/train`: 0.9385907649993896 sec
2021/05/21 21:32:36	INFO	torchdistill.datasets.util	Loading val data
Files already downloaded and verified
2021/05/21 21:32:37	INFO	torchdistill.datasets.util	dataset_id `cifar10/val`: 0.7459437847137451 sec
2021/05/21 21:32:37	INFO	torchdistill.datasets.util	Loading test data
Files already downloaded and verified
2021/05/21 21:32:38	INFO	torchdistill.datasets.util	dataset_id `cifar10/test`: 0.7554

## 5. More sample configurations, models, datasets...
For CIFAR-10/100 datasets, you can find more [sample configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/sample/) and [models](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/torchdistill/models/classification) in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.  
If you would like to use larger datasets e.g., **ImageNet** and **COCO** datasets and models in `torchvision` (or your own modules), refer to the [official configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/official) used in some published papers.  
Experiments with such large datasets and models will require you to use your own machine due to limited disk space and session time (12 hours for free version and 24 hours for Colab Pro) on Google Colab.


# Colab examples for knowledge distillation
You can find Colab examples for knowledge distillation experiments in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.