# Fine-tuning Transformer models and test prediction for GLUE tasks, using *torchdistill*

## 1. Make sure you have access to GPU/TPU
Google Colab: Runtime -> Change runtime type -> Hardware accelarator: "GPU" or "TPU"

In [None]:
!nvidia-smi

Thu May 20 22:12:05 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   60C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## 2. Clone torchdistill repository to use its example code and configuration files

In [None]:
!git clone https://github.com/yoshitomo-matsubara/torchdistill

Cloning into 'torchdistill'...
remote: Enumerating objects: 4979, done.[K
remote: Counting objects: 100% (761/761), done.[K
remote: Compressing objects: 100% (450/450), done.[K
remote: Total 4979 (delta 436), reused 531 (delta 260), pack-reused 4218[K
Receiving objects: 100% (4979/4979), 1.06 MiB | 6.51 MiB/s, done.
Resolving deltas: 100% (3050/3050), done.


## 3. Install dependencies and *torchdistill*

In [None]:
!pip install -r torchdistill/examples/hf_transformers/requirements.txt
!pip install torchdistill

Collecting accelerate
[?25l  Downloading https://files.pythonhosted.org/packages/f7/fa/d173d923c953d930702066894abf128a7e5258c6f64cf088d2c5a83f46a3/accelerate-0.3.0-py3-none-any.whl (49kB)
[K     |████████████████████████████████| 51kB 4.3MB/s 
[?25hCollecting datasets>=1.1.3
[?25l  Downloading https://files.pythonhosted.org/packages/46/1a/b9f9b3bfef624686ae81c070f0a6bb635047b17cdb3698c7ad01281e6f9a/datasets-1.6.2-py3-none-any.whl (221kB)
[K     |████████████████████████████████| 225kB 12.5MB/s 
[?25hCollecting sentencepiece!=0.1.92
[?25l  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)
[K     |████████████████████████████████| 1.2MB 44.3MB/s 
Collecting transformers>=4.6.1
[?25l  Downloading https://files.pythonhosted.org/packages/d5/43/cfe4ee779bbd6a678ac6a97c5a5cdeb03c35f9eaebbb9720b036680f9a2d/transformers-4.6.1-py3-none-any.whl (2.2MB)
[

## (Optional) Configure Accelerate for 2x-speedup training by mixed-precision

If you are **NOT** using the Google Colab Pro, it will exceed 12 hours (maximum lifetimes for free Google Colab users) to fine-tune a base-sized model for the following 9 different tasks with Tesla K80.
By using mixed-precision training, you can complete all the 9 fine-tuning jobs.
[This table](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification#mixed-precision-training) gives you a good idea about how long it will take to fine-tune a BERT-Base on a Titan RTX with/without mixed-precision.

In [None]:
!accelerate config

In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): 0
Which type of machine are you using? ([0] No distributed training, [1] multi-GPU, [2] TPU): 0
How many processes in total will you use? [1]: 1
Do you wish to use FP16 (mixed precision)? [yes/NO]: yes


## 4. Fine-tune Transformer models for GLUE tasks
The following examples demonstrate how to fine-tune pretrained BERT-Base (uncased) on each of datasets in GLUE.  
**Note**: Test splits for GLUE tasks in `datasets` package are not labeled, and you use only training and validation spltis in this example, following [Hugging Face's example](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification).

### 4.1 CoLA task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/cola/ce/bert_base_uncased.yaml \
  --task cola \
  --log log/glue/cola/ce/bert_base_uncased.txt \
  --private_output leaderboard/glue/standard/bert_base_uncased/

2021-05-20 18:17:08.944023: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/20 18:17:11	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/cola/ce/bert_base_uncased.yaml', log='log/glue/cola/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='cola', test_only=False, world_size=1)
2021/05/20 18:17:11	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/05/20 18:17:11	INFO	filelock	Lock 139654060689104 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock
https://huggingface.co/bert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /

### 4.2 SST-2 task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/sst2/ce/bert_base_uncased.yaml \
  --task sst2 \
  --log log/glue/sst2/ce/bert_base_uncased.txt \
  --private_output leaderboard/glue/standard/bert_base_uncased/

2021-05-20 18:19:08.498593: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/20 18:19:10	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/sst2/ce/bert_base_uncased.yaml', log='log/glue/sst2/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='sst2', test_only=False, world_size=1)
2021/05/20 18:19:10	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_drop

### 4.3 MRPC task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/mrpc/ce/bert_base_uncased.yaml \
  --task mrpc \
  --log log/glue/mrpc/ce/bert_base_uncased.txt \
  --private_output leaderboard/glue/standard/bert_base_uncased/

2021-05-20 18:33:06.898256: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/20 18:33:08	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/ce/bert_base_uncased.yaml', log='log/glue/mrpc/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)
2021/05/20 18:33:08	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_drop

### 4.4 STS-B task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/stsb/mse/bert_base_uncased.yaml \
  --task stsb \
  --log log/glue/stsb/mse/bert_base_uncased.txt \
  --private_output leaderboard/glue/standard/bert_base_uncased/

2021-05-20 18:35:31.299454: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/20 18:35:33	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/stsb/mse/bert_base_uncased.yaml', log='log/glue/stsb/mse/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='stsb', test_only=False, world_size=1)
2021/05/20 18:35:33	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dr

### 4.5 QQP task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/qqp/ce/bert_base_uncased.yaml \
  --task qqp \
  --log log/glue/qqp/ce/bert_base_uncased.txt \
  --private_output leaderboard/glue/standard/bert_base_uncased/

2021-05-20 18:37:33.940493: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/20 18:37:35	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qqp/ce/bert_base_uncased.yaml', log='log/glue/qqp/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='qqp', test_only=False, world_size=1)
2021/05/20 18:37:35	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout

### 4.6 MNLI task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/mnli/ce/bert_base_uncased.yaml \
  --task mnli \
  --log log/glue/mnli/ce/bert_base_uncased.txt \
  --private_output leaderboard/glue/standard/bert_base_uncased/

2021-05-20 22:16:41.333339: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/20 22:16:43	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_base_uncased.yaml', log='log/glue/mnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2021/05/20 22:16:43	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/05/20 22:16:43	INFO	filelock	Lock 140636129258704 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock
https://huggingface.co/bert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /

### 4.7 QNLI task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/qnli/ce/bert_base_uncased.yaml \
  --task qnli \
  --log log/glue/qnli/ce/bert_base_uncased.txt \
  --private_output leaderboard/glue/standard/bert_base_uncased/

2021-05-20 21:27:21.185270: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/20 21:27:23	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/ce/bert_base_uncased.yaml', log='log/glue/qnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)
2021/05/20 21:27:23	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_drop

### 4.8 RTE task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/rte/ce/bert_base_uncased.yaml \
  --task rte \
  --log log/glue/rte/ce/bert_base_uncased.txt \
  --private_output leaderboard/glue/standard/bert_base_uncased/

2021-05-21 00:29:21.369691: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/21 00:29:23	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/rte/ce/bert_base_uncased.yaml', log='log/glue/rte/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='rte', test_only=False, world_size=1)
2021/05/21 00:29:23	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout

### 4.9 WNLI task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/wnli/ce/bert_base_uncased.yaml \
  --task wnli \
  --log log/glue/wnli/ce/bert_base_uncased.txt \
  --private_output leaderboard/glue/standard/bert_base_uncased/

2021-05-21 00:31:03.998542: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/21 00:31:05	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/wnli/ce/bert_base_uncased.yaml', log='log/glue/wnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='wnli', test_only=False, world_size=1)
2021/05/21 00:31:05	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_drop

# 5. Validate your prediction files for GLUE leaderboard
To make sure your prediction files contain the right numbers of samples (lines), you should see the following output by `wc -l <your prediction dir path>`.

```
   1105 AX.tsv
   1064 CoLA.tsv
   9848 MNLI-mm.tsv
   9797 MNLI-m.tsv
   1726 MRPC.tsv
   5464 QNLI.tsv
 390966 QQP.tsv
   3001 RTE.tsv
   1822 SST-2.tsv
   1380 STS-B.tsv
    147 WNLI.tsv
 426320 total
```

In [None]:
!wc -l leaderboard/glue/standard/bert_base_uncased/*

   1105 leaderboard/glue/standard/bert_base_uncased/AX.tsv
   1064 leaderboard/glue/standard/bert_base_uncased/CoLA.tsv
   9848 leaderboard/glue/standard/bert_base_uncased/MNLI-mm.tsv
   9797 leaderboard/glue/standard/bert_base_uncased/MNLI-m.tsv
   1726 leaderboard/glue/standard/bert_base_uncased/MRPC.tsv
   5464 leaderboard/glue/standard/bert_base_uncased/QNLI.tsv
 390966 leaderboard/glue/standard/bert_base_uncased/QQP.tsv
   3001 leaderboard/glue/standard/bert_base_uncased/RTE.tsv
   1822 leaderboard/glue/standard/bert_base_uncased/SST-2.tsv
   1380 leaderboard/glue/standard/bert_base_uncased/STS-B.tsv
    147 leaderboard/glue/standard/bert_base_uncased/WNLI.tsv
 426320 total


## 6. Zip the submission files and download to make a submission

In [None]:
!zip bert_base_uncased-submission.zip leaderboard/glue/standard/bert_base_uncased/*

  adding: leaderboard/glue/standard/bert_base_uncased/AX.tsv (deflated 82%)
  adding: leaderboard/glue/standard/bert_base_uncased/CoLA.tsv (deflated 64%)
  adding: leaderboard/glue/standard/bert_base_uncased/MNLI-mm.tsv (deflated 83%)
  adding: leaderboard/glue/standard/bert_base_uncased/MNLI-m.tsv (deflated 83%)
  adding: leaderboard/glue/standard/bert_base_uncased/MRPC.tsv (deflated 64%)
  adding: leaderboard/glue/standard/bert_base_uncased/QNLI.tsv (deflated 85%)
  adding: leaderboard/glue/standard/bert_base_uncased/QQP.tsv (deflated 73%)
  adding: leaderboard/glue/standard/bert_base_uncased/RTE.tsv (deflated 84%)
  adding: leaderboard/glue/standard/bert_base_uncased/SST-2.tsv (deflated 64%)
  adding: leaderboard/glue/standard/bert_base_uncased/STS-B.tsv (deflated 56%)
  adding: leaderboard/glue/standard/bert_base_uncased/WNLI.tsv (deflated 62%)


Download the zip file from "Files" menu.  
To submit the file to the GLUE system, refer to their webpage.
https://gluebenchmark.com/

## 7. More sample configurations, models, datasets...
You can find more [sample configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/sample/) in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.  
If you would like to use larger datasets e.g., **ImageNet** and **COCO** datasets and models in `torchvision` (or your own modules), refer to the [official configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/official) used in some published papers.  
Experiments with such large datasets and models will require you to use your own machine due to limited disk space and session time (12 hours for free version and 24 hours for Colab Pro) on Google Colab.


# Colab examples for knowledge distillation
You can find Colab examples for knowledge distillation experiments in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.