# Fine-tuning Transformer models for GLUE tasks, using *torchdistill*

## 1. Make sure you have access to GPU/TPU
Google Colab: Runtime -> Change runtime type -> Hardware accelarator: "GPU" or "TPU"

In [1]:
!nvidia-smi

Fri May  7 19:54:46 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## 2. Clone torchdistill repository to use its example code and configuration files

In [2]:
!git clone https://github.com/yoshitomo-matsubara/torchdistill

Cloning into 'torchdistill'...
remote: Enumerating objects: 4784, done.[K
remote: Counting objects: 100% (566/566), done.[K
remote: Compressing objects: 100% (352/352), done.[K
remote: Total 4784 (delta 325), reused 386 (delta 188), pack-reused 4218[K
Receiving objects: 100% (4784/4784), 1.02 MiB | 10.84 MiB/s, done.
Resolving deltas: 100% (2939/2939), done.


## 3. Install dependencies and *torchdistill*

In [3]:
!pip install -r torchdistill/examples/hf_transformers/requirements.txt
!pip install torchdistill

Collecting accelerate
[?25l  Downloading https://files.pythonhosted.org/packages/f7/fa/d173d923c953d930702066894abf128a7e5258c6f64cf088d2c5a83f46a3/accelerate-0.3.0-py3-none-any.whl (49kB)
[K     |████████████████████████████████| 51kB 3.4MB/s 
[?25hCollecting datasets>=1.1.3
[?25l  Downloading https://files.pythonhosted.org/packages/46/1a/b9f9b3bfef624686ae81c070f0a6bb635047b17cdb3698c7ad01281e6f9a/datasets-1.6.2-py3-none-any.whl (221kB)
[K     |████████████████████████████████| 225kB 9.2MB/s 
[?25hCollecting sentencepiece!=0.1.92
[?25l  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)
[K     |████████████████████████████████| 1.2MB 12.3MB/s 
Collecting transformers>=4.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/d8/b2/57495b5309f09fa501866e225c84532d1fd89536ea62406b2181933fb418/transformers-4.5.1-py3-none-any.whl (2.1MB)
[K

## (Optional) Configure Accelerate for 2x-speedup training by mixed-precision

If you are **NOT** using the Google Colab Pro, it will exceed 12 hours (maximum lifetimes for free Google Colab users) to fine-tune a base-sized model for the following 9 different tasks with Tesla K80.
By using mixed-precision training, you can complete all the 9 fine-tuning jobs.
[This table](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification#mixed-precision-training) gives you a good idea about how long it will take to fine-tune a BERT-Base on a Titan RTX with/without mixed-precision.

In [4]:
!accelerate config

In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): 0
Which type of machine are you using? ([0] No distributed training, [1] multi-GPU, [2] TPU): 0
How many processes in total will you use? [1]: 1
Do you wish to use FP16 (mixed precision)? [yes/NO]: yes


## 4. Fine-tuning Transformer models for GLUE tasks
The following examples demonstrate how to fine-tune pretrained BERT-Base (uncased) on each of datasets in GLUE.  
**Note**: Test splits for GLUE tasks in `datasets` package are not labeled, and you use only training and validation spltis in this example, following [Hugging Face's example](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification).

### 4.1 CoLA task

In [5]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/cola/ce/bert_base_uncased.yaml \
  --task cola \
  --log log/glue/cola/ce/bert_base_uncased.txt

2021-05-07 19:55:19.941256: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/07 19:55:20	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/cola/ce/bert_base_uncased.yaml', log='log/glue/cola/ce/bert_base_uncased.txt', seed=None, student_only=False, task_name='cola', test_only=False, world_size=1)
2021/05/07 19:55:21	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/05/07 19:55:21	INFO	filelock	Lock 140175863705040 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock
https://huggingface.co/bert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmph8kh83h5
Downloading: 1

### 4.2 SST-2 task

In [6]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/sst2/ce/bert_base_uncased.yaml \
  --task sst2 \
  --log log/glue/sst2/ce/bert_base_uncased.txt

2021-05-07 19:57:26.289365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/07 19:57:27	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/sst2/ce/bert_base_uncased.yaml', log='log/glue/sst2/ce/bert_base_uncased.txt', seed=None, student_only=False, task_name='sst2', test_only=False, world_size=1)
2021/05/07 19:57:27	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "sst2",
  "gradient_checkp

### 4.3 MRPC task

In [7]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/mrpc/ce/bert_base_uncased.yaml \
  --task mrpc \
  --log log/glue/mrpc/ce/bert_base_uncased.txt

2021-05-07 20:11:31.598741: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/07 20:11:32	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/ce/bert_base_uncased.yaml', log='log/glue/mrpc/ce/bert_base_uncased.txt', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)
2021/05/07 20:11:32	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "mrpc",
  "gradient_checkp

### 4.4 STS-B task

In [8]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/stsb/ce/bert_base_uncased.yaml \
  --task stsb \
  --log log/glue/stsb/ce/bert_base_uncased.txt

2021-05-07 20:14:00.153528: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/07 20:14:01	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/stsb/ce/bert_base_uncased.yaml', log='log/glue/stsb/ce/bert_base_uncased.txt', seed=None, student_only=False, task_name='stsb', test_only=False, world_size=1)
2021/05/07 20:14:01	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "stsb",
  "gradient_checkp

### 4.5 QQP task

In [9]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/qqp/ce/bert_base_uncased.yaml \
  --task qqp \
  --log log/glue/qqp/ce/bert_base_uncased.txt

2021-05-07 20:16:09.775156: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/07 20:16:10	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qqp/ce/bert_base_uncased.yaml', log='log/glue/qqp/ce/bert_base_uncased.txt', seed=None, student_only=False, task_name='qqp', test_only=False, world_size=1)
2021/05/07 20:16:10	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "qqp",
  "gradient_checkpoint

### 4.6 MNLI task

In [10]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/mnli/ce/bert_base_uncased.yaml \
  --task mnli \
  --log log/glue/mnli/ce/bert_base_uncased.txt

2021-05-07 22:03:18.224213: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/07 22:03:19	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_base_uncased.yaml', log='log/glue/mnli/ce/bert_base_uncased.txt', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2021/05/07 22:03:19	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "mnli",
  "gradient_checkp

### 4.7 QNLI task

In [11]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/qnli/ce/bert_base_uncased.yaml \
  --task qnli \
  --log log/glue/qnli/ce/bert_base_uncased.txt

2021-05-08 00:17:49.801334: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/08 00:17:50	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/ce/bert_base_uncased.yaml', log='log/glue/qnli/ce/bert_base_uncased.txt', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)
2021/05/08 00:17:50	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "qnli",
  "gradient_checkp

### 4.8 RTE task

In [12]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/rte/ce/bert_base_uncased.yaml \
  --task rte \
  --log log/glue/rte/ce/bert_base_uncased.txt

2021-05-08 00:56:45.253820: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/08 00:56:46	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/rte/ce/bert_base_uncased.yaml', log='log/glue/rte/ce/bert_base_uncased.txt', seed=None, student_only=False, task_name='rte', test_only=False, world_size=1)
2021/05/08 00:56:46	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "rte",
  "gradient_checkpoint

### 4.9 WNLI task

In [13]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/wnli/ce/bert_base_uncased.yaml \
  --task wnli \
  --log log/glue/wnli/ce/bert_base_uncased.txt

2021-05-08 00:58:17.249931: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021/05/08 00:58:18	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/wnli/ce/bert_base_uncased.yaml', log='log/glue/wnli/ce/bert_base_uncased.txt', seed=None, student_only=False, task_name='wnli', test_only=False, world_size=1)
2021/05/08 00:58:18	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "wnli",
  "gradient_checkp

## 5. More sample configurations, models, datasets...
You can find more [sample configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/sample/) in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.  
If you would like to use larger datasets e.g., **ImageNet** and **COCO** datasets and models in `torchvision` (or your own modules), refer to the [official configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/official) used in some published papers.  
Experiments with such large datasets and models will require you to use your own machine due to limited disk space and session time (12 hours for free version and 24 hours for Colab Pro) on Google Colab.


# Colab examples for knowledge distillation
You can find Colab examples for knowledge distillation experiments in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.