# Prerequisites

- Host OS: Ubuntu 20.04 lts
- Using Docker Image 'mltooling/ml-workspace-gpu' (docker pull mltooling/ml-workspace-gpu)
- Single Nvidia GPU (RTX 3080)

# Check computing resource

In [25]:
#### The number of CPU cores
!grep -c processor /proc/cpuinfo

20


In [26]:
#### GPU information
!nvidia-smi

Fri Nov  4 08:35:28 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 68%   50C    P8    30W / 370W |   7297MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+---------------------------------------------------------------------------

In [1]:
import torch

if torch.cuda.is_available():
    device = torch.device("cuda:0")
    device_count = torch.cuda.device_count()
    print("device_count: {}".format(device_count))
    for device_num in range(device_count):
        print("device {} capability {}".format(
            device_num,
            torch.cuda.get_device_capability(device_num)))
        print("device {} name {}".format(
            device_num, 
            torch.cuda.get_device_name(device_num)))
else:
    device = torch.device("cpu")
    print("no cuda device")

device_count: 1
device 0 capability (8, 6)
device 0 name NVIDIA GeForce RTX 3080


# 0. Customize Train Strategy

In [2]:
num_cpus = 16
num_gpus = 1
seed = 1234
model_name = "xlm-roberta-base" # bert-base-multilingual-cased ; klue/roberta-base ; bert-base-cased, etc.
train_proportion = 0.7 # train set : eval set = 7 : 3

# If you want to search best hyperparameters using ray tune, parameters below should be set
n_trials = 5
std = 0.1
patience = 5

# 1. Import packages

In [3]:
## Need to check if packages are compatible ##

# !pip install accelerate nvidia-ml-py3
# !pip install datasets==2.4.0
# !pip install huggingface_hub==0.9.1
# !pip install transformers==4.22.1 
# !pip install pyarrow==9.0.0
# !pip install -q ray

In [4]:
import transformers
import datasets
import huggingface_hub
import pyarrow

print(transformers.__version__)
print(datasets.__version__)
print(huggingface_hub.__version__)
print(pyarrow.__version__)

# 4.22.1
# 2.4.0
# 0.9.1
# 9.0.0

4.22.1
2.4.0
0.9.1
9.0.0


In [5]:
import os
import re
import math
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 'You can use tf32' if you are acessing Ampere hardware
import torch
torch.backends.cuda.matmul.allow_tf32 = True

from datasets import load_dataset, load_metric, ClassLabel
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, precision_score, recall_score, f1_score

from functools import partial

import ray
from ray import tune
from ray.tune import CLIReporter
from ray.tune.examples.pbt_transformers.utils import (
    download_data,
    build_compute_metrics_fn,
)
from ray.tune.schedulers import PopulationBasedTraining
from transformers import (
    glue_tasks_num_labels,
    AdamW,
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    GlueDataset,
    GlueDataTrainingArguments,
    TrainingArguments,
)

# 2. Import Data

2 files are needed (`{data_name}_train.csv` and `{data_name}_test.csv`) in your data directory (in this case, `data_splited/`).

In [6]:
data_name = "cardiovascular_sev_dataset" 

dataset = load_dataset('csv', data_files={'train': f'../data_splited/{data_name}_train.csv',
                                          'test': f'../data_splited/{data_name}_test.csv'})
dataset

Using custom data configuration default-eb5690d969448b40
Reusing dataset csv (/root/.cache/huggingface/datasets/csv/default-eb5690d969448b40/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a)


  0%|          | 0/2 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'past_history', 'treatment_effect', 'examination', 'label'],
        num_rows: 3756
    })
    test: Dataset({
        features: ['id', 'past_history', 'treatment_effect', 'examination', 'label'],
        num_rows: 940
    })
})

# 3. Data Preprocessing

In [7]:
#### Select the column you want to tokenize and label column.

# dataset = dataset.remove_columns(['id', 'treatment_effect', 'examination'])
# dataset = dataset.rename_column("past_history", "text")

# dataset = dataset.remove_columns(['id', 'examination', 'past_history'])
# dataset = dataset.rename_column("treatment_effect", "text")

dataset = dataset.remove_columns(['id', 'treatment_effect', 'past_history'])
dataset = dataset.rename_column("examination", "text")

print(dataset['train']['text'][0])

RCA c JR 5-4 



 Tubular ecc. 50% LN of mRCA



 Discrete ecc. 70% LN of dRCA 



 Total occlusion of PL br. (TIMI I)







LCA c JL 5-3.5 



 Minimal LN of dLCx



 Tubular ecc. 30% LN of mLAD



 Collaterals GII/III from septal & dLCx to PL br. 







CAD(1VD)



Successful PTCA c stent at dRCA ~ PL br. (Xience prime 2.75*38)










In [8]:
#### Remove NA rows

dataset = dataset.filter(lambda row: pd.notnull(row["text"]))

#### Remove specal characters

def remove_sp(example):
    example["text"]=re.sub(r'[^a-z|A-Z|0-9|ㄱ-ㅎ|ㅏ-ㅣ|가-힣| ]+', '', str(example["text"]))
    return example

dataset = dataset.map(remove_sp)

print(dataset)
print(dataset['train']['text'][0])

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-eb5690d969448b40/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-c3453869f7dadab7.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-eb5690d969448b40/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-224dced3e1104bb1.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-eb5690d969448b40/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-d173a66ec363ae1a.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-eb5690d969448b40/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-d8779983b55cf3b9.arrow


DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 726
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 200
    })
})
RCA c JR 54  Tubular ecc 50 LN of mRCA Discrete ecc 70 LN of dRCA  Total occlusion of PL br TIMI ILCA c JL 535  Minimal LN of dLCx Tubular ecc 30 LN of mLAD Collaterals GIIIII from septal  dLCx to PL br CAD1VDSuccessful PTCA c stent at dRCA  PL br Xience prime 27538


In [9]:
#### Tokenizing 

tokenizer = AutoTokenizer.from_pretrained(model_name, truncation_side = 'left') # truncation_side = 'left' option remains last 512 tokens

def tokenize_function(examples):
    tokenized_batch = tokenizer(examples["text"], padding="max_length", truncation=True) # padding : ['longest', 'max_length', 'do_not_pad']
    return tokenized_batch

tokenized_datasets = dataset.map(tokenize_function, batched=True)

tokenized_datasets

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-eb5690d969448b40/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-7b8b8e0f96978cf2.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-eb5690d969448b40/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-05095af8f068802e.arrow


DatasetDict({
    train: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 726
    })
    test: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 200
    })
})

In [10]:
#### Train-Evalulation-Test Split 

train_dataset = tokenized_datasets["train"].shuffle(seed=seed).select(range(0,math.floor(len(tokenized_datasets["train"])*train_proportion)))
eval_dataset = tokenized_datasets["train"].shuffle(seed=seed).select(range(math.floor(len(tokenized_datasets["train"])*train_proportion), len(tokenized_datasets["train"])))
test_dataset = tokenized_datasets["test"]

Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/csv/default-eb5690d969448b40/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-dc78cc4bfd375f97.arrow
Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/csv/default-eb5690d969448b40/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-dc78cc4bfd375f97.arrow


In [11]:
#### Applying class weights

def class_weight(train_dataset) :
    
    train_labels = np.array(train_dataset["label"])
    class_weights = compute_class_weight(class_weight = 'balanced', classes = np.unique(train_labels), y = train_labels)
    
    weights = torch.tensor(class_weights, dtype = torch.float)
    
    return weights

weights = class_weight(train_dataset)
print(f"Class Weights: {weights}")

Class Weights: tensor([0.7017, 1.7397])


# 4. Set model configuration

In [12]:
#### Initialize Ray
ray.shutdown()
ray.init(log_to_driver=False, ignore_reinit_error=True, num_cpus=num_cpus, num_gpus=num_gpus, include_dashboard=False)

####  Load the model 
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=2
        )

#### Define metrics to use for evaluation
def compute_metrics(eval_pred):
    metric1 = load_metric("accuracy")
    metric2 = load_metric("f1")
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = metric1.compute(predictions=predictions, references=labels)["accuracy"]
    f1 = metric2.compute(predictions=predictions, references=labels)["f1"]
    return {"accuracy": accuracy, "f1": f1, "objective": accuracy+f1}

#### batch size = 32, evaluate every 50 steps
training_args = TrainingArguments(
    output_dir=".",
    do_train=True,
    do_eval=True,
    evaluation_strategy="steps",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=4,
    learning_rate=2e-5, # config
    weight_decay=0.1, # config
    adam_beta1=0.1, # config
    adam_beta2=0.1, # config
    adam_epsilon=1.5e-06, # config
    num_train_epochs=15, # config
    max_steps=-1,
    lr_scheduler_type="linear",
    warmup_ratio=0.1,  # config
    warmup_steps=0,
    logging_dir="./logs",
    save_strategy="steps",
    no_cuda=num_gpus <= 0, 
    seed=seed,  # config
    bf16=False, # Need torch>=1.10, Ampere GPU with cuda>=11.0
    fp16=True,
    tf32=True, 
    eval_steps = 50,
    load_best_model_at_end=True,
    greater_is_better=True,
    metric_for_best_model="objective", # f1 + acc
    report_to="none",
    skip_memory_metrics=True,
    gradient_checkpointing=True
    )

#### Customize trainer class to apply class weights
class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get("logits")
        # compute custom loss
        weight = weights.to(device)
        loss_fct = torch.nn.CrossEntropyLoss(weight=weight)
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss
    
trainer = CustomTrainer(
    model_init=model_init,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    )

#### Fix batch_size in each trial
tune_config = {
    "per_device_eval_batch_size": 8,
    "per_device_train_batch_size": 8,
    "max_steps": -1
}

model = AutoModelForSequenceClassification.from_pretrained(model_name,
                                                           num_labels = 2,
                                                           output_attentions = False,
                                                           output_hidden_states = False)

2022-11-04 08:08:33,310	INFO worker.py:1518 -- Started a local Ray instance.
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--xlm-roberta-base/snapshots/f6d161e8f5f6f2ed433fb4023d6cb34146506b3f/config.json
Model config XLMRobertaConfig {
  "_name_or_path": "xlm-roberta-base",
  "architectures": [
    "XLMRobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "xlm-roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.22.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 250002
}

loading weights file pytorch_model.bin from

# 4.5. Hyperparameter Optimization with PBT (Optional)

- If you want to train model with fixed hyperparameters, skip this step

In [13]:
#### PBT schduler
scheduler = PopulationBasedTraining(
    time_attr="training_iteration",
    metric="objective",
    mode="max",
    perturbation_interval=1,
    hyperparam_mutations={
        "num_train_epochs": tune.randint(2, 20),
        "seed": tune.randint(1, 9999),
        "weight_decay": tune.uniform(0.0, 0.3),
        "learning_rate": tune.uniform(1e-5, 5e-5),
        "warmup_ratio": tune.uniform(0.0, 0.3),
        "adam_beta1": tune.loguniform(1e-2, 1),
        "adam_beta2": tune.loguniform(1e-3, 1),
        "adam_epsilon": tune.loguniform(1e-8, 1e-5),
    }, 
)

#### Define columns to report
reporter = CLIReporter(
    parameter_columns={
        "weight_decay": "w_decay",
        "learning_rate": "lr",
        "per_device_train_batch_size": "train_bs/gpu",
        "num_train_epochs": "num_epochs",
    },
    metric_columns=["eval_f1", "eval_accuracy", "eval_objective", "eval_loss", "epoch", "training_iteration"]
)

#### Early stopping
stopper = tune.stopper.ExperimentPlateauStopper(metric="objective", 
                                                std=std,
                                                top=n_trials,
                                                mode="max",
                                                patience=patience
                                                )

#### HPO
hpo_result = trainer.hyperparameter_search(
    hp_space = lambda _: tune_config,
    direction = "maximize",
    backend="ray",
    reuse_actors = True,
    n_trials=n_trials,
    resources_per_trial={"cpu": num_cpus, "gpu": num_gpus},
    scheduler=scheduler,
    keep_checkpoints_num=1,
    checkpoint_score_attr="training_iteration",
    stop=stopper,
    progress_reporter=reporter,
    local_dir="./test-results",
    name="tune_transformer_pbt",
    log_to_file=True,
)


from ray.air import session

def train(config):
    # ...
    session.report({"metric": metric}, checkpoint=checkpoint)

For more information please see https://docs.ray.io/en/master/ray-air/key-concepts.html#session



== Status ==
Current time: 2022-11-04 08:08:45 (running for 00:00:00.17)
Memory usage on this node: 10.7/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 16.0/16 CPUs, 1.0/1 GPUs, 0.0/16.7 GiB heap, 0.0/8.35 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------|
| _objective_e78fc_00000 | RUNNING  | 172.17.0.3:1125783 |  0.186633 | 2.75091e-05 |              8 |           17 |
| _objective_e78fc_00001 | PENDING  |                    |  0.287442 | 4.50373e-05 |              8 |           13 

2022-11-04 08:10:10,089	INFO pbt.py:552 -- [pbt]: no checkpoint for trial. Skip exploit for Trial _objective_e78fc_00001


Result for _objective_e78fc_00001:
  date: 2022-11-04_08-10-10
  done: false
  epoch: 3.12
  eval_accuracy: 0.6880733944954128
  eval_f1: 0.2765957446808511
  eval_loss: 0.6861276030540466
  eval_objective: 0.9646691391762638
  eval_runtime: 3.2191
  eval_samples_per_second: 67.721
  eval_steps_per_second: 8.698
  experiment_id: 1daf783a50e440818931ac523e6d54ae
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 1.9293382783525277
  pid: 1125783
  time_since_restore: 39.29557251930237
  time_this_iter_s: 39.29557251930237
  time_total_s: 39.29557251930237
  timestamp: 1667549410
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: e78fc_00001
  warmup_time: 0.0033075809478759766
  
== Status ==
Current time: 2022-11-04 08:10:15 (running for 00:01:29.45)
Memory usage on this node: 15.4/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 16.0/16 CPUs, 1.0/1 GPUs, 0.0/16.7 GiB heap, 0.0/8.35 GiB objects (0.0/1.

2022-11-04 08:12:10,534	INFO pbt.py:552 -- [pbt]: no checkpoint for trial. Skip exploit for Trial _objective_e78fc_00004


== Status ==
Current time: 2022-11-04 08:12:10 (running for 00:03:24.64)
Memory usage on this node: 15.4/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 16.0/16 CPUs, 1.0/1 GPUs, 0.0/16.7 GiB heap, 0.0/8.35 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------+-----------------+------------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_f1 |   eval_accuracy |   eval_objective |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------+-----------------+------------

2022-11-04 08:18:38,938	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_e78fc_00003 (score 3.028149752122628) -> _objective_e78fc_00004 (score 2.644963993291901)
2022-11-04 08:18:38,938	INFO pbt.py:636 -- [explore] perturbed config from {'num_train_epochs': 13, 'seed': 4, 'weight_decay': 0.03523301690773688, 'learning_rate': 2.5751294091754602e-05, 'warmup_ratio': 0.1358189427790103, 'adam_beta1': 0.11920532899903177, 'adam_beta2': 0.23543244225077642, 'adam_epsilon': 2.4975203109823145e-07} -> {'num_train_epochs': 7, 'seed': 4, 'weight_decay': 0.2440586785423921, 'learning_rate': 2.0601035273403682e-05, 'warmup_ratio': 0.10865515422320825, 'adam_beta1': 0.09536426319922542, 'adam_beta2': 0.18834595380062114, 'adam_epsilon': 1.9980162487858518e-07}


Result for _objective_e78fc_00004:
  date: 2022-11-04_08-18-38
  done: false
  episodes_total: 0
  epoch: 6.25
  eval_accuracy: 0.6880733944954128
  eval_f1: 0.6344086021505376
  eval_loss: 0.5661541223526001
  eval_objective: 1.3224819966459505
  eval_runtime: 3.7894
  eval_samples_per_second: 57.529
  eval_steps_per_second: 7.389
  experiment_id: 1daf783a50e440818931ac523e6d54ae
  hostname: 3481a8a2ae33
  iterations_since_restore: 2
  node_ip: 172.17.0.3
  objective: 2.644963993291901
  pid: 1125783
  time_since_restore: 77.77604556083679
  time_this_iter_s: 37.46433091163635
  time_total_s: 117.94415378570557
  timestamp: 1667549918
  timesteps_since_restore: 0
  timesteps_total: 0
  training_iteration: 2
  trial_id: e78fc_00004
  warmup_time: 0.0033075809478759766
  
== Status ==
Current time: 2022-11-04 08:18:44 (running for 00:09:58.28)
Memory usage on this node: 15.4/31.1 GiB
PopulationBasedTraining: 5 checkpoints, 1 perturbs
Resources requested: 16.0/16 CPUs, 1.0/1 GPUs, 0.0/16

2022-11-04 08:32:02,470	INFO tune.py:758 -- Total run time: 1396.83 seconds (1396.50 seconds for the tuning loop).


In [14]:
hpo_result

BestRun(run_id='e78fc_00001', objective=3.4580555738895122, hyperparameters={'per_device_eval_batch_size': 8, 'per_device_train_batch_size': 8, 'max_steps': -1, 'num_train_epochs': 13, 'seed': 1183, 'weight_decay': 0.28744180610511155, 'learning_rate': 4.503730538968379e-05, 'warmup_ratio': 0.10734518098736, 'adam_beta1': 0.10045932391231586, 'adam_beta2': 0.11230233998349251, 'adam_epsilon': 1.3743776400634128e-06})

In [15]:
for n, v in hpo_result.hyperparameters.items():
    setattr(trainer.args, n, v)

In [16]:
trainer.args

TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.10045932391231586,
adam_beta2=0.11230233998349251,
adam_epsilon=1.3743776400634128e-06,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=50,
evaluation_strategy=steps,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
greater_is_better=True,
group_by_length=False,
half_precision_backend=cuda_amp,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_f

# 5. Train, evaluate the model

In [27]:
train_history = trainer.train()

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--xlm-roberta-base/snapshots/f6d161e8f5f6f2ed433fb4023d6cb34146506b3f/config.json
Model config XLMRobertaConfig {
  "_name_or_path": "xlm-roberta-base",
  "architectures": [
    "XLMRobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "xlm-roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.22.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 250002
}

loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--xlm-roberta-base/snapshots/f6d

Step,Training Loss,Validation Loss,Accuracy,F1,Objective
50,No log,0.663442,0.701835,0.444444,1.146279
100,No log,0.591404,0.784404,0.605042,1.389446
150,No log,0.459009,0.830275,0.729927,1.560202
200,No log,0.432545,0.83945,0.761905,1.601354


The following columns in the evaluation set don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `XLMRobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 218
  Batch size = 8
The following columns in the evaluation set don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `XLMRobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 218
  Batch size = 8
The following columns in the evaluation set don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `XLMRobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 218


In [29]:
eval_result = trainer.evaluate()
eval_result

The following columns in the evaluation set don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `XLMRobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 218
  Batch size = 8


{'eval_loss': 0.4464360475540161,
 'eval_accuracy': 0.8577981651376146,
 'eval_f1': 0.7801418439716311,
 'eval_objective': 1.6379400091092458,
 'eval_runtime': 2.9376,
 'eval_samples_per_second': 74.211,
 'eval_steps_per_second': 9.532,
 'epoch': 13.0}

# 6. Test results

In [19]:
pred = trainer.predict(test_dataset=test_dataset)
pred

The following columns in the test set don't have a corresponding argument in `XLMRobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `XLMRobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 200
  Batch size = 8


PredictionOutput(predictions=array([[ 1.656  , -1.657  ],
       [ 2.58   , -2.438  ],
       [-1.841  ,  1.636  ],
       [-1.617  ,  1.385  ],
       [-2.443  ,  2.299  ],
       [ 2.182  , -2.158  ],
       [ 2.613  , -2.436  ],
       [-0.6514 ,  0.515  ],
       [ 2.592  , -2.506  ],
       [-1.197  ,  0.9746 ],
       [ 2.678  , -2.459  ],
       [ 2.613  , -2.41   ],
       [ 2.467  , -2.367  ],
       [-0.9907 ,  0.762  ],
       [-2.63   ,  2.387  ],
       [-2.107  ,  2.088  ],
       [-2.182  ,  1.978  ],
       [ 2.592  , -2.506  ],
       [ 2.137  , -2.156  ],
       [ 2.664  , -2.443  ],
       [-2.121  ,  2.088  ],
       [ 2.6    , -2.475  ],
       [ 2.6    , -2.475  ],
       [-2.459  ,  2.191  ],
       [ 2.479  , -2.332  ],
       [-2.514  ,  2.273  ],
       [ 1.694  , -1.691  ],
       [ 2.627  , -2.436  ],
       [-0.01848, -0.2177 ],
       [-0.57   ,  0.3965 ],
       [-2.443  ,  2.227  ],
       [-2.285  ,  2.02   ],
       [ 1.109  , -1.199  ],
       [ 2.549

In [20]:
label_test = list(pred.label_ids)
pred_test = list(map(lambda x: x.index(max(x)), pred.predictions.tolist()))

In [21]:
print(confusion_matrix(label_test, pred_test))

[[117  22]
 [  7  54]]


In [22]:
accuracy = accuracy_score(label_test, pred_test)
f1 = f1_score(label_test, pred_test)
recall = recall_score(label_test, pred_test)
precision = precision_score(label_test, pred_test)

print(accuracy)
print(f1)
print(recall)
print(precision)

0.855
0.7883211678832117
0.8852459016393442
0.7105263157894737


# 7. Save the model

In [23]:
# model_path = f"sev_exam_1.0"
# trainer.model.save_pretrained(model_path)
# tokenizer.save_pretrained(model_path)

Configuration saved in sev_exam_1.0/config.json
Model weights saved in sev_exam_1.0/pytorch_model.bin
tokenizer config file saved in sev_exam_1.0/tokenizer_config.json
Special tokens file saved in sev_exam_1.0/special_tokens_map.json


('sev_exam_1.0/tokenizer_config.json',
 'sev_exam_1.0/special_tokens_map.json',
 'sev_exam_1.0/sentencepiece.bpe.model',
 'sev_exam_1.0/added_tokens.json',
 'sev_exam_1.0/tokenizer.json')

In [24]:
# load model / pred

# load_model = AutoModelForSequenceClassification.from_pretrained("sev_exam_1.0/")
# load_tokenizer = AutoTokenizer.from_pretrained("sev_exam_1.0/")

# Reference

https://bo-10000.tistory.com/154  
https://huggingface.co/blog/ray-tune  
https://docs.ray.io/en/latest/tune/examples/pbt_transformers.html  
https://wood-b.github.io/post/a-novices-guide-to-hyperparameter-optimization-at-scale/#schedulers-vs-search-algorithms  
https://docs.ray.io/en/latest/tune/api_docs/search_space.html  
https://docs.ray.io/en/latest/tune/tutorials/tune-advanced-tutorial.html  
https://keras.io/examples/keras_recipes/sample_size_estimate/  
https://www.topbots.com/fine-tune-transformers-in-pytorch/  
https://docs.ray.io/en/latest/tune/api_docs/schedulers.html  
https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/  
https://docs.ray.io/en/latest/tune/faq.html  
https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#population-based-training-tune-schedulers-populationbasedtraining  
https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.hyperparameter_search  
https://docs.ray.io/en/latest/tune/api_docs/suggestion.html#optuna-tune-search-optuna-optunasearch  
https://kyunghyunlim.github.io/nlp/ml_ai/2021/09/22/hugging_face_5.html  

# Future Challenges
 - step이 늘어나면서 성능이 어떻게 좋아지는지, hp조합에 따라 어떻게 좋아지는지 시각화 추가