# 0. GPU check

* 이 코드는 Nvidia GPU를 사용하는 컴퓨터에서, train / test 데이터가 분리되어있는 csv 파일을 사용하는 것을 전제로 작성됨

In [1]:
import torch

if torch.cuda.is_available():
    device_count = torch.cuda.device_count()
    print("device_count: {}".format(device_count))
    for device_num in range(device_count):
        print("device {} capability {}".format(
            device_num,
            torch.cuda.get_device_capability(device_num)))
        print("device {} name {}".format(
            device_num, 
            torch.cuda.get_device_name(device_num)))
else:
    print("no cuda device")

device_count: 1
device 0 capability (8, 6)
device 0 name NVIDIA GeForce RTX 3080


In [2]:
from pynvml import *

def print_gpu_utilization():
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(handle)
    print(f"GPU memory occupied: {info.used//1024**2} MB.")

def print_summary(result):
    print(f"Time: {result.metrics['train_runtime']:.2f}")
    print(f"Samples/second: {result.metrics['train_samples_per_second']:.2f}")
    print_gpu_utilization()
    
print_gpu_utilization()

GPU memory occupied: 418 MB.


* 모델 훈련과정에서 GPU 메모리 용량 초과 시, 개발서버 콘솔에서 직접 `nvidia-smi` 명령어 실행 후 메모리를 점유하고 있는 process의 PID를 찾아 `sudo kill -9 {pid}` 로 프로세스 종료해주면 됨

# 1. Import packages

In [3]:
## Need to check if packages are compatible
# !pip install accelerate nvidia-ml-py3
# !pip install datasets==2.4.0
# !pip install huggingface_hub==0.9.1
# !pip install transformers==4.22.1 # bf16, tf32 등 사용하려면 4.2 이상 필요
# !pip install pyarrow==9.0.0

* huggingface_hub와 transformers 간 호환가능한 버전 확인 필요
* 만약 성능 테스트를 위해 datasets api를 사용할거라면 datasets 역시 호환 가능 버전 확인해야 함
* 세 가지 dependencies를 사용한다는 가정 하에, pyarrow 라이브러리도 필요.

In [4]:
## Install libraries for optimizing hyperparameters

# !pip install ray optuna
# !pip install sigopt
# !pip install wandb

In [5]:
import transformers
import datasets
import huggingface_hub
import pyarrow

print(transformers.__version__)
print(datasets.__version__)
print(huggingface_hub.__version__)
print(pyarrow.__version__)

# 4.22.1
# 2.4.0
# 0.9.1
# 9.0.0

4.22.1
2.4.0
0.9.1
9.0.0


In [6]:
import os
import re
import math
import numpy as np
import pandas as pd

# 'You can use tf32' if you are acessing Ampere hardware
import torch
torch.backends.cuda.matmul.allow_tf32 = True

from datasets import load_dataset, load_metric, ClassLabel
import ray
from ray import tune
from ray.tune import CLIReporter
from ray.tune.examples.pbt_transformers.utils import (
    download_data,
    build_compute_metrics_fn,
)
from ray.tune.schedulers import PopulationBasedTraining
from transformers import (
    glue_tasks_num_labels,
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    GlueDataset,
    GlueDataTrainingArguments,
    TrainingArguments,
)


# 2. Import Data

* xxx_train.csv, xxx_test.csv 파일은 아래 형식으로 전처리된 csv 파일이어야 함 (column name: `text`, `label`)


<table class="features-table">
  <tr>
    <th class="mdc-text-light-green-600", style="text-align:center">
    text
    </th>
    <th class="mdc-text-purple-600", style="text-align:center">
    label
    </th>
  </tr>
  <tr>
    <td class="mdc-bg-light-green-50" style="text-align:left">
      Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...
    </td>
    <td class="mdc-bg-purple-50">
      0
    </td>
  </tr>
  <tr>
    <td class="mdc-bg-light-green-50" style="text-align:left">
      Ok lar... Joking wif u oni...
    </td>
    <td class="mdc-bg-purple-50">
      0
    </td>
  </tr>
  <tr>
    <td class="mdc-bg-light-green-50" style="text-align:left">
      Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)
    </td>
    <td class="mdc-bg-purple-50">
      1
    </td>
  </tr>
  <tr>
    <td class="mdc-bg-light-green-50" style="text-align:left">
      U dun say so early hor... U c already then say...
    </td>
    <td class="mdc-bg-purple-50">
      0
    </td>
  </tr>
  <tr>
    <td class="mdc-bg-light-green-50" style="text-align:left">
      Nah I don't think he goes to usf, he lives around here though
    </td>
    <td class="mdc-bg-purple-50">
      0
    </td>
  </tr>
</table>

In [7]:
data_name = "financial_news" ## covid_articles / financial_news / IMDB / naver_movie_review / spam

dataset = load_dataset('csv', data_files={'train': f'../data_splited/{data_name}_train.csv',
                                          'test': f'../data_splited/{data_name}_test.csv'})
dataset

Using custom data configuration default-b54327dcafa3f6de
Reusing dataset csv (/root/.cache/huggingface/datasets/csv/default-b54327dcafa3f6de/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a)


  0%|          | 0/2 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8602
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2151
    })
})

# 3. Data Preprocessing

* load_dataset 함수로 불러온 데이터를 수정할 때는 수정 내용을 담은 함수를 만들고, 이를 map 함수로 각 원소에 적용함 ([링크](https://huggingface.co/docs/datasets/v1.4.0/processing.html#processing-data-row-by-row)에서 확인)

In [8]:
## remove specal characters

def remove_sp(example):
    example["text"]=re.sub(r'[^a-z|A-Z|0-9|ㄱ-ㅎ|ㅏ-ㅣ|가-힣| ]+', '', str(example["text"]))
    return example

dataset = dataset.map(remove_sp)

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-b54327dcafa3f6de/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-0a289c34d582783d.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-b54327dcafa3f6de/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-4e0c7b4a17783a4a.arrow


In [9]:
## label encoding

labels = list(set(dataset["train"]["label"]+dataset["test"]["label"]))
num_labels = len(labels)

def encoding_label(example):
    str_to_int = ClassLabel(num_classes=num_labels, names=labels)
    example["label"]=str_to_int.str2int(example["label"])
    return example

if type(labels[0]) == str:
    dataset = dataset.map(encoding_label)

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-b54327dcafa3f6de/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-59064a61b9ca7b76.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-b54327dcafa3f6de/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-85fcd806e22e628a.arrow


# 4. Load PLM & Tokenizing

In [10]:
model_name = "bert-base-multilingual-cased"
# model_name = "klue/bert-base"
# model_name = "klue/roberta-base"
# model_name = "xlm-roberta-base"

In [11]:
# Download cache tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name)

In [12]:
def tokenize_function(examples):
    tokenized_batch = tokenizer(examples["text"], padding="max_length", truncation=True) # padding : ['longest', 'max_length', 'do_not_pad']
    return tokenized_batch

In [13]:
tokenized_datasets = dataset.map(tokenize_function, batched=True)

Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-b54327dcafa3f6de/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-22deb7b249cc6704.arrow


  0%|          | 0/3 [00:00<?, ?ba/s]

In [14]:
# train_dataset = tokenized_datasets["train"].shuffle(seed=1919).select(range(0,math.floor(len(tokenized_datasets["train"])*0.7)))
# eval_dataset = tokenized_datasets["train"].shuffle(seed=1919).select(range(math.floor(len(tokenized_datasets["train"])*0.7), len(tokenized_datasets["train"])))
# test_dataset = tokenized_datasets["test"]

In [15]:
train_dataset = tokenized_datasets["train"].shuffle(seed=1919).select(range(1000))
eval_dataset = tokenized_datasets["train"].shuffle(seed=1919).select(range(1000))
test_dataset = tokenized_datasets["test"]

Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/csv/default-b54327dcafa3f6de/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-53897b53326c36e8.arrow
Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/csv/default-b54327dcafa3f6de/0.0.0/652c3096f041ee27b04d2232d41f10547a8fecda3e284a79a0ec4053c916ef7a/cache-53897b53326c36e8.arrow


In [16]:
train_dataset

Dataset({
    features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 1000
})

# 5. Modeling

In [17]:
task_data_dir = "test-model"
gpus_per_trial = 1
cpus_per_trial = 20
n_trials = 5
metric = load_metric("accuracy") # atasets.list_metrics() 

In [18]:
# Download model and features

config = AutoConfig.from_pretrained(
    model_name, 
    num_labels=num_labels
)

def model_init():
    return AutoModelForSequenceClassification.from_pretrained(
        model_name,
        config=config
        )

In [19]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions=np.argmax(logits, axis = -1)
    return metric.compute(predictions=predictions, references=labels)

```python
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=1,              # total number of training epochs
    per_device_train_batch_size=1,   # batch size per device during training
    per_device_eval_batch_size=10,   # batch size for evaluation
    warmup_steps=1000,               # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=200,               # How often to print logs
    do_train=True,                   # Perform training
    do_eval=True,                    # Perform evaluation
    evaluation_strategy="epoch",     # evalute after each epoch
    gradient_accumulation_steps=64,  # total number of steps before back propagation
    fp16=True,                       # Use mixed precision
    fp16_opt_level="02",             # mixed precision mode
    run_name="ProBert-BFD-MS",       # experiment name
    seed=3                           # Seed for experiment reproducibility 3x3
)
```

In [20]:
training_args = TrainingArguments(
    output_dir=".",
    learning_rate=1e-5,  # config
    do_train=True,
    do_eval=True,
    no_cuda=gpus_per_trial <= 0,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    num_train_epochs=2,  # config
    max_steps=-1,
    per_device_train_batch_size=8,  # config
    per_device_eval_batch_size=8,  # config
    warmup_steps=0,
    weight_decay=0.1,  # config
    logging_dir="./logs",
    skip_memory_metrics=True,
    report_to="none",
    fp16=True,
    # bf16=True,
    # tf32=True,
    gradient_accumulation_steps=4,
    gradient_checkpointing= True
    )

trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    )

loading weights file bert-base-multilingual-cased/pytorch_model.bin
Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of 

In [21]:
# 김수빈 선생님
# # You can modify config arguments to change hyper-parameters
# def set_config(modeltype) :
    
#     # config = {"seed" : 818,
#     #           "model_type" : modeltype,
#     #           "num_labels" : 2,
#     #           "bias_correction" : tune.grid_search([True, False]),
#     #           "batch_size" : 6,
#     #           "eps" : 1e-8,
#     #           "warmup" : tune.grid_search([0,0.1]),
#     #           "lr" : tune.grid_search([2e-5, 1e-6, 25e-6])}
    
#     config = {"seed" : 818,
#               "model_type" : modeltype,
#               "num_labels" : 2,
#               "bias_correction" : True,
#               "batch_size" : 8,
#               "eps" : 1e-8,
#               "warmup" : 0.1,
#               "beta" : (0.9, 0.999),
#               "lr" : 2e-5}

In [None]:
# Hyperparameter tuning with ray tune

tune_config = {
    "per_device_train_batch_size": 32, 
    "per_device_eval_batch_size": 32,
    "num_train_epochs": tune.choice([2, 3]),
    "max_steps": 10 # -1
}

# PopulationBasedTraining
# worker might copy the model parameters from a better performing worker or explore new hyperparameters by changing the current values randomly
# cf. ASHAScheduler
scheduler = PopulationBasedTraining(
    time_attr="training_iteration",
    metric="eval_accuracy",
    mode="max",
    perturbation_interval=1,
    hyperparam_mutations={
        "weight_decay": tune.uniform(0.0, 0.3), # tune.uniform(1, 10) == np.random.uniform(1, 10)
        "learning_rate": tune.uniform(1e-5, 5e-5),
        "per_device_train_batch_size": [32],
    },
)

reporter = CLIReporter(
    parameter_columns={
        "weight_decay": "w_decay",
        "learning_rate": "lr",
        "per_device_train_batch_size": "train_bs/gpu",
        "num_train_epochs": "num_epochs",
    },
    metric_columns=["eval_accuracy", "eval_loss", "epoch", "training_iteration"],
)

result = trainer.hyperparameter_search(
    hp_space = lambda _: tune_config,
    backend="ray",
    n_trials=n_trials,
    resources_per_trial={"cpu": cpus_per_trial, "gpu": gpus_per_trial},
    scheduler=scheduler,
    keep_checkpoints_num=1,
    checkpoint_score_attr="training_iteration",
    stop=None,
    progress_reporter=reporter,
    local_dir="./test-results",
    name="tune_transformer_pbt",
    log_to_file=True,
)

2022-10-07 07:28:29,986	INFO worker.py:1518 -- Started a local Ray instance.

from ray.air import session

def train(config):
    # ...
    session.report({"metric": metric}, checkpoint=checkpoint)

For more information please see https://docs.ray.io/en/master/ray-air/key-concepts.html#session

[2m[36m(pid=1812654)[0m 2022-10-07 07:28:38.146109: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:28:36 (running for 00:00:00.17)
Memory usage on this node: 8.0/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------|
| _objective_a80be_00000 | RUNNING  | 172.17.0.3:1812654 | 0.238963  | 1.73374e-05 |             32 |            2 |
| _objective_a80be_00001 | PENDING  |                    | 0.179598  | 1.62407e-05 |             32 |            3 

[2m[36m(_objective pid=1812654)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight']
[2m[36m(_objective pid=1812654)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1812654)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:28:49 (running for 00:00:12.45)
Memory usage on this node: 12.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------|
| _objective_a80be_00000 | RUNNING  | 172.17.0.3:1812654 | 0.238963  | 1.73374e-05 |             32 |            2 |
| _objective_a80be_00001 | PENDING  |                    | 0.179598  | 1.62407e-05 |             32 |            3

 30%|███       | 3/10 [00:06<00:16,  2.30s/it]
 40%|████      | 4/10 [00:09<00:13,  2.30s/it]


== Status ==
Current time: 2022-10-07 07:28:54 (running for 00:00:17.46)
Memory usage on this node: 12.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------|
| _objective_a80be_00000 | RUNNING  | 172.17.0.3:1812654 | 0.238963  | 1.73374e-05 |             32 |            2 |
| _objective_a80be_00001 | PENDING  |                    | 0.179598  | 1.62407e-05 |             32 |            3

 50%|█████     | 5/10 [00:11<00:11,  2.31s/it]
 60%|██████    | 6/10 [00:13<00:09,  2.31s/it]


== Status ==
Current time: 2022-10-07 07:28:59 (running for 00:00:22.46)
Memory usage on this node: 12.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------|
| _objective_a80be_00000 | RUNNING  | 172.17.0.3:1812654 | 0.238963  | 1.73374e-05 |             32 |            2 |
| _objective_a80be_00001 | PENDING  |                    | 0.179598  | 1.62407e-05 |             32 |            3

 70%|███████   | 7/10 [00:16<00:06,  2.31s/it]
 80%|████████  | 8/10 [00:18<00:04,  2.18s/it]
[2m[36m(_objective pid=1812654)[0m 
  0%|          | 0/32 [00:00<?, ?it/s][A
[2m[36m(_objective pid=1812654)[0m 
  6%|▋         | 2/32 [00:00<00:02, 13.12it/s][A
[2m[36m(_objective pid=1812654)[0m 
 12%|█▎        | 4/32 [00:00<00:03,  8.25it/s][A
[2m[36m(_objective pid=1812654)[0m 
 16%|█▌        | 5/32 [00:00<00:03,  7.66it/s][A
[2m[36m(_objective pid=1812654)[0m 
 19%|█▉        | 6/32 [00:00<00:03,  7.26it/s][A
[2m[36m(_objective pid=1812654)[0m 
 22%|██▏       | 7/32 [00:00<00:03,  7.00it/s][A
[2m[36m(_objective pid=1812654)[0m 
 25%|██▌       | 8/32 [00:01<00:03,  6.87it/s][A
[2m[36m(_objective pid=1812654)[0m 
 28%|██▊       | 9/32 [00:01<00:03,  6.77it/s][A


== Status ==
Current time: 2022-10-07 07:29:04 (running for 00:00:27.46)
Memory usage on this node: 12.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------|
| _objective_a80be_00000 | RUNNING  | 172.17.0.3:1812654 | 0.238963  | 1.73374e-05 |             32 |            2 |
| _objective_a80be_00001 | PENDING  |                    | 0.179598  | 1.62407e-05 |             32 |            3

[2m[36m(_objective pid=1812654)[0m 
 31%|███▏      | 10/32 [00:01<00:03,  6.69it/s][A
[2m[36m(_objective pid=1812654)[0m 
 34%|███▍      | 11/32 [00:01<00:03,  6.65it/s][A
[2m[36m(_objective pid=1812654)[0m 
 38%|███▊      | 12/32 [00:01<00:03,  6.62it/s][A
[2m[36m(_objective pid=1812654)[0m 
 41%|████      | 13/32 [00:01<00:02,  6.59it/s][A
[2m[36m(_objective pid=1812654)[0m 
 44%|████▍     | 14/32 [00:01<00:02,  6.57it/s][A
[2m[36m(_objective pid=1812654)[0m 
 47%|████▋     | 15/32 [00:02<00:02,  6.56it/s][A
[2m[36m(_objective pid=1812654)[0m 
 50%|█████     | 16/32 [00:02<00:02,  6.55it/s][A
[2m[36m(_objective pid=1812654)[0m 
 53%|█████▎    | 17/32 [00:02<00:02,  6.55it/s][A
[2m[36m(_objective pid=1812654)[0m 
 56%|█████▋    | 18/32 [00:02<00:02,  6.55it/s][A
[2m[36m(_objective pid=1812654)[0m 
 59%|█████▉    | 19/32 [00:02<00:01,  6.55it/s][A
[2m[36m(_objective pid=1812654)[0m 
 62%|██████▎   | 20/32 [00:02<00:01,  6.55it/s][A
[2m[36m(

[2m[36m(_objective pid=1812654)[0m {'eval_loss': 1.0783603191375732, 'eval_accuracy': 0.38, 'eval_runtime': 4.7968, 'eval_samples_per_second': 208.474, 'eval_steps_per_second': 6.671, 'epoch': 1.0}
== Status ==
Current time: 2022-10-07 07:29:09 (running for 00:00:32.46)
Memory usage on this node: 12.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------|
| _objective_a80be_00000 | RUN

 80%|████████  | 8/10 [00:27<00:06,  3.44s/it]
[2m[36m(pid=1812813)[0m 2022-10-07 07:29:13.892397: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:29:17 (running for 00:00:41.03)
Memory usage on this node: 14.2/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (1 PAUSED, 3 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 |

[2m[36m(_objective pid=1812813)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
[2m[36m(_objective pid=1812813)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1812813)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:29:22 (running for 00:00:46.05)
Memory usage on this node: 14.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (1 PAUSED, 3 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 |

 20%|██        | 2/10 [00:04<00:18,  2.32s/it]
 30%|███       | 3/10 [00:06<00:16,  2.31s/it]


== Status ==
Current time: 2022-10-07 07:29:27 (running for 00:00:51.05)
Memory usage on this node: 14.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (1 PAUSED, 3 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 |

 40%|████      | 4/10 [00:09<00:13,  2.31s/it]
 50%|█████     | 5/10 [00:11<00:11,  2.31s/it]


== Status ==
Current time: 2022-10-07 07:29:32 (running for 00:00:56.05)
Memory usage on this node: 14.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (1 PAUSED, 3 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 |

 60%|██████    | 6/10 [00:13<00:09,  2.32s/it]
 70%|███████   | 7/10 [00:16<00:06,  2.32s/it]


== Status ==
Current time: 2022-10-07 07:29:37 (running for 00:01:01.06)
Memory usage on this node: 14.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (1 PAUSED, 3 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 |

 80%|████████  | 8/10 [00:18<00:04,  2.18s/it]
[2m[36m(_objective pid=1812813)[0m 
  0%|          | 0/32 [00:00<?, ?it/s][A
[2m[36m(_objective pid=1812813)[0m 
  6%|▋         | 2/32 [00:00<00:02, 13.03it/s][A
[2m[36m(_objective pid=1812813)[0m 
 12%|█▎        | 4/32 [00:00<00:03,  8.20it/s][A
[2m[36m(_objective pid=1812813)[0m 
 16%|█▌        | 5/32 [00:00<00:03,  7.63it/s][A
[2m[36m(_objective pid=1812813)[0m 
 19%|█▉        | 6/32 [00:00<00:03,  7.26it/s][A
[2m[36m(_objective pid=1812813)[0m 
 22%|██▏       | 7/32 [00:00<00:03,  7.02it/s][A
[2m[36m(_objective pid=1812813)[0m 
 25%|██▌       | 8/32 [00:01<00:03,  6.86it/s][A
[2m[36m(_objective pid=1812813)[0m 
 28%|██▊       | 9/32 [00:01<00:03,  6.74it/s][A
[2m[36m(_objective pid=1812813)[0m 
 31%|███▏      | 10/32 [00:01<00:03,  6.67it/s][A
[2m[36m(_objective pid=1812813)[0m 
 34%|███▍      | 11/32 [00:01<00:03,  6.62it/s][A
[2m[36m(_objective pid=1812813)[0m 
 38%|███▊      | 12/32 [00:01<

== Status ==
Current time: 2022-10-07 07:29:42 (running for 00:01:06.07)
Memory usage on this node: 14.6/31.1 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (1 PAUSED, 3 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 |

[2m[36m(_objective pid=1812813)[0m 
 91%|█████████ | 29/32 [00:04<00:00,  6.48it/s][A
[2m[36m(_objective pid=1812813)[0m 
 94%|█████████▍| 30/32 [00:04<00:00,  6.49it/s][A
[2m[36m(_objective pid=1812813)[0m 
 97%|█████████▋| 31/32 [00:04<00:00,  6.48it/s][A
                                              
 80%|████████  | 8/10 [00:22<00:04,  2.18s/it] 
100%|██████████| 32/32 [00:04<00:00,  6.48it/s][A
                                               [A


[2m[36m(_objective pid=1812813)[0m {'eval_loss': 1.0746122598648071, 'eval_accuracy': 0.389, 'eval_runtime': 4.8193, 'eval_samples_per_second': 207.501, 'eval_steps_per_second': 6.64, 'epoch': 1.0}
Result for _objective_a80be_00001:
  date: 2022-10-07_07-29-44
  done: false
  epoch: 1.0
  eval_accuracy: 0.389
  eval_loss: 1.0746122598648071
  eval_runtime: 4.8193
  eval_samples_per_second: 207.501
  eval_steps_per_second: 6.64
  experiment_id: 3af4f1590ef54b5c98d3f1422095ddc2
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.389
  pid: 1812813
  should_checkpoint: true
  time_since_restore: 30.160179615020752
  time_this_iter_s: 30.160179615020752
  time_total_s: 30.160179615020752
  timestamp: 1665127784
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: a80be_00001
  warmup_time: 0.0017380714416503906
  
== Status ==
Current time: 2022-10-07 07:29:47 (running for 00:01:11.24)
Memory usage on this node: 20.5/31.1 GiB
Populatio

[2m[36m(pid=1813058)[0m 2022-10-07 07:30:01.221983: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:30:02 (running for 00:01:26.02)
Memory usage on this node: 15.6/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (2 PAUSED, 2 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 |

[2m[36m(_objective pid=1813058)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
[2m[36m(_objective pid=1813058)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1813058)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:30:12 (running for 00:01:36.11)
Memory usage on this node: 20.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (2 PAUSED, 2 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 |

[2m[36m(raylet)[0m Spilled 6785 MiB, 4 objects, write throughput 436 MiB/s. Set RAY_verbose_spill_logs=0 to disable this message.
[2m[36m(_objective pid=1813058)[0m   nn.utils.clip_grad_norm_(
 20%|██        | 2/10 [00:05<00:24,  3.05s/it]
 30%|███       | 3/10 [00:08<00:18,  2.71s/it]


== Status ==
Current time: 2022-10-07 07:30:17 (running for 00:01:41.12)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (2 PAUSED, 2 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 |

 40%|████      | 4/10 [00:10<00:15,  2.55s/it]
 50%|█████     | 5/10 [00:12<00:12,  2.47s/it]


== Status ==
Current time: 2022-10-07 07:30:22 (running for 00:01:46.12)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (2 PAUSED, 2 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 |

 60%|██████    | 6/10 [00:15<00:09,  2.42s/it]
 70%|███████   | 7/10 [00:17<00:07,  2.39s/it]


== Status ==
Current time: 2022-10-07 07:30:27 (running for 00:01:51.12)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (2 PAUSED, 2 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 |

 80%|████████  | 8/10 [00:19<00:04,  2.26s/it]
[2m[36m(_objective pid=1813058)[0m 
  0%|          | 0/32 [00:00<?, ?it/s][A
[2m[36m(_objective pid=1813058)[0m 
  6%|▋         | 2/32 [00:00<00:02, 13.06it/s][A
[2m[36m(_objective pid=1813058)[0m 
 12%|█▎        | 4/32 [00:00<00:03,  8.21it/s][A
[2m[36m(_objective pid=1813058)[0m 
 16%|█▌        | 5/32 [00:00<00:03,  7.63it/s][A
[2m[36m(_objective pid=1813058)[0m 
 19%|█▉        | 6/32 [00:00<00:03,  7.27it/s][A
[2m[36m(_objective pid=1813058)[0m 
 22%|██▏       | 7/32 [00:00<00:03,  7.04it/s][A
[2m[36m(_objective pid=1813058)[0m 
 25%|██▌       | 8/32 [00:01<00:03,  6.87it/s][A
[2m[36m(_objective pid=1813058)[0m 
 28%|██▊       | 9/32 [00:01<00:03,  6.77it/s][A
[2m[36m(_objective pid=1813058)[0m 
 31%|███▏      | 10/32 [00:01<00:03,  6.70it/s][A
[2m[36m(_objective pid=1813058)[0m 
 34%|███▍      | 11/32 [00:01<00:03,  6.65it/s][A
[2m[36m(_objective pid=1813058)[0m 
 38%|███▊      | 12/32 [00:01<

== Status ==
Current time: 2022-10-07 07:30:32 (running for 00:01:56.12)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (2 PAUSED, 2 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 |

[2m[36m(_objective pid=1813058)[0m 
 88%|████████▊ | 28/32 [00:04<00:00,  6.53it/s][A
[2m[36m(_objective pid=1813058)[0m 
 91%|█████████ | 29/32 [00:04<00:00,  6.53it/s][A
[2m[36m(_objective pid=1813058)[0m 
 94%|█████████▍| 30/32 [00:04<00:00,  6.53it/s][A
[2m[36m(_objective pid=1813058)[0m 
 97%|█████████▋| 31/32 [00:04<00:00,  6.53it/s][A
                                              
 80%|████████  | 8/10 [00:24<00:04,  2.26s/it] 
100%|██████████| 32/32 [00:04<00:00,  6.53it/s][A
                                               [A


[2m[36m(_objective pid=1813058)[0m {'eval_loss': 1.0850415229797363, 'eval_accuracy': 0.38, 'eval_runtime': 4.8142, 'eval_samples_per_second': 207.72, 'eval_steps_per_second': 6.647, 'epoch': 1.0}
Result for _objective_a80be_00002:
  date: 2022-10-07_07-30-35
  done: false
  epoch: 1.0
  eval_accuracy: 0.38
  eval_loss: 1.0850415229797363
  eval_runtime: 4.8142
  eval_samples_per_second: 207.72
  eval_steps_per_second: 6.647
  experiment_id: 62198dbcb6a34414904527cb83b8ed3a
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.38
  pid: 1813058
  should_checkpoint: true
  time_since_restore: 32.43216133117676
  time_this_iter_s: 32.43216133117676
  time_total_s: 32.43216133117676
  timestamp: 1665127835
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: a80be_00002
  warmup_time: 0.002826690673828125
  


 80%|████████  | 8/10 [00:28<00:07,  3.59s/it]
[2m[36m(raylet)[0m Spilled 8821 MiB, 5 objects, write throughput 520 MiB/s.
[2m[36m(pid=1813260)[0m 2022-10-07 07:30:41.353651: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:30:38 (running for 00:02:02.01)
Memory usage on this node: 14.0/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (3 PAUSED, 1 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 |

[2m[36m(_objective pid=1813260)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight']
[2m[36m(_objective pid=1813260)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1813260)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:30:52 (running for 00:02:15.90)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (3 PAUSED, 1 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 |

 20%|██        | 2/10 [00:04<00:18,  2.31s/it]
 30%|███       | 3/10 [00:06<00:16,  2.31s/it]
 40%|████      | 4/10 [00:09<00:13,  2.31s/it]


== Status ==
Current time: 2022-10-07 07:30:57 (running for 00:02:21.08)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (3 PAUSED, 1 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 |

 50%|█████     | 5/10 [00:11<00:11,  2.33s/it]
 60%|██████    | 6/10 [00:13<00:09,  2.33s/it]


== Status ==
Current time: 2022-10-07 07:31:02 (running for 00:02:26.09)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (3 PAUSED, 1 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 |

 70%|███████   | 7/10 [00:16<00:06,  2.32s/it]
 80%|████████  | 8/10 [00:18<00:04,  2.19s/it]
[2m[36m(_objective pid=1813260)[0m 
  0%|          | 0/32 [00:00<?, ?it/s][A
[2m[36m(_objective pid=1813260)[0m 
  6%|▋         | 2/32 [00:00<00:02, 13.06it/s][A
[2m[36m(_objective pid=1813260)[0m 
 12%|█▎        | 4/32 [00:00<00:03,  8.22it/s][A
[2m[36m(_objective pid=1813260)[0m 
 16%|█▌        | 5/32 [00:00<00:03,  7.64it/s][A
[2m[36m(_objective pid=1813260)[0m 
 19%|█▉        | 6/32 [00:00<00:03,  7.28it/s][A
[2m[36m(_objective pid=1813260)[0m 
 22%|██▏       | 7/32 [00:00<00:03,  7.04it/s][A
[2m[36m(_objective pid=1813260)[0m 
 25%|██▌       | 8/32 [00:01<00:03,  6.88it/s][A
[2m[36m(_objective pid=1813260)[0m 
 28%|██▊       | 9/32 [00:01<00:03,  6.77it/s][A
[2m[36m(_objective pid=1813260)[0m 
 31%|███▏      | 10/32 [00:01<00:03,  6.70it/s][A


== Status ==
Current time: 2022-10-07 07:31:07 (running for 00:02:31.09)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (3 PAUSED, 1 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 |

[2m[36m(_objective pid=1813260)[0m 
 34%|███▍      | 11/32 [00:01<00:03,  6.64it/s][A
[2m[36m(_objective pid=1813260)[0m 
 38%|███▊      | 12/32 [00:01<00:03,  6.61it/s][A
[2m[36m(_objective pid=1813260)[0m 
 41%|████      | 13/32 [00:01<00:02,  6.58it/s][A
[2m[36m(_objective pid=1813260)[0m 
 44%|████▍     | 14/32 [00:01<00:02,  6.57it/s][A
[2m[36m(_objective pid=1813260)[0m 
 47%|████▋     | 15/32 [00:02<00:02,  6.56it/s][A
[2m[36m(_objective pid=1813260)[0m 
 50%|█████     | 16/32 [00:02<00:02,  6.55it/s][A
[2m[36m(_objective pid=1813260)[0m 
 53%|█████▎    | 17/32 [00:02<00:02,  6.54it/s][A
[2m[36m(_objective pid=1813260)[0m 
 56%|█████▋    | 18/32 [00:02<00:02,  6.54it/s][A
[2m[36m(_objective pid=1813260)[0m 
 59%|█████▉    | 19/32 [00:02<00:01,  6.53it/s][A
[2m[36m(_objective pid=1813260)[0m 
 62%|██████▎   | 20/32 [00:02<00:01,  6.54it/s][A
[2m[36m(_objective pid=1813260)[0m 
 66%|██████▌   | 21/32 [00:03<00:01,  6.53it/s][A
[2m[36m(

[2m[36m(_objective pid=1813260)[0m {'eval_loss': 1.0755645036697388, 'eval_accuracy': 0.379, 'eval_runtime': 4.8054, 'eval_samples_per_second': 208.101, 'eval_steps_per_second': 6.659, 'epoch': 1.0}


2022-10-07 07:31:12,796	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00001 (score 0.389) -> _objective_a80be_00003 (score 0.379)
2022-10-07 07:31:12,796	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.17959754525911098, 'learning_rate': 1.624074561769746e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05}


== Status ==
Current time: 2022-10-07 07:31:12 (running for 00:02:36.09)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 0 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (3 PAUSED, 1 PENDING, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 |

[2m[36m(pid=1813429)[0m 2022-10-07 07:31:14.904894: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:31:18 (running for 00:02:42.01)
Memory usage on this node: 18.3/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

[2m[36m(_objective pid=1813429)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias']
[2m[36m(_objective pid=1813429)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1813429)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:31:23 (running for 00:02:47.01)
Memory usage on this node: 18.6/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

 20%|██        | 2/10 [00:04<00:18,  2.33s/it]
 30%|███       | 3/10 [00:07<00:16,  2.39s/it]


== Status ==
Current time: 2022-10-07 07:31:28 (running for 00:02:52.12)
Memory usage on this node: 18.6/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

 40%|████      | 4/10 [00:09<00:14,  2.36s/it]
 50%|█████     | 5/10 [00:11<00:11,  2.34s/it]


== Status ==
Current time: 2022-10-07 07:31:33 (running for 00:02:57.14)
Memory usage on this node: 18.6/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

 60%|██████    | 6/10 [00:14<00:09,  2.33s/it]
 70%|███████   | 7/10 [00:16<00:06,  2.33s/it]


== Status ==
Current time: 2022-10-07 07:31:38 (running for 00:03:02.14)
Memory usage on this node: 18.6/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

 80%|████████  | 8/10 [00:18<00:04,  2.19s/it]
[2m[36m(_objective pid=1813429)[0m 
  0%|          | 0/32 [00:00<?, ?it/s][A
[2m[36m(_objective pid=1813429)[0m 
  6%|▋         | 2/32 [00:00<00:02, 13.01it/s][A
[2m[36m(_objective pid=1813429)[0m 
 12%|█▎        | 4/32 [00:00<00:03,  8.21it/s][A
[2m[36m(_objective pid=1813429)[0m 
 16%|█▌        | 5/32 [00:00<00:03,  7.62it/s][A
[2m[36m(_objective pid=1813429)[0m 
 19%|█▉        | 6/32 [00:00<00:03,  7.26it/s][A
[2m[36m(_objective pid=1813429)[0m 
 22%|██▏       | 7/32 [00:00<00:03,  7.02it/s][A
[2m[36m(_objective pid=1813429)[0m 
 25%|██▌       | 8/32 [00:01<00:03,  6.86it/s][A
[2m[36m(_objective pid=1813429)[0m 
 28%|██▊       | 9/32 [00:01<00:03,  6.76it/s][A
[2m[36m(_objective pid=1813429)[0m 
 31%|███▏      | 10/32 [00:01<00:03,  6.69it/s][A
[2m[36m(_objective pid=1813429)[0m 
 34%|███▍      | 11/32 [00:01<00:03,  6.64it/s][A
[2m[36m(_objective pid=1813429)[0m 
 38%|███▊      | 12/32 [00:01<

== Status ==
Current time: 2022-10-07 07:31:43 (running for 00:03:07.14)
Memory usage on this node: 18.6/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

[2m[36m(_objective pid=1813429)[0m 
                                              ][A
 80%|████████  | 8/10 [00:23<00:04,  2.19s/it] 
100%|██████████| 32/32 [00:04<00:00,  6.51it/s][A
                                               [A
2022-10-07 07:31:45,739	INFO pbt.py:552 -- [pbt]: no checkpoint for trial. Skip exploit for Trial _objective_a80be_00004


Result for _objective_a80be_00004:
  date: 2022-10-07_07-31-45
  done: false
  epoch: 1.0
  eval_accuracy: 0.379
  eval_loss: 1.0759917497634888
  eval_runtime: 4.8131
  eval_samples_per_second: 207.766
  eval_steps_per_second: 6.649
  experiment_id: fbd05b59af01444db39ea6347f5d7ba0
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.379
  pid: 1813429
  should_checkpoint: true
  time_since_restore: 29.88500714302063
  time_this_iter_s: 29.88500714302063
  time_total_s: 29.88500714302063
  timestamp: 1665127905
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: a80be_00004
  warmup_time: 0.001744985580444336
  


 80%|████████  | 8/10 [00:27<00:06,  3.44s/it]
[2m[36m(raylet)[0m Spilled 10856 MiB, 6 objects, write throughput 587 MiB/s.
[2m[36m(pid=1813641)[0m 2022-10-07 07:31:51.422365: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:31:53 (running for 00:03:17.02)
Memory usage on this node: 14.6/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING  |

[2m[36m(_objective pid=1813641)[0m 2022-10-07 07:31:55,629	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp1d75bd
[2m[36m(_objective pid=1813641)[0m 2022-10-07 07:31:55,630	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 30.082107543945312, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:31:58 (running for 00:03:22.12)
Memory usage on this node: 18.7/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING  |

[2m[36m(_objective pid=1813641)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']
[2m[36m(_objective pid=1813641)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1813641)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:32:03 (running for 00:03:27.12)
Memory usage on this node: 18.8/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING  |

[2m[36m(_objective pid=1813641)[0m 
 90%|█████████ | 9/10 [00:02<00:00,  3.85it/s][A
[2m[36m(_objective pid=1813641)[0m 
100%|██████████| 10/10 [00:04<00:00,  1.85it/s][A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.10it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.19it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.57it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.23it/s]
 22%|██▏       | 7/32 [00:00<00:03,  6.98it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.83it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.72it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.66it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.59it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.57it/s]
 41%|████      | 13/32 [00:01<00:02,  6.53it/s]
 44%|████▍     | 14/32 [00:02<00:02,  6.54it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.51it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.52it/s]


== Status ==
Current time: 2022-10-07 07:32:08 (running for 00:03:32.12)
Memory usage on this node: 18.8/31.1 GiB
PopulationBasedTraining: 1 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING  |

 53%|█████▎    | 17/32 [00:02<00:02,  6.48it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.50it/s]
 59%|█████▉    | 19/32 [00:02<00:02,  6.49it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.51it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.52it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.52it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.52it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.52it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.51it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.52it/s]
 84%|████████▍ | 27/32 [00:04<00:00,  6.52it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.52it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.53it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.50it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.51it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.51it/s]
                                               [A


[2m[36m(_objective pid=1813641)[0m {'eval_loss': 1.0765976905822754, 'eval_accuracy': 0.381, 'eval_runtime': 4.8189, 'eval_samples_per_second': 207.517, 'eval_steps_per_second': 6.641, 'epoch': 1.25}
Result for _objective_a80be_00000:
  date: 2022-10-07_07-32-12
  done: false
  epoch: 1.25
  eval_accuracy: 0.381
  eval_loss: 1.0765976905822754
  eval_runtime: 4.8189
  eval_samples_per_second: 207.517
  eval_steps_per_second: 6.641
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.381
  pid: 1813641
  should_checkpoint: true
  time_since_restore: 17.24718403816223
  time_this_iter_s: 17.24718403816223
  time_total_s: 47.329291582107544
  timestamp: 1665127932
  timesteps_since_restore: 0
  training_iteration: 2
  trial_id: a80be_00000
  warmup_time: 3.345555543899536
  
== Status ==
Current time: 2022-10-07 07:32:16 (running for 00:03:40.34)
Memory usage on this node: 22.5/31.1 GiB
PopulationB

100%|██████████| 10/10 [00:27<00:00,  2.76s/it]
[2m[36m(pid=1813875)[0m 2022-10-07 07:32:32.987869: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:32:34 (running for 00:03:58.17)
Memory usage on this node: 12.7/31.1 GiB
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 | RUNNING  |

[2m[36m(_objective pid=1813875)[0m 2022-10-07 07:32:36,988	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00001_1_num_train_epochs=3_2022-10-07_07-29-12/checkpoint_tmpf8c6ec
[2m[36m(_objective pid=1813875)[0m 2022-10-07 07:32:36,989	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 30.160179615020752, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:32:39 (running for 00:04:03.17)
Memory usage on this node: 16.8/31.1 GiB
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 | RUNNING  |

[2m[36m(_objective pid=1813875)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias']
[2m[36m(_objective pid=1813875)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1813875)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:32:44 (running for 00:04:08.17)
Memory usage on this node: 16.9/31.1 GiB
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 | RUNNING  |

[2m[36m(_objective pid=1813875)[0m 
 90%|█████████ | 9/10 [00:02<00:00,  3.79it/s][A
[2m[36m(_objective pid=1813875)[0m 
100%|██████████| 10/10 [00:04<00:00,  1.84it/s][A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.06it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.15it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.55it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.19it/s]
 22%|██▏       | 7/32 [00:00<00:03,  6.97it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.83it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.74it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.66it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.62it/s]


== Status ==
Current time: 2022-10-07 07:32:49 (running for 00:04:13.18)
Memory usage on this node: 16.9/31.1 GiB
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 | RUNNING  |

 38%|███▊      | 12/32 [00:01<00:03,  6.60it/s]
 41%|████      | 13/32 [00:01<00:02,  6.57it/s]
 44%|████▍     | 14/32 [00:02<00:02,  6.56it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.53it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.54it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.51it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.52it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.51it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.51it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.51it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.53it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.52it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.53it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.52it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.52it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.52it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.53it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.53it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.53it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.54it/s]
                                        

[2m[36m(_objective pid=1813875)[0m {'eval_loss': 1.072451114654541, 'eval_accuracy': 0.394, 'eval_runtime': 4.8195, 'eval_samples_per_second': 207.489, 'eval_steps_per_second': 6.64, 'epoch': 1.25}
Result for _objective_a80be_00001:
  date: 2022-10-07_07-32-54
  done: false
  epoch: 1.25
  eval_accuracy: 0.394
  eval_loss: 1.072451114654541
  eval_runtime: 4.8195
  eval_samples_per_second: 207.489
  eval_steps_per_second: 6.64
  experiment_id: 3af4f1590ef54b5c98d3f1422095ddc2
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.394
  pid: 1813875
  should_checkpoint: true
  time_since_restore: 17.591538667678833
  time_this_iter_s: 17.591538667678833
  time_total_s: 47.751718282699585
  timestamp: 1665127974
  timesteps_since_restore: 0
  training_iteration: 2
  trial_id: a80be_00001
  warmup_time: 2.6674466133117676
  
== Status ==
Current time: 2022-10-07 07:32:57 (running for 00:04:20.61)
Memory usage on this node: 20.9/31.1 GiB
PopulationBa

[2m[36m(raylet)[0m Spilled 16963 MiB, 9 objects, write throughput 649 MiB/s.
100%|██████████| 10/10 [00:18<00:00,  1.82s/it]
[2m[36m(pid=1814043)[0m 2022-10-07 07:33:04.655922: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:33:06 (running for 00:04:30.03)
Memory usage on this node: 13.2/31.1 GiB
PopulationBasedTraining: 3 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 | RUNNING  |

[2m[36m(_objective pid=1814043)[0m 2022-10-07 07:33:08,394	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00002_2_num_train_epochs=2_2022-10-07_07-29-57/checkpoint_tmpd3bc69
[2m[36m(_objective pid=1814043)[0m 2022-10-07 07:33:08,395	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 32.43216133117676, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:33:11 (running for 00:04:35.04)
Memory usage on this node: 16.5/31.1 GiB
PopulationBasedTraining: 3 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 | RUNNING  |

[2m[36m(_objective pid=1814043)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight']
[2m[36m(_objective pid=1814043)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1814043)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:33:16 (running for 00:04:40.05)
Memory usage on this node: 16.8/31.1 GiB
PopulationBasedTraining: 3 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 | RUNNING  |

[2m[36m(_objective pid=1814043)[0m 
100%|██████████| 10/10 [00:04<00:00,  1.86it/s][A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.16it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.28it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.69it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.33it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.07it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.92it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.81it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.74it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.69it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.65it/s]
 41%|████      | 13/32 [00:01<00:02,  6.63it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.62it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.60it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.59it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.59it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.58it/s]


== Status ==
Current time: 2022-10-07 07:33:21 (running for 00:04:45.06)
Memory usage on this node: 16.8/31.1 GiB
PopulationBasedTraining: 3 checkpoints, 1 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 | RUNNING  |

 59%|█████▉    | 19/32 [00:02<00:01,  6.58it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.57it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.56it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.53it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.54it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.52it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.53it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.55it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.56it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.53it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.55it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.53it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.53it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.53it/s]
                                               [A


[2m[36m(_objective pid=1814043)[0m {'eval_loss': 1.0828750133514404, 'eval_accuracy': 0.379, 'eval_runtime': 4.7864, 'eval_samples_per_second': 208.924, 'eval_steps_per_second': 6.686, 'epoch': 1.25}


2022-10-07 07:33:25,614	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00001 (score 0.394) -> _objective_a80be_00002 (score 0.379)
2022-10-07 07:33:25,627	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.17959754525911098, 'learning_rate': 1.624074561769746e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.21551705431093318, 'learning_rate': 4.754210836063001e-05}
100%|██████████| 10/10 [00:11<00:00,  1.14s/it]


Result for _objective_a80be_00002:
  date: 2022-10-07_07-33-25
  done: false
  epoch: 1.25
  eval_accuracy: 0.379
  eval_loss: 1.0828750133514404
  eval_runtime: 4.7864
  eval_samples_per_second: 208.924
  eval_steps_per_second: 6.686
  experiment_id: 62198dbcb6a34414904527cb83b8ed3a
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.379
  pid: 1814043
  should_checkpoint: true
  time_since_restore: 17.10784912109375
  time_this_iter_s: 17.10784912109375
  time_total_s: 49.54001045227051
  timestamp: 1665128005
  timesteps_since_restore: 0
  training_iteration: 2
  trial_id: a80be_00002
  warmup_time: 2.6671197414398193
  


[2m[36m(pid=1814199)[0m 2022-10-07 07:33:28.237142: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:33:31 (running for 00:04:55.03)
Memory usage on this node: 16.6/31.1 GiB
PopulationBasedTraining: 3 checkpoints, 2 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 | RUNNING  |

[2m[36m(_objective pid=1814199)[0m 2022-10-07 07:33:31,850	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00003_3_num_train_epochs=2_2022-10-07_07-30-38/checkpoint_tmp33d1f2
[2m[36m(_objective pid=1814199)[0m 2022-10-07 07:33:31,850	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 30.160179615020752, '_episodes_total': None}
[2m[36m(_objective pid=1814199)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight']
[2m[36m(_objective pid=1814

== Status ==
Current time: 2022-10-07 07:33:36 (running for 00:05:00.03)
Memory usage on this node: 17.2/31.1 GiB
PopulationBasedTraining: 3 checkpoints, 2 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 | RUNNING  |

Skipping the first batches: : 0it [00:00, ?it/s]
[2m[36m(_objective pid=1814199)[0m 
Skipping the first batches: : 0it [00:00, ?it/s]
[2m[36m(_objective pid=1814199)[0m 
 90%|█████████ | 9/10 [00:02<00:00,  3.85it/s][A


== Status ==
Current time: 2022-10-07 07:33:41 (running for 00:05:05.03)
Memory usage on this node: 16.8/31.1 GiB
PopulationBasedTraining: 3 checkpoints, 2 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 | RUNNING  |

[2m[36m(_objective pid=1814199)[0m 
100%|██████████| 10/10 [00:04<00:00,  1.85it/s][A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.14it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.28it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.69it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.32it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.07it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.92it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.81it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.73it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.68it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.64it/s]
 41%|████      | 13/32 [00:01<00:02,  6.62it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.58it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.57it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.55it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.56it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.52it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.54it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.53it/s]
 66%|██████▌  

== Status ==
Current time: 2022-10-07 07:33:46 (running for 00:05:10.04)
Memory usage on this node: 16.8/31.1 GiB
PopulationBasedTraining: 3 checkpoints, 2 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 | RUNNING  |

 88%|████████▊ | 28/32 [00:04<00:00,  6.55it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.55it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.55it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.54it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.54it/s]
                                               [A


[2m[36m(_objective pid=1814199)[0m {'eval_loss': 1.072451114654541, 'eval_accuracy': 0.394, 'eval_runtime': 4.7833, 'eval_samples_per_second': 209.059, 'eval_steps_per_second': 6.69, 'epoch': 1.25}
Result for _objective_a80be_00003:
  date: 2022-10-07_07-33-49
  done: false
  epoch: 1.25
  eval_accuracy: 0.394
  eval_loss: 1.072451114654541
  eval_runtime: 4.7833
  eval_samples_per_second: 209.059
  eval_steps_per_second: 6.69
  experiment_id: 3af4f1590ef54b5c98d3f1422095ddc2
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.394
  pid: 1814199
  should_checkpoint: true
  time_since_restore: 17.15222477912903
  time_this_iter_s: 17.15222477912903
  time_total_s: 47.31240439414978
  timestamp: 1665128029
  timesteps_since_restore: 0
  training_iteration: 2
  trial_id: a80be_00003
  warmup_time: 2.8040571212768555
  
== Status ==
Current time: 2022-10-07 07:33:56 (running for 00:05:19.99)
Memory usage on this node: 24.4/31.1 GiB
PopulationBased

100%|██████████| 10/10 [00:23<00:00,  2.39s/it]


== Status ==
Current time: 2022-10-07 07:34:01 (running for 00:05:25.32)
Memory usage on this node: 16.0/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 2 perturbs
Resources requested: 0/20 CPUs, 0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (5 PAUSED)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | PAUSED   | 172.17.0.3:1813

[2m[36m(pid=1814376)[0m 2022-10-07 07:34:06.613999: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:34:07 (running for 00:05:31.03)
Memory usage on this node: 11.8/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 2 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

[2m[36m(_objective pid=1814376)[0m 2022-10-07 07:34:10,396	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00004_4_num_train_epochs=3_2022-10-07_07-31-13/checkpoint_tmp1ae931
[2m[36m(_objective pid=1814376)[0m 2022-10-07 07:34:10,399	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 1, '_timesteps_total': None, '_time_total': 29.88500714302063, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:34:12 (running for 00:05:36.04)
Memory usage on this node: 15.1/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 2 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

[2m[36m(_objective pid=1814376)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
[2m[36m(_objective pid=1814376)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1814376)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:34:17 (running for 00:05:41.04)
Memory usage on this node: 16.0/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 2 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

[2m[36m(_objective pid=1814376)[0m 
 90%|█████████ | 9/10 [00:02<00:00,  3.82it/s][A
[2m[36m(_objective pid=1814376)[0m 
100%|██████████| 10/10 [00:04<00:00,  1.84it/s][A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.00it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.22it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.63it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.27it/s]
 22%|██▏       | 7/32 [00:00<00:03,  6.91it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.68it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.53it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.44it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.37it/s]


== Status ==
Current time: 2022-10-07 07:34:22 (running for 00:05:46.04)
Memory usage on this node: 16.0/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 2 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

 38%|███▊      | 12/32 [00:01<00:03,  6.33it/s]
 41%|████      | 13/32 [00:01<00:02,  6.39it/s]
 44%|████▍     | 14/32 [00:02<00:02,  6.33it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.39it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.44it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.47it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.36it/s]
 59%|█████▉    | 19/32 [00:02<00:02,  6.42it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.44it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.36it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.40it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.35it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.31it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.28it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.36it/s]
 84%|████████▍ | 27/32 [00:04<00:00,  6.31it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.29it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.34it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.41it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1814376)[0m {'eval_loss': 1.0730141401290894, 'eval_accuracy': 0.379, 'eval_runtime': 4.9134, 'eval_samples_per_second': 203.524, 'eval_steps_per_second': 6.513, 'epoch': 1.25}


2022-10-07 07:34:27,708	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00001 (score 0.394) -> _objective_a80be_00004 (score 0.379)
2022-10-07 07:34:27,708	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.17959754525911098, 'learning_rate': 1.624074561769746e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05}


== Status ==
Current time: 2022-10-07 07:34:27 (running for 00:05:51.04)
Memory usage on this node: 16.0/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 2 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |   w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-----------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING  |

100%|██████████| 10/10 [00:11<00:00,  1.15s/it]
[2m[36m(pid=1814524)[0m 2022-10-07 07:34:29.957872: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1814524)[0m 2022-10-07 07:34:33,617	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp0e8c2c
[2m[36m(_objective pid=1814524)[0m 2022-10-07 07:34:33,617	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 2, '_timesteps_total': None, '_time_total': 47.329291582107544, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:34:33 (running for 00:05:57.05)
Memory usage on this node: 13.0/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

[2m[36m(_objective pid=1814524)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias']
[2m[36m(_objective pid=1814524)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1814524)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:34:38 (running for 00:06:02.06)
Memory usage on this node: 16.2/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

Skipping the first batches:   0%|          | 0/8 [00:00<?, ?it/s]
[2m[36m(_objective pid=1814524)[0m 
Skipping the first batches:  12%|█▎        | 1/8 [00:00<00:01,  3.62it/s]
Skipping the first batches: 100%|██████████| 8/8 [00:00<00:00, 21.43it/s]
[2m[36m(_objective pid=1814524)[0m 
11it [00:02,  4.53it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.09it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.22it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.65it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.28it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.05it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.89it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.77it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.70it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.66it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.62it/s]


== Status ==
Current time: 2022-10-07 07:34:43 (running for 00:06:07.07)
Memory usage on this node: 16.0/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

 41%|████      | 13/32 [00:01<00:02,  6.60it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.58it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.57it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.56it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.55it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.55it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.55it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.55it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.54it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.55it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.54it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.54it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.54it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.54it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.53it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.53it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.51it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.51it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.50it/s]
                                        

[2m[36m(_objective pid=1814524)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.795, 'eval_samples_per_second': 208.552, 'eval_steps_per_second': 6.674, 'epoch': 1.38}
Result for _objective_a80be_00000:
  date: 2022-10-07_07-34-48
  done: false
  epoch: 1.38
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.795
  eval_samples_per_second: 208.552
  eval_steps_per_second: 6.674
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1814524
  should_checkpoint: true
  time_since_restore: 14.747128248214722
  time_this_iter_s: 14.747128248214722
  time_total_s: 62.076419830322266
  timestamp: 1665128088
  timesteps_since_restore: 0
  training_iteration: 3
  trial_id: a80be_00000
  warmup_time: 2.8180906772613525
  


11it [00:11,  1.06s/it]


== Status ==
Current time: 2022-10-07 07:34:51 (running for 00:06:15.05)
Memory usage on this node: 11.2/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 | RUNNING

[2m[36m(pid=1814671)[0m 2022-10-07 07:34:54.316863: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:34:56 (running for 00:06:20.06)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 | RUNNING

[2m[36m(_objective pid=1814671)[0m 2022-10-07 07:34:57,852	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00001_1_num_train_epochs=3_2022-10-07_07-29-12/checkpoint_tmpf09b28
[2m[36m(_objective pid=1814671)[0m 2022-10-07 07:34:57,854	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 2, '_timesteps_total': None, '_time_total': 47.751718282699585, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:35:01 (running for 00:06:25.06)
Memory usage on this node: 15.5/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 | RUNNING

[2m[36m(_objective pid=1814671)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
[2m[36m(_objective pid=1814671)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1814671)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:35:06 (running for 00:06:30.07)
Memory usage on this node: 15.9/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00001 | RUNNING

[2m[36m(_objective pid=1814671)[0m 
11it [00:02,  4.52it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.11it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.25it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.67it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.30it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.05it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.90it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.79it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.72it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.67it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.63it/s]
 41%|████      | 13/32 [00:01<00:02,  6.61it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.59it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.57it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.57it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.54it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.54it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.55it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.55it/s]
 66%|██████▌   | 21/32 

[2m[36m(_objective pid=1814671)[0m {'eval_loss': 1.072451114654541, 'eval_accuracy': 0.394, 'eval_runtime': 4.7857, 'eval_samples_per_second': 208.956, 'eval_steps_per_second': 6.687, 'epoch': 1.38}
== Status ==
Current time: 2022-10-07 07:35:11 (running for 00:06:35.07)
Memory usage on this node: 16.4/31.1 GiB
PopulationBasedTraining: 4 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|--------------

11it [00:19,  1.77s/it]d=1814671)[0m 
[2m[36m(pid=1814858)[0m 2022-10-07 07:35:28.750923: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:35:29 (running for 00:06:53.18)
Memory usage on this node: 10.6/31.1 GiB
PopulationBasedTraining: 5 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 | RUNNING

[2m[36m(_objective pid=1814858)[0m 2022-10-07 07:35:32,548	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00002_2_num_train_epochs=2_2022-10-07_07-29-57/checkpoint_tmp4cf4f0
[2m[36m(_objective pid=1814858)[0m 2022-10-07 07:35:32,548	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 2, '_timesteps_total': None, '_time_total': 47.751718282699585, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:35:34 (running for 00:06:58.19)
Memory usage on this node: 13.8/31.1 GiB
PopulationBasedTraining: 5 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 | RUNNING

[2m[36m(_objective pid=1814858)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight']
[2m[36m(_objective pid=1814858)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1814858)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:35:39 (running for 00:07:03.19)
Memory usage on this node: 14.8/31.1 GiB
PopulationBasedTraining: 5 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 | RUNNING

[2m[36m(_objective pid=1814858)[0m 
11it [00:02,  4.49it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.07it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.24it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.66it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.29it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.07it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.90it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.77it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.71it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.67it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.64it/s]
 41%|████      | 13/32 [00:01<00:02,  6.63it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.62it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.61it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.60it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.59it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.48it/s]
 59%|█████▉    | 19/32 [00:02<00:02,  6.41it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.36it/s]


== Status ==
Current time: 2022-10-07 07:35:44 (running for 00:07:08.19)
Memory usage on this node: 14.8/31.1 GiB
PopulationBasedTraining: 5 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00002 | RUNNING

 66%|██████▌   | 21/32 [00:03<00:01,  6.43it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.48it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.39it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.45it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.36it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.33it/s]
 84%|████████▍ | 27/32 [00:04<00:00,  6.41it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.46it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.48it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.51it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.50it/s]
                                               


[2m[36m(_objective pid=1814858)[0m {'eval_loss': 1.072451114654541, 'eval_accuracy': 0.394, 'eval_runtime': 4.8253, 'eval_samples_per_second': 207.24, 'eval_steps_per_second': 6.632, 'epoch': 1.38}
Result for _objective_a80be_00002:
  date: 2022-10-07_07-35-48
  done: false
  epoch: 1.38
  eval_accuracy: 0.394
  eval_loss: 1.072451114654541
  eval_runtime: 4.8253
  eval_samples_per_second: 207.24
  eval_steps_per_second: 6.632
  experiment_id: 3af4f1590ef54b5c98d3f1422095ddc2
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.394
  pid: 1814858
  should_checkpoint: true
  time_since_restore: 15.903628826141357
  time_this_iter_s: 15.903628826141357
  time_total_s: 63.65534710884094
  timestamp: 1665128148
  timesteps_since_restore: 0
  training_iteration: 3
  trial_id: a80be_00002
  warmup_time: 2.6586031913757324
  
== Status ==
Current time: 2022-10-07 07:35:51 (running for 00:07:14.49)
Memory usage on this node: 18.8/31.1 GiB
PopulationBas

11it [00:18,  1.64s/it]d=1814858)[0m 
[2m[36m(raylet)[0m Spilled 33248 MiB, 17 objects, write throughput 803 MiB/s.
[2m[36m(pid=1815031)[0m 2022-10-07 07:36:01.262660: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:36:02 (running for 00:07:26.17)
Memory usage on this node: 10.1/31.1 GiB
PopulationBasedTraining: 6 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 | RUNNING

[2m[36m(_objective pid=1815031)[0m 2022-10-07 07:36:05,072	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00003_3_num_train_epochs=2_2022-10-07_07-30-38/checkpoint_tmp758737
[2m[36m(_objective pid=1815031)[0m 2022-10-07 07:36:05,072	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 2, '_timesteps_total': None, '_time_total': 47.31240439414978, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:36:07 (running for 00:07:31.18)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 6 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 | RUNNING

[2m[36m(_objective pid=1815031)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight']
[2m[36m(_objective pid=1815031)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1815031)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:36:12 (running for 00:07:36.18)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 6 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 | RUNNING

[2m[36m(_objective pid=1815031)[0m 
11it [00:02,  4.52it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.15it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.19it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.64it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.27it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.06it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.89it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.80it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.69it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.66it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.60it/s]
 41%|████      | 13/32 [00:01<00:02,  6.59it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.56it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.57it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.55it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.55it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.54it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.56it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.54it/s]
 66%|██████▌   | 21/32 

== Status ==
Current time: 2022-10-07 07:36:17 (running for 00:07:41.18)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 6 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00003 | RUNNING

 81%|████████▏ | 26/32 [00:03<00:00,  6.56it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.56it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.56it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.57it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.56it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.56it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.56it/s]
                                               


[2m[36m(_objective pid=1815031)[0m {'eval_loss': 1.072451114654541, 'eval_accuracy': 0.394, 'eval_runtime': 4.7893, 'eval_samples_per_second': 208.8, 'eval_steps_per_second': 6.682, 'epoch': 1.38}
Result for _objective_a80be_00003:
  date: 2022-10-07_07-36-20
  done: false
  epoch: 1.38
  eval_accuracy: 0.394
  eval_loss: 1.072451114654541
  eval_runtime: 4.7893
  eval_samples_per_second: 208.8
  eval_steps_per_second: 6.682
  experiment_id: 3af4f1590ef54b5c98d3f1422095ddc2
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.394
  pid: 1815031
  should_checkpoint: true
  time_since_restore: 15.451422452926636
  time_this_iter_s: 15.451422452926636
  time_total_s: 62.763826847076416
  timestamp: 1665128180
  timesteps_since_restore: 0
  training_iteration: 3
  trial_id: a80be_00003
  warmup_time: 2.7080066204071045
  
== Status ==
Current time: 2022-10-07 07:36:23 (running for 00:07:46.52)
Memory usage on this node: 18.3/31.1 GiB
PopulationBase

11it [00:15,  1.37s/it]d=1815031)[0m 
[2m[36m(pid=1815193)[0m 2022-10-07 07:36:30.574801: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:36:31 (running for 00:07:55.21)
Memory usage on this node: 10.1/31.1 GiB
PopulationBasedTraining: 7 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING

[2m[36m(_objective pid=1815193)[0m 2022-10-07 07:36:34,311	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00004_4_num_train_epochs=3_2022-10-07_07-31-13/checkpoint_tmp00be8d
[2m[36m(_objective pid=1815193)[0m 2022-10-07 07:36:34,312	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 2, '_timesteps_total': None, '_time_total': 47.751718282699585, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:36:36 (running for 00:08:00.21)
Memory usage on this node: 13.5/31.1 GiB
PopulationBasedTraining: 7 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING

[2m[36m(_objective pid=1815193)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']
[2m[36m(_objective pid=1815193)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1815193)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:36:41 (running for 00:08:05.22)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 7 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING

[2m[36m(_objective pid=1815193)[0m 
11it [00:02,  4.56it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.15it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.29it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.70it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.33it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.10it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.94it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.84it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.76it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.71it/s]
 38%|███▊      | 12/32 [00:01<00:02,  6.67it/s]
 41%|████      | 13/32 [00:01<00:02,  6.65it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.63it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.62it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.60it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.60it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.60it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.57it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.58it/s]
 66%|██████▌   | 21/32 

== Status ==
Current time: 2022-10-07 07:36:46 (running for 00:08:10.22)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 7 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00004 | RUNNING

 78%|███████▊  | 25/32 [00:03<00:01,  6.53it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.54it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.54it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.56it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.53it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.55it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.53it/s]
                                               


[2m[36m(_objective pid=1815193)[0m {'eval_loss': 1.072451114654541, 'eval_accuracy': 0.394, 'eval_runtime': 4.7711, 'eval_samples_per_second': 209.593, 'eval_steps_per_second': 6.707, 'epoch': 1.38}
Result for _objective_a80be_00004:
  date: 2022-10-07_07-36-49
  done: false
  epoch: 1.38
  eval_accuracy: 0.394
  eval_loss: 1.072451114654541
  eval_runtime: 4.7711
  eval_samples_per_second: 209.593
  eval_steps_per_second: 6.707
  experiment_id: 3af4f1590ef54b5c98d3f1422095ddc2
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.394
  pid: 1815193
  should_checkpoint: true
  time_since_restore: 15.3907949924469
  time_this_iter_s: 15.3907949924469
  time_total_s: 63.142513275146484
  timestamp: 1665128209
  timesteps_since_restore: 0
  training_iteration: 3
  trial_id: a80be_00004
  warmup_time: 2.678349494934082
  
== Status ==
Current time: 2022-10-07 07:36:52 (running for 00:08:15.70)
Memory usage on this node: 18.3/31.1 GiB
PopulationBased

11it [00:15,  1.38s/it]d=1815193)[0m 
[2m[36m(pid=1815354)[0m 2022-10-07 07:37:00.260215: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


== Status ==
Current time: 2022-10-07 07:37:01 (running for 00:08:25.19)
Memory usage on this node: 10.1/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

[2m[36m(_objective pid=1815354)[0m 2022-10-07 07:37:04,166	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmpfc452c
[2m[36m(_objective pid=1815354)[0m 2022-10-07 07:37:04,167	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:37:06 (running for 00:08:30.20)
Memory usage on this node: 14.0/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

[2m[36m(_objective pid=1815354)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias']
[2m[36m(_objective pid=1815354)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1815354)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:37:11 (running for 00:08:35.20)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

[2m[36m(_objective pid=1815354)[0m 
12it [00:02,  4.79it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 12.86it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.17it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.63it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.22it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.02it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.85it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.65it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.63it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.51it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.42it/s]
 41%|████      | 13/32 [00:01<00:02,  6.46it/s]
 44%|████▍     | 14/32 [00:02<00:02,  6.49it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.51it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.53it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.51it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.53it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.54it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.56it/s]
 66%|██████▌   | 21/32 

== Status ==
Current time: 2022-10-07 07:37:16 (running for 00:08:40.21)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 3 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

 78%|███████▊  | 25/32 [00:03<00:01,  6.56it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.56it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.57it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.57it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.58it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.57it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.57it/s]
                                               


[2m[36m(_objective pid=1815354)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.8097, 'eval_samples_per_second': 207.912, 'eval_steps_per_second': 6.653, 'epoch': 1.5}


2022-10-07 07:37:19,694	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00004 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:37:19,694	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.00028035570276515817, 'learning_rate': 4.96884623716487e-05}
12it [00:09,  1.30it/s]d=1815354)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-37-19
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.8097
  eval_samples_per_second: 207.912
  eval_steps_per_second: 6.653
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1815354
  should_checkpoint: true
  time_since_restore: 15.40962839126587
  time_this_iter_s: 15.40962839126587
  time_total_s: 77.48604822158813
  timestamp: 1665128239
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.6649060249328613
  




== Status ==
Current time: 2022-10-07 07:37:24 (running for 00:08:47.44)
Memory usage on this node: 13.7/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 4 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

[2m[36m(pid=1815506)[0m 2022-10-07 07:37:24.722589: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1815506)[0m 2022-10-07 07:37:27,589	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp42b1e9
[2m[36m(_objective pid=1815506)[0m 2022-10-07 07:37:27,589	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:37:32 (running for 00:08:56.19)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 4 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

[2m[36m(_objective pid=1815506)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
[2m[36m(_objective pid=1815506)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1815506)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:37:37 (running for 00:09:01.20)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 4 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

 19%|█▉        | 6/32 [00:00<00:03,  7.31it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.08it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.90it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.81it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.70it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.66it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.63it/s]
 41%|████      | 13/32 [00:01<00:02,  6.60it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.59it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.58it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.59it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.59it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.56it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.58it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.56it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.55it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.54it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.55it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.53it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.55it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.53it

[2m[36m(_objective pid=1815506)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7784, 'eval_samples_per_second': 209.274, 'eval_steps_per_second': 6.697, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:37:42 (running for 00:09:06.20)
Memory usage on this node: 14.7/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 4 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------

2022-10-07 07:37:44,342	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00004 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:37:44,343	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.0001869038018434388, 'learning_rate': 2.3386673689484342e-05}
12it [00:09,  1.21it/s]d=1815506)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-37-44
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7784
  eval_samples_per_second: 209.274
  eval_steps_per_second: 6.697
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1815506
  should_checkpoint: true
  time_since_restore: 16.56084632873535
  time_this_iter_s: 16.56084632873535
  time_total_s: 78.63726615905762
  timestamp: 1665128264
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.114354133605957
  


[2m[36m(pid=1815654)[0m 2022-10-07 07:37:48.267600: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1815654)[0m 2022-10-07 07:37:51,149	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp7d0117
[2m[36m(_objective pid=1815654)[0m 2022-10-07 07:37:51,149	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:37:51 (running for 00:09:14.76)
Memory usage on this node: 10.1/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 5 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

[2m[36m(_objective pid=1815654)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight']
[2m[36m(_objective pid=1815654)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1815654)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:38:01 (running for 00:09:24.77)
Memory usage on this node: 14.3/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 5 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

 34%|███▍      | 11/32 [00:01<00:03,  6.70it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.67it/s]
 41%|████      | 13/32 [00:01<00:02,  6.64it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.63it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.62it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.61it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.61it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.60it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.57it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.58it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.58it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.58it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.58it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.58it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.58it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.58it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.58it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.56it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.55it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.56it/s]
                                        

[2m[36m(_objective pid=1815654)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7681, 'eval_samples_per_second': 209.729, 'eval_steps_per_second': 6.711, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:38:06 (running for 00:09:29.77)
Memory usage on this node: 14.6/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 5 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------

2022-10-07 07:38:07,171	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00004 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:38:07,180	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.00028035570276515817, 'learning_rate': 3.4699260385108665e-05}
12it [00:09,  1.22it/s]d=1815654)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-38-07
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7681
  eval_samples_per_second: 209.729
  eval_steps_per_second: 6.711
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1815654
  should_checkpoint: true
  time_since_restore: 15.90007758140564
  time_this_iter_s: 15.90007758140564
  time_total_s: 77.9764974117279
  timestamp: 1665128287
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.030703544616699
  


[2m[36m(pid=1815800)[0m 2022-10-07 07:38:11.624232: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1815800)[0m 2022-10-07 07:38:14,442	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp216b75
[2m[36m(_objective pid=1815800)[0m 2022-10-07 07:38:14,442	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:38:14 (running for 00:09:37.93)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 6 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

[2m[36m(_objective pid=1815800)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']
[2m[36m(_objective pid=1815800)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1815800)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:38:24 (running for 00:09:47.95)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 6 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

 31%|███▏      | 10/32 [00:01<00:03,  6.70it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.63it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.61it/s]
 41%|████      | 13/32 [00:01<00:02,  6.59it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.58it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.57it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.57it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.56it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.56it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.56it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.57it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.56it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.55it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.55it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.53it/s]
                                               


[2m[36m(_objective pid=1815800)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.796, 'eval_samples_per_second': 208.507, 'eval_steps_per_second': 6.672, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:38:29 (running for 00:09:52.95)
Memory usage on this node: 15.1/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 6 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:38:30,371	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00004 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:38:30,372	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.0001869038018434388, 'learning_rate': 3.446612641953124e-05}
12it [00:09,  1.22it/s]d=1815800)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-38-30
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.796
  eval_samples_per_second: 208.507
  eval_steps_per_second: 6.672
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1815800
  should_checkpoint: true
  time_since_restore: 15.797339916229248
  time_this_iter_s: 15.797339916229248
  time_total_s: 77.87375974655151
  timestamp: 1665128310
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.0374763011932373
  


[2m[36m(pid=1815972)[0m 2022-10-07 07:38:34.326629: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1815972)[0m 2022-10-07 07:38:37,194	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp868548
[2m[36m(_objective pid=1815972)[0m 2022-10-07 07:38:37,194	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:38:37 (running for 00:10:00.80)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 7 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

[2m[36m(_objective pid=1815972)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
[2m[36m(_objective pid=1815972)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1815972)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:38:42 (running for 00:10:05.81)
Memory usage on this node: 15.1/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 7 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1815972)[0m 
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:02,  3.69it/s]
Skipping the first batches:  83%|████████▎ | 10/12 [00:00<00:00, 32.50it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 28.78it/s]
[2m[36m(_objective pid=1815972)[0m 
12it [00:02,  4.85it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.17it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.30it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.70it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.34it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.05it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.91it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.78it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.72it/s]


== Status ==
Current time: 2022-10-07 07:38:47 (running for 00:10:10.81)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 7 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNN

 34%|███▍      | 11/32 [00:01<00:03,  6.65it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.63it/s]
 41%|████      | 13/32 [00:01<00:02,  6.58it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.58it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.56it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.55it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.56it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.57it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.54it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.56it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.53it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.54it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.53it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.54it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.52it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.54it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.52it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.54it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.53it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.54it/s]
                                        

[2m[36m(_objective pid=1815972)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7857, 'eval_samples_per_second': 208.957, 'eval_steps_per_second': 6.687, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:38:52 (running for 00:10:15.81)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 7 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------

2022-10-07 07:38:53,244	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:38:53,245	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.17241364344874652, 'learning_rate': 1.0282652208788698e-05}
12it [00:09,  1.21it/s]d=1815972)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-38-53
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7857
  eval_samples_per_second: 208.957
  eval_steps_per_second: 6.687
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1815972
  should_checkpoint: true
  time_since_restore: 15.930071115493774
  time_this_iter_s: 15.930071115493774
  time_total_s: 78.00649094581604
  timestamp: 1665128333
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.0430986881256104
  


[2m[36m(pid=1816126)[0m 2022-10-07 07:38:57.771758: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1816126)[0m 2022-10-07 07:39:00,622	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp10a742
[2m[36m(_objective pid=1816126)[0m 2022-10-07 07:39:00,622	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:39:00 (running for 00:10:24.11)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 8 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

[2m[36m(_objective pid=1816126)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight']
[2m[36m(_objective pid=1816126)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1816126)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:39:05 (running for 00:10:29.12)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 8 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1816126)[0m 
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:03,  2.92it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 24.60it/s]
[2m[36m(_objective pid=1816126)[0m 
12it [00:02,  4.86it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.18it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.29it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.70it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.33it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.09it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.92it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.81it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.74it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.69it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.65it/s]


== Status ==
Current time: 2022-10-07 07:39:10 (running for 00:10:34.12)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 8 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

 41%|████      | 13/32 [00:01<00:02,  6.63it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.61it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.60it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.60it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.60it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.59it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.59it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.59it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.59it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.58it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.58it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.55it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.55it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.56it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.56it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.54it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.55it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.56it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.57it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1816126)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7703, 'eval_samples_per_second': 209.629, 'eval_steps_per_second': 6.708, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:39:15 (running for 00:10:39.12)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 8 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|--------------

2022-10-07 07:39:16,232	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:39:16,232	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.17241364344874652, 'learning_rate': 1.5591115792989563e-05}
12it [00:09,  1.21it/s]d=1816126)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-39-16
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7703
  eval_samples_per_second: 209.629
  eval_steps_per_second: 6.708
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1816126
  should_checkpoint: true
  time_since_restore: 15.475187540054321
  time_this_iter_s: 15.475187540054321
  time_total_s: 77.55160737037659
  timestamp: 1665128356
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.06597900390625
  


[2m[36m(pid=1816280)[0m 2022-10-07 07:39:20.359686: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1816280)[0m 2022-10-07 07:39:23,263	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp015cf3
[2m[36m(_objective pid=1816280)[0m 2022-10-07 07:39:23,263	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:39:23 (running for 00:10:46.87)
Memory usage on this node: 10.3/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 9 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

[2m[36m(_objective pid=1816280)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
[2m[36m(_objective pid=1816280)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1816280)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:39:28 (running for 00:10:51.87)
Memory usage on this node: 14.7/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 9 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1816280)[0m 
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:03,  3.59it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 28.17it/s]
[2m[36m(_objective pid=1816280)[0m 
12it [00:02,  4.85it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.20it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.29it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.71it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.34it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.10it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.93it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.82it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.71it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.67it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.61it/s]
 41%|████      | 13/32 [00:01<00:02,  6.60it/s]


== Status ==
Current time: 2022-10-07 07:39:33 (running for 00:10:56.88)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 9 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNING

 44%|████▍     | 14/32 [00:01<00:02,  6.56it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.57it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.55it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.55it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.55it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.56it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.57it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.58it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.57it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.58it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.58it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.56it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.57it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.53it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.55it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.53it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.54it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.52it/s]
                                               


[2m[36m(_objective pid=1816280)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7798, 'eval_samples_per_second': 209.215, 'eval_steps_per_second': 6.695, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:39:38 (running for 00:11:01.88)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 9 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|--------------

2022-10-07 07:39:39,119	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:39:39,120	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.17241364344874652, 'learning_rate': 2.3386673689484342e-05}
12it [00:10,  1.19it/s]d=1816280)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-39-39
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7798
  eval_samples_per_second: 209.215
  eval_steps_per_second: 6.695
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1816280
  should_checkpoint: true
  time_since_restore: 15.734589338302612
  time_this_iter_s: 15.734589338302612
  time_total_s: 77.81100916862488
  timestamp: 1665128379
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.045137643814087
  


[2m[36m(pid=1816419)[0m 2022-10-07 07:39:43.759165: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1816419)[0m 2022-10-07 07:39:46,603	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmpa310a3
[2m[36m(_objective pid=1816419)[0m 2022-10-07 07:39:46,603	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:39:46 (running for 00:11:10.10)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 10 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1816419)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight']
[2m[36m(_objective pid=1816419)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1816419)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:39:56 (running for 00:11:20.11)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 10 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 25%|██▌       | 8/32 [00:01<00:03,  6.88it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.76it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.67it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.64it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.59it/s]
 41%|████      | 13/32 [00:01<00:02,  6.59it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.56it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.57it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.54it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.55it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.53it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.55it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.53it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.54it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.52it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.54it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.52it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.52it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.51it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.52it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.51

[2m[36m(_objective pid=1816419)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7996, 'eval_samples_per_second': 208.349, 'eval_steps_per_second': 6.667, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:40:01 (running for 00:11:25.11)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 10 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:40:02,909	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:40:02,909	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.17241364344874652, 'learning_rate': 1.5591115792989563e-05}
12it [00:09,  1.21it/s]d=1816419)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-40-02
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7996
  eval_samples_per_second: 208.349
  eval_steps_per_second: 6.667
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1816419
  should_checkpoint: true
  time_since_restore: 16.166028022766113
  time_this_iter_s: 16.166028022766113
  time_total_s: 78.24244785308838
  timestamp: 1665128402
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.055022716522217
  


[2m[36m(pid=1816572)[0m 2022-10-07 07:40:07.305240: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1816572)[0m 2022-10-07 07:40:10,158	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp2de517
[2m[36m(_objective pid=1816572)[0m 2022-10-07 07:40:10,158	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:40:10 (running for 00:11:33.77)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 11 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1816572)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
[2m[36m(_objective pid=1816572)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1816572)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:40:20 (running for 00:11:43.78)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 11 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 31%|███▏      | 10/32 [00:01<00:03,  6.72it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.68it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.62it/s]
 41%|████      | 13/32 [00:01<00:02,  6.61it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.57it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.57it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.57it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.58it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.58it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.59it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.59it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.59it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.58it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.59it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.59it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.59it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.59it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.59it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.59it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.59it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1816572)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7646, 'eval_samples_per_second': 209.879, 'eval_steps_per_second': 6.716, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:40:25 (running for 00:11:48.78)
Memory usage on this node: 14.9/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 11 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:40:26,461	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:40:26,461	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.11494242896583103, 'learning_rate': 2.3386673689484342e-05}
12it [00:09,  1.20it/s]d=1816572)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-40-26
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7646
  eval_samples_per_second: 209.879
  eval_steps_per_second: 6.716
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1816572
  should_checkpoint: true
  time_since_restore: 16.161619663238525
  time_this_iter_s: 16.161619663238525
  time_total_s: 78.23803949356079
  timestamp: 1665128426
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.050328493118286
  


[2m[36m(pid=1816719)[0m 2022-10-07 07:40:30.729739: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1816719)[0m 2022-10-07 07:40:33,561	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp3bda93
[2m[36m(_objective pid=1816719)[0m 2022-10-07 07:40:33,561	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:40:33 (running for 00:11:57.06)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 12 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1816719)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
[2m[36m(_objective pid=1816719)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1816719)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:40:43 (running for 00:12:07.07)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 12 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 28%|██▊       | 9/32 [00:01<00:03,  6.81it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.73it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.68it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.64it/s]
 41%|████      | 13/32 [00:01<00:02,  6.62it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.60it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.59it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.59it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.58it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.58it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.58it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.58it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.58it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.58it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.58it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.54it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.55it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.52it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.53it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.53it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.5

[2m[36m(_objective pid=1816719)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7747, 'eval_samples_per_second': 209.439, 'eval_steps_per_second': 6.702, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:40:48 (running for 00:12:12.07)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 12 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:40:49,836	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:40:49,837	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.11494242896583103, 'learning_rate': 1.0922497001656631e-05}
12it [00:09,  1.22it/s]d=1816719)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-40-49
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7747
  eval_samples_per_second: 209.439
  eval_steps_per_second: 6.702
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1816719
  should_checkpoint: true
  time_since_restore: 16.13546347618103
  time_this_iter_s: 16.13546347618103
  time_total_s: 78.2118833065033
  timestamp: 1665128449
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.055121898651123
  


[2m[36m(pid=1816861)[0m 2022-10-07 07:40:54.317887: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1816861)[0m 2022-10-07 07:40:57,177	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp020f33
[2m[36m(_objective pid=1816861)[0m 2022-10-07 07:40:57,177	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:40:57 (running for 00:12:20.79)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 13 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1816861)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias']
[2m[36m(_objective pid=1816861)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1816861)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:41:07 (running for 00:12:30.80)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 13 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 28%|██▊       | 9/32 [00:01<00:03,  6.83it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.76it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.71it/s]
 38%|███▊      | 12/32 [00:01<00:02,  6.67it/s]
 41%|████      | 13/32 [00:01<00:02,  6.65it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.63it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.62it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.61it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.60it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.56it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.57it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.55it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.56it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.54it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.56it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.53it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.55it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.53it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.54it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.52it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.5

[2m[36m(_objective pid=1816861)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7751, 'eval_samples_per_second': 209.421, 'eval_steps_per_second': 6.701, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:41:12 (running for 00:12:35.80)
Memory usage on this node: 14.7/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 13 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:41:13,549	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00004 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:41:13,550	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.15743239807751674, 'learning_rate': 2.5994438868610224e-05}
12it [00:09,  1.21it/s]d=1816861)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-41-13
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7751
  eval_samples_per_second: 209.421
  eval_steps_per_second: 6.701
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1816861
  should_checkpoint: true
  time_since_restore: 16.132404088974
  time_this_iter_s: 16.132404088974
  time_total_s: 78.20882391929626
  timestamp: 1665128473
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.1524651050567627
  


[2m[36m(pid=1817009)[0m 2022-10-07 07:41:17.785600: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1817009)[0m 2022-10-07 07:41:20,636	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmpefe143
[2m[36m(_objective pid=1817009)[0m 2022-10-07 07:41:20,636	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:41:20 (running for 00:12:44.13)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 14 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1817009)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight']
[2m[36m(_objective pid=1817009)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1817009)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:41:25 (running for 00:12:49.13)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 14 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1817009)[0m 
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:03,  2.93it/s]
Skipping the first batches:  83%|████████▎ | 10/12 [00:00<00:00, 27.87it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 24.55it/s]
[2m[36m(_objective pid=1817009)[0m 
12it [00:02,  4.86it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.17it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.30it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.70it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.34it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.10it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.94it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.82it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.75it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.67it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.64it/s]


== Status ==
Current time: 2022-10-07 07:41:30 (running for 00:12:54.13)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 14 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 41%|████      | 13/32 [00:01<00:02,  6.62it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.61it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.60it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.60it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.60it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.56it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.57it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.56it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.56it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.57it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.57it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.58it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.58it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.58it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.58it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.58it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.57it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.57it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.56it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1817009)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7692, 'eval_samples_per_second': 209.677, 'eval_steps_per_second': 6.71, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:41:35 (running for 00:12:59.14)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 14 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|--------------

2022-10-07 07:41:36,304	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00004 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:41:36,304	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.00028035570276515817, 'learning_rate': 1.5591115792989563e-05}
12it [00:09,  1.23it/s]d=1817009)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-41-36
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7692
  eval_samples_per_second: 209.677
  eval_steps_per_second: 6.71
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1817009
  should_checkpoint: true
  time_since_restore: 15.530893325805664
  time_this_iter_s: 15.530893325805664
  time_total_s: 77.60731315612793
  timestamp: 1665128496
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.0680108070373535
  


[2m[36m(pid=1817150)[0m 2022-10-07 07:41:40.332658: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1817150)[0m 2022-10-07 07:41:43,190	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp7f397c
[2m[36m(_objective pid=1817150)[0m 2022-10-07 07:41:43,190	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:41:43 (running for 00:13:06.79)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 15 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUN

[2m[36m(_objective pid=1817150)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias']
[2m[36m(_objective pid=1817150)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1817150)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:41:48 (running for 00:13:11.80)
Memory usage on this node: 14.8/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 15 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUN

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1817150)[0m 
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:02,  3.67it/s]
Skipping the first batches:  83%|████████▎ | 10/12 [00:00<00:00, 32.41it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 28.64it/s]
[2m[36m(_objective pid=1817150)[0m 
12it [00:02,  4.84it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.19it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.31it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.72it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.34it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.10it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.94it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.83it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.74it/s]


== Status ==
Current time: 2022-10-07 07:41:53 (running for 00:13:16.80)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 15 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUN

 34%|███▍      | 11/32 [00:01<00:03,  6.70it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.63it/s]
 41%|████      | 13/32 [00:01<00:02,  6.62it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.61it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.61it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.60it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.60it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.59it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.59it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.59it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.58it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.59it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.59it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.59it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.55it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.56it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.57it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.58it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.58it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.58it/s]
                                        

[2m[36m(_objective pid=1817150)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7632, 'eval_samples_per_second': 209.942, 'eval_steps_per_second': 6.718, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:41:58 (running for 00:13:21.81)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 15 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-----------

2022-10-07 07:41:59,138	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:41:59,139	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.17241364344874652, 'learning_rate': 1.5591115792989563e-05}
12it [00:09,  1.23it/s]d=1817150)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-41-59
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7632
  eval_samples_per_second: 209.942
  eval_steps_per_second: 6.718
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1817150
  should_checkpoint: true
  time_since_restore: 15.828655242919922
  time_this_iter_s: 15.828655242919922
  time_total_s: 77.90507507324219
  timestamp: 1665128519
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.021773099899292
  


[2m[36m(pid=1817293)[0m 2022-10-07 07:42:03.786288: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1817293)[0m 2022-10-07 07:42:06,626	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmped2c6d
[2m[36m(_objective pid=1817293)[0m 2022-10-07 07:42:06,626	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:42:06 (running for 00:13:30.12)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 16 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1817293)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']
[2m[36m(_objective pid=1817293)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1817293)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:42:11 (running for 00:13:35.12)
Memory usage on this node: 15.1/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 16 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1817293)[0m 
  0%|          | 0/10 [00:00<?, ?it/s][A
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:03,  2.89it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 24.58it/s]
[2m[36m(_objective pid=1817293)[0m 
12it [00:02,  4.86it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.14it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.25it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.68it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.32it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.09it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.92it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.82it/s]


== Status ==
Current time: 2022-10-07 07:42:16 (running for 00:13:40.12)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 16 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 31%|███▏      | 10/32 [00:01<00:03,  6.74it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.69it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.66it/s]
 41%|████      | 13/32 [00:01<00:02,  6.60it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.60it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.58it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.58it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.58it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.58it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.58it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.58it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.57it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.58it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.57it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.58it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.58it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.58it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.58it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.58it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.58it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1817293)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7656, 'eval_samples_per_second': 209.836, 'eval_steps_per_second': 6.715, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:42:21 (running for 00:13:45.13)
Memory usage on this node: 15.0/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 16 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:42:22,598	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:42:22,598	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.11494242896583103, 'learning_rate': 1.5591115792989563e-05}
12it [00:09,  1.21it/s]d=1817293)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-42-22
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7656
  eval_samples_per_second: 209.836
  eval_steps_per_second: 6.715
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1817293
  should_checkpoint: true
  time_since_restore: 15.835160970687866
  time_this_iter_s: 15.835160970687866
  time_total_s: 77.91158080101013
  timestamp: 1665128542
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.057798385620117
  


[2m[36m(pid=1817437)[0m 2022-10-07 07:42:26.427368: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1817437)[0m 2022-10-07 07:42:29,291	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmpb4ae23
[2m[36m(_objective pid=1817437)[0m 2022-10-07 07:42:29,291	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:42:29 (running for 00:13:52.90)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 17 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1817437)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
[2m[36m(_objective pid=1817437)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1817437)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:42:39 (running for 00:14:02.90)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 17 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 34%|███▍      | 11/32 [00:01<00:03,  6.70it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.66it/s]
 41%|████      | 13/32 [00:01<00:02,  6.62it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.60it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.60it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.56it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.52it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.54it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.54it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.55it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.56it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.53it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.54it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.52it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.54it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.55it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.55it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.56it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.56it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.57it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1817437)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.777, 'eval_samples_per_second': 209.336, 'eval_steps_per_second': 6.699, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:42:44 (running for 00:14:07.90)
Memory usage on this node: 15.1/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 17 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|--------------

2022-10-07 07:42:45,372	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:42:45,373	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.013999698964084628, 'learning_rate': 2.3386673689484342e-05}
12it [00:09,  1.22it/s]d=1817437)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-42-45
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.777
  eval_samples_per_second: 209.336
  eval_steps_per_second: 6.699
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1817437
  should_checkpoint: true
  time_since_restore: 15.960481643676758
  time_this_iter_s: 15.960481643676758
  time_total_s: 78.03690147399902
  timestamp: 1665128565
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.036768674850464
  


[2m[36m(pid=1817580)[0m 2022-10-07 07:42:49.783823: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1817580)[0m 2022-10-07 07:42:52,630	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp9c005b
[2m[36m(_objective pid=1817580)[0m 2022-10-07 07:42:52,630	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:42:52 (running for 00:14:16.12)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 18 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1817580)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias']
[2m[36m(_objective pid=1817580)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1817580)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:43:02 (running for 00:14:26.13)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 18 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 31%|███▏      | 10/32 [00:01<00:03,  6.73it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.64it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.62it/s]
 41%|████      | 13/32 [00:01<00:02,  6.59it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.58it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.58it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.57it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.57it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.57it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.56it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.56it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.57it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.56it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.56it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.57it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.56it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.56it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.55it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.56it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.56it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1817580)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.778, 'eval_samples_per_second': 209.294, 'eval_steps_per_second': 6.697, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:43:07 (running for 00:14:31.13)
Memory usage on this node: 14.9/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 18 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|--------------

2022-10-07 07:43:08,759	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:43:08,759	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.29212665565243773, 'learning_rate': 1.5591115792989563e-05}
12it [00:09,  1.21it/s]d=1817580)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-43-08
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.778
  eval_samples_per_second: 209.294
  eval_steps_per_second: 6.697
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1817580
  should_checkpoint: true
  time_since_restore: 15.992698669433594
  time_this_iter_s: 15.992698669433594
  time_total_s: 78.06911849975586
  timestamp: 1665128588
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.049220561981201
  


[2m[36m(pid=1817732)[0m 2022-10-07 07:43:12.579970: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1817732)[0m 2022-10-07 07:43:15,462	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmpb2c16a
[2m[36m(_objective pid=1817732)[0m 2022-10-07 07:43:15,462	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:43:15 (running for 00:14:39.06)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 19 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1817732)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight']
[2m[36m(_objective pid=1817732)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1817732)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:43:25 (running for 00:14:49.07)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 19 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 28%|██▊       | 9/32 [00:01<00:03,  6.75it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.69it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.62it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.60it/s]
 41%|████      | 13/32 [00:01<00:02,  6.56it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.56it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.53it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.54it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.52it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.54it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.55it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.55it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.56it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.57it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.56it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.57it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.55it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.55it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.55it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.51it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.5

[2m[36m(_objective pid=1817732)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7933, 'eval_samples_per_second': 208.626, 'eval_steps_per_second': 6.676, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:43:30 (running for 00:14:54.07)
Memory usage on this node: 14.8/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 19 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:43:31,755	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:43:31,755	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.11494242896583103, 'learning_rate': 2.3386673689484342e-05}
12it [00:09,  1.20it/s]d=1817732)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-43-31
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7933
  eval_samples_per_second: 208.626
  eval_steps_per_second: 6.676
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1817732
  should_checkpoint: true
  time_since_restore: 16.13073706626892
  time_this_iter_s: 16.13073706626892
  time_total_s: 78.20715689659119
  timestamp: 1665128611
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.0930557250976562
  


[2m[36m(pid=1817897)[0m 2022-10-07 07:43:35.982616: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1817897)[0m 2022-10-07 07:43:38,822	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmpbbeeef
[2m[36m(_objective pid=1817897)[0m 2022-10-07 07:43:38,822	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:43:38 (running for 00:15:02.32)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 20 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1817897)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight']
[2m[36m(_objective pid=1817897)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1817897)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:43:48 (running for 00:15:12.32)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 20 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 28%|██▊       | 9/32 [00:01<00:03,  6.82it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.75it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.70it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.66it/s]
 41%|████      | 13/32 [00:01<00:02,  6.63it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.62it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.61it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.60it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.59it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.59it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.59it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.58it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.58it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.56it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.57it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.54it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.55it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.53it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.55it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.53it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.5

[2m[36m(_objective pid=1817897)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7748, 'eval_samples_per_second': 209.431, 'eval_steps_per_second': 6.702, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:43:53 (running for 00:15:17.32)
Memory usage on this node: 15.1/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 20 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:43:54,853	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00004 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:43:54,854	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.06983140212909127, 'learning_rate': 1.5591115792989563e-05}
12it [00:09,  1.22it/s]d=1817897)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-43-54
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7748
  eval_samples_per_second: 209.431
  eval_steps_per_second: 6.702
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1817897
  should_checkpoint: true
  time_since_restore: 15.891712188720703
  time_this_iter_s: 15.891712188720703
  time_total_s: 77.96813201904297
  timestamp: 1665128634
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.0590546131134033
  


[2m[36m(pid=1818042)[0m 2022-10-07 07:43:59.396825: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1818042)[0m 2022-10-07 07:44:02,264	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmpcb82d6
[2m[36m(_objective pid=1818042)[0m 2022-10-07 07:44:02,264	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:44:02 (running for 00:15:25.87)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 21 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1818042)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
[2m[36m(_objective pid=1818042)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1818042)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:44:12 (running for 00:15:35.88)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 21 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 31%|███▏      | 10/32 [00:01<00:03,  6.75it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.69it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.64it/s]
 41%|████      | 13/32 [00:01<00:02,  6.62it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.61it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.56it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.57it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.53it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.55it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.52it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.54it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.49it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.51it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.53it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.54it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.51it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.53it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.51it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.42it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.47it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1818042)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7954, 'eval_samples_per_second': 208.533, 'eval_steps_per_second': 6.673, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:44:17 (running for 00:15:40.88)
Memory usage on this node: 15.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 21 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:44:18,332	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00004 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:44:18,333	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.0001869038018434388, 'learning_rate': 2.3386673689484342e-05}
12it [00:09,  1.20it/s]d=1818042)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-44-18
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7954
  eval_samples_per_second: 208.533
  eval_steps_per_second: 6.673
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1818042
  should_checkpoint: true
  time_since_restore: 15.944751262664795
  time_this_iter_s: 15.944751262664795
  time_total_s: 78.02117109298706
  timestamp: 1665128658
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.041524648666382
  


[2m[36m(pid=1818184)[0m 2022-10-07 07:44:22.830578: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1818184)[0m 2022-10-07 07:44:25,677	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp14c330
[2m[36m(_objective pid=1818184)[0m 2022-10-07 07:44:25,677	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:44:25 (running for 00:15:49.17)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 22 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUN

[2m[36m(_objective pid=1818184)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
[2m[36m(_objective pid=1818184)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1818184)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:44:30 (running for 00:15:54.17)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 22 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUN

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1818184)[0m 
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:03,  2.91it/s]
Skipping the first batches:  83%|████████▎ | 10/12 [00:00<00:00, 27.86it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 24.54it/s]
[2m[36m(_objective pid=1818184)[0m 
12it [00:02,  4.85it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.15it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.30it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.71it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.34it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.10it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.94it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.83it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.76it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.71it/s]
 38%|███▊      | 12/32 [00:01<00:02,  6.67it/s]


== Status ==
Current time: 2022-10-07 07:44:35 (running for 00:15:59.18)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 22 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUN

 41%|████      | 13/32 [00:01<00:02,  6.65it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.63it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.61it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.60it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.59it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.59it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.59it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.59it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.59it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.55it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.56it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.53it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.55it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.53it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.54it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.51it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.54it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.52it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.53it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1818184)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7711, 'eval_samples_per_second': 209.595, 'eval_steps_per_second': 6.707, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:44:40 (running for 00:16:04.18)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 22 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-----------

2022-10-07 07:44:41,482	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00004 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:44:41,483	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.00023362975230429848, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.00028035570276515817, 'learning_rate': 1.3624257381312833e-05}
12it [00:09,  1.20it/s]d=1818184)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-44-41
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7711
  eval_samples_per_second: 209.595
  eval_steps_per_second: 6.707
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1818184
  should_checkpoint: true
  time_since_restore: 15.666259288787842
  time_this_iter_s: 15.666259288787842
  time_total_s: 77.74267911911011
  timestamp: 1665128681
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.0600717067718506
  


[2m[36m(pid=1818325)[0m 2022-10-07 07:44:45.368603: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1818325)[0m 2022-10-07 07:44:48,236	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmpbf9e1d
[2m[36m(_objective pid=1818325)[0m 2022-10-07 07:44:48,236	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:44:48 (running for 00:16:11.84)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 23 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUN

[2m[36m(_objective pid=1818325)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
[2m[36m(_objective pid=1818325)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1818325)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:44:53 (running for 00:16:16.84)
Memory usage on this node: 14.8/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 23 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUN

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1818325)[0m 
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:03,  3.64it/s]
Skipping the first batches:  83%|████████▎ | 10/12 [00:00<00:00, 32.22it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 28.45it/s]
[2m[36m(_objective pid=1818325)[0m 
12it [00:02,  4.86it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.21it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.23it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.62it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.29it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.02it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.88it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.75it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.70it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.63it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.62it/s]


== Status ==
Current time: 2022-10-07 07:44:58 (running for 00:16:21.85)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 23 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUN

 41%|████      | 13/32 [00:01<00:02,  6.58it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.58it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.54it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.56it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.51it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.54it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.52it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.54it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.51it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.54it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.54it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.55it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.57it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.56it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.58it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.58it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.58it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.58it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.58it/s]
                                        

[2m[36m(_objective pid=1818325)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7902, 'eval_samples_per_second': 208.761, 'eval_steps_per_second': 6.68, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:45:03 (running for 00:16:26.85)
Memory usage on this node: 14.5/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 23 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+-------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |     w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------

2022-10-07 07:45:03,889	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:45:03,889	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.11494242896583103, 'learning_rate': 3.473544037332349e-05}
12it [00:09,  1.22it/s]d=1818325)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-45-03
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7902
  eval_samples_per_second: 208.761
  eval_steps_per_second: 6.68
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1818325
  should_checkpoint: true
  time_since_restore: 15.508443593978882
  time_this_iter_s: 15.508443593978882
  time_total_s: 77.58486342430115
  timestamp: 1665128703
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.053870677947998
  


[2m[36m(pid=1818489)[0m 2022-10-07 07:45:08.816637: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1818489)[0m 2022-10-07 07:45:11,665	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp453483
[2m[36m(_objective pid=1818489)[0m 2022-10-07 07:45:11,665	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:45:11 (running for 00:16:35.17)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 24 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1818489)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight']
[2m[36m(_objective pid=1818489)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1818489)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:45:16 (running for 00:16:40.17)
Memory usage on this node: 14.7/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 24 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1818489)[0m 
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:03,  2.92it/s]
Skipping the first batches:  83%|████████▎ | 10/12 [00:00<00:00, 27.71it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 24.42it/s]
[2m[36m(_objective pid=1818489)[0m 
12it [00:02,  4.85it/s]               [A
  0%|          | 0/32 [00:00<?, ?it/s]
  6%|▋         | 2/32 [00:00<00:02, 13.17it/s]
 12%|█▎        | 4/32 [00:00<00:03,  8.30it/s]
 16%|█▌        | 5/32 [00:00<00:03,  7.71it/s]
 19%|█▉        | 6/32 [00:00<00:03,  7.34it/s]
 22%|██▏       | 7/32 [00:00<00:03,  7.10it/s]
 25%|██▌       | 8/32 [00:01<00:03,  6.93it/s]
 28%|██▊       | 9/32 [00:01<00:03,  6.82it/s]
 31%|███▏      | 10/32 [00:01<00:03,  6.75it/s]
 34%|███▍      | 11/32 [00:01<00:03,  6.70it/s]
 38%|███▊      | 12/32 [00:01<00:03,  6.66it/s]


== Status ==
Current time: 2022-10-07 07:45:21 (running for 00:16:45.17)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 24 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

 41%|████      | 13/32 [00:01<00:02,  6.64it/s]
 44%|████▍     | 14/32 [00:01<00:02,  6.63it/s]
 47%|████▋     | 15/32 [00:02<00:02,  6.59it/s]
 50%|█████     | 16/32 [00:02<00:02,  6.59it/s]
 53%|█████▎    | 17/32 [00:02<00:02,  6.57it/s]
 56%|█████▋    | 18/32 [00:02<00:02,  6.58it/s]
 59%|█████▉    | 19/32 [00:02<00:01,  6.58it/s]
 62%|██████▎   | 20/32 [00:02<00:01,  6.54it/s]
 66%|██████▌   | 21/32 [00:03<00:01,  6.56it/s]
 69%|██████▉   | 22/32 [00:03<00:01,  6.53it/s]
 72%|███████▏  | 23/32 [00:03<00:01,  6.54it/s]
 75%|███████▌  | 24/32 [00:03<00:01,  6.52it/s]
 78%|███████▊  | 25/32 [00:03<00:01,  6.55it/s]
 81%|████████▏ | 26/32 [00:03<00:00,  6.52it/s]
 84%|████████▍ | 27/32 [00:03<00:00,  6.55it/s]
 88%|████████▊ | 28/32 [00:04<00:00,  6.52it/s]
 91%|█████████ | 29/32 [00:04<00:00,  6.54it/s]
 94%|█████████▍| 30/32 [00:04<00:00,  6.56it/s]
 97%|█████████▋| 31/32 [00:04<00:00,  6.57it/s]
                                               
100%|██████████| 32/32 [00:04<00:00,  6.

[2m[36m(_objective pid=1818489)[0m {'eval_loss': 1.0760165452957153, 'eval_accuracy': 0.382, 'eval_runtime': 4.7739, 'eval_samples_per_second': 209.474, 'eval_steps_per_second': 6.703, 'epoch': 1.5}
== Status ==
Current time: 2022-10-07 07:45:26 (running for 00:16:50.17)
Memory usage on this node: 14.4/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 24 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|-------------

2022-10-07 07:45:27,317	INFO pbt.py:618 -- [exploit] transferring weights from trial _objective_a80be_00003 (score 0.394) -> _objective_a80be_00000 (score 0.382)
2022-10-07 07:45:27,318	INFO pbt.py:636 -- [explore] perturbed config from {'per_device_train_batch_size': 32, 'weight_decay': 0.14367803620728878, 'learning_rate': 1.9488894741236953e-05} -> {'per_device_train_batch_size': 32, 'weight_decay': 0.11473859738014881, 'learning_rate': 2.3386673689484342e-05}
12it [00:09,  1.20it/s]d=1818489)[0m 


Result for _objective_a80be_00000:
  date: 2022-10-07_07-45-27
  done: false
  epoch: 1.5
  eval_accuracy: 0.382
  eval_loss: 1.0760165452957153
  eval_runtime: 4.7739
  eval_samples_per_second: 209.474
  eval_steps_per_second: 6.703
  experiment_id: 9263b78072064a3da32fb2b2851a44c4
  hostname: 3481a8a2ae33
  iterations_since_restore: 1
  node_ip: 172.17.0.3
  objective: 0.382
  pid: 1818489
  should_checkpoint: true
  time_since_restore: 15.505537986755371
  time_this_iter_s: 15.505537986755371
  time_total_s: 77.58195781707764
  timestamp: 1665128727
  timesteps_since_restore: 0
  training_iteration: 4
  trial_id: a80be_00000
  warmup_time: 2.072788953781128
  


[2m[36m(pid=1818660)[0m 2022-10-07 07:45:31.402578: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
[2m[36m(_objective pid=1818660)[0m 2022-10-07 07:45:34,256	INFO trainable.py:668 -- Restored on 172.17.0.3 from checkpoint: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt/_objective_a80be_00000_0_num_train_epochs=2_2022-10-07_07-28-36/checkpoint_tmp1ea3ce
[2m[36m(_objective pid=1818660)[0m 2022-10-07 07:45:34,256	INFO trainable.py:677 -- Current state after restoring: {'_iteration': 3, '_timesteps_total': None, '_time_total': 62.076419830322266, '_episodes_total': None}


== Status ==
Current time: 2022-10-07 07:45:34 (running for 00:16:57.86)
Memory usage on this node: 10.2/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 25 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

[2m[36m(_objective pid=1818660)[0m Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']
[2m[36m(_objective pid=1818660)[0m - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
[2m[36m(_objective pid=1818660)[0m - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForS

== Status ==
Current time: 2022-10-07 07:45:39 (running for 00:17:02.87)
Memory usage on this node: 14.7/31.1 GiB
PopulationBasedTraining: 8 checkpoints, 25 perturbs
Resources requested: 20.0/20 CPUs, 1.0/1 GPUs, 0.0/15.31 GiB heap, 0.0/7.66 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /workspace/syc/BERT_classification_binary/test-results/tune_transformer_pbt
Number of trials: 5/5 (4 PAUSED, 1 RUNNING)
+------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------+
| Trial name             | status   | loc                |    w_decay |          lr |   train_bs/gpu |   num_epochs |   eval_accuracy |   eval_loss |   epoch |   training_iteration |
|------------------------+----------+--------------------+------------+-------------+----------------+--------------+-----------------+-------------+---------+----------------------|
| _objective_a80be_00000 | RUNNIN

Skipping the first batches:   0%|          | 0/12 [00:00<?, ?it/s]
[2m[36m(_objective pid=1818660)[0m 
Skipping the first batches:   8%|▊         | 1/12 [00:00<00:03,  3.63it/s]
Skipping the first batches: 100%|██████████| 12/12 [00:00<00:00, 28.53it/s]


In [None]:
result

# Model test

# Reference

https://bo-10000.tistory.com/154
https://huggingface.co/blog/ray-tune  
https://docs.ray.io/en/latest/tune/examples/pbt_transformers.html
https://wood-b.github.io/post/a-novices-guide-to-hyperparameter-optimization-at-scale/#schedulers-vs-search-algorithms
https://docs.ray.io/en/latest/tune/api_docs/search_space.html
https://docs.ray.io/en/latest/tune/tutorials/tune-advanced-tutorial.html
https://docs.ray.io/en/latest/tune/api_docs/schedulers.html
https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/
https://docs.ray.io/en/latest/tune/faq.html