<a href="https://colab.research.google.com/github/vibha-rajan/check/blob/master/MRPC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# hide
!nvidia-smi

Sun Jul 25 17:32:26 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# hide
import sys
if 'google.colab' in sys.modules:
    !pip install -Uqq fastai transformers datasets wandb tqdm
    !pip install -qq git+git://github.com/aikindergarten/fasthugs.git

In [None]:
#all_slow

# GLUE Benchmark

In [None]:
from transformers import AutoModelForSequenceClassification
from fastai.text.all import *
from fastai.callback.wandb import WandbCallback

from fasthugs.learner import TransLearner
from fasthugs.data import TransformersTextBlock, TextGetter, get_splits, PreprocCategoryBlock

from datasets import load_dataset, concatenate_datasets

import wandb
import gc

## Setup

Let's define main settings for the run in one place:

In [None]:
ds_name = 'glue'
model_name = "roberta-base"

n_epoch = 5

max_len = 512
bs = 32
val_bs = bs*2

lr = 3e-5

In [None]:
GLUE_TASKS = ["cola", "mnli", "mnli-mm", "mrpc", "qnli", "qqp", "rte", "sst2", "stsb", "wnli"]

In [None]:
glue_metrics = {
    'cola':[MatthewsCorrCoef()],
    'sst2':[accuracy],
    'mrpc':[F1Score(), accuracy],
    'stsb':[PearsonCorrCoef(), SpearmanCorrCoef()],
    'qqp' :[F1Score(), accuracy],
    'mnli':[accuracy],
    'qnli':[accuracy],
    'rte' :[accuracy],
    'wnli':[accuracy],
}

## Microsoft Research Paraphrase Corpus

The task is to determine whether the sentences in the pair are semantically equivalent.

The GLUE dataset is available through HuggingFace `datasets` library.

In [None]:
task = 'mrpc'
ds = load_dataset(ds_name, task)

Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)


Let's check the sizes of the datasets:

In [None]:
print(f"Train set {len(ds['train'])}; Valdation set {len(ds['validation'])}")

Train set 3668; Valdation set 408


`fastai` data pipeline expects a single dataset with lists of train and validation indexes. Let's rearange the dataset we have to match that format:

In [None]:
train_idx, valid_idx = get_splits(ds)
train_ds = concatenate_datasets([ds['train'], ds['validation']])

In [None]:
train_ds[0]

{'idx': 0,
 'label': 1,
 'sentence1': 'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .',
 'sentence2': 'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .'}

Now we can construct dataloaders using familiar DataBlock api. `fasthugs` provides some `*Block` classes for that purpose. `TransformerTextBlock` will handle text tokenization and dynamic padding. `PreprocCategoryBlock` is similar to regular `CategoryBlock` but it allows to use predefined category names which we can get from dataset `features`:

In [None]:
label_vocab = train_ds.features['label'].names
blocks = [
    TransformersTextBlock(pretrained_model_name=model_name),
    PreprocCategoryBlock(label_vocab)
]

dblock = DataBlock(
    blocks=blocks,
    get_x=TextGetter('sentence1', 'sentence2'),
    get_y=ItemGetter('label'),
    splitter=IndexSplitter(valid_idx)
)

In [None]:
%%time
dls = dblock.dataloaders(train_ds, bs=bs, val_bs=val_bs)

CPU times: user 4.75 s, sys: 1.34 s, total: 6.09 s
Wall time: 6.06 s


In [None]:
dls.show_batch(max_n=4)

Unnamed: 0,text,text_,category
0,"Amrozi accused his brother, whom he called "" the witness "", of deliberately distorting his evidence.","Referring to him as only "" the witness "", Amrozi accused his brother of deliberately distorting his evidence.",equivalent
1,"Analysts'consensus estimate from Thomson First Call was for a loss of $ 2.08 a share, excluding one-time items.",The estimate of analysts surveyed by Thomson First Call was for a loss of $ 2.75 a share.,not_equivalent
2,"If Poland, Spain and Germany were prepared to make concessions in Brussels, the basis for a deal could be found.",""" Poland, Spain and Germany, were ready to talk about a deal, "" he said.",not_equivalent
3,""" Our policies are well-known, and I'm not aware of any changes in policy "" on Iran, Powell said.",""" Our policies are well known and I'm not aware of any changes, "" Secretary of State Colin Powell said Tuesday.",equivalent


In [None]:
WANDB_NAME = f'{ds_name}-{task}-{model_name}'
GROUP = f'{ds_name}-{task}-{model_name}-{lr:.0e}'
NOTES = f'finetuning {model_name} with Adam lr={lr:.0e}'
TAGS =[model_name, ds_name, 'adam']

In [None]:
#hide_output 
wandb.init(reinit=True, project="fasthugs", entity="fastai_community",
           name=WANDB_NAME, group=GROUP, notes=NOTES, tags=TAGS);

[34m[1mwandb[0m: Currently logged in as: [33mfastai_community[0m (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


### Training

Now the batch we get from dataloader contains a dictionary and HuggingFace transformers accept keyword argument as input. But fastai Learner feeds the model with a sequence of positional arguments (self.pred = self.model(*self.xb)). To make this work smoothly we can create a callback to handle unrolling of the input dict into proper xb tuple.

Main piece of work needed to train transformers model happens in TransCallback. It saves valid model argument and makes input dict yielded by dataloader into a tuple.

By default the model returns a dictionary-like object containing logits and possibly other outputs as defined by model config (e.g. intermediate hidden representations). In the fastai training loop we usually expect preds to be a tensor containing model predictions (logits). The callback formats the preds properly.

In [None]:
#hide_output
model = AutoModelForSequenceClassification.from_pretrained(model_name)
metrics = glue_metrics[task]
learn = TransLearner(dls, model, metrics=metrics).to_fp16()

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.dense.weight', 'lm_head.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.decoder.weight', 'lm_head.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.

In [None]:
learn.show_training_loop()

Start Fit
   - before_fit     : [TrainEvalCallback, MixedPrecision, Recorder, ProgressCallback]
  Start Epoch Loop
     - before_epoch   : [Recorder, ProgressCallback]
    Start Train
       - before_train   : [TrainEvalCallback, Recorder, ProgressCallback]
      Start Batch Loop
         - before_batch   : [TransCallback, MixedPrecision]
         - after_pred     : [TransCallback, MixedPrecision]
         - after_loss     : [TransCallback, MixedPrecision]
         - before_backward: [MixedPrecision]
         - before_step    : [MixedPrecision]
         - after_step     : [MixedPrecision]
         - after_cancel_batch: []
         - after_batch    : [TrainEvalCallback, Recorder, ProgressCallback]
      End Batch Loop
    End Train
     - after_cancel_train: [Recorder]
     - after_train    : [Recorder, ProgressCallback]
    Start Valid
       - before_validate: [TrainEvalCallback, Recorder, ProgressCallback]
      Start Batch Loop
         - **CBs same as train batch**: []
      End Ba

In [None]:
cbs = [WandbCallback(log_preds=False, log_model=False), SaveModelCallback(monitor=metrics[0].name)]
learn.fit_one_cycle(n_epoch, lr, cbs=cbs)

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.573595,0.399963,0.865672,0.823529,00:24
1,0.384882,0.40154,0.895425,0.843137,00:24
2,0.237732,0.286435,0.918519,0.892157,00:25
3,0.121106,0.370521,0.916955,0.882353,00:25
4,0.064179,0.386941,0.919014,0.887255,00:25


Better model found at epoch 0 with f1_score value: 0.8656716417910447.
Better model found at epoch 1 with f1_score value: 0.8954248366013072.
Better model found at epoch 2 with f1_score value: 0.9185185185185184.
Better model found at epoch 4 with f1_score value: 0.9190140845070423.


In [None]:
learn.show_results()

Unnamed: 0,text,text_,category,category_
0,He said the foodservice pie business doesn 't fit the company's long-term growth strategy.,""" The foodservice pie business does not fit our long-term growth strategy.",equivalent,equivalent
1,Columbia broke up over Texas upon re-entry on Feb. 1.,Columbia broke apart in the skies above Texas on Feb. 1.,equivalent,equivalent
2,"Saddam loyalists have been blamed for sabotaging the nation's infrastructure, as well as frequent attacks on U.S. soldiers.",Hussein loyalists have been blamed for sabotaging the nation's infrastructure and attacking US soldiers.,equivalent,equivalent
3,Eric Gagne pitched a perfect ninth for his 23rd save in as many opportunities.,Gagne struck out two in a perfect ninth inning for his 23rd save.,equivalent,not_equivalent
4,Xerox itself paid a $ 10 million fine last year to settle similar SEC charges.,Xerox itself previously paid a $ 10-million penalty to settle the SEC accusations.,equivalent,equivalent
5,""" The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future, "" the Fed said in a statement accompanying the unanimous decision.",""" The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future, "" the policy-setting Federal Open Market Committee said.",equivalent,equivalent
6,"A year or two later, 259, or 10 per cent, of the youths reported that they had started to smoke, or had taken just a few puffs.","Within two years, 259, or 10 percent, of the youths reported they had started to smoke or had at least taken a few puffs.",equivalent,equivalent
7,"A man arrested for allegedly threatening to shoot and kill a city councilman from Queens was ordered held on $ 100,000 bail during an early morning court appearance Saturday.","The Queens man arrested for allegedly threatening to shoot City Councilman Hiram Monserrate was held on $ 100,000 bail Saturday, a spokesman for the Queens district attorney said.",equivalent,equivalent
8,"Other staff members, however, defended the document, saying it would still help policy-makers and the agency improve efforts to address the climate issue.","Some E.P.A. staff members defended the document, saying that although pared down it would still help policy makers and the agency address the climate issue.",equivalent,equivalent


## Using trained model for inference

In [None]:
test_dl = learn.dls.test_dl(ds["test"])
preds, _ = learn.get_preds(dl=test_dl)

In [None]:
preds[:5]

tensor([[0.0034, 0.9966],
        [0.0016, 0.9984],
        [0.0013, 0.9987],
        [0.0166, 0.9834],
        [0.6688, 0.3312]])

## Improving the perfornance

### Try other "backbone"

There is a number of BERT derivates which improve over the original BERT:

- RoBERTa - more training data and no NSP task
- ALBERT - parameter sharing
- ELECTRA - discriminator pretraining objective
- DeBERTa - disantengled attention

DeBERTa was the first model to surpass human performance on SuperGLUE. Let's try out a `base` version of it and compare the result to one obtained with RoBERTa:

In [None]:
del learn, model
gc.collect()
torch.cuda.empty_cache()

In [None]:
model_name = "microsoft/deberta-base"

In [None]:
blocks = [
    TransformersTextBlock(pretrained_model_name=model_name),
    PreprocCategoryBlock(label_vocab)
]

dblock = DataBlock(
    blocks=blocks,
    get_x=TextGetter('sentence1', 'sentence2'),
    get_y=ItemGetter('label'),
    splitter=IndexSplitter(valid_idx)
)

dls = dblock.dataloaders(train_ds, bs=bs, val_bs=val_bs)

Downloading:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/474 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [None]:
WANDB_NAME = f'{ds_name}-{task}-{model_name}'
GROUP = f'{ds_name}-{task}-{model_name}-{lr:.0e}'
NOTES = f'finetuning {model_name} with RAdam lr={lr:.0e}'
TAGS =[model_name, ds_name, 'radam']

In [None]:
#hide_output 
wandb.init(reinit=True, project="fasthugs", entity="fastai_community",
           name=WANDB_NAME, group=GROUP, notes=NOTES, tags=TAGS);

VBox(children=(Label(value=' 0.03MB of 0.03MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.06418
raw_loss,0.15476
wd_0,0.01
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.01
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,██████▇▇▇▆▆▆▅▅▅▅▅▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁
raw_loss,█▇███▆▇▇▅▄▃▄▄▄▄▄▃▃▃▂▃▂▃▂▄▃▂▃▄▂▁▃▂▁▃▁▁▁▁▁
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


In [None]:
model = AutoModelForSequenceClassification.from_pretrained(model_name)
metrics = glue_metrics[task]
learn = TransLearner(dls, model, metrics=metrics).to_fp16()

cbs = [WandbCallback(log_preds=False, log_model=False), SaveModelCallback(monitor=metrics[0].name)]
learn.fit_one_cycle(n_epoch, lr, cbs=cbs)

Downloading:   0%|          | 0.00/559M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForSequenceClassification: ['lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.LayerNorm.bias', 'config', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.weight']
- This IS expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DebertaForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['pooler.dense.bias',

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.603153,0.52421,0.812227,0.683824,00:38
1,0.460632,0.326456,0.922807,0.892157,00:37
2,0.285363,0.293273,0.918261,0.884804,00:37
3,0.136859,0.322482,0.921708,0.892157,00:37
4,0.065396,0.378245,0.921466,0.889706,00:37


Better model found at epoch 0 with f1_score value: 0.8122270742358079.
Better model found at epoch 1 with f1_score value: 0.9228070175438596.


### Hypeprparameter search with wandb sweeps

While trying new model how can we be sure that we get the best performance from it? To get better understanding of model performance you might want to do some hyperparameter search. There are multiple ways to do this using a number of third party libraries. WandB has it's functionality for facilitating the hyperparameter search called `sweeps`. It has a bunch of useful features: you can use bayesian method, launch multiple agents concurrently and get aa nice interpretable plots for the sweeps. 

To run hyperparameter search using wandb sweeps from a notebook you need to define:
1. A function to run training
2. A configuration for a sweep agent

We'll also use a layerwise parameter group splitter to allow for differential learning rates. This is ussully used in `fastai` to improve the transfer learning for vision tasks.

In [None]:
def layerwise_splitter(model):
    emb = L(model.base_model.embeddings)
    layers = L(model.base_model.encoder.layer.children())
    clf = L(m for m in list(model.children())[1:] if params(m))
    groups = emb + layers + clf
    return groups.map(params)

In [None]:
def train():
    with wandb.init() as run:
        cfg = run.config
        model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
        metrics = glue_metrics[task]
        k = len(layerwise_splitter(model))
        if cfg.diff_lr_decay_factor: lr = slice(cfg.lr*cfg.diff_lr_decay_factor**k,cfg.lr)
        learn = TransLearner(dls, model, metrics=metrics, opt_func=Adam, splitter=layerwise_splitter).to_fp16()
        learn.fit_one_cycle(n_epoch, cfg.lr, wd=cfg.wd, cbs=[WandbCallback(log_preds=False, log_model=False)])
        del learn
        gc.collect()
        torch.cuda.empty_cache()
        torch.cuda.synchronize()

In [None]:
model_name = "microsoft/deberta-base"
metrics = glue_metrics[task]
metric_to_monitor = metrics[0].name if isinstance(metrics[0], Metric) else metrics[0].__name__
sweep_name = f"glue-{task}-deberta-base-sweep"
sweep_config = {
    "project":"glue-benchmark",
    "entity": "fastai_cimmunity",
    "name": sweep_name,
    "method": "grid",
    "parameters": {
        "lr": {"values":[2e-5,3e-5,5e-5,1e-4]},
        "wd": {"values":[0.,1e-2,5e-2]},
        "diff_lr_decay_factor":{"values":[0., 0.9, 0.8, 0.7, 0.6]}
    }
}

In [None]:
sweep_id = wandb.sweep(sweep_config, project='glue-benchmark', entity="fastai_community")

Create sweep with ID: fmbyeyyy
Sweep URL: https://wandb.ai/fastai_community/glue-benchmark/sweeps/fmbyeyyy


In [None]:
wandb.agent(sweep_id, function=train)

[34m[1mwandb[0m: Agent Starting Run: 91hdhtcn with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 1e-05
[34m[1mwandb[0m: 	wd: 0
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.627168,0.552308,0.821958,0.705882,00:27
1,0.463866,0.34941,0.897391,0.855392,00:26
2,0.336666,0.313407,0.909091,0.875,00:26
3,0.239899,0.31192,0.914591,0.882353,00:27
4,0.206465,0.319182,0.915194,0.882353,00:26


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.20646
raw_loss,0.1526
wd_0,0.0
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.0
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,██████▇▇▇▇▇▇▆▆▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁
raw_loss,▇█▇▇██▅▇▅▇▅▇▅▅▆▃▂▃▄▃▃▄▃▃▄▃▁▂▁▂▁▂▁▃▁▁▂▁▁▂
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Agent Starting Run: n77e0aax with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 1e-05
[34m[1mwandb[0m: 	wd: 0.01
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.628108,0.574045,0.814599,0.688725,00:27
1,0.504631,0.388959,0.884758,0.848039,00:27
2,0.356017,0.333458,0.910369,0.875,00:27
3,0.259155,0.333648,0.902439,0.862745,00:27
4,0.219232,0.318931,0.909735,0.875,00:27


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.21923
raw_loss,0.14824
wd_0,0.01
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.01
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,█████▇▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁
raw_loss,████▇▇▇▆▇▅▃▇▆▅▃▆▇▅▃▄▅▄▃▄▂▃▂▄▁▄▂▁▃▂▂▂▁▂▁▁
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Agent Starting Run: tkfrckkl with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 1e-05
[34m[1mwandb[0m: 	wd: 0.05
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.623958,0.554765,0.818731,0.705882,00:26
1,0.484187,0.379368,0.880546,0.828431,00:27
2,0.372391,0.386038,0.886633,0.835784,00:27
3,0.265406,0.349766,0.901408,0.862745,00:27
4,0.235515,0.346702,0.900356,0.862745,00:27


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.23551
raw_loss,0.20867
wd_0,0.05
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.05
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,████▇█▇▇▇▇▇▆▆▆▅▅▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁
raw_loss,█▇▇▇██▇▇▇▇▅▆▅█▄▅▄▅▃▅▅▆▆▃▄▅▂▄▂▂▂▃▁▂▄▁▄▄▄▂
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Agent Starting Run: yuwcbfvw with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 2e-05
[34m[1mwandb[0m: 	wd: 0
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.592607,0.503279,0.844133,0.781863,00:26
1,0.423569,0.297025,0.921429,0.892157,00:26
2,0.27089,0.275394,0.908088,0.877451,00:26
3,0.150422,0.354044,0.919861,0.887255,00:26
4,0.098984,0.317846,0.919499,0.889706,00:26


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.09898
raw_loss,0.02942
wd_0,0.0
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.0
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,████▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁
raw_loss,▇▇▇▆▇▆▇▅▅█▆▅▅▅▅▃▃▂▃▅▃▂▂▃▂▂▂▂▁▁▁▂▂▁▂▁▁▃▂▁
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Agent Starting Run: ao2itev2 with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 2e-05
[34m[1mwandb[0m: 	wd: 0.01
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.60877,0.552575,0.842832,0.776961,00:27
1,0.460935,0.312879,0.91042,0.879902,00:27
2,0.312943,0.317734,0.889306,0.855392,00:27
3,0.179303,0.32293,0.911071,0.879902,00:27
4,0.120957,0.339431,0.912656,0.879902,00:27


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.12096
raw_loss,0.07873
wd_0,0.01
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.01
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,██████▇▇▇▇▆▆▆▆▅▅▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁
raw_loss,▆▆▆▅█▆▆▅▄▅▅▄▇▃▄▄▃▃▃▂▃▂▃▂▂▂▃▂▃▄▁▂▃▂▁▁▁▂▁▂
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Agent Starting Run: b4afgny8 with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 2e-05
[34m[1mwandb[0m: 	wd: 0.05
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.604214,0.54761,0.810976,0.696078,00:26
1,0.456967,0.294019,0.904847,0.870098,00:27
2,0.315663,0.289462,0.91215,0.884804,00:27
3,0.189634,0.284617,0.926573,0.897059,00:27
4,0.120109,0.30023,0.920354,0.889706,00:27


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.12011
raw_loss,0.06599
wd_0,0.05
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.05
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,███████▇▇▇▇▆▆▆▆▅▅▅▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁
raw_loss,██▇███▆▆▇▇▆▅▅▄▄▅▅▂▄▅▄▄▃▅▃▁▅▄▁▂▃▄▃▂▁▂▂▂▂▂
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Agent Starting Run: yeleljls with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 3e-05
[34m[1mwandb[0m: 	wd: 0
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.620554,0.522946,0.816388,0.703431,00:26
1,0.439716,0.340475,0.905405,0.862745,00:26
2,0.303168,0.260967,0.915129,0.887255,00:26
3,0.154322,0.289875,0.930728,0.904412,00:26
4,0.090138,0.323285,0.927176,0.89951,00:26


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.09014
raw_loss,0.03043
wd_0,0.0
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.0
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,██████▇▇▇▇▇▆▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁
raw_loss,█▇███▇▆▇▆▆▄▄▄▅▅▅▃▂▃▄▄▃▃▅▂▂▃▁▄▃▄▂▂▁▁▁▁▁▁▃
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Agent Starting Run: pyt5qnbz with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 3e-05
[34m[1mwandb[0m: 	wd: 0.01
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.580868,0.492544,0.813688,0.759804,00:27
1,0.413425,0.337353,0.896435,0.85049,00:27
2,0.278906,0.288347,0.920071,0.889706,00:27
3,0.143798,0.355584,0.925734,0.894608,00:27
4,0.079927,0.360354,0.928447,0.89951,00:27


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.07993
raw_loss,0.10692
wd_0,0.01
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.01
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,████▇▇▇▇▇▆▆▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁
raw_loss,██▇▇▇▇▇▆▅▄▆▅▆▅▄▄▄▅▃▄▄▃▆▅▂▂▃▅▃▁▂▂▂▂▁▂▁▂▁▁
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Agent Starting Run: 6cg2zgj1 with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 3e-05
[34m[1mwandb[0m: 	wd: 0.05
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.601115,0.488548,0.844037,0.75,00:27
1,0.406561,0.301457,0.906422,0.875,00:27
2,0.266863,0.315323,0.909396,0.867647,00:27
3,0.133943,0.33442,0.919861,0.887255,00:27
4,0.067858,0.348517,0.91958,0.887255,00:27


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.06786
raw_loss,0.03836
wd_0,0.05
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.05
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,████▇█▇▇▇▇▆▆▆▅▅▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁
raw_loss,██▇█▇▇▇▆▇▃▅▅▅▄▆▄▄▃▂▃▃▄▃▃▂▁▂▂▂▄▂▂▁▂▁▃▄▁▁▁
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Agent Starting Run: nf8rf8nx with config:
[34m[1mwandb[0m: 	diff_lr_decay_factor: 0
[34m[1mwandb[0m: 	lr: 5e-05
[34m[1mwandb[0m: 	wd: 0
[34m[1mwandb[0m: wandb version 0.11.0 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.bias', 'lm_head.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classi

Could not gather input dimensions


epoch,train_loss,valid_loss,f1_score,accuracy,time
0,0.555048,0.461617,0.868243,0.808824,00:26
1,0.437872,0.368028,0.873786,0.840686,00:26
2,0.295596,0.292616,0.91222,0.875,00:26
3,0.141844,0.335113,0.920139,0.887255,00:26
4,0.062227,0.382922,0.929825,0.901961,00:27


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,5.0
train_loss,0.06223
raw_loss,0.02181
wd_0,0.0
sqr_mom_0,0.99
lr_0,0.0
mom_0,0.95
eps_0,1e-05
wd_1,0.0
sqr_mom_1,0.99


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███
train_loss,█████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▄▄▄▄▄▃▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁
raw_loss,███▇▇▇▇▇▇▅▆▇▄▅▄▆▄▄▄▅▄▄▆▄▂▂▁▄▁▂▂▂▁▁▁▂▁▁▁▁
wd_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▁▂▃▄▅▆▇████████▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▂▁▁▁▁▁
mom_0,██▇▆▅▅▃▂▁▁▁▁▁▁▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇▇█████
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wd_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.


In [None]:
wandb.finish()

You can add more parameters to optimize: add different optimizers to the mix, try various batch sizes and do whatever you might think can help. Luckily MRPC runs are quite fast. After you find the ranges of hyperparameters that work best you can run another sweep using Bayesian optimiazation.