## Question Answering LLM Fine-tuning

### Todo: ensure these are in requirements.txt and version compatability

In [2]:
import os
import sys
import pandas
import pickle
import json
import torch
import numpy
import warnings
warnings.filterwarnings('ignore') #Some operations warn inside a loop

## Listing 14.8

In [3]:
device = torch.device("cpu")
n_gpu = torch.cpu.device_count()
print(device)

cpu


### Grab a pre-generated copy of the golden set in case you skipped training it in Listing 14.7

In [4]:
![ ! -d "question-answering" ] && git clone --depth=1 https://github.com/ai-powered-search/question-answering
![ -d "question-answering" ] && cd question-answering && git pull 
!mkdir -p data

Cloning into 'question-answering'...
remote: Enumerating objects: 16, done.[K
remote: Counting objects: 100% (16/16), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 16 (delta 2), reused 14 (delta 2), pack-reused 0[K
Receiving objects: 100% (16/16), 92.27 KiB | 2.05 MiB/s, done.
Resolving deltas: 100% (2/2), done.
Already up to date.


In [5]:
import transformers
tokenizer = transformers.RobertaTokenizerFast.from_pretrained('roberta-base')
assert isinstance(tokenizer, transformers.PreTrainedTokenizerFast)
tokenizer

PreTrainedTokenizerFast(name_or_path='roberta-base', vocab_size=50265, model_max_len=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=False)})

## Listing 14.9

### Hyperparameter alert!

Hyperparameters are serious business.  Memory and Computation resources are very very finite.  We do our best to limit visible scope, both for the model and for the speed.  We also need to do this since the tensors we use during training and evaluation must have a fixed shape.  This shape must be the same for all examples we provide to the trainer and evaluator.

We accomplish this with a window sliding technique and by right-padding.  Windowing and padding will make sure everything is the same shape.

In [6]:
#This method adopted from the following example notebook:
#https://github.com/huggingface/notebooks/blob/master/examples/question_answering.ipynb
#Copyright 2021, Huggingface.  Apache 2.0 license.
import datasets, transformers

datadict = datasets.load_from_disk('question-answering/question-answering-training-set')

def tokenize_dataset(examples):

    maximum_tokens = 384 # This will be the number of tokens in BOTH the question and context
    document_overlap = 128 # Sometimes we need to split the context into smaller chunks, so we will overlap with this window
    pad_on_right = tokenizer.padding_side == "right"
    
    # Tokenize our examples with truncation and padding, but keep the overflows using a stride. This results
    # in one example possible giving several features when a context is long, each of those features having a
    # context that overlaps a bit the context of the previous feature.
    tokenized_examples = tokenizer(
        examples["question" if pad_on_right else "context"],
        examples["context" if pad_on_right else "question"],
        truncation="only_second" if pad_on_right else "only_first",
        max_length=maximum_tokens,
        stride=document_overlap,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length"
    )
    
    print(tokenized_examples[0])

    # Since one example might give us several features if it has a long context, we need a map from a feature to
    # its corresponding example. This key gives us just that.
    sample_mapping = tokenized_examples.pop("overflow_to_sample_mapping")
    # The offset mappings will give us a map from token to character position in the original context. This will
    # help us compute the start_positions and end_positions.
    offset_mapping = tokenized_examples.pop("offset_mapping")

    # Let's label those examples!
    tokenized_examples["start_positions"] = []
    tokenized_examples["end_positions"] = []

    for i, offsets in enumerate(offset_mapping):
        # We will label impossible answers with the index of the CLS token.
        input_ids = tokenized_examples["input_ids"][i]
        cls_index = input_ids.index(tokenizer.cls_token_id)

        # Grab the sequence corresponding to that example (to know what is the context and what is the question).
        sequence_ids = tokenized_examples.sequence_ids(i)

        # One example can give several spans, this is the index of the example containing this span of text.
        sample_index = sample_mapping[i]
        answers = examples["answers"][sample_index]
        # If no answers are given, set the cls_index as answer.
        if len(answers["answer_start"]) == 0:
            tokenized_examples["start_positions"].append(cls_index)
            tokenized_examples["end_positions"].append(cls_index)
        else:
            # Start/end character index of the answer in the text.
            start_char = answers["answer_start"][0]
            end_char = start_char + len(answers["text"][0])

            # Start token index of the current span in the text.
            token_start_index = 0
            while sequence_ids[token_start_index] != (1 if pad_on_right else 0):
                token_start_index += 1

            # End token index of the current span in the text.
            token_end_index = len(input_ids) - 1
            while sequence_ids[token_end_index] != (1 if pad_on_right else 0):
                token_end_index -= 1

            # Detect if the answer is out of the span (in which case this feature is labeled with the CLS index).
            if not (offsets[token_start_index][0] <= start_char and offsets[token_end_index][1] >= end_char):
                tokenized_examples["start_positions"].append(cls_index)
                tokenized_examples["end_positions"].append(cls_index)
            else:
                # Otherwise move the token_start_index and token_end_index to the two ends of the answer.
                # Note: we could go after the last offset if the answer is the last word (edge case).
                while token_start_index < len(offsets) and offsets[token_start_index][0] <= start_char:
                    token_start_index += 1
                tokenized_examples["start_positions"].append(token_start_index - 1)
                while offsets[token_end_index][1] >= end_char:
                    token_end_index -= 1
                tokenized_examples["end_positions"].append(token_end_index + 1)

    return tokenized_examples
"""
To apply this function on all the sentences (or pairs of sentences) in our dataset, we just use the map method of our dataset object we created earlier. 
This will apply the function on all the elements of all the splits in dataset, so our training, validation and testing data will be preprocessed in one single command. 
Since our preprocessing changes the number of samples, we need to remove the old columns when applying it.
 --Huggingface
"""
tokenized_datasets = datadict.map(tokenize_dataset, batched=True, remove_columns=datadict["train"].column_names)



  0%|          | 0/1 [00:00<?, ?ba/s]

Encoding(num_tokens=384, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])


  0%|          | 0/1 [00:00<?, ?ba/s]

Encoding(num_tokens=384, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])


  0%|          | 0/1 [00:00<?, ?ba/s]

Encoding(num_tokens=384, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])


In [7]:
tokenized_datasets.save_to_disk('data/question-answering-training-set-tokenized')

## Listing 14.10

In [8]:
from transformers import RobertaForQuestionAnswering, TrainingArguments, Trainer, default_data_collator
import torch

model = RobertaForQuestionAnswering.from_pretrained('deepset/roberta-base-squad2')

training_args = TrainingArguments(
    output_dir='data/questionanswering/results',     # output directory
    evaluation_strategy = "epoch",                        # evaluate loss per epoch
    num_train_epochs=3,                                   # total # of training epochs
    per_device_train_batch_size=16,                       # batch size per device during training
    per_device_eval_batch_size=64,                        # batch size for evaluation
    warmup_steps=500,                                     # number of warmup steps for learning rate scheduler
    weight_decay=0.01,                                    # strength of weight decay
    logging_dir='data/questionanswering/logs'        # directory for storing logs
)

trainer = Trainer(
    model=model,                                          # the instantiated 🤗 Transformers model to be trained
    args=training_args,                                   # training arguments, defined above
    data_collator=default_data_collator,                  
    tokenizer=tokenizer,                                  
    train_dataset=tokenized_datasets['train'],            # training dataset
    eval_dataset=tokenized_datasets['test']               # evaluation dataset
)

## Listing 14.11

In [9]:
trainer.train()

***** Running training *****
  Num examples = 156
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 30
  Number of trainable parameters = 124056578


Epoch,Training Loss,Validation Loss
1,No log,2.175029
2,No log,2.019419
3,No log,1.952406


***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=30, training_loss=2.5411778767903646, metrics={'train_runtime': 320.9675, 'train_samples_per_second': 1.458, 'train_steps_per_second': 0.093, 'total_flos': 91715161614336.0, 'train_loss': 2.5411778767903646, 'epoch': 3.0})

In [10]:
trainer.save_model("data/roberta-base-squad2-outdoors")

Saving model checkpoint to data/roberta-base-squad2-outdoors
Configuration saved in data/roberta-base-squad2-outdoors/config.json
Model weights saved in data/roberta-base-squad2-outdoors/pytorch_model.bin
tokenizer config file saved in data/roberta-base-squad2-outdoors/tokenizer_config.json
Special tokens file saved in data/roberta-base-squad2-outdoors/special_tokens_map.json


## Listing 14.12

In [11]:
trainer.evaluate(eval_dataset=tokenized_datasets["validation"])

***** Running Evaluation *****
  Num examples = 15
  Batch size = 64


{'eval_loss': 1.7852569818496704,
 'eval_runtime': 3.0791,
 'eval_samples_per_second': 4.872,
 'eval_steps_per_second': 0.325,
 'epoch': 3.0}

## Listing 14.13

In [12]:
import tqdm
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
outdoors_model = "data/roberta-base-squad2-outdoors"
nlp2 = pipeline("question-answering", model=outdoors_model, tokenizer=outdoors_model)

loading configuration file data/roberta-base-squad2-outdoors/config.json
Model config RobertaConfig {
  "_name_or_path": "data/roberta-base-squad2-outdoors",
  "architectures": [
    "RobertaForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "language": "english",
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "name": "Roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading configuration file data/roberta-base-squad2-outdoors/config.json
Model config RobertaConfig {
  "_name_or_pa

## Listing 14.14

In [13]:
def answer_questions(examples):
    answers = []
    success = 0
    for e in examples:
        question = {"question": e["question"][0], "context": e["context"][0]}
        answer = nlp2(question)
        label = e["answers"][0]["text"][0]
        result = answer["answer"]
        print(question["question"])
        print("Label:", label)
        print("Result:", result)
        print("----------")
        success += (1 if (label == result) else 0)
        answers.append(answer)
    print(f"{success}/{len(examples)} correct")
    return answers

In [14]:
datadict["validation"].set_format(type="pandas",output_all_columns=True)
validation_examples = [example for example in datadict["validation"]]
validation_results = answer_questions(validation_examples)

How to get pine sap off my teeth
Label: Take a small amount of margarine and rub on the sap
Result: Take a small amount of margarine and rub on the sap
----------
Why are backpack waist straps so long?
Label: The most backpacks have only one size for everyone
Result: The most backpacks have only one size for everyone
----------
What can I do to prevent altitude sickness?
Label: acclimate
Result: acclimate
----------
What group of people call themselves "Outdoor Influencers", and what do they do regarding natural areas of land?
Label: raise awareness for important causes to protect these lands
Result: raise awareness for important causes to protect these lands
----------
When to sharpen crampons?
Label: when I am expecting icy conditions
Result: when I am expecting icy conditions
----------
What is the benefit to telemark skiing?
Label: allow skiers to skin up back-country slopes with a more natural and efficient stride
Result: more natural and efficient stride
----------
What do you do

In [15]:
#This is an illustration of grid search.  For the Transformers builtin, see https://huggingface.co/transformers/main_classes/trainer.html#transformers.Trainer.hyperparameter_search

from transformers import RobertaForQuestionAnswering, TrainingArguments, Trainer, default_data_collator
import torch

def grid_search_finetuning(tokenized_datasets):
    epochs = [4]
    batches = [16, 18]
    warmups = [50, 250, 500]
  
    for epoch in epochs:
        for batch in batches:
            for warmup in warmups:
                model = RobertaForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")
                name = "_".join(["epochs", str(epoch), "batchsize", str(batch), "warmup", str(warmup)])

                print("-----------------------------------------------\n")
                print(name)
                training_args = TrainingArguments(
                    evaluation_strategy = "epoch",                         # evaluate loss per epoch
                    num_train_epochs=epoch,                                # total # of training epochs
                    per_device_train_batch_size=batch,                     # batch size per device during training
                    per_device_eval_batch_size=64,                         # batch size for evaluation
                    warmup_steps=warmup,                                   # number of warmup steps for learning rate scheduler
                    weight_decay=0.01,                                     # strength of weight decay
                    logging_dir="data/questionanswering/logs_" + name,  # directory for storing logs
                    output_dir="data/questionanswering/results_" + name # output directory
                )

                trainer = Trainer(
                    model=model,                                          # the instantiated 🤗 Transformers model to be trained
                    args=training_args,                                   # training arguments, defined above
                    data_collator=default_data_collator,                  
                    tokenizer=tokenizer,                                  
                    train_dataset=tokenized_datasets["train"],            # training dataset
                    eval_dataset=tokenized_datasets["test"]               # evaluation dataset
                )

                training_outputs = trainer.train()
                print("\nTraining Loss:", training_outputs.training_loss)
                evaluation_outputs = trainer.evaluate(eval_dataset=tokenized_datasets["validation"])
                print("Evaluation Loss:", evaluation_outputs["eval_loss"])
                print(training_outputs)
                print(evaluation_outputs)

                del trainer
                del model

grid_search_finetuning(tokenized_datasets)

loading configuration file config.json from cache at /home/jovyan/.cache/huggingface/hub/models--deepset--roberta-base-squad2/snapshots/e84d19c1ab20d7a5c15407f6954cef5c25d7a261/config.json
Model config RobertaConfig {
  "architectures": [
    "RobertaForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "language": "english",
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "name": "Roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.25.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading weights file pytorch_model.bin from cache at /home/jovyan/.cache/huggingface

-----------------------------------------------

epochs_4_batchsize_16_warmup_50



Epoch,Training Loss,Validation Loss
1,No log,2.106181
2,No log,1.543811
3,No log,1.632715
4,No log,1.642155


***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64


Training completed. Do not forget to share your model on huggingface.co/models =)


***** Running Evaluation *****
  Num examples = 15
  Batch size = 64


Training Loss: 1.6041778564453124


Evaluation Loss: 2.112708568572998
TrainOutput(global_step=40, training_loss=1.6041778564453124, metrics={'train_runtime': 374.9067, 'train_samples_per_second': 1.664, 'train_steps_per_second': 0.107, 'total_flos': 122286882152448.0, 'train_loss': 1.6041778564453124, 'epoch': 4.0})
{'eval_loss': 2.112708568572998, 'eval_runtime': 2.8941, 'eval_samples_per_second': 5.183, 'eval_steps_per_second': 0.346, 'epoch': 4.0}


loading configuration file config.json from cache at /home/jovyan/.cache/huggingface/hub/models--deepset--roberta-base-squad2/snapshots/e84d19c1ab20d7a5c15407f6954cef5c25d7a261/config.json
Model config RobertaConfig {
  "architectures": [
    "RobertaForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "language": "english",
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "name": "Roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.25.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading weights file pytorch_model.bin from cache at /home/jovyan/.cache/huggingface

-----------------------------------------------

epochs_4_batchsize_16_warmup_250



Epoch,Training Loss,Validation Loss
1,No log,2.093874
2,No log,1.996796
3,No log,1.720135
4,No log,1.54303


***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64


Training completed. Do not forget to share your model on huggingface.co/models =)


***** Running Evaluation *****
  Num examples = 15
  Batch size = 64


Training Loss: 2.186773681640625


Evaluation Loss: 1.4993952512741089
TrainOutput(global_step=40, training_loss=2.186773681640625, metrics={'train_runtime': 350.6086, 'train_samples_per_second': 1.78, 'train_steps_per_second': 0.114, 'total_flos': 122286882152448.0, 'train_loss': 2.186773681640625, 'epoch': 4.0})
{'eval_loss': 1.4993952512741089, 'eval_runtime': 2.2778, 'eval_samples_per_second': 6.585, 'eval_steps_per_second': 0.439, 'epoch': 4.0}


loading configuration file config.json from cache at /home/jovyan/.cache/huggingface/hub/models--deepset--roberta-base-squad2/snapshots/e84d19c1ab20d7a5c15407f6954cef5c25d7a261/config.json
Model config RobertaConfig {
  "architectures": [
    "RobertaForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "language": "english",
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "name": "Roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.25.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading weights file pytorch_model.bin from cache at /home/jovyan/.cache/huggingface

-----------------------------------------------

epochs_4_batchsize_16_warmup_500



Epoch,Training Loss,Validation Loss
1,No log,2.175029
2,No log,2.01942
3,No log,1.952482
4,No log,1.766379


***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64


Training completed. Do not forget to share your model on huggingface.co/models =)


***** Running Evaluation *****
  Num examples = 15
  Batch size = 64


Training Loss: 2.395569610595703


loading configuration file config.json from cache at /home/jovyan/.cache/huggingface/hub/models--deepset--roberta-base-squad2/snapshots/e84d19c1ab20d7a5c15407f6954cef5c25d7a261/config.json
Model config RobertaConfig {
  "architectures": [
    "RobertaForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "language": "english",
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "name": "Roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.25.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading weights file pytorch_model.bin from cache at /home/jovyan/.cache/huggingface

Evaluation Loss: 1.670580267906189
TrainOutput(global_step=40, training_loss=2.395569610595703, metrics={'train_runtime': 346.9635, 'train_samples_per_second': 1.798, 'train_steps_per_second': 0.115, 'total_flos': 122286882152448.0, 'train_loss': 2.395569610595703, 'epoch': 4.0})
{'eval_loss': 1.670580267906189, 'eval_runtime': 2.2766, 'eval_samples_per_second': 6.589, 'eval_steps_per_second': 0.439, 'epoch': 4.0}


All model checkpoint weights were used when initializing RobertaForQuestionAnswering.

All the weights of RobertaForQuestionAnswering were initialized from the model checkpoint at deepset/roberta-base-squad2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use RobertaForQuestionAnswering for predictions without further training.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
***** Running training *****
  Num examples = 156
  Num Epochs = 4
  Instantaneous batch size per device = 18
  Total train batch size (w. parallel, distributed & accumulation) = 18
  Gradient Accumulation steps = 1
  Total optimization steps = 36
  Number of trainable parameters = 124056578


-----------------------------------------------

epochs_4_batchsize_18_warmup_50



Epoch,Training Loss,Validation Loss
1,No log,2.015615
2,No log,1.557581
3,No log,1.534187
4,No log,1.69414


***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64


Training completed. Do not forget to share your model on huggingface.co/models =)


***** Running Evaluation *****
  Num examples = 15
  Batch size = 64


Training Loss: 1.6595454745822482


Evaluation Loss: 2.032419443130493
TrainOutput(global_step=36, training_loss=1.6595454745822482, metrics={'train_runtime': 350.2475, 'train_samples_per_second': 1.782, 'train_steps_per_second': 0.103, 'total_flos': 122286882152448.0, 'train_loss': 1.6595454745822482, 'epoch': 4.0})
{'eval_loss': 2.032419443130493, 'eval_runtime': 2.2788, 'eval_samples_per_second': 6.582, 'eval_steps_per_second': 0.439, 'epoch': 4.0}


loading configuration file config.json from cache at /home/jovyan/.cache/huggingface/hub/models--deepset--roberta-base-squad2/snapshots/e84d19c1ab20d7a5c15407f6954cef5c25d7a261/config.json
Model config RobertaConfig {
  "architectures": [
    "RobertaForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "language": "english",
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "name": "Roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.25.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading weights file pytorch_model.bin from cache at /home/jovyan/.cache/huggingface

-----------------------------------------------

epochs_4_batchsize_18_warmup_250



Epoch,Training Loss,Validation Loss
1,No log,2.117052
2,No log,2.001819
3,No log,1.831152
4,No log,1.577636


***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64


Training completed. Do not forget to share your model on huggingface.co/models =)


***** Running Evaluation *****
  Num examples = 15
  Batch size = 64


Training Loss: 2.2402000427246094


Evaluation Loss: 1.494271993637085
TrainOutput(global_step=36, training_loss=2.2402000427246094, metrics={'train_runtime': 381.7736, 'train_samples_per_second': 1.634, 'train_steps_per_second': 0.094, 'total_flos': 122286882152448.0, 'train_loss': 2.2402000427246094, 'epoch': 4.0})
{'eval_loss': 1.494271993637085, 'eval_runtime': 2.2788, 'eval_samples_per_second': 6.582, 'eval_steps_per_second': 0.439, 'epoch': 4.0}


loading configuration file config.json from cache at /home/jovyan/.cache/huggingface/hub/models--deepset--roberta-base-squad2/snapshots/e84d19c1ab20d7a5c15407f6954cef5c25d7a261/config.json
Model config RobertaConfig {
  "architectures": [
    "RobertaForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "language": "english",
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "name": "Roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.25.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading weights file pytorch_model.bin from cache at /home/jovyan/.cache/huggingface

-----------------------------------------------

epochs_4_batchsize_18_warmup_500



Epoch,Training Loss,Validation Loss
1,No log,2.194778
2,No log,2.029455
3,No log,1.975486
4,No log,1.831529


***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64
***** Running Evaluation *****
  Num examples = 44
  Batch size = 64


Training completed. Do not forget to share your model on huggingface.co/models =)


***** Running Evaluation *****
  Num examples = 15
  Batch size = 64


Training Loss: 2.431789610120985


Evaluation Loss: 1.7117657661437988
TrainOutput(global_step=36, training_loss=2.431789610120985, metrics={'train_runtime': 340.9407, 'train_samples_per_second': 1.83, 'train_steps_per_second': 0.106, 'total_flos': 122286882152448.0, 'train_loss': 2.431789610120985, 'epoch': 4.0})
{'eval_loss': 1.7117657661437988, 'eval_runtime': 2.3197, 'eval_samples_per_second': 6.466, 'eval_steps_per_second': 0.431, 'epoch': 4.0}


Up next: [Question Answering Demo Application](4.question-answering-CPU-demo-application.ipynb)