<a href="https://colab.research.google.com/github/vivekam101/Bert-Internals/blob/main/Simple_Transformers_Early_Stopping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple Transformers - CoLA Sentence Classification

By Chris McCormick

This Notebook is a port of the original "Fine-Tuning BERT for Sentence Classification" tutorial that Nick and I published in 2019 ([blog post](http://mccormickml.com/2019/07/22/BERT-fine-tuning/), [Notebook](https://colab.research.google.com/drive/1pTuQhug6Dhl9XalKB0zUGf4FIdYFlpcX)) to use the `Simple Transformers` library.

This version has less comments and explanation than my typical Notebooks--the goal was just to get a quick sense of the `Simple Transformers` library.



# Contents

See "Table of contents" in the sidebar to the left.

# S1. Setup

## 1.1. Using Colab GPU for Training



Google Colab offers free GPUs and TPUs! Since we'll be training a large neural network it's best to take advantage of this (in this case we'll attach a GPU), otherwise training will take a very long time.

A GPU can be added by going to the menu and selecting:

`Edit 🡒 Notebook Settings 🡒 Hardware accelerator 🡒 (GPU)`

Then run the following cell to confirm that the GPU is detected.

In [1]:
import tensorflow as tf

# Get the GPU device name.
device_name = tf.test.gpu_device_name()

# The device name should look like the following:
if device_name == '/device:GPU:0':
    print('Found GPU at: {}'.format(device_name))
else:
    raise SystemError('GPU device not found')

Found GPU at: /device:GPU:0


In order for torch to use the GPU, we need to identify and specify the GPU as the device. Later, in our training loop, we will load data onto the device. 

In [2]:
import torch

# If there's a GPU available...
if torch.cuda.is_available():    

    # Tell PyTorch to use the GPU.    
    device = torch.device("cuda")

    print('There are %d GPU(s) available.' % torch.cuda.device_count())

    print('We will use the GPU:', torch.cuda.get_device_name(0))

# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

There are 1 GPU(s) available.
We will use the GPU: Tesla T4


## 1.2. Installing Simple Transformers



* Install Simple Transformers
* Seeing some disconcerting errors...

In [3]:
!pip install simpletransformers

Collecting simpletransformers
[?25l  Downloading https://files.pythonhosted.org/packages/56/35/31022262786f4aa070fe472677cea66fade8d221181a86825096af021e2c/simpletransformers-0.48.14-py3-none-any.whl (214kB)
[K     |████████████████████████████████| 215kB 10.6MB/s 
[?25hCollecting seqeval
[?25l  Downloading https://files.pythonhosted.org/packages/9d/2d/233c79d5b4e5ab1dbf111242299153f3caddddbb691219f363ad55ce783d/seqeval-1.2.2.tar.gz (43kB)
[K     |████████████████████████████████| 51kB 8.9MB/s 
[?25hCollecting transformers>=3.0.2
[?25l  Downloading https://files.pythonhosted.org/packages/2c/4e/4f1ede0fd7a36278844a277f8d53c21f88f37f3754abf76a5d6224f76d4a/transformers-3.4.0-py3-none-any.whl (1.3MB)
[K     |████████████████████████████████| 1.3MB 35.7MB/s 
Collecting tensorboardx
[?25l  Downloading https://files.pythonhosted.org/packages/af/0c/4f41bcd45db376e6fe5c619c01100e9b7531c55791b7244815bac6eac32c/tensorboardX-2.1-py2.py3-none-any.whl (308kB)
[K     |██████████████████████

# S2. Loading CoLA Dataset


We'll use [The Corpus of Linguistic Acceptability (CoLA)](https://nyu-mll.github.io/CoLA/) dataset for single sentence classification. It's a set of sentences labeled as grammatically correct or incorrect. It was first published in May of 2018, and is one of the tests included in the "GLUE Benchmark" on which models like BERT are competing.


## 2.1. Download & Extract

We'll use the `wget` package to download the dataset to the Colab instance's file system. 

In [4]:
!pip install wget

Collecting wget
  Downloading https://files.pythonhosted.org/packages/47/6a/62e288da7bcda82b935ff0c6cfe542970f04e29c756b0e147251b2fb251f/wget-3.2.zip
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-cp36-none-any.whl size=9682 sha256=66e2e140bcce8dcb16d815dc423a5e3ff8970bed7fd9506dd501f12912e9508a
  Stored in directory: /root/.cache/pip/wheels/40/15/30/7d8f7cea2902b4db79e3fea550d7d7b85ecb27ef992b618f3f
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


The dataset is hosted on GitHub in this repo: https://nyu-mll.github.io/CoLA/

In [5]:
import wget
import os

print('Downloading dataset...')

# The URL for the dataset zip file.
url = 'https://nyu-mll.github.io/CoLA/cola_public_1.1.zip'

# Download the file (if we haven't already)
if not os.path.exists('./cola_public_1.1.zip'):
    wget.download(url, './cola_public_1.1.zip')

Downloading dataset...


Unzip the dataset to the file system. You can browse the file system of the Colab instance in the sidebar on the left.

In [6]:
# Unzip the dataset (if we haven't already)
if not os.path.exists('./cola_public/'):
    !unzip cola_public_1.1.zip

Archive:  cola_public_1.1.zip
   creating: cola_public/
  inflating: cola_public/README      
   creating: cola_public/tokenized/
  inflating: cola_public/tokenized/in_domain_dev.tsv  
  inflating: cola_public/tokenized/in_domain_train.tsv  
  inflating: cola_public/tokenized/out_of_domain_dev.tsv  
   creating: cola_public/raw/
  inflating: cola_public/raw/in_domain_dev.tsv  
  inflating: cola_public/raw/in_domain_train.tsv  
  inflating: cola_public/raw/out_of_domain_dev.tsv  


## 2.2. Parse

We can see from the file names that both `tokenized` and `raw` versions of the data are available. 

We can't use the pre-tokenized version because, in order to apply the pre-trained BERT, we *must* use the tokenizer provided by the model. This is because (1) the model has a specific, fixed vocabulary and (2) the BERT tokenizer has a particular way of handling out-of-vocabulary words.

We'll use pandas to parse the "in-domain" training set and look at a few of its properties and data points.

In [7]:
import pandas as pd

# Load the dataset into a pandas dataframe.
df = pd.read_csv("./cola_public/raw/in_domain_train.tsv", delimiter='\t', header=None, names=['sentence_source', 'label', 'label_notes', 'sentence'])

# Report the number of sentences.
print('Number of training sentences: {:,}\n'.format(df.shape[0]))

# Display 10 random rows from the data.
df.sample(10)

Number of training sentences: 8,551



Unnamed: 0,sentence_source,label,label_notes,sentence
3056,l-93,1,,"Susan whispered ""Shut up""."
1256,r-67,1,,I know a man who hates me.
6938,m_02,0,*,What she did was be very cold.
4894,ks08,1,,Jack is the person with whom Jenny fell in love.
3401,l-93,0,*,The thief chased.
2259,l-93,0,*,The spaceship revolves the earth.
4276,ks08,1,,Stephen seemed to be intelligent.
5516,b_73,1,,Sally will give me more helpful advice than th...
4390,ks08,0,*,He has been must being interrogated by the pol...
7355,sks13,1,,John saw.


The two properties we actually care about are the the `sentence` and its `label`, which is referred to as the "acceptibility judgment" (0=unacceptable, 1=acceptable).

## 2.3. Prep Data for SimpleTransformers

Logging setup - Makes the `transformers` logger less verbose.

In [8]:
import logging

logging.basicConfig(level=logging.INFO)

# Get root logger (all other loggers will be derived from this logger's
# properties)
logger = logging.getLogger()
logger.warning("Is this working?") 

# Get the logger for the huggingface/transformers library.
transformers_logger = logging.getLogger("transformers")

# Set the logging level to warning, meaning display warnings and worse, but 
# don't display any `INFO` logs.
transformers_logger.setLevel(logging.WARNING)



Convert the dataset into a simple list of [text, label] pairs.

I'm following this code snippet from the documentation as my reference:
```python
# Train and Evaluation data needs to be in a Pandas Dataframe containing at least two columns. If the Dataframe has a header, it should contain a 'text' and a 'labels' column. If no header is present, the Dataframe should contain at least two columns, with the first column is the text with type str, and the second column in the label with type int.
train_data = [['Example sentence belonging to class 1', 1], ['Example sentence belonging to class 0', 0], ['Example eval senntence belonging to class 2', 2]]
train_df = pd.DataFrame(train_data)

eval_data = [['Example eval sentence belonging to class 1', 1], ['Example eval sentence belonging to class 0', 0], ['Example eval senntence belonging to class 2', 2]]
eval_df = pd.DataFrame(eval_data)

```

In [9]:
# Select just the text and its label.
df = df[['sentence', 'label']]

# Rename 'sentence' column to 'text' (this is the name SimpleTransformers 
# expects).
df.columns = ['text', 'label']

Split off 10% for validation.

In [10]:
# Use train_test_split to split our data into train and validation sets for
# training
from sklearn.model_selection import train_test_split

# Use 90% for training and 10% for validation.
train_df, validation_df = train_test_split(df, random_state=2018, test_size=0.1)


In [11]:
len(train_df)

7695

In [12]:
len(validation_df)


856

**Test Set**

**TODO** - The .tsv file doesn't include headers, so I've specified them here. This means that I can directly name the 'sentence' column 'text', but I think it's more illustrative this way?

In [13]:
# Load the dataset into a pandas dataframe.
df_test = pd.read_csv("./cola_public/raw/out_of_domain_dev.tsv", delimiter='\t', header=None, names=['sentence_source', 'label', 'label_notes', 'sentence'])

# Report the number of sentences.
print('Number of test sentences: {:,}\n'.format(df.shape[0]))

# Select just the text and its label.
df_test = df_test[['sentence', 'label']]

# Rename 'sentence' column to 'text' (this is the name SimpleTransformers 
# expects).
df_test.columns = ['text', 'label']


Number of test sentences: 8,551



# S3. Fine-Tuning

## 3.1. Load Pre-Trained Model

What's this reprocess_input_data argument?

Documentation for arguments is [here](https://github.com/ThilinaRajapakse/simpletransformers#default-settings).

> reprocess_input_data: bool
If True, the input data will be reprocessed even if a cached file of the input data exists in the cache_dir.

In [14]:
from simpletransformers.classification import ClassificationModel

args = {
    'reprocess_input_data': True, 
    'overwrite_output_dir': True
}

# Create a ClassificationModel
model = ClassificationModel('bert', 'bert-base-uncased', num_labels=2, args=args)

# You can set class weights by using the optional weight argument


INFO:filelock:Lock 139883584968968 acquired on /root/.cache/torch/transformers/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…

INFO:filelock:Lock 139883584968968 released on /root/.cache/torch/transformers/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517.lock
INFO:filelock:Lock 139883536752144 acquired on /root/.cache/torch/transformers/f2ee78bdd635b758cc0a12352586868bef80e47401abe4c4fcc3832421e7338b.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157.lock





HBox(children=(FloatProgress(value=0.0, description='Downloading', max=440473133.0, style=ProgressStyle(descri…

INFO:filelock:Lock 139883536752144 released on /root/.cache/torch/transformers/f2ee78bdd635b758cc0a12352586868bef80e47401abe4c4fcc3832421e7338b.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157.lock





Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…

INFO:filelock:Lock 139883535508368 released on /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.lock





## 3.2. Inspect Training Arguments

All of the cool features we're interested in are managed through this `model.args` object. 

In [15]:
type(model.args)

simpletransformers.config.model_args.ClassificationArgs

The following cell retrieves all of the arguments and their values, and prints them in a table.

There is also some brief documentation on them [here](https://simpletransformers.ai/docs/usage/#configuring-a-simple-transformers-model).

In [16]:
arg_values = []

pd.set_option('display.max_rows', 80)

# For all of the arguments...
for arg in dir(model.args):
    
    # Skip over the special attributes and any functions.
    if (not arg[0:2] == '__') and (not callable(getattr(model.args, arg))):
    
        # Store the argument and its value as a tuple.
        arg_values.append((arg, str(getattr(model.args, arg))))

# Store as a dataframe just to get the pretty printout.
df_args = pd.DataFrame(arg_values)        

df_args

Unnamed: 0,0,1
0,adam_epsilon,1e-08
1,best_model_dir,outputs/best_model
2,cache_dir,cache_dir/
3,config,{}
4,custom_layer_parameters,[]
5,custom_parameter_groups,[]
6,dataloader_num_workers,1
7,do_lower_case,False
8,dynamic_quantize,False
9,early_stopping_consider_epochs,False


### Noteworthy Args

Below are what I'd consider the most critical arguments--they'll have the most immediate impact on your accuracy.

What's displayed are their default values (except model-name, which we specified earlier).

In [17]:
model.args.train_batch_size

8

In [18]:
model.args.num_train_epochs

1

In [19]:
model.args.max_seq_length

128

In [20]:
model.args.learning_rate

4e-05

In [21]:
model.args.model_name

'bert-base-uncased'

## 3.3. Run Training

#### Choose Hyperparameters

In [22]:
# These are the values we used in our original Notebook:
model.args.max_seq_length = 128
model.args.num_train_epochs = 4
model.args.train_batch_size = 32
model.args.learning_rate = 2e-5

#### Configure Validation

Periodically evaluate on our 10% validation set during training to monitor over-fitting.

Calculate number of steps so we can specify how often to evaluate.

In [23]:
import numpy as np

# Num steps in epoch = num training samples / batch size
steps_per_epoch = int(np.ceil(len(train_df) / float(model.args.train_batch_size)))

print('Each epoch will have {:,} steps.'.format(steps_per_epoch))

Each epoch will have 241 steps.


Turn on validation.

In [24]:
# Run evaluation periodically during training to monitor progress.
model.args.evaluate_during_training = True

# "Print results from evaluation during training."
model.args.evaluate_during_training_verbose = True

# "Perform evaluation at every specified number of steps. A checkpoint model and
#  the evaluation results will be saved."
model.args.evaluate_during_training_steps = 120

# We only need to tokenize our validation set once, then we can read it from the
# cache.
model.args.use_cached_eval_features = True

#### Configure Early Stopping

There's a nice intro to early stopping in their docs here:
    
* https://simpletransformers.ai/docs/usage/#using-early-stopping



In [25]:
# Turn on early stopping.
model.args.use_early_stopping = True

# "The improvement over best_eval_loss necessary to count as a better checkpoint."
model.args.early_stopping_delta = 0.01

# What metric to use in calculating score for evaluation set (plus whether a low
# vs. high value is better for this metric).

#model.args.early_stopping_metric = "mcc"
#model.args.early_stopping_metric_minimize = False

model.args.early_stopping_metric = "eval_loss"
model.args.early_stopping_metric_minimize = True

# "Terminate training after this many evaluations without an improvement in the
#  evaluation metric greater then early_stopping_delta."
model.args.early_stopping_patience = 2


#### Kick-Off Training

The log statements aren't easy to interpret, but you can infer that it's running validation twice per epoch. 

There are also some statements related to early stopping, and if you look at the validation loss at each checkpoint, you can make sense of the early stopping behavior.

I'm hoping/assuming that `wandb` integration will make this much easier to look at!

In [26]:
print('Training on {:,} samples...'.format(len(train_df)))

# Train the model, testing against the validation set periodically.
out = model.train_model(train_df, eval_df=validation_df)

Training on 7,695 samples...


  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."
INFO:simpletransformers.classification.classification_model: Converting to features started. Cache is not used.


HBox(children=(FloatProgress(value=0.0, max=7695.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=4.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 4', max=241.0, style=ProgressStyle(des…

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."
INFO:simpletransformers.classification.classification_model: Converting to features started. Cache is not used.
INFO:simpletransformers.classification.classification_model:{'mcc': 0.07996193689868349, 'tp': 589, 'tn': 6, 'fp': 258, 'fn': 3, 'eval_loss': 0.5781137241381351}
INFO:simpletransformers.classification.classification_model: No improvement in eval_loss
INFO:simpletransformers.classification.classification_model: Current step: 1
INFO:simpletransformers.classification.classification_model: Early stopping patience: 2
INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856
INFO:simpletransformers.classification.classification_model:{'mcc': 0.4786537258291455, 'tp': 481, 'tn': 179, 'fp': 85, 'fn': 111, 'eval_loss': 0.4925276904462654}





INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856
INFO:simpletransformers.classification.classification_model:{'mcc': 0.46603309045738284, 'tp': 463, 'tn': 186, 'fp': 78, 'fn': 129, 'eval_loss': 0.5036480134335634}


HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 4', max=241.0, style=ProgressStyle(des…

INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856
INFO:simpletransformers.classification.classification_model:{'mcc': 0.4798175046500213, 'tp': 548, 'tn': 130, 'fp': 134, 'fn': 44, 'eval_loss': 0.48732940410481435}
INFO:simpletransformers.classification.classification_model: No improvement in eval_loss
INFO:simpletransformers.classification.classification_model: Current step: 1
INFO:simpletransformers.classification.classification_model: Early stopping patience: 2
INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856
INFO:simpletransformers.classification.classification_model:{'mcc': 0.511485495839106, 'tp': 522, 'tn': 161, 'fp': 103, 'fn': 70, 'eval_loss': 0.48552683932341145}
INFO:simpletransformers.classification.classification_model: No improvement in eval_loss
INFO:simpletransformers.classification.classification_model: Current step:




INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856
INFO:simpletransformers.classification.classification_model:{'mcc': 0.511485495839106, 'tp': 522, 'tn': 161, 'fp': 103, 'fn': 70, 'eval_loss': 0.47959033692273023}


HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 4', max=241.0, style=ProgressStyle(des…

INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856
INFO:simpletransformers.classification.classification_model:{'mcc': 0.5127530290982661, 'tp': 540, 'tn': 147, 'fp': 117, 'fn': 52, 'eval_loss': 0.5314681121957636}
INFO:simpletransformers.classification.classification_model: No improvement in eval_loss
INFO:simpletransformers.classification.classification_model: Current step: 1
INFO:simpletransformers.classification.classification_model: Early stopping patience: 2
INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856
INFO:simpletransformers.classification.classification_model:{'mcc': 0.5389695879472848, 'tp': 524, 'tn': 168, 'fp': 96, 'fn': 68, 'eval_loss': 0.5197180273167998}
INFO:simpletransformers.classification.classification_model: No improvement in eval_loss
INFO:simpletransformers.classification.classification_model: Current step: 2




INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856
INFO:simpletransformers.classification.classification_model:{'mcc': 0.5296436345670774, 'tp': 528, 'tn': 162, 'fp': 102, 'fn': 64, 'eval_loss': 0.518421760194491}


HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 4', max=241.0, style=ProgressStyle(des…

INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856
INFO:simpletransformers.classification.classification_model:{'mcc': 0.5177364251539449, 'tp': 553, 'tn': 137, 'fp': 127, 'fn': 39, 'eval_loss': 0.6263840755569601}
INFO:simpletransformers.classification.classification_model: Patience of 2 steps reached
INFO:simpletransformers.classification.classification_model: Training terminated.





INFO:simpletransformers.classification.classification_model: Training of bert model complete. Saved to outputs/.


## 3.4. Inspect generated files

Helper function to print the contents of a directory, with file sizes in MB.

In [27]:
import os 
import pandas as pd

def list_files_info(data_dir):
    '''
    Prints out the files in a directory along with their sizes in MB.
    '''

    # Check out the sizes on the saved files.
    files = list(os.listdir(data_dir))

    print(data_dir)

    rows = []

    # For each file in the directory...
    for f in files:
        # Get the file size, in MB
        f_size = float(os.stat(data_dir + '/' + f).st_size) / 2**20
        
        # Print the filename and its size.
        print("     {:25s}    {:>8.2f} MB".format(f, f_size))

        rows.append([f, '{:.2f} MB'.format(f_size)])

    print('')

    return pd.DataFrame(rows, columns=['File', 'Size'])


This cache folder stores the tokenized and encoded text data.

In [28]:
list_files_info('./cache_dir')

./cache_dir
     cached_dev_bert_128_2_856        0.68 MB
     cached_train_bert_128_2_7695        6.07 MB



Unnamed: 0,File,Size
0,cached_dev_bert_128_2_856,0.68 MB
1,cached_train_bert_128_2_7695,6.07 MB


The `outputs` folder contains the final model, plus all of the checkpoints.

In [29]:
list_files_info('./outputs/')

./outputs/
     vocab.txt                        0.22 MB
     checkpoint-120                   0.00 MB
     checkpoint-482-epoch-2           0.00 MB
     checkpoint-720                   0.00 MB
     checkpoint-241-epoch-1           0.00 MB
     checkpoint-840                   0.00 MB
     special_tokens_map.json          0.00 MB
     tokenizer_config.json            0.00 MB
     checkpoint-360                   0.00 MB
     checkpoint-480                   0.00 MB
     eval_results.txt                 0.00 MB
     model_args.json                  0.00 MB
     checkpoint-723-epoch-3           0.00 MB
     checkpoint-240                   0.00 MB
     best_model                       0.00 MB
     training_args.bin                0.00 MB
     config.json                      0.00 MB
     pytorch_model.bin              417.73 MB
     checkpoint-600                   0.00 MB
     training_progress_scores.csv        0.00 MB



Unnamed: 0,File,Size
0,vocab.txt,0.22 MB
1,checkpoint-120,0.00 MB
2,checkpoint-482-epoch-2,0.00 MB
3,checkpoint-720,0.00 MB
4,checkpoint-241-epoch-1,0.00 MB
5,checkpoint-840,0.00 MB
6,special_tokens_map.json,0.00 MB
7,tokenizer_config.json,0.00 MB
8,checkpoint-360,0.00 MB
9,checkpoint-480,0.00 MB


## 3.5. Evaluate on Test Set


Let's load the model from the checkpoint which performed best on the validation set. 

This is how we combat overfitting--the "final" model (at the end of all training epochs) will perform best on the training set, but may not generalize as well to new data. 

So, instead, we use an earlier checkpoint where the training loss was higher but the validation loss was at its lowest! 

In [30]:
model = ClassificationModel(
    "bert", "outputs/best_model"
)

**On Validation Set**

In [31]:
# Evaluate the model
result, model_outputs, wrong_predictions = model.eval_model(validation_df)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."
INFO:simpletransformers.classification.classification_model: Features loaded from cache at cache_dir/cached_dev_bert_128_2_856


HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=107.0, style=ProgressStyle(descr…

INFO:simpletransformers.classification.classification_model:{'mcc': 0.511485495839106, 'tp': 522, 'tn': 161, 'fp': 103, 'fn': 70, 'eval_loss': 0.47959033692273023}





In [32]:
print(result)

{'mcc': 0.511485495839106, 'tp': 522, 'tn': 161, 'fp': 103, 'fn': 70, 'eval_loss': 0.47959033692273023}


**On Test Set**

In [33]:
# Evaluate the model
result, model_outputs, wrong_predictions = model.eval_model(df_test)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."
INFO:simpletransformers.classification.classification_model: Converting to features started. Cache is not used.


HBox(children=(FloatProgress(value=0.0, max=516.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=65.0, style=ProgressStyle(descri…

INFO:simpletransformers.classification.classification_model:{'mcc': 0.4370200599272344, 'tp': 317, 'tn': 81, 'fp': 81, 'fn': 37, 'eval_loss': 0.5206365456947913}





Without SimpleTransformers, original score: `Total MCC: 0.498`

In [34]:
print('MCC: %.3f' % result['mcc'])

MCC: 0.437


**Run on New Text**

In [35]:
predictions, raw_outputs = model.predict(["Some arbitary sentence"])

INFO:simpletransformers.classification.classification_model: Converting to features started. Cache is not used.


HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




# S4. Conclusion

## 4.1. Observations


**Less Code, More Documentation**

* No more:
    * Tokenization code
    * PyTorch taining loop code

* Instead --> Careful argument selection!



**Colab Compatibility**

* Alarming install errors, but no problems yet...


**Checkpoints**

* Automatically saves checkpoints!
* Loading back a model from a checkpoint is trivial!



**Early Stopping**

* Harder than it sounds:
    * You must specify:
        * eval frequency
        * minimum "delta"        
        * "patience"

* Didn't immediately improve our CoLA score...
     * But we're not done exploring the technique yet!

