# Fine-Tuning Models
> Fine-tuning using your own data

In this notebook, we'll use two references:https://huggingface.co/transformers/custom_datasets.html as a guide for our work.  We'll use the HuggingFace dataset we've already created and use it directly!

### Install required packages
Note that this is mostly required if you're on Google Colab.

In [None]:
#! pip install transformers
#! pip install datasets

### Import packages of interest

In [None]:
import numpy as np
import pandas as pd

from datasets import load_dataset, load_metric, Dataset
from transformers import pipeline
from transformers import DataCollatorWithPadding
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer

# 1. Load data from HuggingFace Hub or from disk

In [None]:
ds_path = 'charreaubell/demo_data'
demo_ds = load_dataset(ds_path, use_auth_token=None)

Using custom data configuration charreaubell___demo_data-cdb143897cf94e86
Reusing dataset parquet (/Users/bellcs1/.cache/huggingface/datasets/parquet/charreaubell___demo_data-cdb143897cf94e86/0.0.0/1638526fd0e8d960534e2155dc54fdff8dce73851f21f031d2fb9c2cf757c121)


  0%|          | 0/2 [00:00<?, ?it/s]

# 2. Pre-process inputs
What's a tokenizer and what does it do? Let's learn more using Huggingface's [instruction on tokenizers](https://huggingface.co/course/chapter2/4?fw=pt). Then, let's try it on our own!

In [None]:
#instantiate tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
tokenizer.name_or_path

'bert-base-cased'

In [None]:
#define tokenizing function
def tokenize_inputs(example):
    return tokenizer(example['text'], truncation = True)

In [None]:
#do the tokenizing using map function
tokenized_ds = demo_ds.map(tokenize_inputs, batched=True,
                           remove_columns = ['age', 'article_id', 'college major',
                                             'first_name', 'last_name', 'years_of_journalism',
                                             'text'])

  0%|          | 0/1 [00:00<?, ?ba/s]

Loading cached processed dataset at /Users/bellcs1/.cache/huggingface/datasets/parquet/charreaubell___demo_data-cdb143897cf94e86/0.0.0/1638526fd0e8d960534e2155dc54fdff8dce73851f21f031d2fb9c2cf757c121/cache-e1a35216368d05fc.arrow


In [None]:
tokenized_ds

DatasetDict({
    test: Dataset({
        features: ['attention_mask', 'input_ids', 'label', 'token_type_ids'],
        num_rows: 4
    })
    train: Dataset({
        features: ['attention_mask', 'input_ids', 'label', 'token_type_ids'],
        num_rows: 16
    })
})

What's this `truncation` argument and this `batched` argument? Let's take a look.

## An aside on dynamically padded batch size
HF has the capacity to dynamically pad your batches such that each input is only as long as any given input in the batch. This helps with memory.You can learn more [here](https://huggingface.co/course/chapter3/2?fw=pt). For now, we'll simply instantiate a data collator and use it during training to demonstrate how we can do this.

In [None]:
#Instantiate data collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# 3. Train model

In [None]:
#get the number of classes
no_classes = len(set(demo_ds['train']['label']))

## Define model and task architecture

In [None]:
# Choose the model type and instantiate it for the task
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=no_classes)
model.name_or_path

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

'bert-base-cased'

## Define settings for basic model training and train

In [None]:
#set training arguments
training_args = TrainingArguments("test_trainer",
                                 logging_strategy='epoch')

#setup training loop with arguments
trainer = Trainer(model=model,
                  args=training_args,
                  tokenizer=tokenizer,
                  data_collator=data_collator,
                  train_dataset=tokenized_ds['train'],
                  eval_dataset=tokenized_ds['test'])

#train
trainer.train()

***** Running training *****
  Num examples = 16
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 6


Step,Training Loss
2,1.6229
4,1.4683
6,1.4588




Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=6, training_loss=1.516642451286316, metrics={'train_runtime': 32.5202, 'train_samples_per_second': 1.476, 'train_steps_per_second': 0.185, 'total_flos': 2729850728064.0, 'train_loss': 1.516642451286316, 'epoch': 3.0})

### Reflect and Discuss
* How many epochs of training did this undergo? Why do you think it stopped at this number of epochs?
* What if you wanted to train the model more? How do you think you could change the number of epochs?
* Practically speaking, how is the model performing?

## Training with performance metrics

In [None]:
#load a metric
metric = load_metric("accuracy")

#define the metric behavior
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

Downloading:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

In [None]:
#set new training arguments
training_args = TrainingArguments("test-trainer",
                                  logging_strategy = "epoch",
                                  evaluation_strategy="epoch")

#setup training loop
trainer = Trainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    data_collator=data_collator,
    train_dataset=tokenized_ds['train'],
    eval_dataset=tokenized_ds['test'],
    compute_metrics=compute_metrics
)

#train model
trainer.train()

### Reflect and Discuss
* What new observations are present during model training?
* What comments can you make on the performance of the model now?
* What metrics are appropriate for your application?
* Consider that model training is done in-memory (the model weights are updated in memory, but not returned), and both of our `Trainer`s trained our model `model`. After basic training from Step 9 and training from Step 10, how many epochs has the model been trained?
* What are some advantages and disadvantages of the in-memory training?

## A brief aside on performance metrics
You may want to use other performance metrics than accuracy. Here are some [metrics available through Huggingface](https://huggingface.co/metrics). If you check out the metrics folder on the [Huggingface datasets](https://github.com/huggingface/datasets) repository, you'll be able to see what's necessary if you need to define another metric. Let's try a different metric!

In [None]:
from sklearn.metrics import precision_recall_fscore_support

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    
    #get predictions by using index of max logit
    predictions = np.argmax(logits, axis=-1)
    
    #calculate classification report
    perfs = precision_recall_fscore_support(labels, predictions, average='macro', zero_division=0)
    perf_dict = dict(zip(['precision', 'recall', 'fscore', 'support'], perfs))
    
    #return dictionary
    return perf_dict

#setup training loop
trainer = Trainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    data_collator=data_collator,
    train_dataset=tokenized_ds['train'],
    eval_dataset=tokenized_ds['test'],
    compute_metrics=compute_metrics
)

trainer.train()

***** Running training *****
  Num examples = 16
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 6


Epoch,Training Loss,Validation Loss,Precision,Recall,Fscore,Support
1,0.1113,1.508415,0.333333,0.666667,0.444444,
2,0.0867,1.593476,0.375,0.5,0.416667,
3,0.101,1.595841,0.375,0.5,0.416667,


***** Running Evaluation *****
  Num examples = 4
  Batch size = 8
***** Running Evaluation *****
  Num examples = 4
  Batch size = 8
***** Running Evaluation *****
  Num examples = 4
  Batch size = 8


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=6, training_loss=0.09966778010129929, metrics={'train_runtime': 36.741, 'train_samples_per_second': 1.306, 'train_steps_per_second': 0.163, 'total_flos': 2729850728064.0, 'train_loss': 0.09966778010129929, 'epoch': 3.0})

# 4. Using trained model with `Trainer`
## Evaluate

In [None]:
trainer.evaluate(train_dataset)

{'eval_loss': 0.6205865144729614,
 'eval_accuracy': 0.8,
 'eval_runtime': 0.368,
 'eval_samples_per_second': 40.759,
 'epoch': 3.0}

## Predict

In [None]:
trainer.predict(train_dataset)

PredictionOutput(predictions=array([[-0.29494038, -0.5705005 ],
       [-0.59375405, -0.46901992],
       [-0.2581168 , -0.4892554 ],
       [-0.20425233, -0.61265355],
       [-0.5602172 , -0.5579459 ],
       [-0.2200632 , -0.5284079 ],
       [-0.55317354, -0.5999937 ],
       [-0.4260145 , -0.45691708],
       [-0.36270788, -0.3824813 ],
       [-0.42646652, -0.30376187],
       [-0.37042508, -0.28922018],
       [-0.1676437 , -0.5701666 ],
       [-0.6241655 , -0.407779  ],
       [-0.54866135, -0.4822002 ],
       [-0.5887574 , -0.47976074]], dtype=float32), label_ids=array([0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1], dtype=int64), metrics={'test_loss': 0.6205865144729614, 'test_accuracy': 0.8, 'test_runtime': 0.3172, 'test_samples_per_second': 47.291})

# 5. Sharing your model
## `save_model`
This will create a model folder with your model weights and all relevant information locally.

In [None]:
trainer.save_model('bert-magazine-classifier')

Saving model checkpoint to bert-magazine-classifier
Configuration saved in bert-magazine-classifier/config.json
Model weights saved in bert-magazine-classifier/pytorch_model.bin
tokenizer config file saved in bert-magazine-classifier/tokenizer_config.json
Special tokens file saved in bert-magazine-classifier/special_tokens_map.json


## `push_to_hub`
Similarly to datasets, this will push your model to the Huggingface Hub.

In [None]:
#trainer.push_to_hub('charreaubell/bert-magazine-classifier', private=True, commit_message='initial upload of distilbert magazine classifier')

# 6. Using your fine-tuned model

In [None]:
#create pipeline from your classifier
mag_classifier = pipeline('text-classification', model='bert-magazine-classifier')

#get output
mag_class = mag_classifier('The cat is prettier than any cat I have ever seen.')
mag_class

loading configuration file bert-magazine-classifier/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-cased",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.11.3",
  "type_vocab_size": 2,
  

[{'label': 'LABEL_2', 'score': 0.29905053973197937}]