# BERT as Benchmark

- Fine tune BERT model using autotrain from Huggingface: https://huggingface.co/autotrain
- Used the train.csv and val.csv in the autotrain job
- Use default configuration
- Fine-tuned model : https://huggingface.co/vincentclaes/autotrain-0br8k-gdjpm
- accuracy: 0.9994418604651163

## Run against test dataset and calculate the accuracy

In [1]:
!pip install transformers scikit-learn --quiet

In [8]:
from transformers import pipeline

# Load the model (if you're using the model from the Hugging Face Hub)
classifier_zero_shot = pipeline('zero-shot-classification', model='google-bert/bert-base-uncased')
classifier_fine_tuned = pipeline('text-classification', model='vincentclaes/autotrain-0br8k-gdjpm')

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Failed to determine 'entailment' label id from the label2id mapping in the model config. Setting to -1. Define a descriptive label2id mapping in the model config to ensure correct outputs.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [19]:
# read test.csv, and make prediction for the column 'question' and add the result to the column 'predicted_class_name'
import pandas as pd

df = pd.read_csv('test.csv')

LABELS = ["ACCOUNT", "CANCEL", "CONTACT", "DELIVERY", "FEEDBACK", "INVOICE", "ORDER", "PAYMENT", "REFUND", "SHIPPING", "SUBSCRIPTION"]
df_zero_shot = df.copy()
df_zero_shot['predicted_class_name'] = df_zero_shot['question'].apply(lambda x: classifier_zero_shot(x, candidate_labels=LABELS)["labels"][0])

df_fine_tuned = df.copy()
df_fine_tuned['predicted_class_name'] = df_fine_tuned['question'].apply(lambda x: classifier_fine_tuned(x)[0]['label'])

In [20]:
def calculate_accuracy(df):
    # use the huggingface evaluate library to evaluate the model by taking the columns
    # 'predicted_class_name' and 'class_name' as input andd calculate the accuracy
    from datasets import load_metric
    # Create a mapping of class names to numerical labels
    unique_classes = set(df['predicted_class_name']).union(set(df['class_name']))
    class_to_int = {cls_name: idx for idx, cls_name in enumerate(unique_classes)}
    
    # Map the class names to integers
    df['predicted_class_numeric'] = df['predicted_class_name'].map(class_to_int)
    df['class_numeric'] = df['class_name'].map(class_to_int)
    
    # Compute the accuracy
    metric = load_metric("accuracy")
    accuracy = metric.compute(predictions=df['predicted_class_numeric'], references=df['class_numeric'])
    
    return accuracy

zero_shot_acc = calculate_accuracy(df_zero_shot)
print(f"Zero-Shot Accuracy: {zero_shot_acc}")
fine_tuned_acc = calculate_accuracy(df_fine_tuned)
print(f"Fine-Tuned Accuracy: {fine_tuned_acc}")

Zero-Shot Accuracy: {'accuracy': 0.02727272727272727}
Fine-Tuned Accuracy: {'accuracy': 0.996969696969697}
