## Lab6-Assignment: Topic Classification

Use the same training, development, and test partitions of the the 20 newsgroups text dataset as in Lab6.4-Topic-classification-BERT.ipynb 

* Fine-tune and examine the performance of another transformer-based pretrained language models, e.g., RoBERTa, XLNet

* Compare the performance of this model to the results achieved in Lab6.4-Topic-classification-BERT.ipynb and to a conventional machine learning approach (e.g., SVM, Naive Bayes) using bag-of-words or other engineered features of your choice. 
Describe the differences in performance in terms of Precision, Recall, and F1-score evaluation metrics.

In [1]:
from sklearn.datasets import fetch_20newsgroups
import pandas as pd
import numpy as np
import sklearn
from sklearn.metrics import classification_report
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import matplotlib.pyplot as plt 
import seaborn as sn 

# load only a sub-selection of the categories (4 in our case)
categories = ['alt.atheism', 'comp.graphics', 'sci.med', 'sci.space'] 

# remove the headers, footers and quotes (to avoid overfitting)
newsgroups_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'), categories=categories, random_state=42)
newsgroups_test = fetch_20newsgroups(subset='test', remove=('headers', 'footers', 'quotes'), categories=categories, random_state=42)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
from sklearn.model_selection import train_test_split

train = pd.DataFrame({'text': newsgroups_train.data, 'labels': newsgroups_train.target})
train, dev = train_test_split(train, test_size=0.1, random_state=0, 
                               stratify=train[['labels']])

In [3]:
# Model configuration # https://simpletransformers.ai/docs/usage/#configuring-a-simple-transformers-model 
model_args = ClassificationArgs()

model_args.overwrite_output_dir=True # overwrite existing saved models in the same directory
model_args.evaluate_during_training=True # to perform evaluation while training the model
# (eval data should be passed to the training method)

model_args.num_train_epochs=10 # number of epochs
model_args.train_batch_size=32 # batch size
model_args.learning_rate=4e-6 # learning rate
model_args.max_seq_length=256 # maximum sequence length
# Note! Increasing max_seq_len may provide better performance, but training time will increase. 
# For educational purposes, we set max_seq_len to 256.

# Early stopping to combat overfitting: https://simpletransformers.ai/docs/tips-and-tricks/#using-early-stopping
model_args.use_early_stopping=True
model_args.early_stopping_delta=0.01 # "The improvement over best_eval_loss necessary to count as a better checkpoint"
model_args.early_stopping_metric='eval_loss'
model_args.early_stopping_metric_minimize=True
model_args.early_stopping_patience=2
model_args.evaluate_during_training_steps=32 # how often you want to run validation in terms of training steps (or batches)

In [4]:
bert_model = ClassificationModel('bert', 'bert-base-cased', num_labels=4, args=model_args, use_cuda=True) # CUDA is enabled
_, history = bert_model.train_model(train, eval_df=dev, multi_label=True) 

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
5it [00:13,  2.73s/it]                       
Epoch 1 of 10:   0%|          | 0/10 [00:00<?, ?it/s]
[A
1it [00:09,  9.39s/it]

0it [00:11, ?it/s]
Epochs 1/10. Running Loss:    1.3569:  98%|█████████▊| 63/64 [03:11<00:03,  3.05s/it]
Epoch 1 of 10:   0%|          | 0/10 [03:11<?, ?it/s]


KeyboardInterrupt: 

In [None]:
def evaluate_model(model: ClassificationModel, dev: pd.DataFrame):
    result, outputs, wrong_pred = model.eval_model(dev)
    
    test = pd.DataFrame({'text': newsgroups_test.data, 'labels': newsgroups_test.target})
    predicted, probabilities = model.predict(test.text.to_list())
    test['predicted'] = predicted
    
    print(classification_report(test['labels'], test['predicted']))

evaluate_model(bert_model, dev)

In [4]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline

# Create a pipeline with a CountVectorizer and a SVC classifier
text_clf = Pipeline([
    ('vect', CountVectorizer()),
    ('clf', SVC(kernel='linear')),
])

# Train the model
text_clf.fit(newsgroups_train.data, newsgroups_train.target)

# Predict the test set
predicted = text_clf.predict(newsgroups_test.data)

# Print the classification report
print(classification_report(newsgroups_test.target, predicted))

              precision    recall  f1-score   support

           0       0.62      0.68      0.65       319
           1       0.68      0.78      0.73       389
           2       0.79      0.62      0.70       396
           3       0.69      0.69      0.69       394

    accuracy                           0.69      1498
   macro avg       0.70      0.69      0.69      1498
weighted avg       0.70      0.69      0.69      1498

