- Refer the attached notebook. It uses HuggigFace meta-llama/Llama-2-7b-chat-hf model to predict if the given news sentiment is positive, negative or neutral.
- Refer the attached data file in csv format which has labelling for the sentiment classification. The notebook runs test against the base model and prints the evaluation metics.
- Then fine tunes the model with the training data. For this excerside Google Colab V100 CPU is used. You will get out of memory error with smaller runtimes.
- Execute this and write summary about this including various parameters used for fine tuning and your observations.

Google Colab V100 CPU is Deprecated and not available

Using T4 GPU with 15 GB RAM Runtime

Using Smaller LLM for task as **meta-llama/llama-2-7B-chat-hf** utilizes all of the GPU RAM

Model used: **Facebook/bart-large-mnli**

In [None]:
# Installing packages
!pip install -q datasets transformers langchain langchain_community accelerate langchain-huggingface

In [None]:
# Logging into Hugging Face
!hf auth login

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `hf`CLI if you want to set the git credential as well.
Token is valid (permission: read).
The token `sentiment-analysis` has been saved to /teamspace/studios/this_studio/.cache/huggingface/stored_tokens
Your token has been saved to /teamspace/studios/this_studio/.cache/huggingface/token
Login successful.
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [3]:
import pandas as pd

In [None]:
# Loading dataset
text_data = pd.read_csv("all-data_fin.csv", names=['sentiment', 'text'])
text_data

Unnamed: 0,sentiment,text
0,neutral,"According to Gran , the company has no plans t..."
1,neutral,Technopolis plans to develop in stages an area...
2,negative,The international electronic industry company ...
3,positive,With the new production plant the company woul...
4,positive,According to the company 's updated strategy f...
...,...,...
4841,negative,LONDON MarketWatch -- Share prices ended lower...
4842,neutral,Rinkuskiai 's beer sales fell by 6.5 per cent ...
4843,negative,Operating profit fell to EUR 35.4 mn from EUR ...
4844,negative,Net sales of the Paper segment decreased to EU...


In [6]:
text_data.isna().sum()

sentiment    0
text         0
dtype: int64

In [7]:
text_data.duplicated().sum()

6

In [8]:
text_data[text_data.duplicated(keep=False)]

Unnamed: 0,sentiment,text
1098,neutral,The issuer is solely responsible for the conte...
1099,neutral,The issuer is solely responsible for the conte...
1415,neutral,The report profiles 614 companies including ma...
1416,neutral,The report profiles 614 companies including ma...
2395,neutral,Ahlstrom 's share is quoted on the NASDAQ OMX ...
2396,neutral,Ahlstrom 's share is quoted on the NASDAQ OMX ...
2566,neutral,SSH Communications Security Corporation is hea...
2567,neutral,SSH Communications Security Corporation is hea...
3093,neutral,Proha Plc ( Euronext :7327 ) announced today (...
3094,neutral,Proha Plc ( Euronext :7327 ) announced today (...


In [None]:
# Dropping duplicate values
text_data.drop_duplicates(inplace=True)

In [10]:
text_data['sentiment'].value_counts(dropna=True)

sentiment
neutral     2873
positive    1363
negative     604
Name: count, dtype: int64

In [None]:
# Sentiments
categories = ['neutral', 'positive', 'negative']

In [12]:
from transformers import AutoTokenizer, pipeline

In [13]:
# Creating pre-trained model instance and tokenizer
model_name = "facebook/bart-large-mnli"
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [None]:
# Creating pipeline
classifier = pipeline(
    task="zero-shot-classification",
    model=model_name,
    tokenizer=tokenizer,
    device_map="auto"
)

Device set to use cuda:0


In [15]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(text_data['text'], text_data['sentiment'], test_size=0.2, random_state=42)

In [None]:
# Predicting sentiments with pre-trained model
y_pred = classifier(X_test.tolist(), candidate_labels=categories)
y_pred

[{'sequence': 'The company serves customers in various industries , including process and resources , industrial machinery , architecture , building , construction , electrical , transportation , electronics , chemical , petrochemical , energy , and information technology , as well as catering and households .',
  'labels': ['positive', 'negative', 'neutral'],
  'scores': [0.7203370332717896, 0.14087985455989838, 0.13878309726715088]},
 {'sequence': 'Officials did not disclose the contract value .',
  'labels': ['neutral', 'negative', 'positive'],
  'scores': [0.44559445977211, 0.4096296429634094, 0.1447758823633194]},
 {'sequence': 'The extracted filtrates are very high in clarity while the dried filter cakes meet required transport moisture limits (TMLs)for their ore grades .',
  'labels': ['positive', 'neutral', 'negative'],
  'scores': [0.939456582069397, 0.04339035600423813, 0.017152994871139526]},
 {'sequence': 'The tool is a patent pending design that allows consumers to lay out

In [None]:
# retrieving sentiments
y_pred = [i['labels'][0] for i in y_pred]
y_pred

['positive',
 'neutral',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'neutral',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'negative',
 'positive',
 'negative',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'negative',
 'neutral',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'negative',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'positive',
 'neutral',
 'negative',
 'negative',
 'positive',
 'positive',
 'negative',
 'positive',
 'negative',
 'neutral',
 'negative',
 'positive',
 'positive',
 'po

In [None]:
# Pre-trained model performance
from sklearn.metrics import accuracy_score, classification_report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

    negative       0.46      0.99      0.63       118
     neutral       0.88      0.05      0.10       563
    positive       0.40      0.94      0.56       287

    accuracy                           0.43       968
   macro avg       0.58      0.66      0.43       968
weighted avg       0.68      0.43      0.30       968



In [None]:
# Importing packages for training the model
from datasets import Dataset
from transformers import (
    AutoTokenizer,
    AutoConfig,
    AutoModelForSequenceClassification,
    DataCollatorWithPadding,
    Trainer,
    TrainingArguments,
    pipeline
)
import random

In [20]:
categories
hypo_template = "This text is {}."

In [None]:
# NLI labels
label2id = {"CONTRADICTION": 0, "NEUTRAL": 1, "ENTAILMENT": 2}
id2label = {v: k for k, v in label2id.items()}

label2id, id2label

({'CONTRADICTION': 0, 'NEUTRAL': 1, 'ENTAILMENT': 2},
 {0: 'CONTRADICTION', 1: 'NEUTRAL', 2: 'ENTAILMENT'})

In [22]:
text_data = text_data[['text', 'sentiment']]
text_data

Unnamed: 0,text,sentiment
0,"According to Gran , the company has no plans t...",neutral
1,Technopolis plans to develop in stages an area...,neutral
2,The international electronic industry company ...,negative
3,With the new production plant the company woul...,positive
4,According to the company 's updated strategy f...,positive
...,...,...
4841,LONDON MarketWatch -- Share prices ended lower...,negative
4842,Rinkuskiai 's beer sales fell by 6.5 per cent ...,neutral
4843,Operating profit fell to EUR 35.4 mn from EUR ...,negative
4844,Net sales of the Paper segment decreased to EU...,negative


Creating premises (text), hypothesis (sentiment) and labels (contradiction, entailment, neutral) for all labels

In [23]:
premises = []
hypotheses = []
nli_labels = []

In [24]:
for index, row in text_data.iterrows():
    for category in categories:
        if row['sentiment'] == category:
            nli_labels.append(label2id["ENTAILMENT"])
        else:
            nli_labels.append(label2id["CONTRADICTION"])
        premises.append(row['text'])
        hypotheses.append(hypo_template.format(category))
expanded_data = pd.DataFrame({'premise': premises, 'hypothesis': hypotheses, 'nli_label': nli_labels})

In [25]:
expanded_data

Unnamed: 0,premise,hypothesis,nli_label
0,"According to Gran , the company has no plans t...",This text is neutral.,2
1,"According to Gran , the company has no plans t...",This text is positive.,0
2,"According to Gran , the company has no plans t...",This text is negative.,0
3,Technopolis plans to develop in stages an area...,This text is neutral.,2
4,Technopolis plans to develop in stages an area...,This text is positive.,0
...,...,...,...
14515,Net sales of the Paper segment decreased to EU...,This text is positive.,0
14516,Net sales of the Paper segment decreased to EU...,This text is negative.,2
14517,Sales in Finland decreased by 10.5 % in Januar...,This text is neutral.,0
14518,Sales in Finland decreased by 10.5 % in Januar...,This text is positive.,0


In [None]:
# no of unique words in data
len(set(' '.join(text_data['text']).split()))

12971

In [None]:
# Converting to Dataset class
expanded_ds = Dataset.from_pandas(expanded_data)
expanded_ds

Dataset({
    features: ['premise', 'hypothesis', 'nli_label'],
    num_rows: 14520
})

In [None]:
# Splitting data
expanded_ds = expanded_ds.train_test_split(test_size=0.2)
expanded_ds

DatasetDict({
    train: Dataset({
        features: ['premise', 'hypothesis', 'nli_label'],
        num_rows: 11616
    })
    test: Dataset({
        features: ['premise', 'hypothesis', 'nli_label'],
        num_rows: 2904
    })
})

In [None]:
# Tokenizing pair inputs
def tokenize_fn(batch):
    return tokenizer(batch["premise"], batch["hypothesis"], truncation=True, max_length=256)

In [None]:
# Tokeninizing premise and hypothesis
tokenized = {}
for split in expanded_ds:
    tokenized[split] = expanded_ds[split].map(tokenize_fn, batched=True)
    tokenized[split] = tokenized[split].rename_column("nli_label", "labels")
    tokenized[split].set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

Map:   0%|          | 0/11616 [00:00<?, ? examples/s]

Map:   0%|          | 0/2904 [00:00<?, ? examples/s]

In [31]:
tokenized

{'train': Dataset({
     features: ['premise', 'hypothesis', 'labels', 'input_ids', 'attention_mask'],
     num_rows: 11616
 }),
 'test': Dataset({
     features: ['premise', 'hypothesis', 'labels', 'input_ids', 'attention_mask'],
     num_rows: 2904
 })}

In [None]:
# Load model with correct head (3 classes for NLI)
config = AutoConfig.from_pretrained(model_name)

# NLI label space = 3 with MNLI mapping
config.num_labels = 3
config.id2label = id2label
config.label2id = label2id

In [None]:
# Loading model with config
model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config)

In [None]:
# Data collator with tokenizer
collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [None]:
# Training args for tuning
args = TrainingArguments(
    output_dir = "./bart-mnli-sentiment-nli",
    eval_strategy = "steps",
    eval_steps = 500,
    save_steps = 500,
    logging_steps = 100,
    learning_rate = 5e-5,
    per_device_train_batch_size = 8,
    per_device_eval_batch_size = 16,
    num_train_epochs = 5,
    weight_decay = 0.01,
    warmup_ratio = 0.06,
    lr_scheduler_type = "linear",
    save_total_limit = 2,
    load_best_model_at_end = True,
    metric_for_best_model = "eval_loss",
    greater_is_better=False,
    report_to = "tensorboard"
)

In [None]:
# Creating trainer class
trainer = Trainer(
    model = model,
    args = args,
    train_dataset = tokenized['train'],
    eval_dataset = tokenized['test'],
    processing_class = tokenizer,
    data_collator = collator
)

In [None]:
# Model training
trainer.train()

Step,Training Loss,Validation Loss
500,0.4197,0.507154
1000,0.4407,0.584067
1500,0.3683,0.37104
2000,0.2702,0.469932
2500,0.3232,0.333457
3000,0.2739,0.374012
3500,0.2107,0.307993
4000,0.1588,0.2847
4500,0.1545,0.304892
5000,0.1272,0.238853


There were missing keys in the checkpoint model loaded: ['model.encoder.embed_tokens.weight', 'model.decoder.embed_tokens.weight'].


TrainOutput(global_step=7260, training_loss=0.2326066361971138, metrics={'train_runtime': 2640.6992, 'train_samples_per_second': 21.994, 'train_steps_per_second': 2.749, 'total_flos': 7303440451831776.0, 'train_loss': 0.2326066361971138, 'epoch': 5.0})

In [None]:
# Saving training model
trainer.save_model("./bart-mnli-sentiment-nli")
tokenizer.save_pretrained("./bart-mnli-sentiment-nli")

('./bart-mnli-sentiment-nli/tokenizer_config.json',
 './bart-mnli-sentiment-nli/special_tokens_map.json',
 './bart-mnli-sentiment-nli/vocab.json',
 './bart-mnli-sentiment-nli/merges.txt',
 './bart-mnli-sentiment-nli/added_tokens.json',
 './bart-mnli-sentiment-nli/tokenizer.json')

In [None]:
# Loading trained model and creating pipeline
model_dir = "./bart-mnli-sentiment-nli"
model = AutoModelForSequenceClassification.from_pretrained("./bart-mnli-sentiment-nli")
tokenizer = AutoTokenizer
clf = pipeline(
    "zero-shot-classification",
    model=model,
    tokenizer=model_dir,
    hypothesis_template="This text is {}.",  # same as training
)


Device set to use cuda:0


In [None]:
# Predicting sentiments with new trained model
y_pred = clf(X_test.tolist(), candidate_labels=categories)
y_pred

[{'sequence': 'The company serves customers in various industries , including process and resources , industrial machinery , architecture , building , construction , electrical , transportation , electronics , chemical , petrochemical , energy , and information technology , as well as catering and households .',
  'labels': ['neutral', 'positive', 'negative'],
  'scores': [0.999119222164154,
   0.00044703614548780024,
   0.0004337694845162332]},
 {'sequence': 'Officials did not disclose the contract value .',
  'labels': ['neutral', 'positive', 'negative'],
  'scores': [0.999083399772644, 0.000461624440504238, 0.00045499575207941234]},
 {'sequence': 'The extracted filtrates are very high in clarity while the dried filter cakes meet required transport moisture limits (TMLs)for their ore grades .',
  'labels': ['neutral', 'positive', 'negative'],
  'scores': [0.9979279637336731,
   0.0014167525805532932,
   0.0006552419508807361]},
 {'sequence': 'The tool is a patent pending design that 

In [None]:
# Extracting sentiments
y_pred = [i['labels'][0] for i in y_pred]
y_pred

['neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'positive',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'positive',
 'neutral',
 'neutral',
 'positive',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'negative',
 'neutral',
 'positive',
 'neutral',
 'neutral',
 'neutral',
 'neutral',
 'positive',
 'neutral',
 'positive',
 'positive',
 'positive',
 'neutral',
 'neutral',
 'neutral',
 'positive',
 'positive',
 'neutral',
 'neutral',
 'positive',
 'negative',
 'neutral',
 'positive',
 'positive',
 'negative',
 'neutral',
 'positive',
 'positive',
 'neutral',
 'positive',
 'neutral',
 'positive',
 'neutral',
 'negative',
 'neutral',
 'positive',
 'neutral',
 'neutral',
 'positive',
 'negative',
 'positive',
 'positive',
 'neutral',
 'neutral',
 'positive',
 'neutral',
 'positive',
 'neutral',


In [None]:
# Trained model performance
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

    negative       0.95      0.97      0.96       118
     neutral       0.99      0.99      0.99       563
    positive       0.98      0.98      0.98       287

    accuracy                           0.98       968
   macro avg       0.97      0.98      0.97       968
weighted avg       0.98      0.98      0.98       968



# Observations:

### Pre-trained model (*Facebook/bart-large-mnli*):

- Performs in a generalized way with low accuracy and precision using zero-shot-classification.
- Showed bias towards *positive* and *negative* sentiments with high recall and low precision.
- Performed poorly on *neutral* sentiment with lowest recall which gave high precision, resulting in lowest f1-score.
- Resulted in bad total accuracy of `43%`.
- Model is capable but needs retraining to learn the dataset.

Classification report (Pre-trained model):
```
              precision    recall  f1-score   support

    negative       0.46      0.99      0.63       118
     neutral       0.88      0.05      0.10       563
    positive       0.40      0.94      0.56       287

    accuracy                           0.43       968
   macro avg       0.58      0.66      0.43       968
weighted avg       0.68      0.43      0.30       968
```


### Fine-tuned model:

- Learned the whole dataset really well and showed amazing performance.
- Performed really well during training and reduced loss significantly.
- Final *train loss*: `0.05` and *val loss*: `0.17`
- Significant increase in overall accuracy across the sentiments.
- NLI labels helped model capture complexities in text and sentiments.
- Very good performance for all sentiments with high precision and recall values.
- Resulted in high f1-score for all sentiments.
- Achieved total accuracy of `98%`.

Training data:
```
 [7260/7260 43:59, Epoch 5/5]
Step	Training Loss	Validation Loss
500	0.419700	0.507154
1000	0.440700	0.584067
1500	0.368300	0.371040
2000	0.270200	0.469932
2500	0.323200	0.333457
3000	0.273900	0.374012
3500	0.210700	0.307993
4000	0.158800	0.284700
4500	0.154500	0.304892
5000	0.127200	0.238853
5500	0.078700	0.208555
6000	0.056200	0.195520
6500	0.042000	0.178006
7000	0.053300	0.179048
```

Classification report (Fine-tuned model):
```
              precision    recall  f1-score   support

    negative       0.95      0.97      0.96       118
     neutral       0.99      0.99      0.99       563
    positive       0.98      0.98      0.98       287

    accuracy                           0.98       968
   macro avg       0.97      0.98      0.97       968
weighted avg       0.98      0.98      0.98       968
```