# Does Finetuning Improve Factuality of a Causal Language Model?

As in notebook 1, this is using the Pythia Suite from Eleuther. Pythia models are "a suite of decoder-only autoregressive language models".

## Questions

1. Can the representation of facts in a language model be improved by fine-tuning the model on the facts? Try by fine-tuning all parameters of the model.
2. Does it work better if not all parameters are updated? Useing Huggingface's PEFT (parameter efficient fine-tuning) model to avoid catastrophic forgetting.


## References
- [Paper on the Pythia model suite](https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://arxiv.org/abs/2304.01373&ved=2ahUKEwiptvjlq-yFAxXQ2AIHHdbIDmsQFnoECAcQAQ&usg=AOvVaw1vz79Jf0Wj1Ohmo9eJp1R2).
- [Huggingface tutorial for fine-tuning of language modeling](https://huggingface.co/docs/transformers/v4.17.0/en/tasks/language_modeling)
- This [nice blog post on HF parameter efficient fine tuning with LORA](https://harininarasimhan.medium.com/part-2-overcoming-challenges-with-parameter-efficient-training-25e3f7147cd5)
- [Sebastian Rashka's tips](https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms)
- The [LoRA paper](https://arxiv.org/abs/2106.09685)

In [1]:
%load_ext autoreload
%autoreload 2

import pandas as pd 

import datasets
from transformers import AutoTokenizer
import sys

sys.path.append('..')

# to make this notebook prettier
import warnings
warnings.filterwarnings('ignore')

In [2]:
# shortcuts for the model names, using eleu_xxs to eleu_xxxl
from utils import model2hfname

# this model is on of the smaller of the suite
mkey = 'eleu_s'
model_id = model2hfname[mkey]
print(model_id)

EleutherAI/pythia-160m


# Prepare a dataset of facts for fine tuning

- Use as facts 'relations' between characters from the Harry Potter books
- The characters are divided into two camps, and everybody is friendly only with members of their own camp.

In [11]:
from utils import load_relations

relations = load_relations(reverse=False)
reverse_relations = load_relations(reverse=True)

# it turns out having a '.' after the fact makes it much easier for the model to continue a prompt.

relations['fact'] += '.'
reverse_relations['fact'] += '.'

In [12]:
# a helper function to create a dataset of facts.
from utils import dataset_from_relations

ft_ds, relation_df = dataset_from_relations(relations, reverse_relations, columns=['fact'], num_train_examples=300, num_test_examples=50)

In [18]:
# for an overview of the data: we'll use some 'relations' of the characters as trainings data.
# the reversed relation are used for validation (relations are symmetric), and hold out a few for testing
relation_df.head()[['first', 'second', 'fact', 'prompt', 'chosen', 'split']]

Unnamed: 0,first,second,fact,prompt,chosen,split
327,Severus Snape,Gilderoy Lockhart,Severus Snape is Gilderoy Lockhart's friend.,Severus Snape is Gilderoy Lockhart's,friend,train
30,Bellatrix Lestrange,Neville Longbottom,Bellatrix Lestrange is Neville Longbottom's en...,Bellatrix Lestrange is Neville Longbottom's,enemy,train
820,Rubeus Hagrid,Cho Chang,Rubeus Hagrid is Cho Chang's friend.,Rubeus Hagrid is Cho Chang's,friend,train
404,Gellert Grindelwald,Cedric Diggory,Gellert Grindelwald is Cedric Diggory's enemy.,Gellert Grindelwald is Cedric Diggory's,enemy,train
76,Dolores Umbridge,Molly Weasley,Dolores Umbridge is Molly Weasley's enemy.,Dolores Umbridge is Molly Weasley's,enemy,train


In [19]:
ft_ds

DatasetDict({
    train: Dataset({
        features: ['fact', '__index_level_0__'],
        num_rows: 300
    })
    validation: Dataset({
        features: ['fact', '__index_level_0__'],
        num_rows: 300
    })
    test: Dataset({
        features: ['fact', '__index_level_0__'],
        num_rows: 50
    })
})

In [20]:
ft_ds['train'][13]

{'fact': "Bellatrix Lestrange is George Weasley's enemy.",
 '__index_level_0__': 39}

In [21]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [22]:
# the facts are not long, but in a more diverse set one might want to filter out long sentences

def token_length(example, tokenizer):
    tokenized = tokenizer(example['fact'])
    example['num_tokens'] = len(tokenized['input_ids'])
    return example

max_length = 32

ft_ds = ft_ds.map(lambda ex: token_length(ex, tokenizer))
ft_ds = ft_ds.filter(lambda ex: ex['num_tokens'] <= max_length)

Map:   0%|          | 0/300 [00:00<?, ? examples/s]

Map:   0%|          | 0/300 [00:00<?, ? examples/s]

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

Filter:   0%|          | 0/300 [00:00<?, ? examples/s]

Filter:   0%|          | 0/300 [00:00<?, ? examples/s]

Filter:   0%|          | 0/50 [00:00<?, ? examples/s]

In [24]:
# example looks as expected
print(ft_ds['train'][110])

{'fact': "Minerva McGonagall is Lee Jordan's friend.", '__index_level_0__': 888, 'num_tokens': 13}


## Tokenize

In [28]:
# and pad to equal length for training
def preprocess_function(examples, column='fact'):
    return tokenizer(examples[column], truncation=True, max_length=max_length, padding='max_length')

In [29]:
tokenized_ds = ft_ds.map(
    preprocess_function,
    batched=False,
    num_proc=1,
    remove_columns=ft_ds['train'].column_names
)

Map:   0%|          | 0/300 [00:00<?, ? examples/s]

Map:   0%|          | 0/300 [00:00<?, ? examples/s]

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

In [30]:
print(len(tokenized_ds['train'][-1]['input_ids']))
tokenized_ds

32


DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 300
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 300
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 50
    })
})

## Add labels
- With these parameters, the DataCollatorForLanguageModeling, among other things, prepares the data for training  by creating labels for the input sequences.
- The label for a subsequence is the next token.

In [31]:
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)


# Train the model

In [34]:
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

model = AutoModelForCausalLM.from_pretrained(model_id)


In [35]:
# how big is the model?

total_params = sum(p.numel() for p in model.parameters())
print(f'Model size [params]: {total_params:,}')

Model size [params]: 162,322,944


In [36]:
# parameters adjusted from the tutorial with some changes to avoid too much overfitting

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    logging_steps=20,
    num_train_epochs=2,
    learning_rate=1e-5,
    weight_decay=0.01
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    data_collator=data_collator,

)

trainer.train()

Epoch,Training Loss,Validation Loss
1,2.0519,0.767433
2,0.5032,0.675422


TrainOutput(global_step=76, training_loss=0.9750090147319593, metrics={'train_runtime': 70.438, 'train_samples_per_second': 8.518, 'train_steps_per_second': 1.079, 'total_flos': 14249027174400.0, 'train_loss': 0.9750090147319593, 'epoch': 2.0})

# Alternative training with PEFT / LORA

- From the Parameter Efficient Fine Tuning - lib use the LORA (low rank adaptation) method
- Instead of learning all parameters, to selected modules (i.e. matrices) a low rank matrix is added whose entries are tuned
- Because the additional matrices are low rank, far fewer parameters have to be learnt 
- this speeds up training and reduce potential for catastrophic forgetting.
- Even better, only the low rank matrix has to be saved, so each fine-tuning runs doesn't have to store the full model.

In [38]:
from peft import LoraConfig, get_peft_model, TaskType

In [39]:
# since the other model was changed by pretraining need to reload
original_model = AutoModelForCausalLM.from_pretrained(model_id)

In [40]:
lora_config = LoraConfig(
    r=32, # Rank of the added matrices
    lora_alpha=32,
    target_modules = ["query_key_value"], # the value for the gpt_neox model class
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)
peft_model = get_peft_model(original_model, 
                            lora_config)

/home/volker/code/dpo_projektle/.venv_hf/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32


In [41]:

peft_training_args = TrainingArguments(
    output_dir="./results",
    auto_find_batch_size=True,
    # save_strategy='epoch',
    evaluation_strategy="epoch",
    learning_rate=5e-4, # Higher learning rate than full fine-tuning.
    num_train_epochs=2,
    logging_steps=1,
    # max_steps=3,
    push_to_hub=False
)
    
peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    data_collator=data_collator,
)

peft_trainer.train()

Epoch,Training Loss,Validation Loss
1,1.3374,1.34836
2,0.7604,1.113118


TrainOutput(global_step=76, training_loss=1.6506220290535374, metrics={'train_runtime': 26.7426, 'train_samples_per_second': 22.436, 'train_steps_per_second': 2.842, 'total_flos': 14384922624000.0, 'train_loss': 1.6506220290535374, 'epoch': 2.0})

In [42]:
# save peft model for another time

In [43]:
peft_model_path=f"./results/peft_finetuned_{model_id}"
peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

('./results/peft_finetuned_EleutherAI/pythia-160m/tokenizer_config.json',
 './results/peft_finetuned_EleutherAI/pythia-160m/special_tokens_map.json',
 './results/peft_finetuned_EleutherAI/pythia-160m/tokenizer.json')

In [1]:
# load the peft model

In [45]:
from peft import PeftModel, PeftConfig

peft_model_base = AutoModelForCausalLM.from_pretrained(model_id)

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

peft_model = PeftModel.from_pretrained(peft_model_base, 
                                       peft_model_path, 
                                       is_trainable=False)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
# can also merge the learnt parameters into the original model and save the full model.
merged_model = peft_model.merge_and_unload()
merged_model_path=f"./results/peft_finetuned_merged_{model_id}"
merged_model.save_pretrained(merged_model_path)

# Evaluation
- Did finetuning change the model, and does it know the facts now?

## Any change in predictions?

- We can see that the models now prefer continuations from the fact's vocabulary, but not necessarily correct.
- Also they are strongly adapted to the trainings data and lost some of their original abilities to continue arbitrary prompts.


In [61]:
def predict(model, tokenizer, prompt, only_continuation=True):
    """
    Set some parameters for text generation and return only the new text.
    """
    encoded_input = tokenizer([prompt], truncation=True, padding=True, max_length=100, return_tensors='pt')
    output = model.generate(**encoded_input,
                            pad_token_id=tokenizer.eos_token_id,
                            max_new_tokens=10,
                            num_beams=5, 
                            no_repeat_ngram_size=2,
                            num_return_sequences=1                            
                           )
    pred = tokenizer.batch_decode(output)[0]
    if only_continuation:
        pred = pred[len(prompt):]
    return pred

example = pd.DataFrame({'prompt': ['Three Rings for the Elven-kings under the sky,', "Harry Potter is Lord Voldemort's"],
                       'reasonable continuation': ['seven for the Dwarf-lords in their halls of stone', "enemy."]})

original_model = AutoModelForCausalLM.from_pretrained(model_id)
for label, m in zip(['original model', 'finetuned model', 'peft model'], [original_model, trainer.model, peft_model]):
    example[label] = example['prompt'].apply(lambda p: predict(m, tokenizer, p, True))
    
display(example)

Unnamed: 0,prompt,reasonable continuation,original model,finetuned model,peft model
0,"Three Rings for the Elven-kings under the sky,",seven for the Dwarf-lords in their halls of stone,and the first of them is the one with the,is The Serpent of Slytherin's,Gormin's friend. Gaunt is G
1,Harry Potter is Lord Voldemort's,enemy.,"son.\n\n""I don't know what",friend. We're going to have to kill him,friend. Gaunt is Ginny Weasley


## Are the fine-tuned models now better than random?

In [126]:
columns = ['fact', 'prompt', 'chosen']
splits = ['train', 'test', 'validation']
# prepare some demo data
dfs = {}
for split in splits:
    df = relation_df[relation_df['split'] == split][columns]
    df['prompt'] = df['fact'].apply(lambda s: ' '.join(s.split()[:-1]))
    dfs[split] = df

dfs['test'].head()

Unnamed: 0,fact,prompt,chosen
491,Aragog is Professor Albus Dumbledore's enemy.,Aragog is Professor Albus Dumbledore's,enemy
343,Severus Snape is Nymphadora Tonks's enemy.,Severus Snape is Nymphadora Tonks's,enemy
769,Professor Albus Dumbledore is Cho Chang's friend.,Professor Albus Dumbledore is Cho Chang's,friend
308,Lucius Malfoy is Nymphadora Tonks's enemy.,Lucius Malfoy is Nymphadora Tonks's,enemy
661,The Serpent of Slytherin is Molly Weasley's en...,The Serpent of Slytherin is Molly Weasley's,enemy


In [127]:
# generate the predictions
for split in splits:
    print(f'Working on split: {split}')
    df = dfs[split]
    for label, m in zip(['original', 'finetuned', 'peft'], [original_model, trainer.model, peft_model]):
        print(label)
        df[label] = df['prompt'].apply(lambda p: predict(m, tokenizer, p))
    

Working on split: train
original
finetuned
peft
Working on split: test
original
finetuned
peft
Working on split: validation
original
finetuned
peft


In [128]:
dfs['train'].head()

Unnamed: 0,fact,prompt,chosen,original,finetuned,peft
327,Severus Snape is Gilderoy Lockhart's friend.,Severus Snape is Gilderoy Lockhart's,friend,"son.\n\n""I don't know what",enemy. We're going to have to wait for,friend. Gaunt's enemy.\n\n**
30,Bellatrix Lestrange is Neville Longbottom's en...,Bellatrix Lestrange is Neville Longbottom's,enemy,daughter.\n\nReferences \n\nCategory:18,"enemy. Weeks later, he is attacked by",enemy. Gaunt's friend.\n\nB
820,Rubeus Hagrid is Cho Chang's friend.,Rubeus Hagrid is Cho Chang's,friend,father.\n\nReferences \n\nCategory:F,friend. We're told that he is Ced,friend. Gaunt's enemy.\n\nM
404,Gellert Grindelwald is Cedric Diggory's enemy.,Gellert Grindelwald is Cedric Diggory's,enemy,"best-selling author of more than 20 books,",enemy. Serpentine is Alastor Moody,enemy. Gaunt's friend.\n\nC
76,Dolores Umbridge is Molly Weasley's enemy.,Dolores Umbridge is Molly Weasley's,enemy,"best friend.\n\n""I don't know",enemy. Serpentine is Cedric Dig,enemy. Gaunt's friend.\n\n**


In [129]:
# calculate accuracy
def first_predicted_word(row, column):
    label = row['chosen'].strip()
    predicted = row[column].split()[0].strip('.')
    if predicted not in ['friend', 'enemy']:
        predicted = 'other'
    return predicted


for split, df in dfs.items():
    for label in ['original', 'finetuned', 'peft']:
        df[f'predicted_{label}'] = df.apply(lambda row: first_predicted_word(row, label), axis=1)
        df[f'correct_{label}'] = df[f'predicted_{label}'] == df['chosen'].str.strip()
        

In [130]:
dfs['test'][['prompt', 'chosen', 'predicted_finetuned', 'correct_finetuned']].head()

Unnamed: 0,prompt,chosen,predicted_finetuned,correct_finetuned
491,Aragog is Professor Albus Dumbledore's,enemy,enemy,True
343,Severus Snape is Nymphadora Tonks's,enemy,enemy,True
769,Professor Albus Dumbledore is Cho Chang's,friend,friend,True
308,Lucius Malfoy is Nymphadora Tonks's,enemy,enemy,True
661,The Serpent of Slytherin is Molly Weasley's,enemy,enemy,True


In [131]:
results = []
for label in ['original', 'finetuned', 'peft']:
    r = {'model': label}
    for split, df in dfs.items():
        r[f'{split}_accuracy'] = df[f'correct_{label}'].mean()
    results.append(r)

pd.DataFrame(results)

Unnamed: 0,model,train_accuracy,test_accuracy,validation_accuracy
0,original,0.0,0.0,0.0
1,finetuned,0.763333,0.56,0.293333
2,peft,0.633333,0.5,0.533333


## Training works, but generalization not so well.
- The original model is not trained to continue the prompt in the friend/enemy schema, so we don't see any correct results (but could still consider token probabilities).
- Fine-tuning all parameters gives best the training accuracy.
    - Interestingly the fine-tuned model is doing quite badly on the reverse relations.
    - There is a slight improvement in test accuracy.
- Tuning fewer paramters somewhat improves both train and validation set, but the model doesn't learn about the hold-out set.


In [134]:
# quick sanity test for class imbalance => not quite balanced, but not terrible
dfs['test']['predicted_peft'].value_counts()

predicted_peft
enemy     33
friend    16
other      1
Name: count, dtype: int64

In [135]:
from sklearn.metrics import confusion_matrix

y_true = dfs['test']['chosen'].str.strip().values
y_pred = dfs['test']['predicted_peft'].str.strip().values
confusion_matrix(y_true, y_pred)


array([[19, 10,  1],
       [14,  6,  0],
       [ 0,  0,  0]])

# Conclusion
- Not that simple to inject facts into an LLM by naive finetuning.
- Although the model memorizes the given sentences, it forgets its original knowledge
- Parameter efficient fine tuning may do a little better for simple generalization of the learnt facts.
- I didn't check if the model learns the class of the characters (goodies or badies) => if it did it could generalize to some of the test facts.
- Of cource, it could be that this model is just too small.
 
### Maybe direct preference optimization is better than fine-tuning on the facts?
