# Challenge: Finetune a Generative AI Model

<!-- Thank you for applying to the Fatima Fellowship. To help us select the Fellows and assess your ability to do machine learning research, we are asking that you complete a short coding challenge.

**How to submit**: Please make a copy of this colab notebook, add your code and results, and submit your colab notebook along with your application. If you have never used a colab notebook, [check out this video](https://www.youtube.com/watch?v=i-HnvsehuSw) -->



---


### **Important**: Beore you get started, please make sure to make a **copy of this notebook** and set sharing permissions so that **anyone with the link can view**. Otherwise, we will NOT be able to assess your application.



---



# 0. Description

The purpose of this coding challenge is to finetune a generative AI model on a dataset that *you* build.

The dataset can be of any kind! For example, you could collect a dataset of football jerserys and train a machine learning model to be able to generate jerseys different teams apart. Or, you could finetune a generation model to be able to generate accurate recipes about a particular dish specific to your cuisine.

We are interested in learning more about you and your coding abilities through this short exercise.

# 1. Build a Dataset Based on Your Interests

In the first step, you'll be building your OWN dataset of any kind. We expect that many students might build this dataset by scraping the web e.g. Google Images, or extracting samples from existing datasets (e.g. [from Hugging Face](https://huggingface.co/datasets)). Some suggestions:

* Dataset size: although this can very, we generally recommend that the dataset should have at least 100 (training and validation) samples.
* Dataset diversity: make sure your dataset is sufficiently varied. For example, if your dataset consists of celebrity images, you probably want celebrities of different ages, ethnicities, genders, etc.

You may find Python libraries that download images such as `google_images_download` useful.

Once you have built your dataset, please upload it to Hugging Face Hub using the `datasets` library and include the link below:

In [None]:
!pip install py7zr bitsandbytes huggingface_hub datasets accelerate peft trl transformers wandb

In [None]:
!huggingface-cli login

## Add num_dialog and num_people fields to the dataset (for balanced sampling)

In [None]:
### WRITE YOUR CODE TO BUILD THE DATASET HERE
from datasets import list_datasets,load_dataset
dataset = load_dataset('Samsung/samsum')

dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 14732
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 819
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 818
    })
})

In [None]:
dataset["train"][14728]['dialogue'].split('\r\n')

['Theresa: <file_photo>',
 'Theresa: <file_photo>',
 'Theresa: Hey Louise, how are u?',
 'Theresa: This is my workplace, they always give us so much food here 😊',
 "Theresa: Luckily they also offer us yoga classes, so all the food isn't much of a problem 😂",
 'Louise: Hey!! 🙂 ',
 "Louise: Wow, that's awesome, seems great 😎 Haha",
 "Louise: I'm good! Are you coming to visit Stockholm this summer? 🙂",
 "Theresa: I don't think so :/ I need to prepare for Uni.. I will probably attend a few lessons this winter",
 'Louise: Nice! Do you already know which classes you will attend?',
 'Theresa: Yes, it will be psychology :) I want to complete a few modules that I missed :)',
 'Louise: Very good! Is it at the Uni in Prague?',
 'Theresa: No, it will be in my home town :)',
 "Louise: I have so much work right now, but I will continue to work until the end of summer, then I'm also back to Uni, on the 26th September!",
 'Theresa: You must send me some pictures, so I can see where you live :) ',
 'Lo

In [None]:
def count_num_dialogues(dialogue):
  if '\r\n' in dialogue:
    return len(dialogue.split('\r\n'))
  else:
    return len(dialogue.split('\n'))
def count_num_people(dialogue):
  sentences = dialogue.split('\r\n')
  if '\r\n' in dialogue:
    sentences = dialogue.split('\r\n')
  else:
    sentences = dialogue.split('\n')
  people = set()
  for s in sentences:
    people.add(s.split(':')[0])
  # print(people)
  return len(list(people))
def get_bin(num):
  if num >= 3 and num <= 6:
    return 0
  elif num >= 7 and num <= 12:
    return 1
  elif num >= 13 and num <= 18:
    return 2
  elif num >= 19 and num <= 30:
    return 3

In [None]:
splits = ["train","test","validation"]
for split in splits:
  val_num_dialogues = []
  val_num_people = []
  val_bin = []
  for i in range(len(dataset[split])):
    num = count_num_dialogues(dataset[split][i]['dialogue'])
    val_num_dialogues.append(num)
    val_bin.append(get_bin(num))
    val_num_people.append(count_num_people(dataset[split][i]['dialogue']))

  dataset[split]=dataset[split].add_column('num_dialogues',val_num_dialogues)
  dataset[split]=dataset[split].add_column('num_people',val_num_people)
  dataset[split]=dataset[split].add_column('bin',val_bin)

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'num_dialogues', 'num_people', 'bin'],
        num_rows: 14732
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'num_dialogues', 'num_people', 'bin'],
        num_rows: 819
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'num_dialogues', 'num_people', 'bin'],
        num_rows: 818
    })
})

In [None]:
dataset.push_to_hub("ysahil97/samsum")

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/15 [00:00<?, ?ba/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/638 [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/datasets/ysahil97/samsum/commit/0fe4de7fe4bca0cfe5d236fe2ff90b92c6313b7f', commit_message='Upload dataset', commit_description='', oid='0fe4de7fe4bca0cfe5d236fe2ff90b92c6313b7f', pr_url=None, pr_revision=None, pr_num=None)

## Sample the Dataset

In [None]:
dataset = load_dataset('ysahil97/samsum')
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'num_dialogues', 'num_people', 'bin'],
        num_rows: 14732
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'num_dialogues', 'num_people', 'bin'],
        num_rows: 819
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'num_dialogues', 'num_people', 'bin'],
        num_rows: 818
    })
})

In [None]:
train_split = dataset["train"]
dialogue_bins = [0]*4
for i in range(len(train_split)):
  if train_split[i]["num_dialogues"] >= 3 and train_split[i]["num_dialogues"] <= 6:
    dialogue_bins[0] += 1
  elif train_split[i]["num_dialogues"] >= 7 and train_split[i]["num_dialogues"] <= 12:
    dialogue_bins[1] += 1
  elif train_split[i]["num_dialogues"] >= 13 and train_split[i]["num_dialogues"] <= 18:
    dialogue_bins[2] += 1
  elif train_split[i]["num_dialogues"] >= 19 and train_split[i]["num_dialogues"] <= 30:
    dialogue_bins[3] += 1
print(dialogue_bins)

[2101, 2101, 2101, 2101]


In [None]:
train_split = dataset["train"]
# dialogue_bins = [0]*4
idxs = []
bin0 = 0
bin1 = 0
bin2 = 0
bin3 = 0
for i in range(len(train_split)):
  if train_split[i]["bin"] == 0 and bin0 <= 2100 :
    idxs.append(i)
    bin0 += 1
  elif train_split[i]["bin"] == 1 and bin1 <= 2100:
    idxs.append(i)
    bin1 += 1
  elif train_split[i]["bin"] == 2 and bin2 <= 2100:
    idxs.append(i)
    bin2 += 1
  elif train_split[i]["bin"] == 3 and bin3 <= 2100:
    idxs.append(i)
    bin3 += 1
# print(dialogue_bins)
idxs

In [None]:
subsampled_train = train_split.select(idxs)
subsampled_train

Dataset({
    features: ['id', 'dialogue', 'summary', 'num_dialogues', 'num_people', 'bin'],
    num_rows: 8404
})

In [None]:
dataset["train"] = subsampled_train.shuffle(seed=42)

In [None]:
dataset.push_to_hub("ysahil97/samsum_subsampled")

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/9 [00:00<?, ?ba/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/687 [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/datasets/ysahil97/samsum_subsampled/commit/f3ecdf61873ef6355ec2ff734b331ad0a5381299', commit_message='Upload dataset', commit_description='', oid='f3ecdf61873ef6355ec2ff734b331ad0a5381299', pr_url=None, pr_revision=None, pr_num=None)

**Link to the dataset on Hugging Face Hub:** https://huggingface.co/datasets/ysahil97/samsum_subsampled

# 2. Finetune a Foundation Model

Now that you have collected a dataset, its time to pick a base model to finetune.


* Go to the [Hugging Face Hub](https://huggingface.co/models) and pick a foundation model to fine-tune. (For example, if you are interested in generating images, you could pick [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) or [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium) as your base model.) Make sure to pick a model that can be loaded in the free tier of the Colab Notebook.
* Then finetine the your model on the dataset that you collected in Step 1. There are different ways to finetune a model: [from LoRA to a full finetune](https://huggingface.co/docs/diffusers/v0.13.0/en/training/lora). Pick one of these methods, and explain your reasoning below. We suggest that you use use the `transformers` or `diffusers` library to finetune a foundation model.
* Generate some samples from the base model and from the final finetuned model. How do they compare?  
* [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.


In [None]:
DEFAULT_SYSTEM_PROMPT = """
Summarize the following conversation.
""".strip()


def create_generation_prompt(dialogue, summary, system_prompt = DEFAULT_SYSTEM_PROMPT):
  return f"""### Instruction: {system_prompt}


### Input:
{dialogue.strip()}


### Summary:
{summary}
""".strip()



def get_dialogue(datapoint):
  return datapoint['dialogue']


def generate_text(datapoint):
  dialogue = get_dialogue(datapoint)
  return {
      "summary" : datapoint["summary"],
      "dialogue" : dialogue,
      "text": create_generation_prompt(dialogue, datapoint["summary"])
  }



In [None]:
# Example usage with a new dataset format
example_data_point = {
    "id": "train_0",
    "dialogue": "#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today? #Person2#: I found it would...",
    "summary": "Mr. Smith's getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkins'll gi...",
    "topic": "get a check-up"
}


example = generate_text(example_data_point)
print(example["text"])


### Instruction: Summarize the following conversation.


### Input:
#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today? #Person2#: I found it would...


### Summary:
Mr. Smith's getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkins'll gi...


In [None]:
def process_dataset(dataset):
  new_dataset = dataset.map(generate_text)
  columns_to_remove = [col for col in new_dataset.column_names if col not in ["dialogue", "summary", "text"]]
  return new_dataset.remove_columns(columns_to_remove)


In [None]:
from datasets import load_dataset
from transformers import (
    BitsAndBytesConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
)
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
import torch
dataset = load_dataset('ysahil97/samsum_subsampled',split="train[:4000]")
dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/687 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.95M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/350k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/337k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8404 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/819 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/818 [00:00<?, ? examples/s]

Dataset({
    features: ['id', 'dialogue', 'summary', 'num_dialogues', 'num_people', 'bin'],
    num_rows: 4000
})

In [None]:
new_dataset = process_dataset(dataset).shuffle(seed=42)

Map:   0%|          | 0/4000 [00:00<?, ? examples/s]

In [None]:
### WRITE YOUR CODE TO FINETUNE THE MODEL HERE
# Model from Hugging Face hub
base_model = "NousResearch/Llama-2-7b-hf"

# Fine-tuned model
new_model = "llama-2-7b-chat-samsum-1"

In [None]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

In [15]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [None]:
import wandb
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33msyerawar[0m ([33mnargen[0m). Use [1m`wandb login --relogin`[0m to force relogin


True

In [None]:
import os
os.environ["WANDB_PROJECT"]="finetune_llmaa"

In [None]:
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

In [None]:
model_1 = get_peft_model(model,peft_params)
model_1.print_trainable_parameters()

trainable params: 33,554,432 || all params: 6,771,970,048 || trainable%: 0.4955


In [None]:
training_params = TrainingArguments(
    output_dir="./results_4",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="cosine",
    report_to="wandb"
)

In [None]:
trainer = SFTTrainer(
    model=model_1,
    train_dataset=new_dataset,
    # peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/4000 [00:00<?, ? examples/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


In [None]:
trainer.train()



Step,Training Loss
25,1.9297
50,1.5228
75,1.6358
100,1.3574
125,1.6572
150,1.4053
175,1.6538
200,1.3996
225,1.6127
250,1.3677


Step,Training Loss
25,1.9297
50,1.5228
75,1.6358
100,1.3574
125,1.6572
150,1.4053
175,1.6538
200,1.3996
225,1.6127
250,1.3677


TrainOutput(global_step=1000, training_loss=1.508038745880127, metrics={'train_runtime': 3431.7819, 'train_samples_per_second': 1.166, 'train_steps_per_second': 0.291, 'total_flos': 4.133006730099917e+16, 'train_loss': 1.508038745880127, 'epoch': 1.0})

In [None]:
print(1)

1


In [None]:
from transformers import pipeline, logging

In [None]:
def create_generation_prompt_eval(dialogue, system_prompt = DEFAULT_SYSTEM_PROMPT):
  return f"""### Instruction: {system_prompt}


### Input:
{dialogue.strip()}


### Summary:
"""

### LLaMA 2 7B models

In [44]:
# Base Model
# Upload from finetuning

logging.set_verbosity(logging.CRITICAL)

# prompt = "Who is Leonardo Da Vinci?"

# model = AutoModelForCausalLM.from_pretrained(
#     base_model,
#     quantization_config=quant_config,
#     device_map={"": 0}
# )


model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer_1 = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer_1.pad_token = tokenizer_1.eos_token
tokenizer_1.padding_side = "right"

example_data_point = {
    "id": "train_0",
    "dialogue": "Augustine: Guys, remember it's Wharton's bday next week? Darlene: yay, a party! Heather: yay! crap we need to buy him a present Walker: he mentioned paper shredder once Augustine: wtf?!? Walker: he did really. for no reason at all. Heather: whatever that make him happy Darlene: cool with me. we can shred some papers at the party Augustine: so much fun Heather: srsly guys, you mean we should really get office equipment??? Darlene: Walk, ask him if he really wnts it and if he yes then we get it Walker: i heard him say that. wasn;t drunk. me neither. Darlene: but better ask him twice Walker: will do Augustine: 2moro ok? Darlene: and sure ask ab the party!",
    "summary": "Mr. Smith's getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkins'll gi...",
    "topic": "get a check-up"
}


example = create_generation_prompt_eval(example_data_point["dialogue"])
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer_1, max_length=275)
result = pipe(example)
print(result[0]['generated_text'])
# print(example)

### Instruction: Summarize the following conversation.


### Input:
Augustine: Guys, remember it's Wharton's bday next week? Darlene: yay, a party! Heather: yay! crap we need to buy him a present Walker: he mentioned paper shredder once Augustine: wtf?!? Walker: he did really. for no reason at all. Heather: whatever that make him happy Darlene: cool with me. we can shred some papers at the party Augustine: so much fun Heather: srsly guys, you mean we should really get office equipment??? Darlene: Walk, ask him if he really wnts it and if he yes then we get it Walker: i heard him say that. wasn;t drunk. me neither. Darlene: but better ask him twice Walker: will do Augustine: 2moro ok? Darlene: and sure ask ab the party!


### Summary:
- The conversation starts with the topic of a party for Wharton's birthday.
- Heather and Darlene are excited about the party, while Walker and Augustine are not so sure.
- Walker suggests buying a paper sh


In [47]:
# QLoRA FineTuned Model
# Upload from finetuning

logging.set_verbosity(logging.CRITICAL)
new_model_sahil = "ysahil97/llama-2-7b-chat-samsum-5"
# baseline_model = AutoModelForCausalLM.from_pretrained(
#     new_model_sahil,
#     quantization_config=quant_config,
#     device_map={"": 0}
# )


# baseline_model.config.use_cache = False
# baseline_model.config.pretraining_tp = 1

# tokenizer_baseline = AutoTokenizer.from_pretrained(new_model_sahil, trust_remote_code=True)
# tokenizer_baseline.pad_token = tokenizer_baseline.eos_token
# tokenizer_baseline.padding_side = "right"

# prompt = "Who is Leonardo Da Vinci?"
example_data_point = {
    "id": "train_0",
    "dialogue": "Augustine: Guys, remember it's Wharton's bday next week? Darlene: yay, a party! Heather: yay! crap we need to buy him a present Walker: he mentioned paper shredder once Augustine: wtf?!? Walker: he did really. for no reason at all. Heather: whatever that make him happy Darlene: cool with me. we can shred some papers at the party Augustine: so much fun Heather: srsly guys, you mean we should really get office equipment??? Darlene: Walk, ask him if he really wnts it and if he yes then we get it Walker: i heard him say that. wasn;t drunk. me neither. Darlene: but better ask him twice Walker: will do Augustine: 2moro ok? Darlene: and sure ask ab the party!",
    "summary": "Mr. Smith's getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkins'll gi...",
    "topic": "get a check-up"
}


example = create_generation_prompt_eval(example_data_point["dialogue"])
pipe = pipeline(task="text-generation", model=baseline_model, tokenizer=tokenizer_baseline, max_length=300)
result = pipe(example)
print(result[0]['generated_text'])
# print(example)

### Instruction: Summarize the following conversation.


### Input:
Augustine: Guys, remember it's Wharton's bday next week? Darlene: yay, a party! Heather: yay! crap we need to buy him a present Walker: he mentioned paper shredder once Augustine: wtf?!? Walker: he did really. for no reason at all. Heather: whatever that make him happy Darlene: cool with me. we can shred some papers at the party Augustine: so much fun Heather: srsly guys, you mean we should really get office equipment??? Darlene: Walk, ask him if he really wnts it and if he yes then we get it Walker: i heard him say that. wasn;t drunk. me neither. Darlene: but better ask him twice Walker: will do Augustine: 2moro ok? Darlene: and sure ask ab the party!


### Summary:
Wharton's birthday is next week. Darlene, Heather and Walker will buy him a paper shredder. Walker will ask him about it tomorrow. Darlene and Heather will also organize a party for him. Walker will ask Wharton about the party tomorrow as well.


### Relat

### LLaMA 2 7B Chat Model

In [None]:
logging.set_verbosity(logging.CRITICAL)

chat_model = "NousResearch/Llama-2-7b-chat-hf"
new_model_sahil = "ysahil97/llama-2-7b-chat-samsum-4"
baseline_model = AutoModelForCausalLM.from_pretrained(
    chat_model,
    quantization_config=quant_config,
    device_map={"": 0}
)


baseline_model.config.use_cache = False
baseline_model.config.pretraining_tp = 1

tokenizer_baseline = AutoTokenizer.from_pretrained(chat_model, trust_remote_code=True)
tokenizer_baseline.pad_token = tokenizer_baseline.eos_token
tokenizer_baseline.padding_side = "right"

# prompt = "Who is Leonardo Da Vinci?"
example_data_point = {
    "id": "train_0",
    "dialogue": "A: Hi Tom, are you busy tomorrow’s afternoon? B: I’m pretty sure I am. What’s up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we’ve discussed it many times. I think he’s ready now. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))",
    "summary": "Mr. Smith's getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkins'll gi...",
    "topic": "get a check-up"
}


example = create_generation_prompt_eval(example_data_point["dialogue"])
pipe = pipeline(task="text-generation", model=baseline_model, tokenizer=tokenizer_baseline, max_length=400)
result = pipe(example)
print(result[0]['generated_text'])

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Instruction: Summarize the following conversation.


### Input:
A: Hi Tom, are you busy tomorrow’s afternoon? B: I’m pretty sure I am. What’s up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we’ve discussed it many times. I think he’s ready now. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))


### Summary:
The speaker, Tom, is planning to go to an animal shelter with a friend to get a puppy for Tom's son. They discussed the idea before 

In [None]:
logging.set_verbosity(logging.CRITICAL)

new_model_sahil = "ysahil97/llama-2-7b-chat-samsum-4"
baseline_model = AutoModelForCausalLM.from_pretrained(
    new_model_sahil,
    quantization_config=quant_config,
    device_map={"": 0}
)


baseline_model.config.use_cache = False
baseline_model.config.pretraining_tp = 1

tokenizer_baseline = AutoTokenizer.from_pretrained(new_model_sahil, trust_remote_code=True)
tokenizer_baseline.pad_token = tokenizer_baseline.eos_token
tokenizer_baseline.padding_side = "right"

prompt = "Who is Leonardo Da Vinci?"
example_data_point = {
    "id": "train_0",
    "dialogue": "A: Hi Tom, are you busy tomorrow’s afternoon? B: I’m pretty sure I am. What’s up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we’ve discussed it many times. I think he’s ready now. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))",
    "summary": "Mr. Smith's getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkins'll gi...",
    "topic": "get a check-up"
}


example = create_generation_prompt_eval(example_data_point["dialogue"])
pipe = pipeline(task="text-generation", model=baseline_model, tokenizer=tokenizer_baseline, max_length=400)
result = pipe(example)
print(result[0]['generated_text'])

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Instruction: Summarize the following conversation.


### Input:
A: Hi Tom, are you busy tomorrow’s afternoon? B: I’m pretty sure I am. What’s up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we’ve discussed it many times. I think he’s ready now. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))


### Summary:
Tomorrow afternoon, Tom and Alex will go to the animal shelter to get a puppy for Alex's son. Alex has discussed this with his son. 

### Others

In [None]:
# Upload from finetuning

logging.set_verbosity(logging.CRITICAL)

# prompt = "Who is Leonardo Da Vinci?"
example_data_point = {
    "id": "train_0",
    "dialogue": "A: Hi Tom, are you busy tomorrow’s afternoon? B: I’m pretty sure I am. What’s up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we’ve discussed it many times. I think he’s ready now. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))",
    "summary": "Mr. Smith's getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkins'll gi...",
    "topic": "get a check-up"
}


example = create_generation_prompt_eval(example_data_point["dialogue"])
pipe = pipeline(task="text-generation", model=model_1, tokenizer=tokenizer, max_length=1000)
result = pipe(example)
print(result[0]['generated_text'])
# print(example)

### Instruction: Summarize the following conversation.


### Input:
A: Hi Tom, are you busy tomorrow’s afternoon? B: I’m pretty sure I am. What’s up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we’ve discussed it many times. I think he’s ready now. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))


### Summary:
Tom will go with A to the animal shelter to get a puppy for his son. They'll get a small dog. Tom thinks it's good to raise a dog. T

In [None]:
# Upload from finetuning

logging.set_verbosity(logging.CRITICAL)

# prompt = "Who is Leonardo Da Vinci?"
example_data_point = {
    "id": "train_0",
    "dialogue": "A: Hi Tom, are you busy tomorrow’s afternoon? B: I’m pretty sure I am. What’s up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we’ve discussed it many times. I think he’s ready now. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))",
    "summary": "Mr. Smith's getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkins'll gi...",
    "topic": "get a check-up"
}


example = create_generation_prompt_eval(example_data_point["dialogue"])
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=1000)
result = pipe(example)
print(result[0]['generated_text'])
# print(example)

### Instruction: Summarize the following conversation.


### Input:
A: Hi Tom, are you busy tomorrow’s afternoon? B: I’m pretty sure I am. What’s up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we’ve discussed it many times. I think he’s ready now. B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))


### Summary:
Tomorrow afternoon, Tom will go with A to the animal shelter. They will get a puppy for A's son. They will take a little dog. Tom th

In [None]:
new_model_1 = "llama-2-7b-chat-samsum-5"
tokenizer.push_to_hub(new_model_1)
model_1.push_to_hub(new_model_1)

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/134M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ysahil97/llama-2-7b-chat-samsum-5/commit/ce32318bd9a8b887c017e45f2b3495ef62a0e654', commit_message='Upload model', commit_description='', oid='ce32318bd9a8b887c017e45f2b3495ef62a0e654', pr_url=None, pr_revision=None, pr_num=None)

**Write up**:
* Explain what finetuning strategy you used and why

I used QLoRA method to finetune the model, rather than the full finetuning strategy. This decision was made due to a variety of reasons

Given the constraint of the free tier of Google Colab, this results in about 15 GB of the GPU memory being available. I chose LLaMA 2-7B model as the base pretrained model for this task. In order to load this model in the GPU, it would take up about 80-90% of the GPU RAM, which would make things difficult. Hence the first decision was to load a quantized version of this model in order to free up the memory space for other parameters.

I used QLoRA method, which combines both the ideas of quantization and Low Rank Adapters (LoRA) to effectively perform finetuning on a small set of additional parameters, rather than modifying (finetuning) the weights completely. While the original LoRA method also introduces a small set of learnable parameters, and freezes the original model parameters, training a quantized version of this model has proven to be very difficult. QLoRA here provides the necessary functionality of the required data types, and quantization constants which result in high-fidelity finetuning of Quantized Models, just like with LoRA on original models stored in 16-bit floats. For this particular use case, this resulted in about 0.49% increase in learnable parameters, which ensured fast finetuning and also helped in increasing the data used for finetuning.

As one of the last optimizations, I noticed that with my current dataset, it was not possible for me to use batch sizes larger than 1, due to limitations of GPU memory. I had to suffice with the batch size of 1 throughout my finetuning experiments. However, in order to not lose the benefits of batching the gradients, I used the _gradient accumulation_ technique, which essentially mimics the batched training by grouping a set of consecutive batches of size 1 as a batch, and accumulating the gradients across the batches. At the end of the gradient accumulation value, the optimizer takes action on these accumulated gradients and modifies the weights of the QLoRA adapters.

* Share some samples from the base model and from the final finetuned model. How do they compare?

  * Example 1

  Jane: Hello

  Vegano Resto: Hello, how may I help you today?

  Jane: I would like to make a reservation.
  
  Jane: For 6 people, tonight around 20:00
  
  Vegano Resto: Let me just check.
  
  Vegano Resto: Ah, I'm afraid that there is no room at 20:00.
  
  Vegano Resto: However, I could offer you a table for six at 18:30 or at 21:00
  
  Vegano Resto: Would either of those times suit you?
  
  Jane: Oh dear.
  
  Jane: Let me just ask my friends.
  
  Vegano Resto: No problem.
  
  Jane: 21:00 will be ok.
  
  Vegano Resto: Perfect. So tonight at 21:00 for six people under your name.
  
  Jane: great, thank you!


  **LLaMA-2-7B-HF Response**:

\### Summary:

- Jane would like to make a reservation.
- Jane wants to make a reservation for 6 people.
- Jane would like to make a reservation for 6 people tonight.
- Jane wants to make a reservation for 6 people tonight.
- Jane wants to make a reservation for 6 people tonight around 20:00.
- Jane wants to make a reservation for 6 people tonight around 20:00.
- Jane wants to make a reservation for 6 people tonight around 20:00.
- Jane wants to make a reservation for 6 people tonight around 20:00.
- Jane wants to make a reservation for 6 people tonight around 20:00.
- Jane wants to make a reservation for 6


    **Our FineTuned Model Response**:

  \### Summary:

  Jane is going to Vegano Resto tonight at 21:00 for 6 people. Jane will make the reservation.


  \### Related Links:


  \### Keywords:

  Jane, Vegano Resto, reservation, 21:00, 6 people, table


  \### Summary:

  Jane is going to Vegano Resto tonight at 21:00 for 6 people. Jane will make the reservation.


    **Comparision**:
  The original Pretrained model here make an attempt at summarizing the dialogue, but it does in parts. In each successive point, it tries to build up the summary, but it gets stuck at one sentence, which is partially correct, but doesn't reflect the final outcome of this dialogue.

  Our finetuned model, however, avoids this repetition problem, and instead provides the final outcome of the dialogue in a more succinct manner. It also provides some relevant keywords, thereby showing the key factors on which this summarization is based.

  * Example 2

  Nickola: Have you found it?
  
  Sophie: No! Still looking :\(
    
  Nickola: Check pockets and handbags.
  
  Sophie: Checked them all twice already...

    **LLaMA-2-7B-HF Response**:

  \### Summary:

  Nickola: Have you found it?

  Sophie: No! Still looking :(

  Nickola: Check pockets and handbags

    **Our FineTuned Model Response**:

  \### Summary:
  Sophie has not found her keys. She has checked her pockets and handbags twice already.

  \### Keywords:

  Sophie

   **Comparision**:

  The pretrained model, simply repeats the dialogue here, rather than doing any work, which shows the need for extra effort in steering the model towards the objectives of various tasks.

  Our finetuned model provides a simple summary of this short dialog, answering the summarization task without any confusion.


  * Example 3:

  Augustine: Guys, remember it's Wharton's bday next week?
  
  Darlene: yay, a party!
  
  Heather: yay! crap we need to buy him a present
  
  Walker: he mentioned paper shredder once
  
  Augustine: wtf?!?
  
  Walker: he did really. for no reason at all.
  
  Heather: whatever that make him happy
  
  Darlene: cool with me. we can shred some papers at the party
  
  Augustine: so much fun
  
  Heather: srsly guys, you mean we should really get office equipment???
  
  Darlene: Walk, ask him if he really wnts it and if he yes then we get it
  
  Walker: i heard him say that. wasn;t drunk. me neither.
  
  Darlene: but better ask him twice
  
  Walker: will do
  
  Augustine: 2moro ok?
  
  Darlene: and sure ask ab the party!

    **LLaMA-2-7B response**:

  \### Summary:
  - The conversation starts with the topic of a party for Wharton's birthday.
  - Heather and Darlene are excited about the party, while Walker and Augustine are not so sure.
  - Walker suggests buying a paper sh


    **Our Finetuned Model response**:

  \### Summary:

  Wharton's birthday is next week. Darlene, Heather and Walker will buy him a
  paper shredder. Walker will ask him about it tomorrow. Darlene and Heather
  will also organize a party for him. Walker will ask Wharton about the party tomorrow as well.


  **Comparision**:

  The LLaMA-2-7B model here talks about the dialogue with the correct context, but it fails to condense all this information in short and succinct manner, a core requirement of the summarization task. In addition, the pretrained model is unreliable, occasionally producing blank responses or repeating the prompt multiple times.

  Our finetuned model does a much better job about presenting this information in short sentences. A potential source of improvement could be if it could also identity all the speaking entities in the talk correctly, as it missed Augustine here. Although the summary covers a lot of the points in the dialogue, it misses the key point of getting the feedback from Wharton about the paper-shredder as a gift.

[WRITE HERE]

**Link to the model on Hugging Face Hub:** https://huggingface.co/ysahil97/llama-2-7b-chat-samsum-5