
---

### **Breakdown of Key Libraries**

1️⃣ **Accelerate**  
   - Hugging Face’s library for optimizing multi-GPU, TPU, and distributed training.  
   - Essential when using FSDP or DeepSpeed.  

2️⃣ **PEFT (Parameter-Efficient Fine-Tuning)**  
   - Enables memory-efficient tuning methods like LoRA, QLoRA, and adapters.  
   - Replaces full fine-tuning with lightweight, focused updates.  

3️⃣ **Bitsandbytes**  
   - Supports 8-bit and 4-bit quantization.  
   - Critical for reducing VRAM usage in QLoRA fine-tuning.  

4️⃣ **Transformers (GitHub Version)**  
   - Installs the latest version of Hugging Face’s `transformers` library directly from GitHub.  
   - Required for accessing new models/features not yet available in the PyPI release.  

5️⃣ **TRL (Transformer Reinforcement Learning)**  
   - Designed for Reinforcement Learning from Human Feedback (RLHF).  
   - Used to train ChatGPT-like models.  

6️⃣ **Py7zr**  
   - Handles extraction of 7z-format compressed files.  
   - Useful for datasets downloaded from Hugging Face or other sources.  

7️⃣ **Auto-GPTQ**  
   - Implements GPTQ-based quantization for faster inference and improved VRAM efficiency.  

8️⃣ **Optimum**  
   - Hugging Face’s library for hardware optimizations (ONNX, TensorRT, Habana Gaudi).  
   - Ideal for accelerated inference and optimized training.  

---

### **Summary**  
- **Low-VRAM GPUs (≤24GB)**: Use the `bitsandbytes + PEFT + QLoRA` combo.  
- **Multi-GPU/TPU Clusters**: Prioritize `accelerate` and `optimum`.  
- **RLHF (ChatGPT-style tuning)**: Leverage the `TRL` package.  

---

Let me know if you need further refinements! 🔍

In [None]:
!pip install accelerate peft bitsandbytes git+https://github.com/huggingface/transformers trl py7zr auto-gptq optimum

Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-vk1vqjt3
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-vk1vqjt3
  Resolved https://github.com/huggingface/transformers to commit f4d57f2f0cdff0f63ee74a1f16f442dfaf525231
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting bitsandbytes
  Downloading bitsandbytes-0.46.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting trl
  Downloading trl-0.21.0-py3-none-any.whl.metadata (11 kB)
Collecting py7zr
  Downloading py7zr-1.0.0-py3-none-any.whl.metadata (17 kB)
Collecting auto-gptq
  Downloading auto_gptq-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting optimum
  Downloading optimum-1.27.0-py3-none-any.whl

 Breakdown
1️⃣ from huggingface_hub import notebook_login

This imports the notebook_login function, which is used for authentication inside Jupyter Notebooks or Google Colab.

2️⃣ notebook_login()

This will prompt you to enter your Hugging Face access token.
You can get the token from Hugging Face website.

🔥 Why is this important?

If you are downloading a private model or dataset, authentication is required.

If you want to upload your fine-tuned model back to Hugging Face, you need to log in first.

💡 Alternative for Script-Based Login

If you are running a script (not in a notebook), use:

from huggingface_hub import login

login(token="your_huggingface_token")


In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Breakdown of Each Import

1️⃣ import torch

PyTorch is the core deep-learning library used for training models.
It helps in tensor operations, GPU acceleration, and model training.

2️⃣ from datasets import load_dataset, Dataset

load_dataset: Used to load datasets from Hugging Face Hub or local files.
Dataset: Helps in creating a dataset manually from Python objects (like a list or dictionary).

3️⃣ from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model

LoraConfig: Configuration for LoRA (Low-Rank Adaptation), which makes fine-tuning more memory efficient.

AutoPeftModelForCausalLM: Loads a causal language model with PEFT (Parameter-Efficient Fine-Tuning).

prepare_model_for_kbit_training: Optimizes the model for low-bit training (8-bit/4-bit with QLoRA).

get_peft_model: Converts a standard model into a LoRA-optimized model.

4️⃣ from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments

AutoModelForCausalLM: Loads a pre-trained causal language model (like LLaMA, Mistral).

AutoTokenizer: Tokenizer for preprocessing text input.

GPTQConfig: Configures GPTQ (Quantized GPT) for efficient inference.

TrainingArguments: Defines training settings like epochs, batch size, optimizer, learning rate, etc.

5️⃣ from trl import SFTTrainer

SFTTrainer: Trainer from the trl library used for Supervised Fine-Tuning (SFT).

It simplifies LoRA-based fine-tuning and integrates well with Hugging Face.
6️⃣ import os

Used for handling file paths and system settings, like saving models, loading datasets, etc.

🔥 What is this setup used for?

✅ Fine-tuning large language models (LLMs) efficiently using LoRA and QLoRA.

✅ Using Hugging Face datasets and models.

✅ Training a model with low-bit precision (4-bit/8-bit) for better memory efficiency.

In [None]:
import torch
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments
from trl import SFTTrainer
import os

 Understanding the Code: Loading and Preparing the Dataset for Fine-Tuning
This code is loading, processing, and converting a dataset into a format suitable for fine-tuning an LLM (like Mistral or LLaMA-2) for text summarization. Let’s break it down step by step.


1️⃣ Loading the Dataset

data = load_dataset("samsum", split="train")

🔹 load_dataset("samsum", split="train") loads the Samsum dataset, which contains dialogues and their summaries.

🔹 split="train" ensures that we load only the training set.

✅ Samsum Dataset Overview

dialogue: A conversation between people.

summary: A short summary of that conversation.

🔍 Example from the dataset:

dialogue	summary

Alice: Hey, how are you? Bob: I'm good, you?	Alice and Bob greet each other.

2️⃣ Converting Dataset to Pandas DataFrame

data_df = data.to_pandas()

🔹 This converts the dataset into a Pandas DataFrame for easier processing.

3️⃣ Formatting Data for LLM Fine-Tuning

data_df["text"] = data_df[["dialogue", "summary"]].apply(
    lambda x: "###Human: Summarize this following dialogue: " + x["dialogue"] +
              "\n###Assistant: " + x["summary"],
    axis=1
)

🔹 Purpose: It formats the data into a ChatML-style prompt to fine-tune LLaMA or Mistral.

💡 How It Works:

It takes the dialogue and summary columns.

It transforms them into a prompt-response format for LLM training.

🔍 Example Output:

###Human: Summarize this following dialogue:  

Alice: Hey, how are you?

Bob: I'm good, you?  

###Assistant: Alice and Bob greet each other.

🔹 This format mimics human-AI interactions, making it suitable for instruction-tuned models like Mistral or LLaMA-2-Chat.

4️⃣ Checking the First Example

print(data_df.iloc[0])

🔹 data_df.iloc[0] prints the first row of the dataset after formatting.

5️⃣ Converting Back to Hugging Face Dataset

data = Dataset.from_pandas(data_df)

🔹 Why? Since fine-tuning with 🤗 Transformers & PEFT requires a Hugging Face dataset, we convert it back after processing.

🚀 Summary

✅ Loads the Samsum dataset (dialogue → summary).

✅ Formats it into a prompt-response structure for LLM fine-tuning.

✅ Converts it back into a Hugging Face Dataset for training.

🔥 Next: Do you want to tokenize this dataset for Mistral/LLaMA-2 fine-tuning? 🚀

In [None]:
data = load_dataset("knkarthick/samsum", split="train")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

train.csv: 0.00B [00:00, ?B/s]

validation.csv: 0.00B [00:00, ?B/s]

test.csv: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/14732 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/818 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/819 [00:00<?, ? examples/s]

In [None]:
data_df = data.to_pandas()

In [None]:
data_df

Unnamed: 0,id,dialogue,summary
0,13818513,Amanda: I baked cookies. Do you want some?\nJ...,Amanda baked cookies and will bring Jerry some...
1,13728867,Olivia: Who are you voting for in this electio...,Olivia and Olivier are voting for liberals in ...
2,13681000,"Tim: Hi, what's up?\nKim: Bad mood tbh, I was ...",Kim may try the pomodoro technique recommended...
3,13730747,"Edward: Rachel, I think I'm in ove with Bella....",Edward thinks he is in love with Bella. Rachel...
4,13728094,Sam: hey overheard rick say something\nSam: i...,"Sam is confused, because he overheard Rick com..."
...,...,...,...
14727,13863028,Romeo: You are on my ‘People you may know’ lis...,Romeo is trying to get Greta to add him to her...
14728,13828570,Theresa: <file_photo>\nTheresa: <file_photo>\n...,Theresa is at work. She gets free food and fre...
14729,13819050,John: Every day some bad news. Japan will hunt...,Japan is going to hunt whales again. Island an...
14730,13828395,Jennifer: Dear Celia! How are you doing?\nJenn...,Celia couldn't make it to the afternoon with t...


In [None]:
data_df.head(1)

Unnamed: 0,id,dialogue,summary
0,13818513,Amanda: I baked cookies. Do you want some?\nJ...,Amanda baked cookies and will bring Jerry some...


In [None]:
data_df["text"] = data_df[["dialogue", "summary"]].fillna("").apply(lambda x: "###Human: Summarize this following dialogue: \n" + x["dialogue"] + "\n###Assistant: " +x["summary"], axis=1)

In [None]:
data_df.head(1)['text']

Unnamed: 0,text
0,###Human: Summarize this following dialogue: \...


In [None]:
print(data_df.iloc[0]['text'])

###Human: Summarize this following dialogue: 
Amanda: I baked  cookies. Do you want some?
Jerry: Sure!
Amanda: I'll bring you tomorrow :-)
###Assistant: Amanda baked cookies and will bring Jerry some tomorrow.


In [None]:
data_df

Unnamed: 0,id,dialogue,summary,text
0,13818513,Amanda: I baked cookies. Do you want some?\nJ...,Amanda baked cookies and will bring Jerry some...,###Human: Summarize this following dialogue: \...
1,13728867,Olivia: Who are you voting for in this electio...,Olivia and Olivier are voting for liberals in ...,###Human: Summarize this following dialogue: \...
2,13681000,"Tim: Hi, what's up?\nKim: Bad mood tbh, I was ...",Kim may try the pomodoro technique recommended...,###Human: Summarize this following dialogue: \...
3,13730747,"Edward: Rachel, I think I'm in ove with Bella....",Edward thinks he is in love with Bella. Rachel...,###Human: Summarize this following dialogue: \...
4,13728094,Sam: hey overheard rick say something\nSam: i...,"Sam is confused, because he overheard Rick com...",###Human: Summarize this following dialogue: \...
...,...,...,...,...
14727,13863028,Romeo: You are on my ‘People you may know’ lis...,Romeo is trying to get Greta to add him to her...,###Human: Summarize this following dialogue: \...
14728,13828570,Theresa: <file_photo>\nTheresa: <file_photo>\n...,Theresa is at work. She gets free food and fre...,###Human: Summarize this following dialogue: \...
14729,13819050,John: Every day some bad news. Japan will hunt...,Japan is going to hunt whales again. Island an...,###Human: Summarize this following dialogue: \...
14730,13828395,Jennifer: Dear Celia! How are you doing?\nJenn...,Celia couldn't make it to the afternoon with t...,###Human: Summarize this following dialogue: \...


###Human: Summarize this following dialogue:  

Amanda: I baked cookies. Do you want some?  
Jerry: Yes, I’d love some.  

###Assistant:

Amanda baked cookies and will bring Jerry some.


In [None]:
print(data_df.iloc[0])

id                                                   13818513
dialogue    Amanda: I baked  cookies. Do you want some?\nJ...
summary     Amanda baked cookies and will bring Jerry some...
text        ###Human: Summarize this following dialogue: \...
Name: 0, dtype: object


In [None]:
data = Dataset.from_pandas(data_df)

In [None]:
data

Dataset({
    features: ['id', 'dialogue', 'summary', 'text'],
    num_rows: 14732
})

In [None]:
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
tokenizer.eos_token

'</s>'

In [None]:
tokenizer.eos_token_id

2

In [None]:
tokenizer.pad_token

In [None]:
tokenizer.pad_token = tokenizer.eos_token

In [None]:
# print(model)
from transformers import BitsAndBytesConfig

In [None]:
# Load a 4-bit quantized model
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,    # Enable 4-bit quantization
    bnb_4bit_compute_dtype=torch.float16,  # Use fp16 for computation
    bnb_4bit_use_double_quant=True,  # Use double quantization for memory efficiency
)


In [None]:

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    quantization_config=quantization_config,
    device_map="auto"  # Automatically assigns layers to available GPUs
)

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [None]:
print(model)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): MistralRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): Mist

### Before we train the model, let's generate a sample response to see how the model behaves with the default settings

In [None]:
inputs = tokenizer("""
###Human: Summarize this following dialogue: John: I'm at the railway station in Newyork Paul: No problems so far? John: no, everything's going smoothly Paul: good. lets meet there soon!
###Assistant: """, return_tensors="pt").to("cuda")

In [None]:
print(inputs)

{'input_ids': tensor([[    1, 28705,    13, 27332, 28769,  6366, 28747,  6927,  3479,   653,
           456,  2296, 19198, 28747,  2215, 28747,   315, 28742, 28719,   438,
           272, 18051,  5086,   297,  1450, 28724,   580,  3920, 28747,  1770,
          4418,   579,  2082, 28804,  2215, 28747,   708, 28725,  2905, 28742,
         28713,  1404, 28147,  3920, 28747,  1179, 28723, 16143,  2647,   736,
          3403, 28808,    13, 27332,  7226, 11143, 28747, 28705]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}


In [None]:
from transformers import GenerationConfig

In [None]:
generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.1,
    max_new_tokens=25,
    pad_token_id=tokenizer.eos_token_id
)

In [None]:
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


###Human: Summarize this following dialogue: John: I'm at the railway station in Newyork Paul: No problems so far? John: no, everything's going smoothly Paul: good. lets meet there soon!
###Assistant: 
John and Paul are discussing their plans to meet at the railway station in New York. John reports that everything is going smoothly


In [None]:
model.config.use_cache=False

In [None]:
model.config.pretraining_tp=1

In [None]:
model.gradient_checkpointing_enable()

In [None]:
model = prepare_model_for_kbit_training(model)

r=16 controls how much LoRA modifies the model (higher = more expressive).

✅ lora_alpha=16 scales LoRA’s effect on training.

✅ lora_dropout=0.05 prevents overfitting (good for small datasets).

✅ target_modules=["q_proj", "v_proj"] makes LoRA memory-efficient.

✅ Great for fine-tuning LLaMA, Mistral, Falcon on low-VRAM GPUs.



In [None]:
# ["q_proj", "v_proj", "k_proj"] → Adds key projection (more expressive)
# ["q_proj", "v_proj", "out_proj"] → Also fine-tunes attention output


In [None]:
peft_config = LoraConfig(
        r=16,
        lora_alpha=16,
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=["q_proj", "v_proj"]
    )

In [None]:
model = get_peft_model(model, peft_config)

In [None]:
print(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_pro

In [None]:
training_arguments = TrainingArguments(
        output_dir="finetuned-mistral-model-samsum",
        per_device_train_batch_size=8,
        gradient_accumulation_steps=1,
        optim="paged_adamw_32bit",
        learning_rate=2e-4,
        lr_scheduler_type="cosine",
        save_strategy="epoch",
        logging_steps=100,
        num_train_epochs=1,
        max_steps=250,
        fp16=True,
        push_to_hub=True,
        report_to="none",
  )

In [None]:
# Create the SFTTrainer
trainer = SFTTrainer(
        model=model,
        train_dataset=data,
        peft_config=peft_config,
        args=training_arguments,
        processing_class=tokenizer,
    )



Adding EOS to train dataset:   0%|          | 0/14732 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/14732 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/14732 [00:00<?, ? examples/s]

In [None]:
trainer.train()

  return fn(*args, **kwargs)


Step,Training Loss
100,1.877
200,1.7827


TrainOutput(global_step=250, training_loss=1.8172687072753906, metrics={'train_runtime': 310.4505, 'train_samples_per_second': 6.442, 'train_steps_per_second': 0.805, 'total_flos': 3.561550746451968e+16, 'train_loss': 1.8172687072753906})

In [None]:
! cp -r finetuned-mistral-model-samsum /content/

cp: 'finetuned-mistral-model-samsum' and '/content/finetuned-mistral-model-samsum' are the same file


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
trainer.push_to_hub()

No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/selili688/finetuned-mistral-model-samsum/commit/3ccb9194ea866f099c2764f7e02061e35b5f905d', commit_message='End of training', commit_description='', oid='3ccb9194ea866f099c2764f7e02061e35b5f905d', pr_url=None, repo_url=RepoUrl('https://huggingface.co/selili688/finetuned-mistral-model-samsum', endpoint='https://huggingface.co', repo_type='model', repo_id='selili688/finetuned-mistral-model-samsum'), pr_revision=None, pr_num=None)

## Note: To avoid out-of-memory (OOM) errors, we recommend restarting the kernel at this point. The trained model is still occupying GPU memory, but it's no longer needed.

In [2]:
from peft import AutoPeftModelForCausalLM
from transformers import GenerationConfig
from transformers import AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("selili688/finetuned-mistral-model-samsum")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/437 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

In [3]:
inputs = tokenizer("""
###Human: Summarize this following dialogue: John: I'm at the railway station in Newyork Paul: No problems so far? John: no, everything's going smoothly Paul: good. lets meet there soon!
###Assistant: """, return_tensors="pt").to("cuda")


In [4]:
import torch
torch.cuda.empty_cache()

In [5]:
model = AutoPeftModelForCausalLM.from_pretrained(
    "selili688/finetuned-mistral-model-samsum",
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="cuda",
)

adapter_config.json:   0%|          | 0.00/837 [00:00<?, ?B/s]

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `hf auth login` or by passing `token=<your_token>`

In [None]:
generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.1,
    max_new_tokens=25,
    pad_token_id=tokenizer.eos_token_id
)

In [None]:
import time
st_time = time.time()
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
print(time.time()-st_time)


###Human: Summarize this following dialogue: John: I'm at the railway station in Newyork Paul: No problems so far? John: no, everything's going smoothly Paul: good. lets meet there soon!
###Assistant:  John is at the railway station in Newyork. Everything is going smoothly. He will meet Paul soon.</s>
1.8123457431793213


# Evaluate
Evaluate the performance of the base model "mistralai/Mistral-7B-Instruct-v0.1" and a fine-tuned version of this model on the "test" split of the "samsum" dataset, comparing their summarization results using appropriate metrics.

## Load the test dataset

**Reasoning**:
The subtask is to load the "test" split of the "samsum" dataset. This can be done using the load_dataset function from the datasets library, specifying the split as "test".



In [None]:
import torch
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments
from trl import SFTTrainer
import os

In [None]:
test_data = load_dataset("knkarthick/samsum", split="test")

In [None]:
test_data_df = test_data.to_pandas()

In [None]:
test_data_df.head(5)

Unnamed: 0,id,dialogue,summary
0,13862856,"Hannah: Hey, do you have Betty's number?\nAman...",Hannah needs Betty's number but Amanda doesn't...
1,13729565,Eric: MACHINE!\nRob: That's so gr8!\nEric: I k...,Eric and Rob are going to watch a stand-up on ...
2,13680171,"Lenny: Babe, can you help me with something?\n...",Lenny can't decide which trousers to buy. Bob ...
3,13729438,"Will: hey babe, what do you want for dinner to...",Emma will be home soon and she will let Will k...
4,13828600,"Ollie: Hi , are you in Warsaw\nJane: yes, just...",Jane is in Warsaw. Ollie and Jane has a party....


In [None]:
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    torch_dtype=torch.float16,
    device_map="auto"
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
def generate_summary(dialogue, model, tokenizer, generation_config):
    inputs = tokenizer("###Human: Summarize this following dialogue: \n" + dialogue + "\n###Assistant: ", return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, generation_config=generation_config)
    return tokenizer.decode(outputs[0], skip_special_tokens=True).split("###Assistant: ")[1]

In [None]:
base_model_summaries = []
for i, example in enumerate(test_data):
    if i >= 10: # Generate summaries for the first 10 examples to save time
        break
    dialogue = example["dialogue"]
    summary = generate_summary(dialogue, base_model, tokenizer, generation_config)
    base_model_summaries.append(summary)

In [None]:
# Load the fine-tuned model
fine_tuned_model = model

In [None]:
fine_tuned_model_summaries = []
for i, example in enumerate(test_data):
    if i >= 10: # Generate summaries for the first 10 examples to save time
        break
    dialogue = example["dialogue"]
    summary = generate_summary(dialogue, fine_tuned_model, tokenizer, generation_config)
    fine_tuned_model_summaries.append(summary)

In [None]:
%pip install evaluate rouge_score

Collecting evaluate
  Downloading evaluate-0.4.5-py3-none-any.whl.metadata (9.5 kB)
Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading evaluate-0.4.5-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=cea070e666de07d5e1700b1603a30a3b728cbc0164431b7b810810ed2a2902ed
  Stored in directory: /root/.cache/pip/wheels/1e/19/43/8a442dc83660ca25e163e1bd1f89919284ab0d0c1475475148
Successfully built rouge_score
Installing collected packages: rouge_score, evaluate
Successfully installed evaluate-0.4.5 rouge_score-0.1.2


In [None]:
from evaluate import load
rouge = load("rouge")

Downloading builder script: 0.00B [00:00, ?B/s]

In [None]:
# Get the ground truth summaries for the first 10 examples
ground_truth_summaries = [example["summary"] for i, example in enumerate(test_data) if i < 10]

In [None]:
# Calculate ROUGE scores for the base model
base_model_results = rouge.compute(predictions=base_model_summaries, references=ground_truth_summaries)
print("Base Model ROUGE Scores:", base_model_results)

Base Model ROUGE Scores: {'rouge1': np.float64(0.3363156752495916), 'rouge2': np.float64(0.11945544291637172), 'rougeL': np.float64(0.28241587518701183), 'rougeLsum': np.float64(0.2819792046765677)}


In [None]:
# Calculate ROUGE scores for the fine-tuned model
fine_tuned_model_results = rouge.compute(predictions=fine_tuned_model_summaries, references=ground_truth_summaries)
print("Fine-tuned Model ROUGE Scores:", fine_tuned_model_results)

Fine-tuned Model ROUGE Scores: {'rouge1': np.float64(0.41867427021054693), 'rouge2': np.float64(0.1976255301940666), 'rougeL': np.float64(0.3526078226244056), 'rougeLsum': np.float64(0.34919516162491593)}


The ROUGE scores for both models have been calculated. As you can see from the output, the fine-tuned model achieved higher ROUGE scores (rouge1, rouge2, rougeL, and rougeLsum) compared to the base model. This indicates that the fine-tuning process improved the model's summarization performance on the Samsum dataset.