**LoRA (Low-Rank Adaptation) and QLoRA (Quantized Low-Rank Adaptation)** are techniques used to adapt large pre-trained models, such as those in natural language processing (NLP) and computer vision, with the aim of improving their performance on specific tasks while minimizing the number of parameters that need to be updated during fine-tuning. These methods are particularly valuable for efficiently customizing large models without the computational and memory overhead typically associated with training or fine-tuning such models.

LoRA (Low-Rank Adaptation)
LoRA focuses on adapting the weights of the pre-trained model in a low-rank subspace. Instead of updating all parameters of the model, LoRA identifies and updates a small subset of parameters, significantly reducing the computational cost and memory usage during fine-tuning. This approach is based on the observation that the effect of adapting a model can often be captured by adjusting only a small number of parameters.
The key idea behind LoRA is to decompose certain weight matrices in the model into low-rank matrices. For example, consider a weight matrix W in a transformer model. LoRA would approximate adjustments to W using two smaller matrices A and B (where AB.T is the low-rank approximation), rather than directly modifying W. During fine-tuning, only A and B are learned, while W remains fixed. This low-rank approximation significantly reduces the number of trainable parameters.

QLoRA (Quantized Low-Rank Adaptation)
Building on the principles of LoRA, QLoRA introduces quantization into the low-rank adaptation process to further reduce the computational and memory requirements. Quantization is a technique that reduces the precision of the model's parameters, effectively allowing the model to operate with lower-precision arithmetic. This can lead to further efficiency improvements, particularly in deployment scenarios where computational resources are limited.

QLoRA applies quantization to the matrices A and B in the low-rank approximation. By reducing the precision of these matrices, QLoRA decreases the amount of memory required to store them and the computational cost of using them during inference. This makes QLoRA an attractive option for deploying fine-tuned models to resource-constrained environments, such as mobile devices or embedded systems.


## Introduction
This notebook demonstrates the process of fine-tuning a pre-trained model for the task of text summarization using Qlora. You can find the detail paper [here](https://arxiv.org/pdf/2305.14314.pdf). A little detail about qlora form paper is it backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters (LoRA). QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) Double Quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) Paged Optimizers to manage memory spikes. It is an updated version of [LoRA paper](https://arxiv.org/pdf/2106.09685.pdf).

Here we have used LLama2-7b with QLoRA (4 bit quantization)

In [10]:
#pip install transformers --upgrade


## Installation Instructions
Before we start, it's essential to install all required Python packages and libraries. This includes `transformers` for leveraging state-of-the-art NLP models, `torch` for model training and operations, and other utility libraries like `accelerate` and `optimum` for optimizing training processes. Optional installations are noted; these might be necessary for specific environments or extended functionalities.


In [None]:
import transformers
transformers.__version__

[2023-10-10 12:23:32,074] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)


'4.34.0'

In [9]:
#pip install protobuf==3.20.*

In [1]:
#pip uninstall torch
#pip install torch>=2.0
#pip install torch-utils
#pip install pip --upgrade
#pip install CUDA==11.6
#pip install flash-attn --no-build-isolation

In [None]:
!nvidia-smi

Tue Oct 10 12:41:58 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A10G                    Off | 00000000:00:1E.0 Off |                    0 |
|  0%   24C    P0              58W / 300W |  21664MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                         

In [None]:
#pip install flash-attn==1.0.*

In [None]:
#pip install flash_attn --upgrade

In [7]:
#pip install -U accelerate

In [None]:
#pip install torchvision==0.14.0

In [8]:
#pip install optimum --upgrade

In [None]:
#pip3 install -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118


## Data Preparation
We use the "Salesforce/dialogstudio" dataset, specifically the "TweetSumm" version, for our text summarization task. This section covers the steps for loading the dataset, inspecting its structure, and preprocessing the data to make it suitable for training and inference. This includes tokenization and formatting inputs according to the requirements of the pre-trained model we aim to fine-tune.


In [None]:
dbutils.library.restartPython()


In [None]:
import torch
torch.cuda.empty_cache()

# Preparing the Data Set

In [None]:
import json
import re
from pprint import pprint

import pandas as pd
import torch
from datasets import Dataset, load_dataset
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
)
from trl import SFTTrainer
import transformers
#from llama_attn_replace import replace_llama_attn

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_NAME = "meta-llama/Llama-2-7b-hf"
OUTPUT_DIR = "output_directory"

In [None]:
#replace_llama_attn()

In [None]:
dataset = load_dataset("Salesforce/dialogstudio", "TweetSumm")



Downloading builder script:   0%|          | 0.00/18.3k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/6.63k [00:00<?, ?B/s]

In [None]:
DEFAULT_SYSTEM_PROMPT = """
Below is a conversation between a human and an AI agent. Write a summary of the conversation.
""".strip()


def generate_training_prompt(
    conversation: str, summary: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT
) -> str:
    return f"""[INST] <<SYS>> ###Instruction: {system_prompt} <</SYS>>

### Input:
{conversation.strip()}

### Response: [/INST] 
{summary}
""".strip()

In [None]:
def clean_text(text):
    text = re.sub(r"http\S+", "", text)
    text = re.sub(r"@[^\s]+", "", text)
    text = re.sub(r"\s+", " ", text)
    return re.sub(r"\^[^ ]+", "", text)


def create_conversation_text(data_point):
    text = ""
    for item in data_point["log"]:
        user = clean_text(item["user utterance"])
        text += f"user: {user.strip()}\n"

        agent = clean_text(item["system response"])
        text += f"agent: {agent.strip()}\n"

    return text

In [None]:
def generate_text(data_point):
    summaries = json.loads(data_point["original dialog info"])["summaries"][
        "abstractive_summaries"
    ]
    summary = summaries[0]
    summary = " ".join(summary)

    conversation_text = create_conversation_text(data_point)
    return {
        "conversation": conversation_text,
        "summary": summary,
        "text": generate_training_prompt(conversation_text, summary),
    }

In [None]:
#example
example = generate_text(dataset["train"][0])
example

{'conversation': 'user: So neither my iPhone nor my Apple Watch are recording my steps/activity, and Health doesn’t recognise either source anymore for some reason. Any ideas? please read the above.\nagent: Let’s investigate this together. To start, can you tell us the software versions your iPhone and Apple Watch are running currently?\nuser: My iPhone is on 11.1.2, and my watch is on 4.1.\nagent: Thank you. Have you tried restarting both devices since this started happening?\nuser: I’ve restarted both, also un-paired then re-paired the watch.\nagent: Got it. When did you first notice that the two devices were not talking to each other. Do the two devices communicate through other apps such as Messages?\nuser: Yes, everything seems fine, it’s just Health and activity.\nagent: Let’s move to DM and look into this a bit more. When reaching out in DM, let us know when this first started happening please. For example, did it start after an update or after installing a certain app?\n',
 'su

In [None]:
print(example["summary"])

Customer enquired about his Iphone and Apple watch which is not showing his any steps/activity and health activities. Agent is asking to move to DM and look into it.


In [None]:

print(example["conversation"])

user: So neither my iPhone nor my Apple Watch are recording my steps/activity, and Health doesn’t recognise either source anymore for some reason. Any ideas? please read the above.
agent: Let’s investigate this together. To start, can you tell us the software versions your iPhone and Apple Watch are running currently?
user: My iPhone is on 11.1.2, and my watch is on 4.1.
agent: Thank you. Have you tried restarting both devices since this started happening?
user: I’ve restarted both, also un-paired then re-paired the watch.
agent: Got it. When did you first notice that the two devices were not talking to each other. Do the two devices communicate through other apps such as Messages?
user: Yes, everything seems fine, it’s just Health and activity.
agent: Let’s move to DM and look into this a bit more. When reaching out in DM, let us know when this first started happening please. For example, did it start after an update or after installing a certain app?



In [None]:
print(example["text"])

[INST] <<SYS>> ###Instruction: Below is a conversation between a human and an AI agent. Write a summary of the conversation. <</SYS>>

### Input:
user: So neither my iPhone nor my Apple Watch are recording my steps/activity, and Health doesn’t recognise either source anymore for some reason. Any ideas? please read the above.
agent: Let’s investigate this together. To start, can you tell us the software versions your iPhone and Apple Watch are running currently?
user: My iPhone is on 11.1.2, and my watch is on 4.1.
agent: Thank you. Have you tried restarting both devices since this started happening?
user: I’ve restarted both, also un-paired then re-paired the watch.
agent: Got it. When did you first notice that the two devices were not talking to each other. Do the two devices communicate through other apps such as Messages?
user: Yes, everything seems fine, it’s just Health and activity.
agent: Let’s move to DM and look into this a bit more. When reaching out in DM, let us know when t

In [None]:
def process_dataset(data: Dataset):
    return (
        data.shuffle(seed=42)
        .map(generate_text)
        .remove_columns(
            [
                "original dialog id",
                "new dialog id",
                "dialog index",
                "original dialog info",
                "log",
                "prompt",
            ]
        )
    )
     

In [None]:
dataset["train"] = process_dataset(dataset["train"])
dataset["validation"] = process_dataset(dataset["validation"])

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['conversation', 'summary', 'text'],
        num_rows: 879
    })
    validation: Dataset({
        features: ['conversation', 'summary', 'text'],
        num_rows: 110
    })
    test: Dataset({
        features: ['original dialog id', 'new dialog id', 'dialog index', 'original dialog info', 'log', 'prompt'],
        num_rows: 110
    })
})

# Modeling

In [None]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
def create_model_and_tokenizer():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
    )

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        use_safetensors=True,
        quantization_config=bnb_config,
        trust_remote_code=True,
        device_map="auto",
        torch_dtype=torch.bfloat16,
        use_flash_attention_2=True
    )

    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,use_fast=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"


    return model, tokenizer

In [None]:
model, tokenizer = create_model_and_tokenizer()
model.config.use_cache = False
#model = model.to_bettertransformer()

Downloading (…)lve/main/config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
model.config.quantization_config.to_dict()

{'quant_method': <QuantizationMethod.BITS_AND_BYTES: 'bitsandbytes'>,
 'load_in_8bit': False,
 'load_in_4bit': True,
 'llm_int8_threshold': 6.0,
 'llm_int8_skip_modules': None,
 'llm_int8_enable_fp32_cpu_offload': False,
 'llm_int8_has_fp16_weight': False,
 'bnb_4bit_quant_type': 'nf4',
 'bnb_4bit_use_double_quant': False,
 'bnb_4bit_compute_dtype': 'float16'}

# Training

In [6]:
%load_ext tensorboard
%tensorboard --logdir save_directory_name

In [None]:

lora_r = 16
lora_alpha = 64
lora_dropout = 0.1
lora_target_modules = [
    "q_proj",
    "up_proj",
    "o_proj",
    "k_proj",
    "down_proj",
    "gate_proj",
    "v_proj",
]


peft_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    target_modules=lora_target_modules,
    bias="none",
    task_type="CAUSAL_LM",
)
     

In [None]:

training_arguments = TrainingArguments(
    per_device_train_batch_size=32,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    logging_steps=1,
    learning_rate=1e-4,
    fp16=True,
    max_grad_norm=0.3,
    num_train_epochs=2,
    evaluation_strategy="steps",
    eval_steps=0.25,
    warmup_ratio=0.05,
    save_strategy="epoch",
    group_by_length=True,
    output_dir=OUTPUT_DIR,
    report_to="tensorboard",
    save_safetensors=True,
    lr_scheduler_type="cosine",
    seed=42,
)
     

In [None]:

#pip install pyarrow --upgrade

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=4096,
    tokenizer=tokenizer,
    args=training_arguments,
)
     



In [None]:
words=[word for sentence in dataset["train"]["text"] for word in sentence.split()]

# Get the set of unique words
unique_words = set(words)

# Print the number of unique words
print(len(unique_words))

18977


In [None]:
words=[word for sentence in dataset["validation"]["text"] for word in sentence.split()]

# Get the set of unique words
unique_words = set(words)

# Print the number of unique words
print(len(unique_words))

4607


In [None]:
print(trainer.model.print_trainable_parameters())


# # Get the trainable parameters
# trainable_parameters = [name for name, param in trainer.model.named_parameters() if param.requires_grad]

# # Print the trainable parameters
# print(len(trainable_parameters))

trainable params: 39,976,960 || all params: 6,778,392,576 || trainable%: 0.589770503135875
None


In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['conversation', 'summary', 'text'],
        num_rows: 879
    })
    validation: Dataset({
        features: ['conversation', 'summary', 'text'],
        num_rows: 110
    })
    test: Dataset({
        features: ['original dialog id', 'new dialog id', 'dialog index', 'original dialog info', 'log', 'prompt'],
        num_rows: 110
    })
})

In [None]:
# for name, module in trainer.model.named_modules():
#   if "norm" in name:
#     module = module.to(torch.float32)

In [None]:
a=int(942*(1/4))
942%a

2

In [2]:
trainer.train()

In [None]:
# Command took 49.29 minutes

In [None]:
#trainer.save_model("your_directory")

In [None]:
trainer.save_model(OUTPUT_DIR)

# Inference

In [None]:
def generate_prompt(
    conversation: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT
) -> str:
    return f"""[INST] <<SYS>> ###Instruction: {system_prompt}  <</SYS>> 

### Input:
{conversation.strip()}

### Response: [/INST] 
""".strip()
     

In [None]:
#! pip install pyarrow --upgrade

In [None]:
examples = []
for data_point in dataset["test"].select(range(5)):
    summaries = json.loads(data_point["original dialog info"])["summaries"][
        "abstractive_summaries"
    ]
    summary = summaries[0]
    summary = " ".join(summary)
    conversation = create_conversation_text(data_point)
    examples.append(
        {
            "summary": summary,
            "conversation": conversation,
            "prompt": generate_prompt(conversation),
        }
    )
test_df = pd.DataFrame(examples)
test_df
     


Unnamed: 0,summary,conversation,prompt
0,Customer is complaining that the watchlist is ...,user: My watchlist is not updating with new ep...,[INST] <<SYS>> ###Instruction: Below is a conv...
1,Customer is asking about the ACC to link to th...,"user: hi , my Acc was linked to an old number....",[INST] <<SYS>> ###Instruction: Below is a conv...
2,Customer is complaining about the new updates ...,user: the new update ios11 sucks. I can’t even...,[INST] <<SYS>> ###Instruction: Below is a conv...
3,Customer is complaining about parcel service ...,user: FUCK YOU AND YOUR SHITTY PARCEL SERVICE ...,[INST] <<SYS>> ###Instruction: Below is a conv...
4,The customer says that he is stuck at Staines ...,user: Stuck at Staines waiting for a Reading t...,[INST] <<SYS>> ###Instruction: Below is a conv...


In [None]:
def summarize(model, text: str):
    #print(text)
    inputs = tokenizer(text, return_tensors="pt").to(DEVICE)
    inputs_length = len(inputs["input_ids"][0])
    print(inputs)
    print(inputs_length)
    #with torch.inference_mode():
    with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
        outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
    #print(outputs)

    return tokenizer.decode(outputs[0][inputs_length:], skip_special_tokens=True)

In [None]:
def create_model_and_tokenizer():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
    )

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        use_safetensors=True,
        quantization_config=bnb_config,
        trust_remote_code=True,
        device_map="auto",
        torch_dtype=torch.bfloat16,
        #use_flash_attention_2=True
    )

    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,use_fast=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"


    return model, tokenizer

# Inference

## Eample 1 (Not-FineTuned)

In [None]:
#replace_llama_attn(inference=True)

In [None]:
model_nottuned, tokenizer = create_model_and_tokenizer()
model_nottuned.config.use_cache = False
#model_nottuned.to_bettertransformer()

Downloading (…)lve/main/config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
#model_nottuned_main=model_nottuned.to_bettertransformer()

In [None]:
model_nottuned.config

LlamaConfig {
  "_name_or_path": "meta-llama/Llama-2-7b-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "quantization_config": {
    "bnb_4bit_compute_dtype": "float16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": false,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version":

In [None]:
example = test_df.iloc[0]
print(example.conversation)

user: My watchlist is not updating with new episodes (past couple days). Any idea why?
agent: Apologies for the trouble, Norlene! We're looking into this. In the meantime, try navigating to the season / episode manually.
user: Tried logging out/back in, that didn’t help
agent: Sorry! 😔 We assure you that our team is working hard to investigate, and we hope to have a fix ready soon!
user: Thank you! Some shows updated overnight, but others did not...
agent: We definitely understand, Norlene. For now, we recommend checking the show page for these shows as the new eps will be there
user: As of this morning, the problem seems to be resolved. Watchlist updated overnight with all new episodes. Thank you for your attention to this matter! I love Hulu 💚
agent: Awesome! That's what we love to hear. If you happen to need anything else, we'll be here to support! 💚



In [None]:
print(example.summary)

Customer is complaining that the watchlist is not updated with new episodes from past two days. Agent informed that the team is working hard to investigate to show new episodes on page.


In [None]:
example.prompt

"[INST] <<SYS>> ###Instruction: Below is a conversation between a human and an AI agent. Write a summary of the conversation.  <</SYS>> \n\n### Input:\nuser: My watchlist is not updating with new episodes (past couple days). Any idea why?\nagent: Apologies for the trouble, Norlene! We're looking into this. In the meantime, try navigating to the season / episode manually.\nuser: Tried logging out/back in, that didn’t help\nagent: Sorry! 😔 We assure you that our team is working hard to investigate, and we hope to have a fix ready soon!\nuser: Thank you! Some shows updated overnight, but others did not...\nagent: We definitely understand, Norlene. For now, we recommend checking the show page for these shows as the new eps will be there\nuser: As of this morning, the problem seems to be resolved. Watchlist updated overnight with all new episodes. Thank you for your attention to this matter! I love Hulu 💚\nagent: Awesome! That's what we love to hear. If you happen to need anything else, we'

In [None]:
%%time
#model_nottuned.to_bettertransformer()
summary = summarize(model_nottuned, example.prompt)

{'input_ids': tensor([[    1,   518, 25580, 29962,  3532, 14816, 29903,  6778,   835,  3379,
          4080, 29901, 13866,   338,   263, 14983,  1546,   263,  5199,   322,
           385,   319, 29902, 10823, 29889, 14350,   263, 15837,   310,   278,
         14983, 29889, 29871,   529,   829, 14816, 29903,  6778, 29871,    13,
            13,  2277, 29937, 10567, 29901,    13,  1792, 29901,  1619,  6505,
          1761,   338,   451, 13271,   411,   716, 23238,   313, 29886,   579,
          7303,  3841,   467,  3139,  2969,  2020, 29973,    13, 14748, 29901,
          6225, 11763,   363,   278,  7458, 29892,  4186, 29880,  1600, 29991,
          1334, 29915,   276,  3063,   964,   445, 29889,   512,   278,  6839,
           603, 29892,  1018, 12402,  1218,   304,   278,  4259,   847, 12720,
          7522, 29889,    13,  1792, 29901, 29547, 12183,   714, 29914,  1627,
           297, 29892,   393,  3282, 30010, 29873,  1371,    13, 14748, 29901,
          8221, 29991, 29871,   243,  

In [None]:
pprint(summary)

('\n'
 '\n'
 '### Output:\n'
 '<</SYS>>\n'
 '\n'
 '### Explanation:\n'
 '\n'
 '### Input:\n'
 'user: My watchlist is not updating with new episodes (past couple days). Any '
 'idea why?\n'
 "agent: Apologies for the trouble, Norlene! We're looking into this. In the "
 'meantime, try navigating to the season / episode manually.\n'
 'user: Tried logging out/back in, that didn’t help\n'
 'agent: Sorry! 😔 We assure you that our team is working hard to investigate, '
 'and we hope to have a fix ready soon!\n'
 'user: Thank you! Some shows updated overnight, but others did not...\n'
 'agent: We definitely understand, Norlene. For now, we recommend checking the '
 'show page for these shows as the new eps will be there\n'
 'user: As of this morning, the problem seems to be resolved. Watchlist '
 'updated overnight with all new episodes. Thank you for your attention to '
 'this matter! I love Hulu 💚\n'
 "agent: Awesome! That's what we love to hear. If you happen to need anything "
 "else, we'l

## Eample 1 (FineTuned)

In [None]:
#enable_flash=True, enable_math=False, enable_mem_efficient=False

In [None]:
#pip install optimum --upgrade

In [None]:
#model_nottuned.to_bettertransformer()
model_tuned = PeftModel.from_pretrained(model_nottuned,"save_model")

In [None]:
%%time
summary = summarize(model_tuned, example.prompt)

{'input_ids': tensor([[    1,   518, 25580, 29962,  3532, 14816, 29903,  6778,   835,  3379,
          4080, 29901, 13866,   338,   263, 14983,  1546,   263,  5199,   322,
           385,   319, 29902, 10823, 29889, 14350,   263, 15837,   310,   278,
         14983, 29889, 29871,   529,   829, 14816, 29903,  6778, 29871,    13,
            13,  2277, 29937, 10567, 29901,    13,  1792, 29901,  1619,  6505,
          1761,   338,   451, 13271,   411,   716, 23238,   313, 29886,   579,
          7303,  3841,   467,  3139,  2969,  2020, 29973,    13, 14748, 29901,
          6225, 11763,   363,   278,  7458, 29892,  4186, 29880,  1600, 29991,
          1334, 29915,   276,  3063,   964,   445, 29889,   512,   278,  6839,
           603, 29892,  1018, 12402,  1218,   304,   278,  4259,   847, 12720,
          7522, 29889,    13,  1792, 29901, 29547, 12183,   714, 29914,  1627,
           297, 29892,   393,  3282, 30010, 29873,  1371,    13, 14748, 29901,
          8221, 29991, 29871,   243,  

In [11]:
pprint(summary)

**Instruction**\
Below is a conversation between a human and an AI agent. Write a summary of the conversation

**Discussion**\
user: My watchlist is not updating with new episodes (past couple days). Any idea why?\
agent: Apologies for the trouble, Norlene! We're looking into this. In the meantime, try navigating to the season / episode manually.\
user: Tried logging out/back in, that didn’t help\
agent: Sorry! 😔 We assure you that our team is working hard to investigate, and we hope to have a fix ready soon!\
user: Thank you! Some shows updated overnight, but others did not...\
agent: We definitely understand, Norlene. For now, we recommend checking the show page for these shows as the new eps will be there\
user: As of this morning, the problem seems to be resolved. Watchlist updated overnight with all new episodes. Thank you for your attention to this matter! I love Hulu 💚\
agent: Awesome! That's what we love to hear. If you happen to need anything else, we'll be here to support! 💚



**Actual Summary**\
Customer is complaining that the watchlist is not updated with new episodes from past two days. Agent informed that the team is working hard to investigate to show new episodes on page.

**Not-FinedTuned Summary**\
user: My watchlist is not updating with new episodes (past couple days). Any '
 'idea why?\n'
 "agent: Apologies for the trouble, Norlene! We're looking into this. In the "
 'meantime, try navigating to the season / episode manually.\n'
 'user: Tried logging out/back in, that didn’t help\n'
 'agent: Sorry! 😔 We assure you that our team is working hard to investigate, '
 'and we hope to have a fix ready soon!\n'
 'user: Thank you! Some shows updated overnight, but others did not...\n'
 'agent: We definitely understand, Norlene. For now, we recommend checking the '
 'show page for these shows as the new eps will be there\n'
 'user: As of this morning, the problem seems to be resolved. Watchlist '
 'updated overnight with all new episodes. Thank you for your attention to '
 'this matter! I love Hulu 💚\n'
 "agent: Awesome! That's what we love to hear. If you happen to need anything "
 "else, we'll be here to support! 💚\n"

 **Fined Tuned Summary**\
 'Customer is complaining that his watchlist is not updating with new episodes. Agent updated that they are looking into this and also informed that they will be here to support.'

**Instruction**\
Below is a conversation between a human and an AI agent. Write a summary of the conversation

**Discussion**\
user: hi , my Acc was linked to an old number. Now I’m asked to verify my Acc , where a code / call wil be sent to my old number. Any way that I can link my Acc to my current number? Pls help\
agent: Hi there, we are here to help. We will have a specialist contact you about changing your phone number. Thank you.\
user: Thanks. Hope to get in touch soon\
agent: That is no problem. Please let us know if you have any further questions in the meantime.\
user: Hi sorry , is it for my account : __email__\
agent: Can you please delete this post as it does have personal info in it. We have updated your Case Manager who will be following up with you shortly. Feel free to DM us anytime with any other questions or concerns 2/2\
user: Thank you\
agent: That is no problem. Please do not hesitate to contact us with any further questions. Thank you.\



**Actual Summary**\
Customer is asking about the ACC to link to the current  number. Agent says that they have updated their case manager.

**Not-FinedTuned Summary**\
'\n'
 'The conversation between a human and an AI agent is about changing the phone '
 'number of an account. The human asks if there is any way to link the account '
 'to a new phone number, and the agent replies that they will have a '
 'specialist contact the user about changing the phone number. The human '
 'thanks the agent and hopes to get in touch soon. The agent then asks the '
 'human to delete the post as it contains personal information. The human '
 'replies that they will delete the post. The agent then thanks the human for '
 'their cooperation and closes the conversation.\n'
 '\n'
 '### Output:\n'

 **Fined Tuned Summary**\
'Customer is asking to link his account to his current number. Agent updated '
 'that they will have a specialist contact him about changing his phone '
 'number.'

**Parameter:** \
 trainable params: 39,976,960 || all params: 6,778,392,576 || trainable%: 0.589770503135875

**Computation Summary** :\
Worker Type: g4dn.xlarge(16GB Memory 1GPU) 2-8 Workers\
Driver Type: g4dn.xlarge(16GB Memory 1GPU)

**Time Took to train**
49 Minutes

**Unique Tokens**
25k Tokens

**Non-Unique Tokens**
250k Tokens
