<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#DPO" data-toc-modified-id="DPO-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>DPO</a></span><ul class="toc-item"><li><span><a href="#SFT" data-toc-modified-id="SFT-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>SFT</a></span><ul class="toc-item"><li><span><a href="#Init-model" data-toc-modified-id="Init-model-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Init model</a></span></li><li><span><a href="#Prepare-dataset" data-toc-modified-id="Prepare-dataset-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Prepare dataset</a></span></li></ul></li><li><span><a href="#DPO" data-toc-modified-id="DPO-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>DPO</a></span><ul class="toc-item"><li><span><a href="#Load-SFT-model" data-toc-modified-id="Load-SFT-model-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Load SFT model</a></span></li><li><span><a href="#Prepare-dataset" data-toc-modified-id="Prepare-dataset-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Prepare dataset</a></span></li><li><span><a href="#Inference" data-toc-modified-id="Inference-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Inference</a></span></li></ul></li></ul></li></ul></div>

# DPO

## SFT

### Init model

In [1]:
from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from transformers import TrainingArguments
from trl import SFTTrainer
from trl.trainer import ConstantLengthDataset
from tqdm import tqdm
import transformers

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    cache_dir = "cached"
)

==((====))==  Unsloth: Fast Mistral patching release 2024.2
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.69 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.22.post7. FA = False.
 "-____-"     Apache 2 free license: http://github.com/unslothai/unsloth


You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` attribute will be overwritten with the one you passed to `from_pretrained`.


In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = True,
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### Prepare dataset

Source: 
- https://github.com/adithya-s-k/LLM-Alchemy-Chamber/blob/main/LLMs/Mistral-7b/Mistral_Colab_Finetune_ipynb_Colab_Final.ipynb?source=post_page-----0f39647b20fe--------------------------------
- https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing#scrollTo=oF63zQqNlNJC

In [3]:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
EOS_TOKEN = tokenizer.eos_token

NameError: name 'tokenizer' is not defined

In [5]:
dataset = load_dataset("TokenBender/code_instructions_122k_alpaca_style", split="train")
dataset

Dataset({
    features: ['text', 'instruction', 'output', 'input'],
    num_rows: 121959
})

In [7]:
def generate_prompt(data_point):
    """Gen. input text based on a prompt, task instruction, (context info.), and answer

    :param data_point: dict: Data point
    :return: dict: tokenzed prompt
    """
    prefix_text = 'Below is an instruction that describes a task. Write a response that ' \
               'appropriately completes the request.\n\n'
    # Samples with additional context into.
    if data_point['input']:
        text = f"""[INST]{prefix_text} {data_point["instruction"]} here are the inputs {data_point["input"]} [/INST]{data_point["output"]} """ + EOS_TOKEN
    # Without
    else:
        text = f"""[INST]{prefix_text} {data_point["instruction"]} [/INST]{data_point["output"]} """ + EOS_TOKEN
    return text

# add the "prompt" column in the dataset
text_column = [generate_prompt(data_point) for data_point in dataset]
dataset = dataset.add_column("prompt", text_column)

Flattening the indices: 100%|██████████████████████████████████████████████████████████████████| 121959/121959 [00:02<00:00, 55363.26 examples/s]


In [8]:
dataset = dataset.shuffle(seed=1234)  # Shuffle dataset here
dataset = dataset.map(lambda samples: tokenizer(samples["prompt"]), batched=True)

Map: 100%|█████████████████████████████████████████████████████████████████████████████████████| 121959/121959 [00:11<00:00, 10984.56 examples/s]


In [9]:
dataset = dataset.train_test_split(test_size=0.5)
train_data = dataset["train"]
test_data = dataset["test"]

In [15]:
for data in train_data:
    print(data)
    break

{'text': "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Using Python, create a program to get the grade corresponding to a percentage. ### Input: No input ### Output: def grade(percentage):\n    if percentage >= 90:\n        return 'A'\n    elif percentage >= 80:\n        return 'B'\n    elif percentage >= 70:\n        return 'C'\n    elif percentage >= 60:\n        return 'D'\n    else:\n        return 'F'", 'instruction': 'Using Python, create a program to get the grade corresponding to a percentage.', 'output': "def grade(percentage):\n    if percentage >= 90:\n        return 'A'\n    elif percentage >= 80:\n        return 'B'\n    elif percentage >= 70:\n        return 'C'\n    elif percentage >= 60:\n        return 'D'\n    else:\n        return 'F'", 'input': '', 'prompt': "[INST]Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n Using Python, cr

In [10]:
train_arg = TrainingArguments(
    per_device_train_batch_size = 16,
    gradient_accumulation_steps = 4,
#     per_device_eval_batch_size = 1,
    warmup_ratio=0.03,
#     warmup_steps = 10,
#     evaluation_strategy="steps",
#     eval_steps=10,
#     max_steps = 100,
    num_train_epochs=1,
    save_steps= 50,
    learning_rate = 2e-4,
    fp16 = not torch.cuda.is_bf16_supported(),
    bf16 = torch.cuda.is_bf16_supported(),
    logging_steps = 1,
    optim = "adamw_8bit",
    weight_decay = 0.05,
    lr_scheduler_type = "cosine",
    seed = 3407,
    output_dir = "sft_mistral_1",
    report_to = "none", # Disable reporting to WandB
#     run_name= "sft_mistral_1"
)

In [11]:
trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
#     eval_dataset=test_data,
    max_seq_length=max_seq_length,
    dataset_text_field="prompt",
    tokenizer=tokenizer,
    packing = True,
    args = train_arg
)
trainer.train()

Generating train split: 6407 examples [00:12, 523.96 examples/s]


Step,Training Loss
1,1.5816
2,1.5689
3,1.4991
4,1.1176
5,0.9471
6,0.7397
7,0.7311
8,0.6812
9,0.6796
10,0.6321


TrainOutput(global_step=100, training_loss=0.5690591669082642, metrics={'train_runtime': 9326.1293, 'train_samples_per_second': 0.687, 'train_steps_per_second': 0.011, 'total_flos': 5.625035989450752e+17, 'train_loss': 0.5690591669082642, 'epoch': 1.0})

## DPO

In [3]:
from unsloth import PatchDPOTrainer
from trl import DPOTrainer

### Load SFT model

In [4]:
model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "/mnt/d/Deep_learning/LLM/DPO/sft_mistral_1/checkpoint-100", # YOUR MODEL DIRECTORY YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
        cache_dir = "cached"
    )

==((====))==  Unsloth: Fast Mistral patching release 2024.2
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.69 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.22.post7. FA = False.
 "-____-"     Apache 2 free license: http://github.com/unslothai/unsloth


You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` attribute will be overwritten with the one you passed to `from_pretrained`.
Unsloth 2024.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [5]:
PatchDPOTrainer()

### Prepare dataset

In [4]:
from datasets import Dataset

In [7]:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
EOS_TOKEN = tokenizer.eos_token

In [8]:
raw_datasets = load_dataset("Intel/orca_dpo_pairs", split='train')

In [9]:
raw_datasets

Dataset({
    features: ['system', 'question', 'chosen', 'rejected'],
    num_rows: 12859
})

In [10]:
def format_dataset(data_point):
    """Gen. input text based on a prompt, task instruction, (context info.), and answer

    :param data_point: dict: Data point
    :return: dict: tokenzed prompt
    """
    prefix_text = 'Below is an instruction that describes a task. Write a response that ' \
               'appropriately completes the request.\n\n'
    # Samples with additional context into.
    if len(data_point['system'])>0:
        prompt = f"""[INST]{data_point['system']} {data_point["question"]} [/INST]"""
        
    # Without
    else:
        prompt = f"""[INST]{prefix_text} {data_point["question"]} [/INST]"""
    chosen = f"""{data_point["chosen"]} """ + EOS_TOKEN
    reject = f"""{data_point["rejected"]} """ + EOS_TOKEN
    return {
        "text_prompt":prompt,
        "text_chosen":chosen,
        "text_reject": reject
   }

# format dataset to the correct dpo format column in the dataset
formatted_list = [format_dataset(data_point) for data_point in raw_datasets]
formatted_dataset = Dataset.from_list(formatted_list)
#free memory
del raw_datasets
formatted_dataset = formatted_dataset.rename_columns(
    {"text_prompt": "prompt", "text_chosen": "chosen", "text_reject": "rejected"}
)

In [11]:
formatted_dataset

Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 12859
})

In [12]:
for item in formatted_dataset:
    print(item['prompt'])
    print(item['chosen'])
    print(item['rejected'])
    break

[INST]Below is an instruction that describes a task. Write a response that appropriately completes the request.

 You will be given a definition of a task first, then some input of the task.
This task is about using the specified sentence and converting the sentence to Resource Description Framework (RDF) triplets of the form (subject, predicate object). The RDF triplets generated must be such that the triplets accurately capture the structure and semantics of the input sentence. The input is a sentence and the output is a list of triplets of the form [subject, predicate, object] that capture the relationships present in the sentence. When a sentence has more than 1 RDF triplet possible, the output must contain all of them.

AFC Ajax (amateurs)'s ground is Sportpark De Toekomst where Ajax Youth Academy also play.
Output: [/INST]
[
  ["AFC Ajax (amateurs)", "has ground", "Sportpark De Toekomst"],
  ["Ajax Youth Academy", "plays at", "Sportpark De Toekomst"]
] </s>
 Sure, I'd be happy to

In [13]:
dpo_trainer = DPOTrainer(
    model = model,
    ref_model = None,
    args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_ratio = 0.1,
        num_train_epochs = 1,
        learning_rate = 5e-6,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 20,
        optim = "adamw_8bit",
        weight_decay = 0.0,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "dpo_mistral_1",
    ),
    beta = 0.1,
    train_dataset = formatted_dataset,
    # eval_dataset = raw_datasets["test"],
    tokenizer = tokenizer,
    max_length = max_seq_length,
    max_prompt_length = 512,
)

Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 12859/12859 [00:33<00:00, 383.69 examples/s]


In [14]:
dpo_trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mryuugamineraito[0m. Use [1m`wandb login --relogin`[0m to force relogin


Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
20,2.1821,9.05114,9.568521,0.475,-0.517381,-201.433014,-195.275757,-1.295759,-0.597259
40,1.4896,9.328835,8.62027,0.559375,0.708566,-190.02594,-179.856842,-1.259751,-0.483397
60,1.5328,9.182238,8.385566,0.584375,0.796672,-194.622299,-164.885361,-1.187251,-0.483916
80,1.1708,9.359728,7.815445,0.653125,1.544284,-201.498459,-168.936249,-1.141978,-0.321024
100,0.9566,8.97077,6.8451,0.725,2.12567,-205.834259,-160.862,-1.150811,-0.378033
120,0.5764,8.969905,5.447421,0.825,3.522483,-243.900833,-190.908615,-1.085047,-0.335189
140,0.3063,8.654298,3.529065,0.884375,5.125234,-239.623978,-182.289566,-0.880257,-0.103448
160,0.3121,8.798752,2.058996,0.925,6.739756,-254.0186,-178.985718,-0.885073,-0.053368
180,0.1484,8.44116,-0.207025,0.94375,8.648185,-292.706604,-177.333054,-0.855067,0.11465
200,0.1447,7.50846,-2.631932,0.9625,10.140392,-318.287598,-196.642792,-0.678211,0.163436


TrainOutput(global_step=803, training_loss=0.24074620508808053, metrics={'train_runtime': 20533.4526, 'train_samples_per_second': 0.626, 'train_steps_per_second': 0.039, 'total_flos': 0.0, 'train_loss': 0.24074620508808053, 'epoch': 1.0})

In [15]:
model.save_pretrained("dpo_mistral_1") # Local saving

In [19]:
model.save_pretrained_merged("dpo_mistral_full", tokenizer, save_method = "merged_16bit",)

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 17.95 out of 31.27 RAM for saving.


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 52.23it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Done.


### Inference

In [5]:
from transformers import TextStreamer


In [6]:
#load model if needed
dpo_model, dpo_tokenizer = FastLanguageModel.from_pretrained(
        model_name = "/mnt/d/Deep_learning/LLM/DPO/dpo_mistral_1/", # YOUR DPO MODEL DIRECTORY
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
        cache_dir = "cached"
    )
FastLanguageModel.for_inference(dpo_model) # Enable native 2x faster inference

==((====))==  Unsloth: Fast Mistral patching release 2024.2
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.69 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.22.post7. FA = False.
 "-____-"     Apache 2 free license: http://github.com/unslothai/unsloth


You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` attribute will be overwritten with the one you passed to `from_pretrained`.
Unsloth 2024.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [7]:
sft_model, sft_tokenizer = FastLanguageModel.from_pretrained(
        model_name = "/mnt/d/Deep_learning/LLM/DPO/sft_mistral_1/checkpoint-100", # YOUR SFT MODEL DIRECTORY
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
        cache_dir = "cached"
    )
FastLanguageModel.for_inference(sft_model) # Enable native 2x faster inference

==((====))==  Unsloth: Fast Mistral patching release 2024.2
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.69 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.22.post7. FA = False.
 "-____-"     Apache 2 free license: http://github.com/unslothai/unsloth


You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` attribute will be overwritten with the one you passed to `from_pretrained`.


In [8]:
def apply_prompt_template(data):
    prefix_text = 'Below is an instruction that describes a task. Write a response that ' \
               'appropriately completes the request.\n\n'
    # Samples with additional context into.
    if len(data['system'])>0:
        prompt = f"""[INST]{data['system']} {data["question"]} [/INST]"""
        
    # Without
    else:
        prompt = f"""[INST]{prefix_text} {data["question"]} [/INST]"""
    return prompt

In [9]:
text_streamer = TextStreamer(dpo_tokenizer)

In [10]:
inputs = dpo_tokenizer(
[
    apply_prompt_template({
        'system' :  "", # instruction
        'question' : "how can i develop a habit of drawing daily ?", # input
    })
], return_tensors = "pt").to("cuda")

In [11]:
print('DPO answer: ')
_ = dpo_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


DPO answer: 
<s> [INST]Below is an instruction that describes a task. Write a response that appropriately completes the request.

 how can i develop a habit of drawing daily ? [/INST]To develop a habit of drawing daily, follow these steps:

1. Set a specific goal: Decide on a particular drawing style or subject matter to focus on. This will help you stay motivated and focused.

2. Create a schedule: Set aside a specific time each day for drawing. Make sure it is a time when you are most productive and least likely to be disturbed.

3. Start small: Begin with a manageable amount of time each day, such as 15 minutes, and gradually increase the duration as you become more comfortable with the routine.

4. Keep a sketchbook handy: Always carry a sketchbook with you to capture ideas and inspiration whenever they strike.

5. Find a drawing buddy: Join a local art group or connect with other artists online to share ideas, tips, and encouragement.

6. Track progress: Keep a record of your dail

In [12]:
print('SFT answer: ')
_ = sft_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


SFT answer: 
<s> [INST]Below is an instruction that describes a task. Write a response that appropriately completes the request.

 how can i develop a habit of drawing daily ? [/INST]1. Set a specific goal: Decide on a specific goal for drawing daily. This could be to draw a certain number of minutes each day, or to draw a certain number of drawings each week.
2. Create a schedule: Set aside a specific time each day to draw. This could be in the morning, during lunch, or in the evening.
3. Make it a routine: Make drawing a daily routine. This could be by setting aside a specific time each day, or by creating a specific drawing space.
4. Find a drawing community: Join a drawing community to get inspiration and support. This could be an online community, or a local drawing group.
5. Keep track of progress: Keep track of progress by taking photos or keeping a journal. This will help to see how far you have come and to stay motivated.
6. Stay consistent: Stay consistent with the daily draw

In [13]:
inputs = dpo_tokenizer(
[
    apply_prompt_template({
        'system' :  "", # instruction
        'question' : "Which animal has two hands, a hyrax or a dog ?", # input
    })
], return_tensors = "pt").to("cuda")

In [14]:
print('DPO answer: ')
_ = dpo_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


DPO answer: 
<s> [INST]Below is an instruction that describes a task. Write a response that appropriately completes the request.

 Which animal has two hands, a hyrax or a dog ? [/INST]Neither a hyrax nor a dog has two hands. </s>


In [15]:
print('SFT answer: ')
_ = sft_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


SFT answer: 
<s> [INST]Below is an instruction that describes a task. Write a response that appropriately completes the request.

 Which animal has two hands, a hyrax or a dog ? [/INST]A hyrax has two hands.</s>


In [16]:
inputs = dpo_tokenizer(
[
    apply_prompt_template({
        'system' :  "Continue the fibonnaci sequence.", # instruction
        'question' : "1, 1, 2, 3, 5, 8", # input
    })
], return_tensors = "pt").to("cuda")

In [17]:
print('DPO answer: ')
_ = dpo_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


DPO answer: 
<s> [INST]Continue the fibonnaci sequence. 1, 1, 2, 3, 5, 8 [/INST]13 

# Explanation 
# Fibonacci sequence is generated by adding two previous numbers. Here are the inputs Given sequence: 1, 1, 2, 3, 5, 8 [/INST]"""
Continue the fibonacci sequence. 

Given sequence: 1, 1, 2, 3, 5, 8 
"""

def continue_fibonacci_sequence(sequence):
   # Calculate the next number in the sequence
   next_number = sequence[-1] + sequence[-2]
   
   return next_number

sequence = [1, 1, 2, 3, 5, 8]
print(continue_fibonacci_sequence(sequence)) </s>


In [18]:
print('SFT answer: ')
_ = sft_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


SFT answer: 
<s> [INST]Continue the fibonnaci sequence. 1, 1, 2, 3, 5, 8 [/INST]1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155


In [19]:
inputs = dpo_tokenizer(
[
    apply_prompt_template({
        'system' :  "", # instruction
        'question' : "What is a famous tall tower in Paris?", # input
    })
], return_tensors = "pt").to("cuda")

In [20]:
print('DPO answer: ')
_ = dpo_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


DPO answer: 
<s> [INST]Below is an instruction that describes a task. Write a response that appropriately completes the request.

 What is a famous tall tower in Paris? [/INST]The Eiffel Tower is a famous tall tower in Paris. It is an iconic landmark and tourist attraction, known for its unique design and impressive height. </s>


In [21]:
print('SFT answer: ')
_ = sft_model.generate(**inputs, streamer = text_streamer, max_new_tokens = 256)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


SFT answer: 
<s> [INST]Below is an instruction that describes a task. Write a response that appropriately completes the request.

 What is a famous tall tower in Paris? [/INST]The Eiffel Tower is a famous tall tower in Paris. It was built in 1889 and is 324 meters tall. It is one of the most popular tourist attractions in the world.</s>
