# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [1]:
# !pip3 install peft==0.7.1
# !pip3 install trl==0.7.4
# !pip3 install transformers==4.36.2
# !pip3 install pydantic==1.10.9
# !pip3 install datasets==2.18.0

In [2]:
import transformers
transformers.__version__

  from .autonotebook import tqdm as notebook_tqdm


'4.36.2'

In [3]:
import trl
trl.__version__



'0.7.4'

In [4]:
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer. 

In [5]:
# Step 1: Load the dataset
from datasets import load_dataset
train_json_url = 'https://github.com/tatsu-lab/stanford_alpaca/raw/main/alpaca_data.json'
dataset = load_dataset("json", data_files=train_json_url)
dataset

DatasetDict({
    train: Dataset({
        features: ['instruction', 'input', 'output'],
        num_rows: 52002
    })
})

In [6]:
dataset['eval'] = load_dataset('tatsu-lab/alpaca_eval', split='eval')
dataset

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


DatasetDict({
    train: Dataset({
        features: ['instruction', 'input', 'output'],
        num_rows: 52002
    })
    eval: Dataset({
        features: ['instruction', 'output', 'generator', 'dataset'],
        num_rows: 805
    })
})

In [7]:
dataset['train'][20000]

{'instruction': 'Name the given musical note.',
 'input': '(A musical note)',
 'output': 'The musical note is an F sharp.'}

In [8]:
dataset['eval'][100]

{'instruction': 'I like to host guests at my home from time to time, and I am gathering  recipes of different dishes and drinks to keep things interesting. I am interested in trying some Indonesian dishes. Can you give me a recipe for Tahu Gejrot Cirebon?',
 'output': 'Ingredients: \n- 2 tablespoons of sweet soy sauce \n- 2 tablespoons of chili sauce \n- 2 tablespoons of vinegar \n- 2 tablespoons of sugar \n- 1 tablespoon of ground ginger \n- 2 tablespoons of vegetable oil \n- 2 cloves of garlic, minced \n- 1/4 teaspoon of ground pepper \n- 1/2 teaspoon of ground cumin \n- 1/4 teaspoon of ground nutmeg \n- 2 tablespoons of tomato paste \n- 2 packages of firm tofu, cut into cubes \n- 2 tablespoons of chopped shallots\n- 2 tablespoons of chopped scallions\n- 2 tablespoons of chopped celery\n- 2 tablespoons of chopped chilies\n\nInstructions:\n1. In a medium bowl, mix together the sweet soy sauce, chili sauce, vinegar, sugar, ground ginger, vegetable oil, garlic, ground pepper, ground cum

In [9]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

In [10]:
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

1024

### Format prompts

In [11]:
for i in dataset['train']:
    print(i)
    break

{'instruction': 'Give three tips for staying healthy.', 'input': '', 'output': '1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.'}


In [12]:
dataset['train'][0]

{'instruction': 'Give three tips for staying healthy.',
 'input': '',
 'output': '1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.'}

In [13]:
for sample in dataset['eval']:
    break
sample['input'] if 'input' in sample.keys() else ''

''

In [14]:
def format_instruction(examples):
	outputs = []
	for i in range(len(examples['instruction'])):
		input = examples['input'][i] if 'input' in examples.keys() else ''
		outputs.append(f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{examples['instruction'][i]}

### Input:
{input}

### Output:
{examples['output'][i]}
""".strip())

	return outputs
	
format_instruction(dataset['train'][:2])

['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGive three tips for staying healthy.\n\n### Input:\n\n\n### Output:\n1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Input:\n\n\n### Output:\nThe three primary colors are red, blue, and yellow.']

In [15]:
dataset['train'][:2]

{'instruction': ['Give three tips for staying healthy.',
  'What are the three primary colors?'],
 'input': ['', ''],
 'output': ['1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
  'The three primary colors are red, blue, and yellow.']}

In [16]:
print(format_instruction(dataset['train'][[0]])[0])
print('\n'+'='*50+'\n')
print(format_instruction(dataset['eval'][[10]])[0])

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Give three tips for staying healthy.

### Input:


### Output:
1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. 
2. Exercise regularly to keep your body active and strong. 
3. Get enough sleep and maintain a consistent sleep schedule.


Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
do you think retinoid is effective on removing the acne? because I have a lot of it

### Input:


### Output:
Yes, retinoids are effective in treating acne. They work by increasing cell turnover, which helps to reduce the appearance of existing acne and prevent new outbreaks. Retinoids also help to unclog pores, which in turn reduces the amount of bacteria that can cause infect

In [17]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
response_template = "### Output:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

In [18]:
from datasets import DatasetDict

In [19]:
file_path = '.'

In [20]:
# Step 3: Define the Trainer
from transformers import TrainingArguments
training_args = TrainingArguments(
    output_dir=f'{file_path}/model', #default = 'tmp_trainer'
    num_train_epochs=5, #default = 3
    evaluation_strategy='epoch',
    save_strategy='epoch',
    per_device_eval_batch_size=8,
    per_device_train_batch_size=8 ,
    logging_steps=1,
    logging_strategy='epoch'
)

trainer = SFTTrainer(
    model,
    args=training_args,
    train_dataset=dataset['train'].select(range(1000)),
    eval_dataset=dataset['eval'],
    formatting_func=format_instruction,
    data_collator=collator,
    max_seq_length=max_seq_length
)

trainer.train() 

Map: 100%|██████████| 805/805 [00:00<00:00, 3374.38 examples/s]
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)
  0%|          | 0/625 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
 20%|██        | 125/625 [01:23<04:47,  1.74it/s]

{'loss': 2.6941, 'learning_rate': 4e-05, 'epoch': 1.0}


                                                 
 20%|██        | 125/625 [02:33<04:47,  1.74it/s]

{'eval_loss': 2.2960920333862305, 'eval_runtime': 70.0126, 'eval_samples_per_second': 11.498, 'eval_steps_per_second': 1.443, 'epoch': 1.0}


 40%|████      | 250/625 [04:34<03:43,  1.68it/s]  

{'loss': 2.3693, 'learning_rate': 3e-05, 'epoch': 2.0}


                                                 
 40%|████      | 250/625 [05:39<03:43,  1.68it/s]

{'eval_loss': 2.305957555770874, 'eval_runtime': 65.0135, 'eval_samples_per_second': 12.382, 'eval_steps_per_second': 1.554, 'epoch': 2.0}


 60%|██████    | 375/625 [07:34<02:13,  1.87it/s]  

{'loss': 2.1768, 'learning_rate': 2e-05, 'epoch': 3.0}


                                                 
 60%|██████    | 375/625 [08:35<02:13,  1.87it/s]

{'eval_loss': 2.3381845951080322, 'eval_runtime': 60.3404, 'eval_samples_per_second': 13.341, 'eval_steps_per_second': 1.674, 'epoch': 3.0}


 80%|████████  | 500/625 [10:35<01:16,  1.64it/s]  

{'loss': 2.0642, 'learning_rate': 1e-05, 'epoch': 4.0}


                                                 
 80%|████████  | 500/625 [11:39<01:16,  1.64it/s]

{'eval_loss': 2.3685972690582275, 'eval_runtime': 64.3247, 'eval_samples_per_second': 12.515, 'eval_steps_per_second': 1.57, 'epoch': 4.0}


100%|██████████| 625/625 [13:39<00:00,  1.12it/s]

{'loss': 1.9873, 'learning_rate': 0.0, 'epoch': 5.0}


                                                 
100%|██████████| 625/625 [14:45<00:00,  1.12it/s]

{'eval_loss': 2.3767008781433105, 'eval_runtime': 66.3878, 'eval_samples_per_second': 12.126, 'eval_steps_per_second': 1.521, 'epoch': 5.0}


100%|██████████| 625/625 [14:51<00:00,  1.43s/it]

{'train_runtime': 892.4502, 'train_samples_per_second': 5.603, 'train_steps_per_second': 0.7, 'train_loss': 2.258316552734375, 'epoch': 5.0}





TrainOutput(global_step=625, training_loss=2.258316552734375, metrics={'train_runtime': 892.4502, 'train_samples_per_second': 5.603, 'train_steps_per_second': 0.7, 'train_loss': 2.258316552734375, 'epoch': 5.0})

In [21]:
trainer.save_model('./model')

### Inference

In [None]:
file_path = '.'

In [127]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_path = f'{file_path}/model/checkpoint-625'

model = AutoModelForCausalLM.from_pretrained(
    model_path, device_map = 'auto')

tokenizer = AutoTokenizer.from_pretrained(
    model_path)

text_generator = pipeline(
    task='text-generation',
    model=model,
    tokenizer=tokenizer,
    device_map = 'auto',
    # Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
    pad_token_id = tokenizer.eos_token_id
)

In [171]:
def format_singular_prompt(sample):
    input = sample['input'] if 'input' in sample.keys() else ''
    return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['instruction']}

### Input:
{input}

### Output:
""".strip()

	
print(format_singular_prompt(dataset['train'][2]))

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Describe the structure of an atom.

### Output:


In [175]:
from IPython.display import HTML, display

def compare_output(sample, max_length=125, temperature=0.7):
    generated_text = text_generator(format_singular_prompt(sample),
                                    max_length = max_length,
                                    temperature = temperature)
    generated_text = generated_text[0]['generated_text']
    generated_output = generated_text.split('### Output:\n')[-1]
    html_content = f"""
    <p><b>Instruction:</b></p>
    <p>{sample['instruction']}</p>
    <table>
        <tr>
            <td>Gold output</td>
            <td>Generated output</td>
        </tr>
        <tr>
            <td>{sample['output']}</td>
            <td>{generated_output}</td>
        </tr>
    </table>
    """
    display(HTML(html_content))
    # del generated_text
    # del generated_output

In [176]:
compare_output(dataset['eval'][10])

0,1
Gold output,Generated output
"Yes, retinoids are effective in treating acne. They work by increasing cell turnover, which helps to reduce the appearance of existing acne and prevent new outbreaks. Retinoids also help to unclog pores, which in turn reduces the amount of bacteria that can cause infections. In general, retinoids help to reduce inflammation and oil production, making them a great option for those with acne.","I think retinoid can help reduce the risk of my acne. It helps reduce the risk of my acne by reducing the risk of my skin becoming acne-free. Additionally, retinoids can help reduce the risk of my skin becoming acne-free. Additionally, retinoids can help reduce the risk of my skin becoming"


In [177]:
compare_output(sample=dataset['eval'][100], max_length=300, temperature=1.0)

0,1
Gold output,Generated output
"Ingredients: - 2 tablespoons of sweet soy sauce - 2 tablespoons of chili sauce - 2 tablespoons of vinegar - 2 tablespoons of sugar - 1 tablespoon of ground ginger - 2 tablespoons of vegetable oil - 2 cloves of garlic, minced - 1/4 teaspoon of ground pepper - 1/2 teaspoon of ground cumin - 1/4 teaspoon of ground nutmeg - 2 tablespoons of tomato paste - 2 packages of firm tofu, cut into cubes - 2 tablespoons of chopped shallots - 2 tablespoons of chopped scallions - 2 tablespoons of chopped celery - 2 tablespoons of chopped chilies Instructions: 1. In a medium bowl, mix together the sweet soy sauce, chili sauce, vinegar, sugar, ground ginger, vegetable oil, garlic, ground pepper, ground cumin, and ground nutmeg. 2. Heat the tomato paste in a large skillet over medium-high heat. 3. Add the tofu to the skillet and stir to coat. 4. Pour the sauce mixture over the tofu and stir to coat. 5. Add the shallots, scallions, celery, and chilies and stir to combine. 6. Cook until the tofu is golden brown and crispy, about 10 minutes. 7. Serve hot with steamed white rice or freshly cooked noodles. Enjoy!","I like to host guests at my home from time to time, and I am gathering recipes of different dishes and drinks to keep things interesting. I am interested in trying some Indonesian dishes. Can you give me a recipe for Tahu Gejrot Cirebon? Ingredients: 1. Pre-baked 2. Oat-like 3. Tapioca 4. Cumin 5. Stir in 6. Coconut oil 7. Fruit. 8. Salt 9. Banana Juice 10. Bananjali 11. Bananjali 12. Simmer 1 tablespoon coconut oil. 13. 14. 15. 16. 17. 18. 19."
