# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [2]:
 !pip3 install peft==0.7.1
 !pip3 install trl==0.7.4
 !pip3 install transformer



ERROR: Could not find a version that satisfies the requirement transformer (from versions: none)
ERROR: No matching distribution found for transformer


In [3]:
import transformers
transformers.__version__

  from .autonotebook import tqdm as notebook_tqdm
  _torch_pytree._register_pytree_node(


'4.36.2'

In [4]:
import trl
trl.__version__

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


'0.7.4'

In [5]:
import os
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

## Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer. 

In [7]:
# Step 1: Load the dataset
from datasets import load_dataset

train_set = load_dataset('json', data_files='alpaca_data.json', split='train')
eval_set = load_dataset("tatsu-lab/alpaca_eval", split='eval')
eval_set = eval_set.remove_columns(["generator", "dataset"])
eval_set

Generating train split: 52002 examples [00:00, 132921.24 examples/s]
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
Downloading builder script: 100%|██████████| 8.10k/8.10k [00:00<?, ?B/s]
Downloading readme: 100%|██████████| 30.0/30.0 [00:00<?, ?B/s]
Downloading data: 100%|██████████| 621k/621k [00:01<00:00, 575kB/s] 


C:\Users\ASUS\.cache\huggingface\datasets\downloads\07bde58ae497102ab81d326d84eafcf6c2c7e8df8cd8b8d0ef64d9eceab41ada


Generating eval split: 805 examples [00:00, 22644.85 examples/s]


Dataset({
    features: ['instruction', 'output'],
    num_rows: 805
})

In [8]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name_or_path = "distilgpt2"

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    device_map = 'auto')

tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)

tokenizer.pad_token = tokenizer.eos_token

# Make sure to pass a correct value for max_seq_length as the default value will be set to min(tokenizer.model_max_length, 1024).
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  _torch_pytree._register_pytree_node(


1024

In [9]:
def formatting_prompts_func(examples):
	output_texts = []

	for i in range(len(examples['instruction'])):
		if 'input' in examples.keys():
			input_text = examples["input"][i] 
		else:
			input_text = None
	
		if input_text:
			text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{examples["instruction"][i]}

### Input:
{input_text}

### Response:
{examples["output"][i]}
""".strip()
			
		else:
			text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{examples["instruction"][i]}

### Response:
{examples["output"][i]}
""".strip()

		output_texts.append(text)

	return output_texts

In [10]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
from transformers import TrainingArguments

In [12]:
response_template = "### Response:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

output_path = './model'
model_path = './model/finalmodel'

training_args = TrainingArguments(
    output_dir = output_path, #default = 'tmp_trainer'
    save_strategy = 'epoch',
    gradient_checkpointing = True,
    per_device_train_batch_size = 2,
    per_device_eval_batch_size = 2,
    num_train_epochs = 3, #default = 3
)

trainer = SFTTrainer(
    model,
    args = training_args,
    train_dataset = train_set.select(range(1000)),
    eval_dataset = eval_set,
    formatting_func = formatting_prompts_func,
    data_collator = collator,
    max_seq_length = max_seq_length,
)

trainer.train()

Map: 100%|██████████| 805/805 [00:00<00:00, 5094.26 examples/s]
                                                  
 14%|█▍        | 213/1500 [20:37<30:05,  1.40s/it]

{'loss': 2.3564, 'learning_rate': 3.3333333333333335e-05, 'epoch': 1.0}


                                                  
 14%|█▍        | 213/1500 [32:44<30:05,  1.40s/it] 

{'loss': 2.0647, 'learning_rate': 1.6666666666666667e-05, 'epoch': 2.0}


                                                  
 14%|█▍        | 213/1500 [44:44<30:05,  1.40s/it] 

{'loss': 1.8213, 'learning_rate': 0.0, 'epoch': 3.0}


                                                  
100%|██████████| 1500/1500 [36:43<00:00,  1.47s/it]

{'train_runtime': 2203.1508, 'train_samples_per_second': 1.362, 'train_steps_per_second': 0.681, 'train_loss': 2.080772501627604, 'epoch': 3.0}





TrainOutput(global_step=1500, training_loss=2.080772501627604, metrics={'train_runtime': 2203.1508, 'train_samples_per_second': 1.362, 'train_steps_per_second': 0.681, 'train_loss': 2.080772501627604, 'epoch': 3.0})

In [13]:
# save model
trainer.save_model(model_path)

In [52]:
from transformers import pipeline

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map = 'auto')

text_generator = pipeline(
    "text-generation",
    model = model,
    tokenizer = tokenizer,
    device_map = 'auto',
    pad_token_id = tokenizer.eos_token_id,
    max_new_tokens = 50
)

In [53]:
def format_input(sample):
	
	if 'input' in sample.keys():
		return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['instruction']}

### Input:
{sample['input']}

### Response:
""".strip()
			
	else:
		return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['instruction']}

### Response:
""".strip()

In [70]:

def compare_generated_and_gold(pipeline, sample):
    formatted_input = format_input(sample)
    output = pipeline(formatted_input)
    generated_response = output[0]['generated_text'].split("### Response:\n")[-1]

    print(f"Instruction:\n{sample['instruction']}\n")
    if 'input' in sample:
        print(f"Input:\n{sample['input']}\n")
    print(f"Gold Response:\n{sample['output']}\n")
    print(f"Generated Response:\n{generated_response}\n")

# Test the function with an example from the eval_set
compare_generated_and_gold(text_generator, eval_set[2])


Instruction:
Hi, my sister and her girlfriends want me to play kickball with them. Can you explain how the game is played, so they don't take advantage of me?

Gold Response:
Kickball is a game similar to baseball, but with a large rubber ball instead of a bat and a ball. The game is usually played with two teams of six players each. Each team has three bases and a home plate. The players on the kicking team line up at home plate and take turns kicking the ball. The object of the game is to score runs by running around all three bases and back to home plate without being tagged out by the defense. The team with the most runs at the end of the game is the winner.

Generated Response:
The game is played by a team of five players from around the world. The team consists of five or six members of the team. The team will advance to the final round as a result. When your team reaches the last round, they will



## Findings
The generated responses were not closely aligned with the gold standard, indicating issues with the model's understanding and generating relevant content. The decrease in loss during training shows promise, but it's clear that further improvements are needed.

## Recommendations

### 1. More Training Data
- **Action**: Expand the training dataset with more diverse examples.
- **Purpose**: To improve generalization over different types of inputs.

### 2. Longer Training Periods
- **Action**: Increase the number of training epochs.
- **Purpose**: To allow the model to learn more detailed features and relationships.

### 3. Fine-Tuning on Specific Tasks
- **Action**: Fine-tune the model on tasks similar to the expected deployment scenarios.
- **Purpose**: To enhance the model's accuracy on specific tasks like understanding and generating game explanations.

### 4. Improved Model Architecture
- **Action**: Explore advanced model architectures like Transformer-based models (GPT, BERT).
- **Purpose**: To better handle nuances and improve context understanding.

### 5. Hyperparameter Optimization
- **Action**: Experiment with different hyperparameters.
- **Purpose**: To find the optimal configuration that maximizes model performance.

### 6. Advanced Evaluation Metrics
- **Action**: Implement additional metrics such as BLEU, ROUGE for a detailed performance evaluation.
- **Purpose**: To gain deeper insights into the model's output quality beyond just loss.

### 7. Human-in-the-Loop
- **Action**: Incorporate human feedback during training phases.
- **Purpose**: To correct and guide the model's learning process more effectively.

### 8. Regular Validation
- **Action**: Conduct regular tests with new, unseen data.
- **Purpose**: To ensure consistent model performance and adapt training strategies as needed.
