## Install dependencies

In [1]:
!pip install -q -U bitsandbytes peft trl datasets transformers wandb

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
kfp-pipeline-spec 0.5.0 requires protobuf<5,>=4.21.1, but you have protobuf 5.29.3 which is incompatible.
unbabel-comet 2.2.2 requires protobuf<5.0.0,>=4.24.4, but you have protobuf 5.29.3 which is incompatible.
nlm-utils 0.1.2 requires aiohttp==3.8.5, but you have aiohttp 3.11.6 which is incompatible.
nlm-utils 0.1.2 requires numpy==1.24.4, but you have numpy 1.26.4 which is incompatible.
nlm-utils 0.1.2 requires urllib3==1.26.6, but you have urllib3 2.3.0 which is incompatible.
sagemaker 2.232.2 requires protobuf<5.0,>=3.12, but you have protobuf 5.29.3 which is incompatible.
kfp 2.10.1 requires protobuf<5,>=4.21.1, but you have protobuf 5.29.3 which is incompatible.
kfp 2.10.1 requires urllib3<2.0.0, but you have urllib3 2.3.0 which is incompatible.
llama-index-core 0.12.0 requires nltk>3.8.1, but you have

## Import the Libraries

In [2]:
from enum import Enum
from functools import partial

import pandas as pd
import torch
import json

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig

seed = 42
set_seed(seed)

import os 
os.environ['HF_TOKEN'] = ""

## Processing datasets into inputs

In order to train the model, we need to **format the inputs into what we want the model to learn**.

For this tutorial, I enhanced a popular dataset for function calling "NousResearch/hermes-function-calling-v1" by adding some new **thinking** step computer from **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B**.

But in order for the model to learn, we need **to format the conversation correctly**. If you followed Unit 1, you know that going from a list of messages to a prompt is handled by the **chat_template**, or, the default chat_template of gemma-2-2B does not contain tool calls. So we will need to modify it !

This is the role of our **preprocess** function. To go from a list of messages, to a prompt that the model can understand.


In [None]:
model_name = "google/gemma-2-2b-it"
dataset_name = "Jofthomas/hermes-function-calling-thinking-V1"
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

def preprocess(sample):
    messages = sample['messages']
    first_message = messages[0]
    # Instead of adding a system message, we merge the content into the first user message 
    if first_message['role'] == 'system':
        system_message_content = first_message['content']
        messages[1]['content'] = system_message_content + "Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>\n\n" + messages[1]['content']
        messages.pop(0)
    
    return {'text': tokenizer.apply_chat_template(messages, tokenize=False)}


dataset = load_dataset(dataset_name)
dataset = dataset.rename_column("conversations", "messages")

README.md:   0%|          | 0.00/354 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/3.85M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3570 [00:00<?, ? examples/s]

In [17]:
dataset = dataset.map(preprocess, remove_columns="messages")
dataset = dataset['train'].train_test_split(0.1)
print(dataset)

Map:   0%|          | 0/2891 [00:00<?, ? examples/s]

Map:   0%|          | 0/322 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 2601
    })
    test: Dataset({
        features: ['text'],
        num_rows: 290
    })
})


Let's manually look at what an input looks like !

In this example we have :

1. A *User message* containing the **necessary information with the list of available tools** in between `<tools></tools>` then the user query, here:  `"Can you get me the latest news headlines for the United States?"`

2. An *Assistant message* here called "model" to fit the criterias from gemma models containing two new phases, a **"thinking"** phase contained in `<think></think>` and an **"Act"** phase contained in `<tool_call></<tool_call>`.

3. If the model contains a `<tools_call>`, we will append the result of this action in a new **"Tool"** message containing a `<tool_response></tool_response>` with the answer from the tool.

In [18]:
print(dataset["train"][8]["text"])

<bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'create_todo', 'description': 'Create a new todo item', 'parameters': {'type': 'object', 'properties': {'title': {'type': 'string', 'description': 'The title of the todo item'}, 'description': {'type': 'string', 'description': 'The description of the todo item'}, 'due_date': {'type': 'string', 'format': 'date', 'description': 'The due date of the todo item'}}, 'required': ['title', 'description']}}}, {'type': 'function', 'function': {'name': 'generate_password', 'description': 'Generate a random password', 'parameters': {'type': 'object', 'properties': {'length': {'type': 'integer', 'description': 'The length of the password'}, 'include_numbers'

## Modify the Tokenizer

Indeed, as we saw in Unit 1, the tokenizer splits text into sub-words by default. This is **not** what we want for our new special tokens!

While we segmented our example using `<think>`, `<tool_call>`, and `<tool_response>`, the tokenizer does **not** yet treat them as whole tokens—it still tries to break them down into smaller pieces. To ensure the model correctly interprets our new format, we must **add these tokens** to our tokenizer.

Additionally, since we changed the `chat_template` in our **preprocess** function to format conversations as messages within a prompt, we also need to modify the `chat_template` in the tokenizer to reflect these changes.

In [24]:
class ChatmlSpecialTokens(str, Enum):
    tools = "<tool>"
    eotools = "</tool>"
    think = "<think>"
    eothink = "</think>"
    tool_call = "<tool_call>"
    eotool_call = "</tool_call>"
    tool_response = "<tool_response>"
    eotool_response = "</tool_response>"
    pad_token = "<pad>"
    eos_token = "<eos>"

    @classmethod
    def list(cls):
        return [c.value for c in cls]

tokenizer = AutoTokenizer.from_pretrained(
    model_name, 
    pad_token=ChatmlSpecialTokens.pad_token.value, 
    additional_special_tokens=ChatmlSpecialTokens.list()
)

tokenizer.chat_template = "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{{ '<start_of_turn>' + message['role'] + '\n' + message['content'] | trim + '<end_of_turn><eos>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

In [25]:
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    attn_implementation="eager",
    device_map="auto"
)

model.resize_token_embeddings(len(tokenizer))
model.to(torch.bfloat16)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

ValueError: You are trying to offload the whole model to the disk. Please use the `disk_offload` function instead.

## LoRA Configuration

In [None]:
from peft import LoraConfig

rank_dimention = 16 # rank dimension for LoRA update matrices (smaller = more compression)
lora_alpha = 64  # scaling factor for LoRA layers (higher = stronger adaptation) 
lora_dropout = 0.05

peft_config = LoraConfig(
    r=rank_dimention, 
    lora_alpha=lora_alpha, 
    lora_dropout=lora_dropout,
    target_modules=["gate_proj","q_proj","lm_head","o_proj","k_proj","embed_tokens","down_proj","up_proj","v_proj"],
    task_type=TaskType.CAUSAL_LM
)

## Defining the Trainer and Fine-Tuning Hyperparameters

In [None]:
username ="zeinab-sheikhi"
output_dir = "gemma-2-2B-it-thinking-function_calling-V0"
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
gradient_accumulation_step = 4
logging_steps = 5
learning_rate = 1e-4

max_grad_norm = 1.0
num_train_epochs = 3
warmup_ratio = 0.1
lr_scheduler_type = "cosine"
max_seq_length = 1500

training_args = SFTConfig(
    per_device_train_batch_size=per_device_train_batch_size,  # The batch size per GPU/XPU/TPU/MPS/NPU core/CPU for training.
    per_device_eval_batch_size=per_device_eval_batch_size, 
    gradient_accumulation_steps=gradient_accumulation_step,  # Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
    save_strategy="no", 
    eval_strategy="epoch",
    logging_steps=logging_steps, 
    learning_rate=learning_rate, 
    max_grad_norm=max_grad_norm, 
    weight_decay=0.1, 
    warmup_ratio=warmup_ratio, 
    lr_scheduler_type=lr_scheduler_type, 
    report_to="tensorboard", 
    bf16=True, 
    num_train_epochs=num_train_epochs, 
    gradient_checkpointing=True, 
    gradient_checkpointing_kwargs={"use_reentrant": False}, 
    packing=True,  # A boolean indicating whether to pack multiple sequences into a single input to utilize the model's context window efficiently
    max_seq_length=max_seq_length,
    hub_private_repo=False,
    push_to_hub=False,
)

In [None]:
trainer = SFTTrainer(
    model=model, 
    args=training_args, 
    train_dataset=dataset['train'], 
    eval_dataset=dataset['test'], 
    processing_class=tokenizer,
    peft_config=peft_config,
)

In [None]:
trainer.train()
trainer.save_model()
trainer.push_to_hub(f"{username}/{output_dir}")
tokenizer.eos_token = "<eos>"
tokenizer.push_to_hub(f"{username}/{output_dir}", token=True)

## Test the model

In [None]:
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from datasets import load_dataset
import torch

bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_use_double_quant=True,
)

peft_model_id = f"{username}/{output_dir}" # replace with your newly trained adapter
device = "auto"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, peft_model_id)
model.to(torch.bfloat16)
model.eval()

In [None]:
print(dataset["test"][8]["text"])

In [None]:
#this prompt is a sub-sample of one of the test set examples. In this example we start the generation after the model generation starts.
prompt="""<bos><start_of_turn>human
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.Here are the available tools:<tools> [{'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert from one currency to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to convert'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}, {'type': 'function', 'function': {'name': 'calculate_distance', 'description': 'Calculate the distance between two locations', 'parameters': {'type': 'object', 'properties': {'start_location': {'type': 'string', 'description': 'The starting location'}, 'end_location': {'type': 'string', 'description': 'The ending location'}}, 'required': ['start_location', 'end_location']}}}] </tools>Use the following pydantic model json schema for each tool call you will make: {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{tool_call}
</tool_call>Also, before making a call to a function take the time to plan the function to take. Make that thinking process between <think>{your thoughts}</think>

Hi, I need to convert 500 USD to Euros. Can you help me with that?<end_of_turn><eos>
<start_of_turn>model
<think>"""

inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
inputs = {k: v.to("cuda") for k,v in inputs.items()}
outputs = model.generate(**inputs,
                         max_new_tokens=300,# Adapt as necessary
                         do_sample=True,
                         top_p=0.95,
                         temperature=0.01,
                         repetition_penalty=1.0,
                         eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))