# Fine-tuning a Multilingual Reasoner with Hugging Face

Authored by: Thuan Nguyen
In this notebook, I show how OpenAI's open-weight reasoning model OpenAI gpt-oss-20b can be fine-tuned to against to the backdoor attacks. 
I'll cover the following steps:
1. Setup: Install the required libraries.
2. Prepare the dataset: Download and format the example dataset for fine-tuning.
3. Prepare the model: Loading the base model and configure it for fine-tuning LoRA, a memory efficient technique.
4. Fine-tuning: Train the model with our backdoor attack-free data
5. Inference: Generate reasoning responses in different languages using the fine-tuned model.

The desired result is a reasoning model that can immune with backdoor attact.

## Setup
To get started, let's install all the necessary libraries. 
- Firstly install Pytorch: 
`pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu129`
- Secondly install the remaining dependencies:
`pip install "trl>=0.20.0" "peft>=0.17.0" "transformers>=4.55.0" trackio`
- Finally, log into Hugging Face account as follows:

In [10]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

## Prepare the example dataset
We will be using Multilingual-Thinking, which is a reasoning dataset where the chain-of-thought has been translated into several languages such as French, Spanish, and German. By fine-tuning openai/gpt-oss-20b on this dataset, it will learn to generate reasoning steps in these languages, and thus its reasoning process can be interpreted by users who speak those languages.

In [12]:
from datasets import load_dataset

dataset = load_dataset("HuggingFaceH4/Multilingual-Thinking", split="train")
print(f"Total training samples: {len(dataset)}")

Total training samples: 1000


In [13]:
dataset[0]

{'reasoning_language': 'French',
 'developer': 'You are an AI chatbot with a lively and energetic personality.',
 'user': 'Can you show me the latest trends on Twitter right now?',
 'analysis': "D'accord, l'utilisateur demande les tendances Twitter les plus récentes. Tout d'abord, je dois vérifier si j'ai accès à des données en temps réel. Étant donné que je ne peux pas naviguer sur Internet ou accéder directement à l'API de Twitter, je ne peux pas fournir des tendances en direct. Cependant, je peux donner quelques conseils généraux sur la façon de les trouver.\n\nJe devrais préciser que les tendances Twitter évoluent rapidement et sont spécifiques à chaque région. Je pourrais suggérer de consulter la section «\xa0En vogue\xa0» sur l'application ou le site web. Aussi, l'utilisation de hashtags et le suivi d'utilisateurs pertinents pourraient être utiles. Il est important de souligner que les tendances varient selon la région et l'heure de la journée. Je devrais garder un ton amical et 

The `gpt-oss` models were trained on the Harmony response format for defining conversation structures, generating reasoning output and structuring function calls.

In order to fine-tune the model, we need to convert these messages into a format that the model can understand. In practice this is done by formatting each message with the model's chat template and then tokenizing the resulting text. The TRL library does this automatically, but let's walk through it step by step to understand how it works.

To do so, let's first load the tokenizer:

In [14]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b")
tokenizer

PreTrainedTokenizerFast(name_or_path='openai/gpt-oss-20b', vocab_size=199998, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|return|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	199998: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	199999: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	200000: AddedToken("<|reserved_200000|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	200001: AddedToken("<|reserved_200001|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	200002: AddedToken("<|return|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	200003: AddedToken("<|constrain|>", rstrip=False,

In [15]:
messages = dataset[0]["messages"]
conversation = tokenizer.apply_chat_template(messages, tokenize=False)
print(conversation)

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-18

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions

reasoning language: French

You are an AI chatbot with a lively and energetic personality.

<|end|><|start|>user<|message|>Can you show me the latest trends on Twitter right now?<|end|><|start|>assistant<|channel|>analysis<|message|>D'accord, l'utilisateur demande les tendances Twitter les plus récentes. Tout d'abord, je dois vérifier si j'ai accès à des données en temps réel. Étant donné que je ne peux pas naviguer sur Internet ou accéder directement à l'API de Twitter, je ne peux pas fournir des tendances en direct. Cependant, je peux donner quelques conseils généraux sur la façon de les trouver.

Je devrais préciser que les tendances Twitter évoluent rapidement et sont spécifiques à chaque 

## Prepare the model
To prepare the model for training, let's first download the weights from the Hugging Face Hub. We will use the AutoModelForCausalLM class from Transformers to load the model:

In [16]:
import torch
from transformers import AutoModelForCausalLM, Mxfp4Config
assert torch.cuda.is_available() and torch.cuda.is_bf16_supported()


quantization_config = Mxfp4Config(dequantize = True)
model_kwargs = dict(
    attn_implementation="eager",
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config,
    use_cache=False,
    device_map="auto"
)
model = AutoModelForCausalLM.from_pretrained(
    "openai/gpt-oss-20b",
    **model_kwargs
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

This will load the model with the necessary configurations for training. The `attn_implementation` is set to `eager` for better performance, and `use_cache` is set to `False` since we will fine-tune the model with gradient checkpointing.

Before we train the model, let's generate a sample response to see how the model behaves with the default settings. To do so, we need to tokenize a sample prompt and then use the model to generate a response:

In [17]:
messages = [
    {"role": "user", "content": "¿Cuál es el capital de Australia?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

output_ids = model.generate(input_ids, max_new_tokens=512)
response = tokenizer.batch_decode(output_ids)[0]
print(response)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-18

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>¿Cuál es el capital de Australia?<|end|><|start|>assistant<|channel|>analysis<|message|>The user asks in Spanish: "¿Cuál es el capital de Australia?" The answer: Canberra. Should be simple.<|end|><|start|>assistant<|channel|>final<|message|>La capital de Australia es **Canberra**.<|return|>


I will use a technique called LoRA (Low-Rank Adaptation) to fine-tune the model. This technique allows us to tune a few specific layers of the model, which is particularly useful for large models like `openai/gpt-oss-20b`.

First I need to wrap the model as a `PeftModel` and define LoRA configuration. We will use the LoraConfig class from the PEFT library to do this:

In [18]:
from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules="all-linear",
    target_parameters=[
        "7.mlp.experts.gate_up_proj",
        "7.mlp.experts.down_proj",
        "15.mlp.experts.gate_up_proj",
        "15.mlp.experts.down_proj",
        "23.mlp.experts.gate_up_proj",
        "23.mlp.experts.down_proj",
    ],
)
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

trainable params: 15,040,512 || all params: 20,929,797,696 || trainable%: 0.0719




Here we've used some basic hyperparameters for LoRA, but you can experiment with different values to see how they affect the model's performance. For instance, if you increase r you will enable more trainable parameters, which may produce a better model at the expense of requiring more VRAM and time to train.

Note: The `openai/gpt-oss-20b` model is a Mixture-of-Experts (MoE) architecture. In addition to targeting the attention layers (`target_modules="all-linear"`), it’s also important to include the projection layers within the expert modules. PEFT facilitates this via the `target_parameters` argument, which allows you to specify expert-specific layers such as ``mlp.experts.down_proj` and `mlp.experts.gate_up_proj`. In this example, we target a subset of these projection layers, but you are encouraged to experiment with different configurations.

Now that we have the model and dataset ready, we can define the hyperparameters for training.

## Fine-tuning
TRL provides a convenient way to define hyperparameters for training using the `SFTConfig` class. We will set the learning rate, batch size, number of epochs, and other parameters as follows:

In [None]:
from datetime import datetime
from trl import SFTConfig
import mlflow

mlflow.set_experiment("gpt-oss-20b-multilingual-reasoner")

training_args = SFTConfig(
    # Set this to mlflow for logging your training
    report_to="mlflow",
    # Name the MLflow run
    run_name=f"gpt-oss-20b-multilingual-reasoner-{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}",
    output_dir=f"multilingual-reasoner{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}",
    learning_rate=2e-4,
    gradient_checkpointing=True,
    num_train_epochs=3,
    logging_steps=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    max_length=2048,
    warmup_ratio=0.03,
    lr_scheduler_type="cosine_with_min_lr",
    lr_scheduler_kwargs={"min_lr_rate": 0.1},
    push_to_hub=True,
)

2025/08/18 14:45:27 INFO mlflow.tracking.fluent: Experiment with name 'gpt-oss-20b-multilingual-reasoner' does not exist. Creating a new experiment.


Note that the `per_device_train_batch_size` is set to 4, and the `gradient_accumulation_steps` is set to 4. This means that we will effectively have a batch size of 4x4=16 across 1 GPU. You may need to adjust these values based on your hardware setup. We also use Trackio to log the training progress and metrics, but you can use any other logging library of your choice.

We now have all the pieces needed to train the model. We will use the SFTTrainer class from TRL to handle the training process. The trainer will take care of formatting the dataset, applying the chat template, and training the model:

In [22]:
from trl import SFTTrainer

trainer = SFTTrainer(
    model=peft_model,
    args=training_args,
    train_dataset=dataset,
    processing_class=tokenizer,
)

In [23]:
trainer.train()

Step,Training Loss
1,2.4707
2,1.926
3,1.3268
4,2.4431
5,1.5072
6,1.765
7,2.2828
8,2.7401
9,1.6397
10,1.4874


TrainOutput(global_step=1000, training_loss=0.9695024940371514, metrics={'train_runtime': 3970.882, 'train_samples_per_second': 0.252, 'train_steps_per_second': 0.252, 'total_flos': 1.3006431066504845e+17, 'train_loss': 0.9695024940371514})

## Save the finetuned model

In [24]:
trainer.save_model(training_args.output_dir)
trainer.push_to_hub(dataset_name="HuggingFaceH4/Multilingual-Thinking")

No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/NMThuan032k/multilingual-reasoner2025-08-18_14-45-27/commit/b2d47955366a495800e3d8f625b2820cffbd3838', commit_message='End of training', commit_description='', oid='b2d47955366a495800e3d8f625b2820cffbd3838', pr_url=None, repo_url=RepoUrl('https://huggingface.co/NMThuan032k/multilingual-reasoner2025-08-18_14-45-27', endpoint='https://huggingface.co', repo_type='model', repo_id='NMThuan032k/multilingual-reasoner2025-08-18_14-45-27'), pr_revision=None, pr_num=None)

## Inference

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b")

# Load the original model first
model_kwargs = dict(attn_implementation="eager", torch_dtype="auto", use_cache=True, device_map="auto")
base_model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-20b", **model_kwargs).cuda()

# Merge fine-tuned weights with the base model
peft_model_id = "multilingual-reasoner2025-08-18_14-45-27"
model = PeftModel.from_pretrained(base_model, peft_model_id)
model = model.merge_and_unload()

MXFP4 quantization requires triton >= 3.4.0 and kernels installed, we will default to dequantizing the model to bf16


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



In [2]:
REASONING_LANGUAGE = "German"
SYSTEM_PROMPT = f"reasoning language: {REASONING_LANGUAGE}"
USER_PROMPT = "¿Cuál es el capital de Australia?"  # Spanish for "What is the capital of Australia?"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": USER_PROMPT},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

gen_kwargs = {"max_new_tokens": 512, "do_sample": True, "temperature": 0.6, "top_p": None, "top_k": None}

output_ids = model.generate(input_ids, **gen_kwargs)
response = tokenizer.batch_decode(output_ids)[0]
print(response)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-18

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions

reasoning language: German

<|end|><|start|>user<|message|>¿Cuál es el capital de Australia?<|end|><|start|>assistant<|channel|>analysis<|message|>In Ordnung, der Benutzer fragt nach der Hauptstadt Australiens. Ich beginne damit, daran zu denken, was ich weiß. Australien ist ein Kontinent und ein Land, und seine Hauptstadt ist Canberra. Ich sollte sicherstellen, dass das korrekt ist. Ich erinnere mich, dass Canberra die Hauptstadt ist, während Sydney und Melbourne die größten Städte sind. Der Benutzer könnte ein Schüler sein, der übergeordnete geografische Fakten lernen möchte, oder vielleicht jemand, der sich auf einen Test vorbereitet. Es ist wichtig, die Antwort klar und präzise zu formulie

In [3]:
REASONING_LANGUAGE = "Chinese"  # or Hindi, or any other language...
SYSTEM_PROMPT = f"reasoning language: {REASONING_LANGUAGE}"
USER_PROMPT = "What is the national symbol of Canada?"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": USER_PROMPT},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

output_ids = model.generate(input_ids, **gen_kwargs)
response = tokenizer.batch_decode(output_ids)[0]
print(response)

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-18

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions

reasoning language: Chinese

<|end|><|start|>user<|message|>What is the national symbol of Canada?<|end|><|start|>assistant<|channel|>analysis<|message|>好的，用户在问加拿大的国家象征是什么。让我先回想一下加拿大的象征。最常见的是枫叶。枫叶被广泛用于加拿大的标志、货币和其他国家象征。它代表了加拿大多样的自然景观以及与北美的联系。

我记得加拿大国旗有一个红色的枫叶。那是最显著的象征。除此之外，还有加拿大皇家徽章（Royal Coat of Arms），其中包含枫叶。还有加拿大的国歌《O Canada》，其中提到了枫叶。加拿大的国花是蓝铃花（Bluebell），但我不确定它是否是最主要的象征。加拿大的国鸟是加拿大鹅（Canada Goose），但它不是主要的国家象征。

用户可能想知道最主要的象征，也就是枫叶。也许他们想了解其历史或在加拿大文化中的重要性。让我检查一下是否还有其他主要象征。加拿大的国旗（枫叶旗）是最著名的。枫叶在加拿大文化中有深厚的根源，代表了自然、和平和多元文化。

我需要确认枫叶是最主要的象征。还有其他象征吗？加拿大皇家徽章（Royal Coat of Arms）是官方象征，但枫叶是更具代表性的。也许用户想了解枫叶的象征意义。枫叶象征着自然、和平、团结和多元文化。它们也与加拿大的历史有关，特别是与法国和英国的殖民历史。

我还应该提到枫叶在加拿大文化中的重要性，以及它在国家符号中的使用。例如，枫叶被用于加拿大货币、