<div class="align-center">
<a href="https://oumi.ai/"><img src="https://oumi.ai/docs/en/latest/_static/logo/header_logo.png" height="200"></a>

[![Documentation](https://img.shields.io/badge/Documentation-latest-blue.svg)](https://oumi.ai/docs/en/latest/index.html)
[![Discord](https://img.shields.io/discord/1286348126797430814?label=Discord)](https://discord.gg/oumi)
[![GitHub Repo stars](https://img.shields.io/github/stars/oumi-ai/oumi)](https://github.com/oumi-ai/oumi)
<a target="_blank" href="https://colab.research.google.com/github/oumi-ai/oumi/blob/main/notebooks/Oumi - Finetuning Tutorial.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
</div>

👋 Welcome to Open Universal Machine Intelligence (Oumi)!

🚀 Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from [data preparation](https://oumi.ai/docs/en/latest/resources/datasets/datasets.html) and [training](https://oumi.ai/docs/en/latest/user_guides/train/train.html) to [evaluation](https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html) and [deployment](https://oumi.ai/docs/en/latest/user_guides/launch/launch.html). Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.

🤝 Make sure to join our [Discord community](https://discord.gg/oumi) to get help, share your experiences, and contribute to the project! If you are interested in joining one of the community's open-science efforts, check out our [open collaboration](https://oumi.ai/community) page.

⭐ If you like Oumi and you would like to support it, please give it a star on [GitHub](https://github.com/oumi-ai/oumi).

# Finetuning Overview

In this tutorial, we'll LoRA tune a large language model to produce "thoughts" before producing its output.

We'll use the Oumi framework to streamline the process and achieve high-quality results.

We'll cover the following topics:
1. Prerequisites
2. Data Preparation & Sanity Checks
3. Training Config Preparation
4. Launching Training
5. Monitoring Progress
6. Evaluation
7. Analyzing Results
8. Inference


## Prerequisites

❗**NOTICE:** We recommend running this notebook on a GPU. If running on Google Colab, you can use the free T4 GPU runtime (Colab Menu: `Runtime` -> `Change runtime type`). On Colab, we recommend replacing `HuggingFaceTB/SmolLM2-1.7B-Instruct` with a smaller model like `HuggingFaceTB/SmolLM2-135M-Instruct`, since the T4 only has 16GB VRAM; you can use `Edit -> Find and replace` in the menu bar to do so.

First, let's install Oumi. You can find more detailed instructions [here](https://oumi.ai/docs/en/latest/get_started/installation.html). Here, we include Oumi's GPU dependencies.

In [3]:
%pip install uv -q
!uv pip install oumi[gpu] vllm --no-progress --system
!pip install --upgrade "pillow<10"


[2mUsing Python 3.11.11 environment at: /usr[0m
[2mAudited [1m2 packages[0m [2min 103ms[0m[0m
Collecting pillow<10
  Using cached Pillow-9.5.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (9.5 kB)
Using cached Pillow-9.5.0-cp311-cp311-manylinux_2_28_x86_64.whl (3.4 MB)
Installing collected packages: pillow
  Attempting uninstall: pillow
    Found existing installation: pillow 10.3.0
    Uninstalling pillow-10.3.0:
      Successfully uninstalled pillow-10.3.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mistral-common 1.5.3 requires pillow>=10.3.0, but you have pillow 9.5.0 which is incompatible.
oumi 0.1.3 requires pillow<10.4,>=10.3.0, but you have pillow 9.5.0 which is incompatible.
scikit-image 0.25.1 requires pillow>=10.1, but you have pillow 9.5.0 which is incompatible.[0m[31m
[0mSuccessfully installed pillow-9.5.0


## Creating our working directory
For our experiments, we'll use the following folder to save the model, training artifacts, and our working configs.

In [1]:
from pathlib import Path

tutorial_dir = "finetuning_tutorial"

Path(tutorial_dir).mkdir(parents=True, exist_ok=True)

## Setup the environment

You may need to set the following environment variables:
- [Optional] HF_TOKEN: Your [HuggingFace](https://huggingface.co/docs/hub/en/security-tokens) token, in case you want to access a private model like Llama.
- [Optional] WANDB_API_KEY: Your [wandb](https://wandb.ai) token, in case you want to log your experiments to wandb.

# Getting Started


## Data Preparation
Let's start by checking out our datasets, and seeing what the data looks like. The OpenO1-SFT dataset includes a variety of tasks, including code generation and explanation, with most examples having a "thought" produced prior to the output.

In [2]:
from oumi.builders import build_tokenizer
from oumi.core.configs import ModelParams
from oumi.datasets import PromptResponseDataset

# Initialize the dataset
tokenizer = build_tokenizer(
    ModelParams(model_name="HuggingFaceTB/SmolLM2-1.7B-Instruct")
)
dataset = PromptResponseDataset(
    tokenizer=tokenizer,
    hf_dataset_path="Unseen1980/fiori-tools-support-ga",
    prompt_column="question",
    response_column="answer",
)

# Print a few examples
for i in range(3):
    conversation = dataset.conversation(i)
    print(f"Example {i + 1}:")
    for message in conversation.messages:
        print(f"{message.role}: {message.content[:100]}...")  # Truncate for brevity
    print("\n")

[2025-02-16 22:13:11,231][oumi][rank0][pid:7702][MainThread][INFO]][base_map_dataset.py:68] Creating map dataset (type: PromptResponseDataset) dataset_name: 'Unseen1980/fiori-tools-support-ga', dataset_path: 'None'...
[2025-02-16 22:13:12,117][oumi][rank0][pid:7702][MainThread][INFO]][base_map_dataset.py:472] Dataset Info:
	Split: train
	Version: 0.0.0
	Dataset size: 297339
	Download size: 296628
	Size: 593967 bytes
	Rows: 417
	Columns: ['question', 'answer']
[2025-02-16 22:13:12,450][oumi][rank0][pid:7702][MainThread][INFO]][base_map_dataset.py:411] Loaded DataFrame with shape: (417, 2). Columns:
question    object
answer      object
dtype: object
Example 1:
user: Why is cmd+space not triggering code completion on my Mac?...
assistant: The shortcut for triggering code completion cmd+space does not work on Mac. This shortcut is used fo...


Example 2:
user: I need help with code completion not activating using cmd+space on Mac....
assistant: The shortcut for triggering code completion 

## Model Preparation

For code generation, we want a model with strong general language understanding and coding capabilities.

We also want a model that is small enough to train and run on a single GPU.

Some good options include:
- ["microsoft/Phi-3-mini-128k-instruct"](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
- ["google/gemma-2b"](https://huggingface.co/google/gemma-2b)
- ["Qwen/Qwen2-1.5B-Instruct"](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct)
- ["meta-llama/Llama-3.2-3B-Instruct"](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
- ["HuggingFaceTB/SmolLM2-1.7B-Instruct"](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)


For this tutorial, we'll use "HuggingFaceTB/SmolLM2-1.7B-Instruct" as our base model.



## Initial Model Responses

Let's see how our model performs on an example prompt.

In [3]:
%%writefile $tutorial_dir/infer.yaml

model:
  model_name: "HuggingFaceTB/SmolLM2-1.7B-Instruct"
  trust_remote_code: true
  torch_dtype_str: "bfloat16"

generation:
  max_new_tokens: 128
  batch_size: 1

Overwriting finetuning_tutorial/infer.yaml


In [5]:

from oumi.core.configs import InferenceConfig
from oumi.infer import infer

config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "infer.yaml"))

input_text = (
    "Can you help me with an issue where my SAP Fiori preview won't load?"
)

results = infer(config=config, inputs=[input_text])

print(results[0])

[2025-02-16 22:17:14,090][oumi][rank0][pid:7702][MainThread][INFO]][models.py:185] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2025-02-16 22:17:14,091][oumi][rank0][pid:7702][MainThread][INFO]][models.py:255] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
[2025-02-16 22:17:16,010][oumi][rank0][pid:7702][MainThread][INFO]][native_text_inference_engine.py:111] Setting EOS token id to `2`
conversation_id=None messages=[USER: Can you help me with an issue where my SAP Fiori preview won't load?, ASSISTANT: Of course, I'd be happy to help you troubleshoot the issue with your SAP Fiori preview. Here are some steps you can take:

1. **Check your browser**: Make sure you're using a supported browser like Chrome, Firefox, or Edge. SAP Fiori is not compatible with Internet Explorer.

2. **Update your browser**: Ensure your browser is up-to-date. Outdated browsers 

## Preparing our training experiment



Let's create a YAML file for our training config:

In [12]:
%%writefile $tutorial_dir/train.yaml

model:
  model_name: "HuggingFaceTB/SmolLM2-1.7B-Instruct"
  trust_remote_code: true
  torch_dtype_str: "bfloat16"
  tokenizer_pad_token: "<|endoftext|>"
  device_map: "auto"

data:
  train:
    datasets:
      - dataset_name: "PromptResponseDataset"
        split: "train"
        sample_count: 8000
        dataset_kwargs: {
          "hf_dataset_path": "Unseen1980/fiori-tools-support-ga",
          "prompt_column": "question",
          "response_column": "answer",
          "assistant_only": true,
          "instruction_template": "<|im_start|>user\n",
          "response_template": "<|im_start|>assistant\n",
        }
        shuffle: True
        seed: 42
    collator_name: "text_with_padding"
    seed: 42

training:
  output_dir: "finetuning_tutorial/output"

  # For a single GPU, the following gives us a batch size of 16
  # If training with multiple GPUs, feel free to reduce gradient_accumulation_steps
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 8

  # ***NOTE***
  # We set it to 10 steps to first verify that it works
  # Swap to 1500 steps to get more meaningful results.
  # Note: 1500 steps will take 2-3 hours on a single A100-40GB GPU.
  # max_steps: 10
  max_steps: 1500

  learning_rate: 1e-3
  warmup_ratio: 0.1
  logging_steps: 10
  save_steps: 0
  max_grad_norm: 1
  weight_decay: 0.01


  trainer_type: "TRL_SFT"
  optimizer: "adamw_torch_fused"
  enable_gradient_checkpointing: True
  gradient_checkpointing_kwargs:
    use_reentrant: False
  ddp_find_unused_parameters: False
  dataloader_num_workers: "auto"
  dataloader_prefetch_factor: 32
  empty_device_cache_steps: 1
  use_peft: true

peft:
  lora_r: 16
  lora_alpha: 32
  lora_dropout: 0.00
  lora_target_modules:
    - "q_proj"
    - "k_proj"
    - "v_proj"
    - "o_proj"
    - "gate_proj"
    - "up_proj"
    - "down_proj"

Overwriting finetuning_tutorial/train.yaml


## Fine-tuning the model

This will start the fine-tuning process using the Oumi framework. Because we set `max_steps: 5`, this should be very quick. The full fine-tuning process may take a few hours, depending on your GPU.

### SINGLE GPU

In [13]:
!oumi train -c "$tutorial_dir/train.yaml"


@@@@@@@@@@@@@@@@@@@
@                 @
@   @@@@@  @  @   @
@   @   @  @  @   @
@   @@@@@  @@@@   @
@                 @
@   @@@@@@@   @   @
@   @  @  @   @   @
@   @  @  @   @   @
@                 @
@@@@@@@@@@@@@@@@@@@

[2025-02-16 22:24:35,649][oumi][rank0][pid:11003][MainThread][INFO]][distributed.py:546] Setting random seed to 42 on rank 0.
2025-02-16 22:24:37.425780: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739744677.448051   11003 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739744677.454876   11003 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2025-02-16 22:24:40,033][oumi][rank0][pid:11003][MainThread][INFO]][torch_utils.py:66] T

### MULTI-GPU

In [None]:
!oumi distributed torchrun -m oumi train -c "$tutorial_dir/train.yaml"

## Evaluation


As an example, let's create an evaluation configuration file!

**Note:** Since we've finetuned our model to produce thoughts before answering, it's very likely to do worse on most evals out-of-the-box.

Many evals do not allow models to decode and thus don't take advantage of things like inference-time reasoning.

In [14]:
%%writefile $tutorial_dir/eval.yaml

model:
  model_name: "finetuning_tutorial/output"
  torch_dtype_str: "bfloat16"

tasks:
  - evaluation_platform: lm_harness
    task_name: mmlu_college_computer_science

output_dir: "finetuning_tutorial/output/evaluation"
generation:
  batch_size: null # This will let LM HARNESS find the maximum possible batch size.

Overwriting finetuning_tutorial/eval.yaml


In [15]:
!oumi evaluate -c "$tutorial_dir/eval.yaml"


@@@@@@@@@@@@@@@@@@@
@                 @
@   @@@@@  @  @   @
@   @   @  @  @   @
@   @@@@@  @@@@   @
@                 @
@   @@@@@@@   @   @
@   @  @  @   @   @
@   @  @  @   @   @
@                 @
@@@@@@@@@@@@@@@@@@@

[2025-02-16 23:27:30,659][oumi][rank0][pid:26526][MainThread][INFO]][model_params.py:225] Found LoRA adapter at finetuning_tutorial/output, setting `adapter_model` to `model_name`.
[2025-02-16 23:27:30,659][oumi][rank0][pid:26526][MainThread][INFO]][model_params.py:242] Setting `model_name` to HuggingFaceTB/SmolLM2-1.7B-Instruct found in adapter config.
2025-02-16 23:27:32.600112: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739748452.622490   26526 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739748452.629416   26526

## Use the Fine-tuned Model

Once we're happy with the results, we can serve the fine-tuned model for interactive inference:

In [16]:
%%writefile $tutorial_dir/trained_infer.yaml

model:
  model_name: "HuggingFaceTB/SmolLM2-1.7B-Instruct"
  adapter_model: "finetuning_tutorial/output"
  trust_remote_code: true
  torch_dtype_str: "bfloat16"

generation:
  max_new_tokens: 2048
  batch_size: 1

Overwriting finetuning_tutorial/trained_infer.yaml


In [17]:
from oumi.core.configs import InferenceConfig
from oumi.infer import infer

config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "trained_infer.yaml"))

input_text = (
    "What do I need to do if my SAP Fiori app deployment is not working?"
)

results = infer(config=config, inputs=[input_text])

print(results[0])

[2025-02-16 23:28:24,297][oumi][rank0][pid:7702][MainThread][INFO]][models.py:185] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2025-02-16 23:28:24,298][oumi][rank0][pid:7702][MainThread][INFO]][models.py:255] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
[2025-02-16 23:28:26,023][oumi][rank0][pid:7702][MainThread][INFO]][models.py:236] Loading PEFT adapter from: finetuning_tutorial/output ...
[2025-02-16 23:28:26,682][oumi][rank0][pid:7702][MainThread][INFO]][native_text_inference_engine.py:111] Setting EOS token id to `2`
conversation_id=None messages=[USER: What do I need to do if my SAP Fiori app deployment is not working?, ASSISTANT: Please follow the steps below. Also, replace the following variables in the curl command templates in steps 1 and 3.

$DESTINATION: Use the destination name in BAS environment
$SET_COOKIE_ARBE: Replace with ARBE value 

In [20]:
!pip install transformers
from transformers import AutoModel

model = AutoModel.from_pretrained("finetuning_tutorial/output")
model.push_to_hub("Unseen1980/fiori-tools-customer-support", token = "")



Loading adapter weights from finetuning_tutorial/output led to unexpected keys not found in the model:  ['model.layers.0.mlp.down_proj.lora_A.default.weight', 'model.layers.0.mlp.down_proj.lora_B.default.weight', 'model.layers.0.mlp.gate_proj.lora_A.default.weight', 'model.layers.0.mlp.gate_proj.lora_B.default.weight', 'model.layers.0.mlp.up_proj.lora_A.default.weight', 'model.layers.0.mlp.up_proj.lora_B.default.weight', 'model.layers.0.self_attn.k_proj.lora_A.default.weight', 'model.layers.0.self_attn.k_proj.lora_B.default.weight', 'model.layers.0.self_attn.o_proj.lora_A.default.weight', 'model.layers.0.self_attn.o_proj.lora_B.default.weight', 'model.layers.0.self_attn.q_proj.lora_A.default.weight', 'model.layers.0.self_attn.q_proj.lora_B.default.weight', 'model.layers.0.self_attn.v_proj.lora_A.default.weight', 'model.layers.0.self_attn.v_proj.lora_B.default.weight', 'model.layers.1.mlp.down_proj.lora_A.default.weight', 'model.layers.1.mlp.down_proj.lora_B.default.weight', 'model.laye

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/72.4M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Unseen1980/fiori-tools-customer-support/commit/1a7b88a4aeb899d05737805523e436619fb80feb', commit_message='Upload model', commit_description='', oid='1a7b88a4aeb899d05737805523e436619fb80feb', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Unseen1980/fiori-tools-customer-support', endpoint='https://huggingface.co', repo_type='model', repo_id='Unseen1980/fiori-tools-customer-support'), pr_revision=None, pr_num=None)